Wanwei He
Grocery
AI & ML interests
LLM
Recent Activity
upvoted a paper 10 days ago
Learning Ordinal Probabilistic Reward from Preferences liked a model 20 days ago
Qwen/Qwen3.5-35B-A3B commentedon a paper 7 months ago
Implicit Actor Critic Coupling via a Supervised Learning Framework for
RLVR