Hao Peng's picture

Hao Peng

Wesleythu

·

h-peng17

AI & ML interests

None yet

Recent Activity

updated a collection 4 days ago

updated a model 4 days ago

THU-KEG/WildReward-8B

updated a model 4 days ago

THU-KEG/WildReward-4B

View all activity

Organizations

updated a collection 4 days ago

WildReward

Learning Reward Models from In-the-Wild Interactions • 5 items • Updated 4 days ago • 2

updated 2 models 4 days ago

THU-KEG/WildReward-8B

Text Classification • 8B • Updated 4 days ago • 41 • 3

THU-KEG/WildReward-4B

Text Classification • 4B • Updated 4 days ago • 46 • 4

liked a dataset 4 days ago

THU-KEG/WildFB

Updated 4 days ago • 15 • 1

updated a collection 4 days ago

WildReward

Learning Reward Models from In-the-Wild Interactions • 5 items • Updated 4 days ago • 2

updated a dataset 4 days ago

THU-KEG/WildFB

Updated 4 days ago • 15 • 1

published a dataset 4 days ago

THU-KEG/WildFB

Updated 4 days ago • 15 • 1

upvoted a paper 20 days ago

WildReward: Learning Reward Models from In-the-Wild Human Interactions

Paper • 2602.08829 • Published 20 days ago • 3

submitted a paper to Daily Papers 20 days ago

WildReward: Learning Reward Models from In-the-Wild Human Interactions

Paper • 2602.08829 • Published 20 days ago • 3

upvoted a collection 22 days ago

WildReward

Learning Reward Models from In-the-Wild Interactions • 5 items • Updated 4 days ago • 2

liked 2 models 22 days ago

THU-KEG/WildReward-8B

Text Classification • 8B • Updated 4 days ago • 41 • 3

THU-KEG/WildReward-4B

Text Classification • 4B • Updated 4 days ago • 46 • 4

updated a collection 22 days ago

WildReward

Learning Reward Models from In-the-Wild Interactions • 5 items • Updated 4 days ago • 2

upvoted a paper about 2 months ago

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Paper • 2601.06021 • Published Jan 9 • 47

published 2 models 2 months ago

THU-KEG/WildReward-8B

Text Classification • 8B • Updated 4 days ago • 41 • 3

THU-KEG/WildReward-4B

Text Classification • 4B • Updated 4 days ago • 46 • 4

upvoted 2 papers 5 months ago

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models

Paper • 2510.11683 • Published Oct 13, 2025 • 15

DeepPrune: Parallel Scaling without Inter-trace Redundancy

Paper • 2510.08483 • Published Oct 9, 2025 • 24