🔄 In a Training Loop

weiliu

thinkwee

·

https://thinkwee.top/about/

AI & ML interests

LLM reasoning, agents

Recent Activity

submitted a paper 27 days ago

Large Language Models Hack Rewards, and Society

upvoted a paper 27 days ago

Large Language Models Hack Rewards, and Society

upvoted a paper about 1 month ago

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

View all activity

Organizations

None yet

New activity in thinkwee/DDRBench_10K_trajectory 5 months ago

Add paper link, project page, and code links to dataset card

#2 opened 5 months ago by

New activity in thinkwee/NOVEReason_5k 11 months ago

[bot] Conversion to Parquet

#1 opened 11 months ago by

parquet-converter

commented 3 papers about 1 year ago

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Paper • 2505.16022 • Published May 21, 2025 • 4 •

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Paper • 2505.16022 • Published May 21, 2025 • 4 •

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Paper • 2505.16022 • Published May 21, 2025 • 4 •