10 2

Siye Wu

Siye01

AI & ML interests

None yet

Recent Activity

updated a model 3 days ago

Siye01/CODA-14B

updated a model 3 days ago

Siye01/CODA-4B

updated a model 3 days ago

Siye01/CODA-8B

View all activity

Organizations

None yet

updated 3 models 3 days ago

updated a collection 4 days ago

CODA

Collection

3 items • Updated 4 days ago

published 3 models 4 days ago

Siye01/CODA-14B

15B • Updated 3 days ago • 25

Siye01/CODA-4B

4B • Updated 3 days ago • 25

Siye01/CODA-8B

8B • Updated 3 days ago • 35

authored 3 papers 18 days ago

How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?

Paper • 2404.03302 • Published Apr 4, 2024 • 2

From Persona to Personalization: A Survey on Role-Playing Language Agents

Paper • 2404.18231 • Published Apr 28, 2024 • 1

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Paper • 2602.10604 • Published 30 days ago • 189

upvoted a paper 29 days ago

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Paper • 2602.10604 • Published 30 days ago • 189

liked a model about 1 month ago

stepfun-ai/Step-3.5-Flash

Text Generation • 199B • Updated 4 days ago • 103k • • 707

upvoted an article 3 months ago

Article

The 4 Things Qwen-3’s Chat Template Teaches Us

Apr 30, 2025

•

updated a model 8 months ago

Siye01/test_arm_3b

Text Generation • 3B • Updated Jul 30, 2025

published a model 8 months ago

Siye01/test_arm_3b

Text Generation • 3B • Updated Jul 30, 2025

upvoted 2 articles 9 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Feb 11, 2025

•

110

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

•

403

upvoted 2 papers 9 months ago

Is Extending Modality The Right Path Towards Omni-Modality?

Paper • 2506.01872 • Published Jun 2, 2025 • 24

ARIA: Training Language Agents with Intention-Driven Reward Aggregation

Paper • 2506.00539 • Published May 31, 2025 • 30

Siye Wu

AI & ML interests

Recent Activity

Organizations

Siye01's activity

The 4 Things Qwen-3’s Chat Template Teaches Us

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Illustrating Reinforcement Learning from Human Feedback (RLHF)