Yinxu Pan

cppowboy

·

https://github.com/Cppowboy

AI & ML interests

RL for LLM, Code&Math Reasoning, Function Calling, Code Interpreter, Vision-Language Pretraining

Recent Activity

liked a dataset 10 days ago

ByteDance-Seed/EdgeBench

liked a dataset 24 days ago

LiberCoders/CLI-Gym

liked a dataset 29 days ago

nvidia/ProCUA-SFT

View all activity

Organizations

upvoted 3 papers 29 days ago

ProCUA-SFT Technical Report

Paper • 2606.17321 • Published Jun 15 • 10

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Paper • 2606.18216 • Published about 1 month ago • 63

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Paper • 2606.18023 • Published about 1 month ago • 209

upvoted 3 papers about 1 month ago

Agents' Last Exam

Paper • 2606.05405 • Published Jun 3 • 381

MemTrain: Self-Supervised Context Memory Training

Paper • 2606.03197 • Published Jun 2 • 17

Self-Distilled Policy Gradient

Paper • 2606.04036 • Published Jun 2 • 28

upvoted 3 papers about 2 months ago

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

Paper • 2605.26114 • Published May 25 • 66

TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks

Paper • 2605.22535 • Published May 21 • 11

Synthetic Computers at Scale for Long-Horizon Productivity Simulation

Paper • 2604.28181 • Published Apr 30 • 20

upvoted a paper 2 months ago

δ-mem: Efficient Online Memory for Large Language Models

Paper • 2605.12357 • Published May 12 • 132

upvoted 7 papers 3 months ago

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

Paper • 2604.18543 • Published Apr 20 • 30

Qwen3.5-Omni Technical Report

Paper • 2604.15804 • Published Apr 17 • 60

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Paper • 2604.13010 • Published Apr 14 • 20

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2604.12374 • Published Apr 14 • 39

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published Apr 14 • 114

SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

Paper • 2604.08377 • Published Apr 9 • 295

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Paper • 2604.08523 • Published Apr 9 • 265

upvoted 3 papers 4 months ago

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

Paper • 2603.19835 • Published Mar 20 • 353

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published Mar 29 • 150

SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

Paper • 2603.24755 • Published Mar 25 • 30