-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
Shuo Xing
shuoxing
AI & ML interests
MLLMs, LLMs
Recent Activity
updated a model 9 days ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4 published a model 9 days ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4 updated a model 9 days ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-2m-bs4Organizations
MLLM Reasoning, Rewarding, and Understanding
Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 281 -
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper • 2506.02096 • Published • 52 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146
LLM4Math
-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
MLLM Reasoning, Rewarding, and Understanding
Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 281 -
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper • 2506.02096 • Published • 52 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146
models 227
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4
8B • Updated • 21
shuoxing/llama3-8b-full-pretrain-wash-c4-4-2m-bs4
Text Generation • 8B • Updated • 145
shuoxing/llama3-8b-full-pretrain-wash-c4-3-9m-bs4
Text Generation • 8B • Updated • 156
shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4
Text Generation • 8B • Updated • 291
shuoxing/llama3-8b-full-pretrain-wash-c4-3-3m-bs4
Text Generation • 8B • Updated • 180
shuoxing/llama3-8b-full-pretrain-wash-c4-3-0m-bs4
Text Generation • 8B • Updated • 277
shuoxing/llama3-8b-full-pretrain-wash-c4-2-7m-bs4
Text Generation • 8B • Updated • 184
shuoxing/llama3-8b-full-pretrain-wash-c4-2-4m-sft-bs64
Text Generation • 8B • Updated • 195
shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64
Text Generation • 8B • Updated • 210
shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64
Text Generation • 8B • Updated • 224
datasets 7
shuoxing/yt_ugc_public
Updated • 1.41k
shuoxing/AutoTrust
Updated • 6
shuoxing/KoNViD_1k_videos
Viewer • Updated • 1.2k • 60
shuoxing/Tweet_demo
Viewer • Updated • 100 • 13
shuoxing/MapBench_VQA
Viewer • Updated • 96 • 6 • 1
shuoxing/MapBench
Viewer • Updated • 97 • 5
shuoxing/tweet-scholar
Viewer • Updated • 95 • 4