Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning Paper • 2510.10959 • Published Oct 13, 2025 • 2
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published Feb 11 • 193
Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy Paper • 2507.01327 • Published Jul 2, 2025 • 1
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas Paper • 2603.16448 • Published 3 days ago • 51
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas Paper • 2603.16448 • Published 3 days ago • 51
Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization Paper • 2406.16743 • Published Jun 24, 2024 • 1
When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning Paper • 2505.15400 • Published May 21, 2025 • 23