Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents Paper • 2606.06036 • Published 12 days ago • 53
Skip a Layer or Loop It? Learning Program-of-Layers in LLMs Paper • 2606.06574 • Published 12 days ago • 13
Redesign Mixture-of-Experts Routers with Manifold Power Iteration Paper • 2606.12397 • Published 6 days ago • 85
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It Paper • 2606.11052 • Published 7 days ago • 15
FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention Paper • 2606.09079 • Published 8 days ago • 61
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 516 • 43
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 516
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding Paper • 2605.29707 • Published 19 days ago • 145
NITP: Next Implicit Token Prediction for LLM Pre-training Paper • 2605.24956 • Published 23 days ago • 35
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws Paper • 2605.23901 • Published 25 days ago • 13
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 195
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Paper • 2605.22791 • Published 26 days ago • 31