17 20 8

khtsly

AI & ML interests

None yet

Recent Activity

upvoted a paper about 5 hours ago

Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

upvoted a paper about 7 hours ago

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

upvoted a paper 3 days ago

MiniMax Sparse Attention

View all activity

Organizations

None yet

upvoted a paper about 5 hours ago

Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

Paper • 2606.06036 • Published 12 days ago • 53

upvoted a paper about 7 hours ago

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

Paper • 2606.06574 • Published 12 days ago • 13

upvoted a paper 3 days ago

MiniMax Sparse Attention

Paper • 2606.13392 • Published 5 days ago • 129

upvoted a paper 4 days ago

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Paper • 2606.12397 • Published 6 days ago • 85

upvoted a paper 5 days ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Paper • 2606.11052 • Published 7 days ago • 15

upvoted a paper 6 days ago

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

Paper • 2606.09079 • Published 8 days ago • 61

New activity in sapientinc/HRM-Text-1B 9 days ago

Hrm can't calculate 2+2

#8 opened 11 days ago by

Xhub1880

commented a paper 11 days ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 516 •

upvoted a paper 11 days ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 516

upvoted 2 papers 12 days ago

Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

Paper • 2605.29707 • Published 19 days ago • 145

dMoE: dLLMs with Learnable Block Experts

Paper • 2605.30876 • Published 18 days ago • 36

upvoted a paper 13 days ago

NITP: Next Implicit Token Prediction for LLM Pre-training

Paper • 2605.24956 • Published 23 days ago • 35

upvoted a paper 21 days ago

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Paper • 2605.23901 • Published 25 days ago • 13

upvoted a paper 23 days ago

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published May 12 • 195

upvoted 2 papers 24 days ago

HRM-Text: Efficient Pretraining Beyond Scaling

Paper • 2605.20613 • Published 27 days ago • 315

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Paper • 2605.22791 • Published 26 days ago • 31

upvoted a paper 25 days ago

Generative Recursive Reasoning

Paper • 2605.19376 • Published 27 days ago • 30

liked a model about 2 months ago

khtsly/luau-coder-preview-28B-A3B-noft

Text Generation • 28B • Updated Apr 26 • 78 • 2

published a model about 2 months ago

khtsly/luau-coder-preview-28B-A3B-noft

Text Generation • 28B • Updated Apr 26 • 78 • 2

updated a model about 2 months ago

khtsly/luau-coder-preview-28B-A3B-noft

Text Generation • 28B • Updated Apr 26 • 78 • 2

khtsly

AI & ML interests

Recent Activity

Organizations

khtsly's activity

Hrm can't calculate 2+2