-
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 -
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 -
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 -
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2512.24695
-
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
Paper • 2512.24271 • Published • 62 -
Nested Learning: The Illusion of Deep Learning Architectures
Paper • 2512.24695 • Published • 42 -
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Paper • 2512.20578 • Published • 85 -
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper • 2512.23959 • Published • 112
-
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Paper • 2509.20427 • Published • 82 -
Tree Search for LLM Agent Reinforcement Learning
Paper • 2509.21240 • Published • 92 -
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
Paper • 2510.06917 • Published • 35 -
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Paper • 2510.04618 • Published • 129
-
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47 -
I-Con: A Unifying Framework for Representation Learning
Paper • 2504.16929 • Published • 30 -
Chain-of-Model Learning for Language Model
Paper • 2505.11820 • Published • 121 -
Nested Learning: The Illusion of Deep Learning Architectures
Paper • 2512.24695 • Published • 42
-
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 64 -
Recursive Language Models
Paper • 2512.24601 • Published • 81 -
Nested Learning: The Illusion of Deep Learning Architectures
Paper • 2512.24695 • Published • 42 -
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper • 2512.02556 • Published • 255
-
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper • 2512.24618 • Published • 147 -
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper • 2512.24873 • Published • 104 -
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
Paper • 2512.23343 • Published • 28 -
Scaling Open-Ended Reasoning to Predict the Future
Paper • 2512.25070 • Published • 17
-
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 84 -
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Paper • 2509.06501 • Published • 80 -
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Paper • 2509.02544 • Published • 125 -
Baichuan-M2: Scaling Medical Capability with Large Verifier System
Paper • 2509.02208 • Published • 43
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 24 -
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 25
-
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 -
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 -
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 -
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1
-
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
Paper • 2512.24271 • Published • 62 -
Nested Learning: The Illusion of Deep Learning Architectures
Paper • 2512.24695 • Published • 42 -
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Paper • 2512.20578 • Published • 85 -
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper • 2512.23959 • Published • 112
-
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 64 -
Recursive Language Models
Paper • 2512.24601 • Published • 81 -
Nested Learning: The Illusion of Deep Learning Architectures
Paper • 2512.24695 • Published • 42 -
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper • 2512.02556 • Published • 255
-
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper • 2512.24618 • Published • 147 -
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper • 2512.24873 • Published • 104 -
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
Paper • 2512.23343 • Published • 28 -
Scaling Open-Ended Reasoning to Predict the Future
Paper • 2512.25070 • Published • 17
-
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Paper • 2509.20427 • Published • 82 -
Tree Search for LLM Agent Reinforcement Learning
Paper • 2509.21240 • Published • 92 -
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
Paper • 2510.06917 • Published • 35 -
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Paper • 2510.04618 • Published • 129
-
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 84 -
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Paper • 2509.06501 • Published • 80 -
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Paper • 2509.02544 • Published • 125 -
Baichuan-M2: Scaling Medical Capability with Large Verifier System
Paper • 2509.02208 • Published • 43
-
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47 -
I-Con: A Unifying Framework for Representation Learning
Paper • 2504.16929 • Published • 30 -
Chain-of-Model Learning for Language Model
Paper • 2505.11820 • Published • 121 -
Nested Learning: The Illusion of Deep Learning Architectures
Paper • 2512.24695 • Published • 42
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 24 -
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 25