Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.19613

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26, 2025 • 82
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Paper • 2506.14245 • Published Jun 17, 2025 • 45

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3, 2025 • 86
When an LLM is apprehensive about its answers -- and when its uncertainty is justified

Paper • 2503.01688 • Published Mar 3, 2025 • 22
Predictive Data Selection: The Data That Predicts Is the Data That Teaches

Paper • 2503.00808 • Published Mar 2, 2025 • 57
Chain of Draft: Thinking Faster by Writing Less

Paper • 2502.18600 • Published Feb 25, 2025 • 50

收集的感兴趣的AI

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 195
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20, 2025 • 110
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published Feb 20, 2025 • 92
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Paper • 2502.14282 • Published Feb 20, 2025 • 29

Llms and reasoning

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 16, 2025 • 41
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 448
Chain-of-Retrieval Augmented Generation

Paper • 2501.14342 • Published Jan 24, 2025 • 58
RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24, 2025 • 28

HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

Paper • 2411.02959 • Published Nov 5, 2024 • 71
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Paper • 2411.02355 • Published Nov 4, 2024 • 52
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

Paper • 2410.23090 • Published Oct 30, 2024 • 55
RARe: Retrieval Augmented Retrieval with In-Context Examples

Paper • 2410.20088 • Published Oct 26, 2024 • 4

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26, 2025 • 82

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26, 2025 • 82
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving

Paper • 2502.20238 • Published Feb 27, 2025 • 23

This is the collection of the online-DPO-R1 project.

RLHFlow/Qwen2.5-7B-PPO-Zero

8B • Updated Feb 17, 2025 • 29 • 3
RLHFlow/Qwen2.5-7B-DPO-Zero

8B • Updated Feb 17, 2025 • 10
RLHFlow/Qwen2.5-7B-DPO-NLL-Zero

8B • Updated Feb 17, 2025 • 2
RLHFlow/Qwen2.5-7B-RAFT-Zero

8B • Updated Feb 17, 2025 • 8

Reasoning, Thinking, RL and Test-Time Scaling

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 36
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 52
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16, 2024 • 101
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26, 2025 • 82
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Paper • 2506.14245 • Published Jun 17, 2025 • 45

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26, 2025 • 82

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3, 2025 • 86
When an LLM is apprehensive about its answers -- and when its uncertainty is justified

Paper • 2503.01688 • Published Mar 3, 2025 • 22
Predictive Data Selection: The Data That Predicts Is the Data That Teaches

Paper • 2503.00808 • Published Mar 2, 2025 • 57
Chain of Draft: Thinking Faster by Writing Less

Paper • 2502.18600 • Published Feb 25, 2025 • 50

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26, 2025 • 82
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving

Paper • 2502.20238 • Published Feb 27, 2025 • 23

收集的感兴趣的AI

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 195
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20, 2025 • 110
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published Feb 20, 2025 • 92
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Paper • 2502.14282 • Published Feb 20, 2025 • 29

This is the collection of the online-DPO-R1 project.

RLHFlow/Qwen2.5-7B-PPO-Zero

8B • Updated Feb 17, 2025 • 29 • 3
RLHFlow/Qwen2.5-7B-DPO-Zero

8B • Updated Feb 17, 2025 • 10
RLHFlow/Qwen2.5-7B-DPO-NLL-Zero

8B • Updated Feb 17, 2025 • 2
RLHFlow/Qwen2.5-7B-RAFT-Zero

8B • Updated Feb 17, 2025 • 8

Llms and reasoning

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 16, 2025 • 41
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 448
Chain-of-Retrieval Augmented Generation

Paper • 2501.14342 • Published Jan 24, 2025 • 58
RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24, 2025 • 28

Reasoning, Thinking, RL and Test-Time Scaling

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 36
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

Paper • 2411.02959 • Published Nov 5, 2024 • 71
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Paper • 2411.02355 • Published Nov 4, 2024 • 52
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

Paper • 2410.23090 • Published Oct 30, 2024 • 55
RARe: Retrieval Augmented Retrieval with In-Context Examples

Paper • 2410.20088 • Published Oct 26, 2024 • 4

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 52
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16, 2024 • 101
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs