Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
6
6
10
Song
Hwanjun
Follow
21world's profile picture
1 follower
·
1 following
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
3 days ago
Reasoning over Video: Evaluating How MLLMs Extract, Integrate, and Reconstruct Spatiotemporal Evidence
reacted
to
Kseniase
's
post
with 👍
11 months ago
16 new research on inference-time scaling: For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models. So here are 13 new methods + 3 comprehensive studies on test-time scaling: 1. https://huggingface.co/papers/2504.02495 Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT) 2. https://huggingface.co/papers/2504.04718 Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification 3. https://huggingface.co/papers/2504.00810 Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window 4. https://huggingface.co/papers/2504.00891 Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models 5. https://huggingface.co/papers/2503.24320 SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search 6. https://huggingface.co/papers/2504.07104 Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases 7. https://huggingface.co/papers/2503.13288 Proposes a φ-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps Read further below 👇 Also, subscribe to the Turing Post https://www.turingpost.com/subscribe
upvoted
a
paper
11 months ago
Inference-Time Scaling for Generalist Reward Modeling
View all activity
Organizations
Hwanjun
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
2 models
about 1 year ago
DISLab/Gen-8B-R2
Question Answering
•
8B
•
Updated
Mar 19, 2025
•
7
•
2
DISLab/Ext2Gen-8B-R2
Question Answering
•
8B
•
Updated
Mar 19, 2025
•
17
•
4
liked
a dataset
about 1 year ago
DISLab/FeedSum
Viewer
•
Updated
Jan 25, 2025
•
127k
•
142
•
3
liked
6 models
over 1 year ago
DevQuasar-4/DISLab.SummLlama3.2-3B-GGUF
Text Generation
•
3B
•
Updated
Feb 1, 2025
•
6
•
3
DISLab/SummLlama3-70B
Summarization
•
71B
•
Updated
Nov 13, 2024
•
3
•
7
DISLab/SummLlama3.1-8B
Summarization
•
8B
•
Updated
Nov 13, 2024
•
315
•
12
DISLab/SummLlama3.1-70B
Summarization
•
71B
•
Updated
Nov 13, 2024
•
4
•
7
DISLab/SummLlama3.2-3B
Summarization
•
3B
•
Updated
Dec 10, 2024
•
63
•
37
DISLab/SummLlama3-8B
Summarization
•
8B
•
Updated
Nov 13, 2024
•
13
•
14
liked
a model
almost 2 years ago
allenai/led-base-16384
Updated
Jan 24, 2023
•
26.8k
•
50