Zuhao Yang's picture

Zuhao Yang

mwxely

·

https://mwxely.github.io/

AI & ML interests

Large Multimodal Models

Recent Activity

authored a paper about 6 hours ago

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

authored a paper about 6 hours ago

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

upvoted a paper about 15 hours ago

Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

View all activity

Organizations

authored 2 papers about 6 hours ago

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Paper • 2603.15726 • Published Mar 16 • 186

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Paper • 2605.20342 • Published 3 days ago

authored a paper 9 days ago

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

Paper • 2604.28123 • Published 21 days ago • 48

authored a paper 10 days ago

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

Paper • 2605.10434 • Published 11 days ago • 30

authored a paper 18 days ago

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Paper • 2604.28185 • Published 22 days ago • 90

authored a paper 5 months ago

A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models

Paper • 2511.15098 • Published Nov 19, 2025

authored 7 papers 6 months ago

FACE: Evaluating Natural Language Generation with Fourier Analysis of Cross-Entropy

Paper • 2305.10307 • Published May 17, 2023

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 188

TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding

Paper • 2508.01699 • Published Aug 3, 2025

AI-Generated Images as Data Source: The Dawn of Synthetic Era

Paper • 2310.01830 • Published Oct 3, 2023

ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models

Paper • 2505.18757 • Published May 24, 2025

Versatile Transition Generation with Image-to-Video Diffusion

Paper • 2508.01698 • Published Aug 3, 2025

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 96