MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

Ruijie Zhu1,2, Jiahao Lu3, Wenbo Hu2, Xiaoguang Han4
Jianfei Cai5, Ying Shan2, Chuanxia Zheng1

1 NTU   2 ARC Lab, Tencent PCG   3 HKUST   4 CUHK(SZ)   5 Monash University

πŸ“„ Paper | 🌐 Project Page | πŸ’» Code | πŸ“œ License

Model Description

MotionCrafter is a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense object motion from monocular videos. It predicts dense point maps and scene flow for each frame within a shared world coordinate system, without requiring post-optimization.

Intended Use

  • Research on 4D reconstruction and motion estimation from monocular videos
  • Academic evaluation and benchmarking of dense point map and scene flow prediction

Not intended for safety-critical or real-time production use.

Limitations

  • Performance can degrade with extreme motion blur or severe occlusion.
  • Output quality is sensitive to input resolution and video quality.
  • Generalization may be limited for out-of-domain scenes.

Training Data

Training data details and preprocessing are described in the paper and main repository. If you need dataset specifics, please refer to the project page and the paper.

Evaluation

Please refer to the paper for evaluation datasets, metrics, and results.

How to Use

import torch
from motioncrafter import (
    MotionCrafterDiffPipeline,
    MotionCrafterDetermPipeline,
    UnifyAutoencoderKL,
    UNetSpatioTemporalConditionModelVid2vid
)

unet_path = "TencentARC/MotionCrafter"
vae_path = "TencentARC/MotionCrafter"
model_type = "determ"  # or "diff" for diffusion version
cache_dir = "./pretrained_models"

unet = UNetSpatioTemporalConditionModelVid2vid.from_pretrained(
    unet_path,
    subfolder='unet_diff' if model_type == 'diff' else 'unet_determ',
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    cache_dir=cache_dir
).requires_grad_(False).to("cuda", dtype=torch.float16)

geometry_motion_vae = UnifyAutoencoderKL.from_pretrained(
    vae_path,
    subfolder='geometry_motion_vae',
    low_cpu_mem_usage=True,
    torch_dtype=torch.float32,
    cache_dir=cache_dir
).requires_grad_(False).to("cuda", dtype=torch.float32)

if model_type == 'diff':
    pipe = MotionCrafterDiffPipeline.from_pretrained(
        "stabilityai/stable-video-diffusion-img2vid-xt",
        unet=unet,
        torch_dtype=torch.float16,
        variant="fp16",
        cache_dir=cache_dir
    ).to("cuda")
else:
    pipe = MotionCrafterDetermPipeline.from_pretrained(
        "stabilityai/stable-video-diffusion-img2vid-xt",
        unet=unet,
        torch_dtype=torch.float16,
        variant="fp16",
        cache_dir=cache_dir
    ).to("cuda")

Model Weights

  • geometry_motion_vae/: 4D VAE for joint geometry and motion representation
  • unet_determ/: deterministic UNet for motion prediction

Model Variants

  • Deterministic (unet_determ): fast inference with fixed predictions per input
  • Diffusion (unet_diff): probabilistic predictions with diverse outputs

Citation

@article{zhu2025motioncrafter,
  title={MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE},
  author={Zhu, Ruijie and Lu, Jiahao and Hu, Wenbo and Han, Xiaoguang and Cai, Jianfei and Shan, Ying and Zheng, Chuanxia},
  journal={arXiv preprint arXiv:2602.08961},
  year={2026}
}

License

This model is provided under the Tencent License. See LICENSE.txt for details.

Acknowledgments

This work builds upon GeometryCrafter. We thank the authors for their excellent contributions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for TencentARC/MotionCrafter

Finetuned
(7)
this model

Collection including TencentARC/MotionCrafter

Paper for TencentARC/MotionCrafter