logonod
/

Image-to-3D

Pixal3D: Pixel-Aligned 3D Generation from Images

SIGGRAPH 2026

Dong-Yang Liยน ยท Wang Zhaoยฒ* ยท Yuxin Chenยฒ ยท Wenbo Huยฒ ยท Meng-Hao Guoยน ยท Fang-Lue Zhangยณ ยท Ying Shanยฒ ยท Shi-Min Huยนโœ‰

ยนTsinghua University (BNRist)    ยฒTencent ARC Lab    ยณVictoria University of Wellington

*Project lead    โœ‰Corresponding author

Pixal3D generates high-fidelity 3D assets from a single image. Unlike previous methods that loosely inject image features via attention, Pixal3D explicitly lifts pixel features into 3D through back-projection, establishing direct pixel-to-3D correspondences. This enables near-reconstruction-level fidelity with detailed geometry and PBR textures.


โœจ News

  • May 2026: Release training code and data preparation toolkit. ๐Ÿ”ง
  • May 2026: Release the improved version based on Trellis.2 backbone. ๐Ÿ’ช
  • May 2026: Release inference code and online demo. ๐Ÿค—
  • Apr 2026: Our paper is accepted to SIGGRAPH 2026! ๐ŸŽ‰

๐Ÿ“Œ Branches

Branch Description
main Latest version โ€” improved implementation based on Trellis.2 backbone with better performance.
paper Paper version โ€” original implementation based on Direct3D-S2, corresponding to results reported in our SIGGRAPH 2026 paper.

If you want to reproduce the results in our paper, please switch to the paper branch.

๐ŸŽฎ Try It Online

You can try Pixal3D directly in your browser without any installation via our Hugging Face Gradio demo:

๐Ÿ‘‰ Launch Demo

๐Ÿš€ Getting Started

Installation

Step 1: Follow TRELLIS.2 Installation

Please first follow the installation guide of TRELLIS.2 to set up the base environment.

Step 2: Install Additional Dependencies

pip install -r requirements.txt

Step 3: Install natten

NATTEN_CUDA_ARCH="xx" NATTEN_N_WORKERS=xx pip install natten==0.21.0 --no-build-isolation

Please replace xx with the CUDA architecture and the number of build workers suitable for your machine.

Step 4: Install utils3d

pip install https://github.com/LDYang694/Storages/releases/download/20260430/utils3d-0.0.2-py3-none-any.whl

Note: requirements-hfdemo.txt is for the Hugging Face Spaces demo (H-series GPU architecture) and may not be compatible with other architectures.

Usage

Inference

Generate a GLB mesh from a single image:

python inference.py --image assets/images/0_img.png --output ./output.glb

Low-VRAM mode (reduces peak VRAM by loading models on-demand):

python inference.py --image assets/images/0_img.png --output ./output.glb --low_vram

By default, the pipeline resolution is 1536 (standard mode) or 1024 (low-VRAM mode). You can override this with --resolution:

# Force 1536 even in low-VRAM mode
python inference.py --image assets/images/0_img.png --output ./output.glb --low_vram --resolution 1536

# Force 1024 in standard mode
python inference.py --image assets/images/0_img.png --output ./output.glb --resolution 1024

Tip: If you don't have flash_attn installed, you can use PyTorch's built-in SDPA backend instead:

ATTN_BACKEND=sdpa python inference.py --image assets/images/0_img.png --output ./output.glb --low_vram

Web Demo

We provide a Gradio web demo for Pixal3D, which allows you to generate 3D meshes from images interactively.

python app.py 

Low-VRAM mode is also available for the web demo. The frontend default resolution will automatically switch to 1024 in low-VRAM mode (1536 otherwise), but can be changed manually in the UI.

python app.py --low_vram
# or via environment variable:
LOW_VRAM=1 python app.py

๐Ÿ”ง Training

We provide the full training codebase for reproducing Pixal3D from scratch.

Data Preparation

Prepare view-aligned O-Voxel data and rendered condition images by following the data toolkit instructions:

๐Ÿ“‚ data_toolkit/README.md

Overview

Pixal3D is trained as a three-stage cascade, each progressively increasing resolution:

Stage Model Resolutions Config Prefix
1 Sparse Structure 32 โ†’ 64 ss_flow_img_dit_*_proj_finetune
2 Shape 256 โ†’ 512 โ†’ 1024 slat_flow_img2shape_*_proj_finetune
3 Texture 256 โ†’ 512 โ†’ 1024 slat_flow_imgshape2tex_*_proj_finetune

All stages use pixel-aligned projection conditioning and view-aligned latents (2 views by default). Within each stage, start from the lowest resolution and progressively fine-tune to higher resolutions by setting finetune_ckpt in the config.

Quick Start

python train.py \
  --config <CONFIG_JSON> \
  --output_dir <OUTPUT_DIR> \
  --data_dir '<DATA_DIR_JSON>'

--data_dir is a JSON string describing the dataset layout. Different stages require different keys:

Stage Required keys
Sparse Structure base, ss_latent, render_cond
Shape base, shape_latent, render_cond
Texture base, shape_latent, pbr_latent, render_cond

Example: Training All Three Stages

Below we show the full training sequence using ObjaverseXL as an example. Each higher-resolution step requires updating finetune_ckpt in its config JSON to point to the previous checkpoint.

Stage 1: Sparse Structure (32 โ†’ 64)
# Resolution 32
python train.py \
  --config configs/gen/ss_flow_img_dit_1_3B_32_bf16_proj_finetune.json \
  --output_dir results/ss_32 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "ss_latent": "datasets/ObjaverseXL_sketchfab/ss_latents/ss_enc_conv3d_16l8_fp16_64_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

# Resolution 64 (set finetune_ckpt โ†’ results/ss_32 checkpoint)
python train.py \
  --config configs/gen/ss_flow_img_dit_1_3B_32_bf16_proj_finetune_ft64.json \
  --output_dir results/ss_ft64 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "ss_latent": "datasets/ObjaverseXL_sketchfab/ss_latents/ss_enc_conv3d_16l8_fp16_64_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'
Stage 2: Shape (256 โ†’ 512 โ†’ 1024)
# Resolution 256
python train.py \
  --config configs/gen/slat_flow_img2shape_dit_1_3B_256_bf16_proj_finetune.json \
  --output_dir results/shape_256 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_256_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

# Resolution 512
python train.py \
  --config configs/gen/slat_flow_img2shape_dit_1_3B_256_bf16_proj_finetune_ft512.json \
  --output_dir results/shape_ft512 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_512_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

# Resolution 1024
python train.py \
  --config configs/gen/slat_flow_img2shape_dit_1_3B_512_bf16_proj_finetune_ft1024.json \
  --output_dir results/shape_ft1024 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_1024_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'
Stage 3: Texture (256 โ†’ 512 โ†’ 1024)
# Resolution 256
python train.py \
  --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_256_bf16_proj_finetune.json \
  --output_dir results/tex_256 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_256_view", "pbr_latent": "datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_256_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

# Resolution 512
python train.py \
  --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_512_bf16_proj_finetune.json \
  --output_dir results/tex_512 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_512_view", "pbr_latent": "datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_512_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

# Resolution 1024
python train.py \
  --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_512_bf16_proj_finetune_ft1024.json \
  --output_dir results/tex_ft1024 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_1024_view", "pbr_latent": "datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_1024_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

Additional Options

All command-line arguments
Argument Description Default
--config Config JSON path required
--output_dir Output directory required
--data_dir Dataset JSON string ./data/
--load_dir Checkpoint load directory output_dir
--ckpt Resume from step latest
--auto_retry Retries on failure 3
--tryrun Dry run false
--profile Profiling false
--num_nodes Number of nodes 1
--node_rank Current node rank 0
--num_gpus GPUs per node all
--master_addr Master address localhost
--master_port Master port 12666
--use_wandb Enable W&B logging false
--wandb_project W&B project trellis2-training
--wandb_name W&B run name basename of output_dir
--wandb_id W&B run ID (resume) โ€”

๐ŸŒ Community Projects

We thank the community for building extensions and deployment guides for Pixal3D!

  • Pixal3D-ComfyUI โ€” ComfyUI integration with deployment guides for Windows, WSL, and more.

๐Ÿค— Acknowledgements

This project is heavily built upon Trellis.2 and Direct3D-S2. We sincerely thank the authors for their outstanding work on scalable 3D generation , which serves as the foundation of our codebase and model architecture.

We also thank the following repos for their great contributions:

๐Ÿ“„ Citation

If you find this work useful, please consider citing:

@article{li2026pixal3d,
    title={Pixal3D: Pixel-Aligned 3D Generation from Images},
    author={Li, Dong-Yang and Zhao, Wang and Chen, Yuxin and Hu, Wenbo and Guo, Meng-Hao and Zhang, Fang-Lue and Shan, Ying and Hu, Shi-Min},
    journal={arXiv preprint arXiv:2605.10922},
    year={2026}
}

๐Ÿ“œ License

This project is released under the MIT License. The third-party components included in this project remain licensed under their respective original terms; see NOTICE for the full list of dependencies and their licenses.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for logonod/Pixal3D