Pixal3D: Pixel-Aligned 3D Generation from Images

SIGGRAPH 2026

Dong-Yang Li¹ · Wang Zhao²* · Yuxin Chen² · Wenbo Hu² · Meng-Hao Guo¹ · Fang-Lue Zhang³ · Ying Shan² · Shi-Min Hu¹✉

¹Tsinghua University (BNRist) ²Tencent ARC Lab ³Victoria University of Wellington

*Project lead ✉Corresponding author

Pixal3D generates high-fidelity 3D assets from a single image. Unlike previous methods that loosely inject image features via attention, Pixal3D explicitly lifts pixel features into 3D through back-projection, establishing direct pixel-to-3D correspondences. This enables near-reconstruction-level fidelity with detailed geometry and PBR textures.

✨ News

May 2026: Release training code and data preparation toolkit. 🔧
May 2026: Release the improved version based on Trellis.2 backbone. 💪
May 2026: Release inference code and online demo. 🤗
Apr 2026: Our paper is accepted to SIGGRAPH 2026! 🎉

📌 Branches

Branch	Description
`main`	Latest version — improved implementation based on Trellis.2 backbone with better performance.
`paper`	Paper version — original implementation based on Direct3D-S2, corresponding to results reported in our SIGGRAPH 2026 paper.

If you want to reproduce the results in our paper, please switch to the paper branch.

🎮 Try It Online

You can try Pixal3D directly in your browser without any installation via our Hugging Face Gradio demo:

👉 Launch Demo

🚀 Getting Started

Installation

Step 1: Follow TRELLIS.2 Installation

Please first follow the installation guide of TRELLIS.2 to set up the base environment.

Step 2: Install Additional Dependencies

pip install -r requirements.txt

Step 3: Install natten

NATTEN_CUDA_ARCH="xx" NATTEN_N_WORKERS=xx pip install natten==0.21.0 --no-build-isolation

Please replace xx with the CUDA architecture and the number of build workers suitable for your machine.

Step 4: Install utils3d

pip install https://github.com/LDYang694/Storages/releases/download/20260430/utils3d-0.0.2-py3-none-any.whl

Note: requirements-hfdemo.txt is for the Hugging Face Spaces demo (H-series GPU architecture) and may not be compatible with other architectures.

Usage

Inference

Generate a GLB mesh from a single image:

python inference.py --image assets/images/0_img.png --output ./output.glb

Low-VRAM mode (reduces peak VRAM by loading models on-demand):

python inference.py --image assets/images/0_img.png --output ./output.glb --low_vram

By default, the pipeline resolution is 1536 (standard mode) or 1024 (low-VRAM mode). You can override this with --resolution:

# Force 1536 even in low-VRAM mode
python inference.py --image assets/images/0_img.png --output ./output.glb --low_vram --resolution 1536

# Force 1024 in standard mode
python inference.py --image assets/images/0_img.png --output ./output.glb --resolution 1024

Tip: If you don't have flash_attn installed, you can use PyTorch's built-in SDPA backend instead:

ATTN_BACKEND=sdpa python inference.py --image assets/images/0_img.png --output ./output.glb --low_vram

Web Demo

We provide a Gradio web demo for Pixal3D, which allows you to generate 3D meshes from images interactively.

python app.py

Low-VRAM mode is also available for the web demo. The frontend default resolution will automatically switch to 1024 in low-VRAM mode (1536 otherwise), but can be changed manually in the UI.

python app.py --low_vram
# or via environment variable:
LOW_VRAM=1 python app.py

🔧 Training

We provide the full training codebase for reproducing Pixal3D from scratch.

Data Preparation

Prepare view-aligned O-Voxel data and rendered condition images by following the data toolkit instructions:

📂 data_toolkit/README.md

Overview

Pixal3D is trained as a three-stage cascade, each progressively increasing resolution:

Stage	Model	Resolutions	Config Prefix
1	Sparse Structure	32 → 64	`ss_flow_img_dit_*_proj_finetune`
2	Shape	256 → 512 → 1024	`slat_flow_img2shape_*_proj_finetune`
3	Texture	256 → 512 → 1024	`slat_flow_imgshape2tex_*_proj_finetune`

All stages use pixel-aligned projection conditioning and view-aligned latents (2 views by default). Within each stage, start from the lowest resolution and progressively fine-tune to higher resolutions by setting finetune_ckpt in the config.

Quick Start

python train.py \
  --config <CONFIG_JSON> \
  --output_dir <OUTPUT_DIR> \
  --data_dir '<DATA_DIR_JSON>'

--data_dir is a JSON string describing the dataset layout. Different stages require different keys:

Stage	Required keys
Sparse Structure	`base`, `ss_latent`, `render_cond`
Shape	`base`, `shape_latent`, `render_cond`
Texture	`base`, `shape_latent`, `pbr_latent`, `render_cond`

Example: Training All Three Stages

Below we show the full training sequence using ObjaverseXL as an example. Each higher-resolution step requires updating finetune_ckpt in its config JSON to point to the previous checkpoint.

Stage 1: Sparse Structure (32 → 64)

# Resolution 32
python train.py \
  --config configs/gen/ss_flow_img_dit_1_3B_32_bf16_proj_finetune.json \
  --output_dir results/ss_32 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "ss_latent": "datasets/ObjaverseXL_sketchfab/ss_latents/ss_enc_conv3d_16l8_fp16_64_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

# Resolution 64 (set finetune_ckpt → results/ss_32 checkpoint)
python train.py \
  --config configs/gen/ss_flow_img_dit_1_3B_32_bf16_proj_finetune_ft64.json \
  --output_dir results/ss_ft64 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "ss_latent": "datasets/ObjaverseXL_sketchfab/ss_latents/ss_enc_conv3d_16l8_fp16_64_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

Stage 2: Shape (256 → 512 → 1024)

# Resolution 256
python train.py \
  --config configs/gen/slat_flow_img2shape_dit_1_3B_256_bf16_proj_finetune.json \
  --output_dir results/shape_256 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_256_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

# Resolution 512
python train.py \
  --config configs/gen/slat_flow_img2shape_dit_1_3B_256_bf16_proj_finetune_ft512.json \
  --output_dir results/shape_ft512 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_512_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

# Resolution 1024
python train.py \
  --config configs/gen/slat_flow_img2shape_dit_1_3B_512_bf16_proj_finetune_ft1024.json \
  --output_dir results/shape_ft1024 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_1024_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

Stage 3: Texture (256 → 512 → 1024)

# Resolution 256
python train.py \
  --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_256_bf16_proj_finetune.json \
  --output_dir results/tex_256 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_256_view", "pbr_latent": "datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_256_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

# Resolution 512
python train.py \
  --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_512_bf16_proj_finetune.json \
  --output_dir results/tex_512 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_512_view", "pbr_latent": "datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_512_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

# Resolution 1024
python train.py \
  --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_512_bf16_proj_finetune_ft1024.json \
  --output_dir results/tex_ft1024 \
  --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_1024_view", "pbr_latent": "datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_1024_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'

Additional Options

All command-line arguments

Argument	Description	Default
`--config`	Config JSON path	required
`--output_dir`	Output directory	required
`--data_dir`	Dataset JSON string	`./data/`
`--load_dir`	Checkpoint load directory	`output_dir`
`--ckpt`	Resume from step	`latest`
`--auto_retry`	Retries on failure	`3`
`--tryrun`	Dry run	`false`
`--profile`	Profiling	`false`
`--num_nodes`	Number of nodes	`1`
`--node_rank`	Current node rank	`0`
`--num_gpus`	GPUs per node	all
`--master_addr`	Master address	`localhost`
`--master_port`	Master port	`12666`
`--use_wandb`	Enable W&B logging	`false`
`--wandb_project`	W&B project	`trellis2-training`
`--wandb_name`	W&B run name	basename of `output_dir`
`--wandb_id`	W&B run ID (resume)	—

🌐 Community Projects

We thank the community for building extensions and deployment guides for Pixal3D!

Pixal3D-ComfyUI — ComfyUI integration with deployment guides for Windows, WSL, and more.

🤗 Acknowledgements

This project is heavily built upon Trellis.2 and Direct3D-S2. We sincerely thank the authors for their outstanding work on scalable 3D generation , which serves as the foundation of our codebase and model architecture.

We also thank the following repos for their great contributions:

📄 Citation

If you find this work useful, please consider citing:

@article{li2026pixal3d,
    title={Pixal3D: Pixel-Aligned 3D Generation from Images},
    author={Li, Dong-Yang and Zhao, Wang and Chen, Yuxin and Hu, Wenbo and Guo, Meng-Hao and Zhang, Fang-Lue and Shan, Ying and Hu, Shi-Min},
    journal={arXiv preprint arXiv:2605.10922},
    year={2026}
}

📜 License

This project is released under the MIT License. The third-party components included in this project remain licensed under their respective original terms; see NOTICE for the full list of dependencies and their licenses.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-to-3D

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for logonod/Pixal3D

Pixal3D: Pixel-Aligned 3D Generation from Images

Paper • 2605.10922 • Published 17 days ago • 32