Instant-NuRec | Model Card

Description:

Instant-NuRec is a model that takes a series of images as input and outputs Gaussian Splats. The model uses an alternate-attention Vision Transformer encoder following the Depth-Anything-v3 (DAv3) design and is initialized from the DAv3 ViT-Base checkpoint (DINOv2-based) before being finetuned on NVIDIA AV data. Instant-NuRec allows users to generate Gaussian Splats in less than 2 minutes. This model was trained to take up to 90 input images (5 views x 18 frames) with a resolution of 504x280.

This model is ready for commercial/non-commercial use.

License/Terms of Use:

Governing Terms: Use of this model system is governed by the NVIDIA Open Model License Agreement.

Deployment Geography: Global

Release Management:

Instant-NuRec is published as a standalone GitHub repository for code, with model weights distributed via Hugging Face.

Release date: June 2026
GitHub code: https://github.com/NVIDIA/instant-nurec
Hugging Face model and weights: https://huggingface.co/nvidia/instant-nurec

Use Case:

Physical AI developers who are looking to create 3D automotive scenes for either closed-loop simulation or Synthetic Data Generation (SDG).

Known Technical Limitations:

The model is not guaranteed to perform well with scenes that are outside of the common distribution. The model was not trained on extreme weather conditions. Night scenes are sparsely represented.

Known Risk(s):

AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the scene and should not be relied upon in safety-critical simulations.

Reference(s):

Model Architecture:

Instant-NuRec depends on the Vision Transformer and follows the alternate-attention design of Depth-Anything-v3. The encoder is paired with several lightweight DPT-style decoder heads for sky cubemap, camera-ISP, depth and context, motion, and Gaussian Splatting attributes. These heads produce the per-pixel attributes consumed by the 3D Gaussian representation.

Architecture Type: Transformer

Network Architecture: Other Not Listed - alternate-attention Vision Transformer (ViT-Base, DAv3 design) with DPT-style decoder heads.

This model was developed based on Depth-Anything-v3 ViT-Base, which is itself initialized from DINOv2.

Number of model parameters: 202M

Model Input:

Input Type(s): NCoreV4 file

Input Format: Red, Green, Blue (RGB)

Input Parameters: Two-Dimensional (2D)

Other Properties Related to Input:

The NCoreV4 file packages, per scene:

Up to 90 RGB images (5 views x 18 frames at 2-4 Hz) at a resolution of 504x280
Camera 6-DoF pose (orientation and translation) for each image
Camera intrinsics / field of view for each image
Optional cuboid tracks of dynamic actors in the scene, represented as sequential 3D bounding box trajectories with fixed spatial size

Model Output:

Output Type(s): One or more PLY files containing 3D Gaussian particles

Output Format: Polygon File Format (PLY)

Output Parameters: Three-Dimensional (3D)

Other Properties Related to Output:

A PLY file (Polygon File Format) contains 3D model data with the following specific components:

Header: Defines the file structure, including format (ASCII or binary), the vertex element, its properties (x, y, z coordinates plus Gaussian attributes), and data types such as float and int.
Vertex Data: One entry per Gaussian. Each entry stores the Gaussian's world-space position (x, y, z).
Custom Data: Defines Gaussian attributes, such as scale, rotation, color, opacity, and semantics storing information if a Gaussian belongs to the road, background, or foreground.

3D Gaussian Splatting PLYs do not contain face data. The scene is represented purely as a collection of Gaussian primitives stored as vertex entries.

Software Integration:

Runtime Engine(s):

PyTorch-based inference, distributed via standalone GitHub repository

Hardware Compatibility:

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere
NVIDIA Blackwell
NVIDIA Hopper
NVIDIA Lovelace

Preferred/Supported Operating Systems: Linux

Hardware Specific Requirements:

The model can run on a single NVIDIA GPU with CUDA Compute Capability greater than or equal to 8.0. The following is required:

GPU performance >= 300 Tflops
GPU memory size >= 30GB for inference / 80GB for training
GPU memory bandwidth >= 768 GB/s
System RAM >= 32 GB
System disk storage >= 100GB
CPU >= 16 threads x 3GHz

NVIDIA AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA hardware and software frameworks, the model can achieve faster training and inference times compared to CPU-only solutions.

Model Version:

Instant_NuRec_v1

Inference:

Engine: PyTorch

Test Hardware:

NVIDIA H100 (Hopper, datacenter - primary training/inference)
NVIDIA A100 (Ampere, datacenter)
NVIDIA RTX 5090 (Blackwell, consumer - validated for local single-GPU inference)

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy subcards below.

Please make sure you have proper rights and permissions for all input image and video content. If image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.

Please report model quality, risk, security vulnerabilities, or NVIDIA AI Concerns here.

Model Card++

Bias

Field	Response
Participation considerations from adversely impacted groups protected classes in model design and testing:	None
Measures taken to mitigate against unwanted bias:	None

Explainability

Field	Response
Intended Task/Domain:	Advanced Driver Assistance Systems
Model Type:	Image-to-3D Gaussians
Intended Users:	Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines.
Output:	3D Gaussian Splats as PLY file.
Describe how the model works:	The model takes a series of input images, and outputs a Gaussian Splatting scene.
Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of:	Not Applicable
Technical Limitations & Mitigation:	The model is not guaranteed to perform well with scenes that are outside of the common distribution. The model was not trained on extreme weather conditions. Night scenes are sparsely represented.
Verified to have met prescribed NVIDIA quality standards:	Yes
Performance Metrics:	PSNR (Peak Signal-to-Noise Ratio)
Potential Known Risks:	AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the scene and should not be relied upon in safety-critical simulations.
Licensing:	Use of this model system is governed by the NVIDIA Open Model License.

Privacy

Field	Response
Generatable or reverse engineerable personal data?	No
Personal data used to create this model?	Yes
Was consent obtained for any personal data used?	No
Is a mechanism in place to honor data subject right of access or deletion of personal data?	Yes
If personal data was collected for the development of the model, was it collected directly by NVIDIA?	Yes
If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects?	Yes
If personal data was collected for the development of this AI model, was it minimized to only what was required?	Yes
How often is the dataset reviewed?	Before release
Is there provenance for all datasets used in training?	Yes
Does data labeling (annotation, metadata) comply with privacy laws?	Yes
Is data compliant with data subject requests for data correction or removal, if such a request was made?	Yes
Was data from user interactions with the AI model, such as user input and prompts, used to train the model?	No
Applicable Privacy Policy	https://www.nvidia.com/en-us/about-nvidia/privacy-policy/

Safety & Security

Field	Response
Model Application Field(s):	3D Asset Generation
Describe the life critical impact.	Not Applicable. The model is not intended for direct life-critical decision-making, and outputs should not be used as the sole basis for autonomous vehicle perception, robotics control, or operational safety decisions. Additional validation and testing should be incorporated prior to deployment in real-world production.
Use Case Restrictions:	Abide by NVIDIA Open Model License
Model and dataset restrictions:	The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

Downloads last month: 16

Inference Providers NEW

Image-to-3D

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for nvidia/instant-nurec