Instructions to use IffYuan/Embodied-R1-3B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use IffYuan/Embodied-R1-3B-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="IffYuan/Embodied-R1-3B-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("IffYuan/Embodied-R1-3B-v1")
model = AutoModelForMultimodalLM.from_pretrained("IffYuan/Embodied-R1-3B-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use IffYuan/Embodied-R1-3B-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "IffYuan/Embodied-R1-3B-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IffYuan/Embodied-R1-3B-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/IffYuan/Embodied-R1-3B-v1

SGLang

How to use IffYuan/Embodied-R1-3B-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "IffYuan/Embodied-R1-3B-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IffYuan/Embodied-R1-3B-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "IffYuan/Embodied-R1-3B-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IffYuan/Embodied-R1-3B-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use IffYuan/Embodied-R1-3B-v1 with Docker Model Runner:
```
docker model run hf.co/IffYuan/Embodied-R1-3B-v1
```

Embodied-R1-3B-v1

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation (ICLR 2026)

[🌐 Project Website] [📄 Paper] [🏆 ICLR2026 Version] [🎯 Dataset] [📦 Code]

Model Details

Model Description

Embodied-R1 is a 3B vision-language model (VLM) for general robotic manipulation. It introduces a Pointing mechanism and uses Reinforced Fine-tuning (RFT) to bridge perception and action, with strong zero-shot generalization in embodied tasks.

Figure: Embodied-R1 framework, performance overview, and zero-shot manipulation demos.

Model Sources

Repository: https://github.com/pickxiguapi/Embodied-R1
Paper: http://arxiv.org/abs/2508.13998
OpenReview: https://openreview.net/forum?id=i5wlozMFsQ

Updates

[2026-03] VABench-P / VABench-V released: VABench-P, VABench-V
[2026-03-03] Embodied-R1 dataset released: https://huggingface.co/datasets/IffYuan/Embodied-R1-Dataset
[2026-01-27] Accepted by ICLR 2026
[2025-08-22] Embodied-R1-3B-v1 checkpoint released

Intended Uses

Direct Use

This model is intended for research and benchmarking in embodied reasoning and robotic manipulation tasks, including:

Visual target grounding (VTG)
Referring region grounding (RRG/REG-style tasks)
Open-form grounding (OFG)

Out-of-Scope Use

Safety-critical real-world deployment without additional safeguards and validation
Decision-making in high-risk domains
Any use requiring guaranteed robustness under distribution shift

How to Use

Setup

git clone https://github.com/pickxiguapi/Embodied-R1.git
cd Embodied-R1

conda create -n embodied_r1 python=3.11 -y
conda activate embodied_r1

pip install transformers==4.51.3 accelerate
pip install qwen-vl-utils[decord]

Inference

python inference_example.py

Example Tasks

VTG: put the red block on top of the yellow block
RRG: put pepper in pan
REG: bring me the camel model
OFG: loosening stuck bolts

(Visualization examples are available in the project repo: assets/)

Evaluation

cd eval
python hf_inference_where2place.py
python hf_inference_vabench_point.py
...

Related benchmarks:

Training

Training scripts are available at: https://github.com/pickxiguapi/Embodied-R1/tree/main/scripts

# Stage 1 training
bash scripts/stage_1_embodied_r1.sh

# Stage 2 training
bash scripts/stage_2_embodied_r1.sh

Key files:

scripts/config_stage1.yaml
scripts/config_stage2.yaml
scripts/stage_1_embodied_r1.sh
scripts/stage_2_embodied_r1.sh
scripts/model_merger.py (checkpoint merging + HF export)

Limitations

Performance may vary across environments, camera viewpoints, and unseen object domains.
Outputs are generated from visual-language reasoning and may include localization/action errors.
Additional system-level constraints (calibration, motion planning, safety checks) are required for real robot deployment.

Citation

@article{yuan2026embodied,
  title={Embodied-r1: Reinforced embodied reasoning for general robotic manipulation},
  author={Yuan, Yifu and Cui, Haiqin and Huang, Yaoting and Chen, Yibin and Ni, Fei and Dong, Zibin and Li, Pengyi and Zheng, Yan and Tang, Hongyao and Hao, Jianye},
  journal={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

@article{yuan2026seeing,
  title={From seeing to doing: Bridging reasoning and decision for robotic manipulation},
  author={Yuan, Yifu and Cui, Haiqin and Chen, Yibin and Dong, Zibin and Ni, Fei and Kou, Longxin and Liu, Jinyi and Li, Pengyi and Zheng, Yan and Hao, Jianye},
  journal={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

Acknowledgements

If this model or resources are useful for your research, please consider citing our work and starring the repository.

Downloads last month: 188

Safetensors

Model size

4B params

Tensor type

BF16

Collection including IffYuan/Embodied-R1-3B-v1

Embodied-R1

Collection

Model and Datasets of Embodied-R1 • 11 items • Updated Mar 3 • 5

Paper for IffYuan/Embodied-R1-3B-v1

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Paper • 2508.13998 • Published Aug 19, 2025 • 18