Instructions to use internlm/Intern-S2-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use internlm/Intern-S2-Preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="internlm/Intern-S2-Preview", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForImageTextToText model = AutoModelForImageTextToText.from_pretrained("internlm/Intern-S2-Preview", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use internlm/Intern-S2-Preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "internlm/Intern-S2-Preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/Intern-S2-Preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/internlm/Intern-S2-Preview
- SGLang
How to use internlm/Intern-S2-Preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "internlm/Intern-S2-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/Intern-S2-Preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "internlm/Intern-S2-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/Intern-S2-Preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use internlm/Intern-S2-Preview with Docker Model Runner:
docker model run hf.co/internlm/Intern-S2-Preview
Intern-S2-Preview Deployment Guide
The Intern-S2-Preview release is a 35B-A3B model stored in bfloat16 weight format. This guide provides deployment examples for the following configurations:
- MTP speculative decoding (Recommended)
- Basic serving without MTP
- Long-context inference with YaRN RoPE configuration
NOTE: The commands below are reference configurations. Inference frameworks are under active development, so use the latest framework documentation and your local validation results when tuning production deployments.
LMDeploy
Use the latest LMDeploy (>=0.13.0) with Intern-S2-Preview support.
- Serving With MTP (Recommended)
lmdeploy serve api_server \
internlm/Intern-S2-Preview \
--trust-remote-code \
--backend pytorch \
--tp 2 \
--reasoning-parser default \
--tool-call-parser interns2-preview \
--speculative-algorithm qwen3_5_mtp \
--speculative-num-draft-tokens 4 \
--max-batch-size 256
- Basic Serving Without MTP
lmdeploy serve api_server \
internlm/Intern-S2-Preview \
--trust-remote-code \
--backend pytorch \
--tp 2 \
--reasoning-parser default \
--tool-call-parser interns2-preview
- Long-Context Serving
For long-context inference, configure both --session-len and YaRN RoPE parameters. The following example uses a 512k context length:
lmdeploy serve api_server \
internlm/Intern-S2-Preview \
--trust-remote-code \
--tp 2 \
--backend pytorch \
--reasoning-parser default \
--tool-call-parser interns2-preview \
--session-len 512000 \
--max-batch-size 64 \
--hf-overrides '{"text_config": {"rope_parameters": {"mrope_interleaved": true, "mrope_section": [11, 11, 10], "rope_type": "yarn", "rope_theta": 10000000, "partial_rotary_factor": 0.25, "factor": 4.0, "original_max_position_embeddings": 262144}}}'
vLLM
Use the latest vLLM Docker image or source build with Intern-S2-Preview support.
- Serving With MTP (Recommended)
vllm serve internlm/Intern-S2-Preview \
--trust-remote-code \
--tensor-parallel-size 2 \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--speculative-config '{"method":"mtp","num_speculative_tokens":4}'
- Basic Serving Without MTP
vllm serve internlm/Intern-S2-Preview \
--trust-remote-code \
--tensor-parallel-size 2 \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder
SGLang
Use the latest SGLang Docker image or source build with Intern-S2-Preview support.
- Serving With MTP (Recommended)
SGLANG_ENABLE_SPEC_V2=1 \
python3 -m sglang.launch_server \
--model-path internLM/Intern-S2-Preview \
--trust-remote-code \
--tp-size 2 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mamba-scheduler-strategy extra_buffer \
--speculative-algo 'NEXTN' \
--speculative-eagle-topk 1 \
--speculative-num-steps 3 \
--speculative-num-draft-tokens 4
- Basic Serving Without MTP
python3 -m sglang.launch_server \
--model-path internlm/Intern-S2-Preview \
--trust-remote-code \
--tp-size 2 \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder