Text Generation
Safetensors
Model Optimizer
llama
nvidia
ModelOpt
Qwen3
quantized
FP4
fp4
conversational
Instructions to use nm-testing/convert_ct_dequant-e2e with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Inference
| { | |
| "bos_token_id": 1, | |
| "eos_token_id": 2, | |
| "max_length": 2048, | |
| "pad_token_id": 0, | |
| "transformers_version": "4.45.0.dev0" | |
| } | |