Maincoder 1B โ ONNX (Quantized, WebGPU)
This is a quantized ONNX version of Maincode/Maincoder-1B, optimized for in-browser inference with Transformers.js and WebGPU.
Quantization
- Format: ONNX with int4 (MatMulNBits) quantization
- Original model size: ~5 GB (fp32)
- Quantized model size: ~1.5 GB (q4)
- Quantization method:
MatMulNBitsQuantizerfromonnxruntimewith block_size=32, symmetric quantization
All tensor data is embedded in a single .onnx file (no external data files) for browser compatibility.
Usage with Transformers.js
import { AutoModelForCausalLM, AutoTokenizer } from "@huggingface/transformers";
const model = await AutoModelForCausalLM.from_pretrained(
"shreyask/Maincoder-1B-ONNX-web",
{ dtype: "q4", device: "webgpu" }
);
const tokenizer = await AutoTokenizer.from_pretrained(
"shreyask/Maincoder-1B-ONNX-web"
);
const messages = [
{ role: "system", content: "You are Maincoder, an expert code generation assistant." },
{ role: "user", content: "Write a binary search function in Python" },
];
const input = tokenizer.apply_chat_template(messages, {
add_generation_prompt: true,
return_dict: true,
});
const output = await model.generate({
...input,
max_new_tokens: 1024,
eos_token_id: [151643, 151645],
});
Base Model
This is a quantized conversion of Maincode/Maincoder-1B. See the base model card for training details, benchmarks, and intended use.
- Downloads last month
- 47
Model tree for shreyask/Maincoder-1B-ONNX-web
Base model
Maincode/Maincoder-1B