dpo-qwen-y-v35

DPO fine-tuned version of Qwen/Qwen3-4B-Instruct-2507. Full-merged 16-bit weights. No adapter loading required.

Training Configuration

  • Method: DPO
  • Epochs: 1
  • Learning rate: 1e-07
  • Beta: 0.1
  • Max sequence length: 1024
Downloads last month
2
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yamaTK/dpo-qwen-y-v35-v3

Finetuned
(1541)
this model

Dataset used to train yamaTK/dpo-qwen-y-v35-v3