Text Generation
• 0.6B • Updated • 138
• 1
Text Generation
• 0.8B • Updated • 38
AIPlans/Qwen3-0.6B-ORPO-Crosscoder-MixedDataset
Updated
AIPlans/Qwen3-0.6B-GRPO-Crosscoder-MixedDataset
Updated
AIPlans/Qwen3-0.6B-KTO-Crosscoder-MixedDataset
Updated
AIPlans/Qwen3-0.6B-IPO-Crosscoder-MixedDataset
Updated
Reinforcement Learning
• 0.6B • Updated • 4
• 2
AIPlans/Qwen3-0.6B-GRPO-RM_NVIDIA
Text Generation
• 0.6B • Updated • 1
AIPlans/Qwen3-0.6B-GRPO_Epoch2
Text Generation
• 0.6B • Updated • 1
AIPlans/Qwen3-0.6B-GRPO_Epoch1
Text Generation
• 0.6B • Updated • 2
Reinforcement Learning
• 0.6B • Updated • 15
• 1
AIPlans/qwen3-0.6b-base-PPO-hs2
Updated
AIPlans/Qwen3-0.6B-DPO_Epoch_1
Text Generation
• 0.6B • Updated • 1
AIPlans/Qwen3-0.6B-SFT-hs2
Text Generation
• 0.6B • Updated • 5
AIPlans/Qwen3-0.6B-RM-hs2
Text Classification
• 0.6B • Updated • 7
• 1
Text Generation
• Updated • 13
AIPlans/Qwen3-0.6B-DPO_NOTLORA
Text Generation
• 0.6B • Updated • 8
Text Generation
• Updated • 4
• 1
Text Generation
• Updated • 8
AIPlans/qwen3-0.6b-hh-rlhf-sft
0.6B • Updated • 4
AIPlans/Qwen3-0.6B-KTO_trial
Text Generation
• 0.6B • Updated • 3
• 1
AIPlans/qwen3-0.6b-sft-hh-rlhf-lora
Updated
AIPlans/qwen3-0.6b-base-PPO-PM
AIPlans/qwen3-0.6b-base-hl-RM
Text Classification
• 0.6B • Updated • 2
0.6B • Updated AIPlans/qwen3-0.6b-dpo-lora
Text Generation
• 0.6B • Updated • 2
• 1
AIPlans/qwen3-0.6B-reward-hh-rlhf
Text Generation
• 0.6B • Updated • 1