-
inference-optimization/test_tencentbac_fastmtp
Updated • 38 -
inference-optimization/test_qwen3_next_mtp
Updated • 42 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct_mtp_speculator
Text Generation • 2B • Updated • 57 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 18
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
-
inference-optimization/test_tencentbac_fastmtp
Updated • 38 -
inference-optimization/test_qwen3_next_mtp
Updated • 42 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct_mtp_speculator
Text Generation • 2B • Updated • 57 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 18
FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
models 212
inference-optimization/Qwen3-30B-A3B-Instruct-2507-quant-test-7-bits-heuristic
26B • Updated • 8
inference-optimization/Qwen3-30B-A3B-Instruct-2507-quant-test-6-bits-heuristic
23B • Updated • 11
inference-optimization/Qwen3-30B-A3B-Instruct-2507-quant-test-6.5-bits-heuristic
25B • Updated • 10
inference-optimization/Qwen3-30B-A3B-Instruct-2507-quant-test-5-bits-heuristic
20B • Updated • 11
inference-optimization/Qwen3-30B-A3B-Instruct-2507-quant-test-5.5-bits-heuristic
22B • Updated • 11
inference-optimization/gpt-oss-120b-from-self-ckpt5-speculator.eagle3
0.9B • Updated • 67
inference-optimization/gpt-oss-120b-from-self-ckpt3-speculator.eagle3
0.9B • Updated • 51
inference-optimization/gpt-oss-120b-from-self-ckpt4-speculator.eagle3
0.9B • Updated • 51
inference-optimization/gpt-oss-120b-from-self-ckpt2-speculator.eagle3
0.9B • Updated • 54
inference-optimization/gpt-oss-120b-from-self-ckpt1-speculator.eagle3
0.9B • Updated • 49