Kkk's picture

Kkk

Michalea

·

AI & ML interests

None yet

Recent Activity

new activity 1 day ago

festr2/GLM-5-NVFP4-MTP:[Bug] Eagle V2 speculative decoding crashes with NaN in logits when radix cache prefix hit occurs (SM120 / RTX PRO 6000 Blackwell)

new activity 5 days ago

BLR2/Qwen3.5-9B-Eagle3-ShareGPT:MTP vs Eagle3

new activity 13 days ago

GadflyII/GLM-4.7-Flash-MTP-NVFP4:SGLang and MTP

View all activity

Organizations

None yet

New activity in festr2/GLM-5-NVFP4-MTP 1 day ago

[Bug] Eagle V2 speculative decoding crashes with NaN in logits when radix cache prefix hit occurs (SM120 / RTX PRO 6000 Blackwell)

#2 opened 15 days ago by

New activity in BLR2/Qwen3.5-9B-Eagle3-ShareGPT 5 days ago

MTP vs Eagle3

#1 opened 5 days ago by

New activity in GadflyII/GLM-4.7-Flash-MTP-NVFP4 13 days ago

SGLang and MTP

#2 opened 13 days ago by

New activity in Qwen/Qwen3-4B-Thinking-2507 22 days ago

RoPE theta 5mln instead of 1mln

#15 opened 22 days ago by

New activity in lmsys/SGLang-EAGLE3-Qwen3-Next-80B-A3B-Instruct-FP8-SpecForge-Meituan 30 days ago

The comparison with the original MTP

#2 opened 30 days ago by

New activity in nvidia/Qwen3-235B-A22B-Eagle3 about 1 month ago

context length/number of generated tokens during training

#4 opened about 1 month ago by

New activity in AngelSlim/Qwen3-8B_eagle3 about 1 month ago

Datasets used to create this head

#5 opened about 1 month ago by

New activity in unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF about 1 month ago

A low number of evaluation benchmarks

#5 opened about 2 months ago by

New activity in RedHatAI/Qwen3-235B-A22B-speculator.eagle3 about 1 month ago

Context length and regeneration

#1 opened about 1 month ago by

New activity in GadflyII/GLM-4.7-Flash-NVFP4 about 2 months ago

MTP quality, 47 layer

#7 opened about 2 months ago by

New activity in nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 about 2 months ago

Efficiency of NVFP4 vs FP16/8

#4 opened about 2 months ago by

New activity in speakleash/Bielik-11B-v3.0-Instruct about 2 months ago

opis wersji 3.0

#1 opened 2 months ago by

New activity in cerebras/GLM-4.7-Flash-REAP-23B-A3B about 2 months ago

Evaluation

#3 opened about 2 months ago by

New activity in unsloth/GLM-4.7-Flash-FP8-Dynamic about 2 months ago

Severe Looping/Repetitive Output when using --kv-cache-dtype fp8 with GLM-4.7-Flash-FP8-Dynamic on vLLM

#2 opened about 2 months ago by

New activity in nvidia/gpt-oss-120b-Eagle3-throughput 2 months ago

Inconsistent description with the evaluation results

#3 opened 2 months ago by

New activity in AngelSlim/Qwen3-1.7B_eagle3 2 months ago

Data used to train the EAGLE head

#5 opened 2 months ago by

New activity in nex-agi/SGLANG-EAGLE3-Qwen3-32B-Nex-N1 2 months ago

Training details - Question about Reasoning | /think

#1 opened 2 months ago by

New activity in Qwen/Qwen3-Reranker-4B 8 months ago

does the tokenizer need to be updated for this model?

#5 opened 9 months ago by