MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper ⢠2601.07832 ⢠Published Jan 12 ⢠52
MiniMaxAI/MiniMax-M2.1 Text Generation ⢠229B ⢠Updated about 1 month ago ⢠53.2k ⢠⢠1.27k