Missing SWA implementation?

by hell0ks - opened Dec 23, 2025

Dec 23, 2025

Hello,

I'm currently implementing Trillion architecture support for llama.cpp.

However, during testing I found model is unstable at long context. With trial-and-error, it looks like trained with SWA at window size 4096 as model card says, but its implementation is missing in transformer modeling code.

Can you confirm this is correct? Thanks.

hell0ks

Dec 28, 2025

You are working on this too. I see, SWA is required. Closing.

hell0ks changed discussion status to closed Dec 28, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment