Context length and regeneration

by Michalea - opened Feb 5

Discussion

Michalea

Feb 5

What was the context length used to train this head?
Did you regenerate the data firstly?

MeganEFlynn

Red Hat AI org Feb 5

We used context length 4096 and regenerated the sequences using the large model.

Michalea

Feb 6

Thank you for information.
I think there is a risk that after first 4k of tokens during reasoning, the acceptance ratio will drop significantly, as it is the problem with train data distribution instead of rope/yarn.
So this head is ok for easier reasoning tasks but not for AIME GPQA etc. where sometimes we need to generate more than 30k tokens before answer.

What do you think?

MeganEFlynn

Red Hat AI org Feb 7

Sure, that may be an issue. We haven't tested very long context length tasks but please let us know how it goes if you do!

angazenn

Feb 11

Hello, I met similar problems here. When testing AIME with this eagle, the accept rate is 50~60% with k=3 if generated sequence is short. But it drops significantly to < 10 % if seq lens are longer

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment