-
Notifications
You must be signed in to change notification settings - Fork 408
position ids for kv-cache #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
How does this differ from @ariG23498 's PR (#69)? Did you test it with the benchmark scripts if there is a noticeable speedup? |
|
Shouldn't we put it in the same PR then? Or is is also a general fix? I am a bit confused what exactly it does |
|
I didn't have permissions, so I made a PR on the PR... sorry about the confusion. It adds the position_id so that the rope position embeddings are added correctly, and then we sample a new token sequentially using the kv-cache up till the max_new_token lengths in the generate |
|
@ariG23498 will test it out in the morning I believe |
|
Ah okay, I see. I added you to the repo though, you should have permission! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loved the implementation.
* position ids for rope * cleanup * no need for mask * no mask * more cleanup * add back filtering * more cleanup * revert the signature of llm's generate and forward * use self.decoder.lm_use_tokens * use torch inference_mode * add back comment * fix bug * add back comments * add back comments
* position ids for rope * cleanup * no need for mask * no mask * more cleanup * add back filtering * more cleanup * revert the signature of llm's generate and forward * use self.decoder.lm_use_tokens * use torch inference_mode * add back comment * fix bug * add back comments * add back comments
No description provided.