-
Notifications
You must be signed in to change notification settings - Fork 278
bugfix kv cache quantization with ignored layers #1312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be better if self.k_observers was of type dict?
9d7b26e to
8bc85a9
Compare
@kylesayrs I originally started with that, but since this is always incremented it ends up being a smaller footprint and we can still get by with easy access with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This illustrates that we should really replace these list data structures with dicts, but this is a good change to fix the issue!
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
8bc85a9 to
aad85a3
Compare
SUMMARY:
Users may ignore some layers in configuration, meaning list indices may not correspond to a given layer index when values are being appended to a list for each layer. We must account for that case by padding list so that index of lists always corresponds to layer_idx
Resolves #1295 . I can confirm i see the error original poster is reporting, using their recipe with
meta-llama/Meta-Llama-3-8B-Instruct, and that this PR resolves it. I am able to runlm_evalon the quantized model. I will move this to ready for review to get feedback.TEST PLAN:
Add doctests to unit test new helper function, not sure if these get tested though.