-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Tool calling parser for Granite 3.0 models #9027
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @maxdebayser!
This model supports all the cases in our unit tests I had to rebase this due to DCO problems in several commits that have now been merged in main. Signed-off-by: Max de Bayser <[email protected]>
77ee365
to
51e2d3a
Compare
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @maxdebayser! I did a first pass and left some comments
cc @K-Mistele in case you'd like to take a look too |
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
will take another look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @maxdebayser. I'll hold off for any comments from @K-Mistele before merging.
Giving it a final once-over :) Out of curiosity it looks like the model's default context is 4096; is there any way to scale this if I want to test longer-context tool calls? no worries if there's not a good way to do with with vLLM |
Looks great to me! Seems pretty robust, and passing tests on my machine :) |
Looks like the only failing tests were AMD-related which was expected last I checked. cc @mgoin @DarkLight1337 |
The test is not failing on main branch, so I think it's introduced by this PR. Do tell me if I'm mistaken though. |
Oops, missed that. I just remembered the last time I did one there were some expected to fail, but I'm sure you're right :) Seems related to the FP8 granite 20B quantization ( https://buildkite.com/vllm/ci-aws/builds/10840#01930511-9576-45cb-aebb-e237d5f07c9b/974-5501 AMD supports FP8 in ROCM > 6.2. Could also be an OOM issue since this is a 20B that we had issues fitting into other CI GPUs; or could be related to CPU offloading. Whatever the case, seems more like a model+configuration+hardware compatibility issue rather than a code issue. Is there any way to disable this particular model for AMD tests, and would that be an acceptable solution? |
I think this is alright. @njhill any thoughts? |
Signed-off-by: Max de Bayser <[email protected]>
I've pushed a change to disable skip the |
I'm not sure, but as we add more models and tests, we might have to get the max sequence length for each model and skip the tests that require longer ones. |
Signed-off-by: Max de Bayser <[email protected]>
Thanks again @maxdebayser @K-Mistele @DarkLight1337! |
) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Isotr0py <[email protected]>
) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: OmerD <[email protected]>
) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Loc Huynh <[email protected]>
) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
) Signed-off-by: Max de Bayser <[email protected]>
) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
This PR adds a tool calling parser for
ibm-granite/granite-3.0-8b-instruct
. The smaller models inthe Granite 3.0 Language Models models collection pass some of the tests..
cc: @njhill