Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] Tool calling parser for Granite 3.0 models #9027

Merged
merged 10 commits into from
Nov 7, 2024

Conversation

maxdebayser
Copy link
Contributor

@maxdebayser maxdebayser commented Oct 2, 2024

This PR adds a tool calling parser for ibm-granite/granite-3.0-8b-instruct. The smaller models in
the Granite 3.0 Language Models models collection pass some of the tests..

cc: @njhill

Copy link

github-actions bot commented Oct 2, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@njhill njhill changed the title [Frontend] Tool calling parser for granite-8b-instruct [Frontend] Tool calling parser for Granite 3.0 models Oct 24, 2024
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maxdebayser!

docs/source/serving/openai_compatible_server.md Outdated Show resolved Hide resolved
docs/source/serving/openai_compatible_server.md Outdated Show resolved Hide resolved
docs/source/serving/openai_compatible_server.md Outdated Show resolved Hide resolved
docs/source/serving/openai_compatible_server.md Outdated Show resolved Hide resolved
docs/source/serving/openai_compatible_server.md Outdated Show resolved Hide resolved
This model supports all the cases in our unit tests

I had to rebase this due to DCO problems in several commits
that have now been merged in main.

Signed-off-by: Max de Bayser <[email protected]>
@maxdebayser maxdebayser marked this pull request as ready for review October 31, 2024 18:50
@maxdebayser
Copy link
Contributor Author

This is ready for review.

cc: @njhill @wseaton

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maxdebayser! I did a first pass and left some comments

@njhill
Copy link
Member

njhill commented Nov 1, 2024

cc @K-Mistele in case you'd like to take a look too

Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
@K-Mistele
Copy link
Contributor

will take another look!

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 6, 2024
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @maxdebayser. I'll hold off for any comments from @K-Mistele before merging.

@K-Mistele
Copy link
Contributor

Giving it a final once-over :)

Out of curiosity it looks like the model's default context is 4096; is there any way to scale this if I want to test longer-context tool calls? no worries if there's not a good way to do with with vLLM

@K-Mistele
Copy link
Contributor

Looks great to me! Seems pretty robust, and passing tests on my machine :)

@K-Mistele
Copy link
Contributor

Looks like the only failing tests were AMD-related which was expected last I checked. cc @mgoin @DarkLight1337

@DarkLight1337
Copy link
Member

Looks like the only failing tests were AMD-related which was expected last I checked. cc @mgoin @DarkLight1337

The test is not failing on main branch, so I think it's introduced by this PR. Do tell me if I'm mistaken though.

@K-Mistele
Copy link
Contributor

Looks like the only failing tests were AMD-related which was expected last I checked. cc @mgoin @DarkLight1337

The test is not failing on main branch, so I think it's introduced by this PR. Do tell me if I'm mistaken though.

Oops, missed that. I just remembered the last time I did one there were some expected to fail, but I'm sure you're right :)

Seems related to the FP8 granite 20B quantization (mbayser/granite-20b-functioncalling-FP8-KV) that's being used:

https://buildkite.com/vllm/ci-aws/builds/10840#01930511-9576-45cb-aebb-e237d5f07c9b/974-5501

AMD supports FP8 in ROCM > 6.2. Could also be an OOM issue since this is a 20B that we had issues fitting into other CI GPUs; or could be related to CPU offloading.

Whatever the case, seems more like a model+configuration+hardware compatibility issue rather than a code issue. Is there any way to disable this particular model for AMD tests, and would that be an acceptable solution?

@DarkLight1337
Copy link
Member

I think this is alright. @njhill any thoughts?

@maxdebayser
Copy link
Contributor Author

I've pushed a change to disable skip the granite-20b-functioncalling test on AMD. Let's see if it passes now 🤞

@maxdebayser
Copy link
Contributor Author

Out of curiosity it looks like the model's default context is 4096; is there any way to scale this if I want to test longer-context tool calls? no worries if there's not a good way to do with with vLLM

I'm not sure, but as we add more models and tests, we might have to get the max sequence length for each model and skip the tests that require longer ones.

Signed-off-by: Max de Bayser <[email protected]>
@njhill
Copy link
Member

njhill commented Nov 7, 2024

Thanks again @maxdebayser @K-Mistele @DarkLight1337!

@njhill njhill merged commit ae62fd1 into vllm-project:main Nov 7, 2024
47 checks passed
Isotr0py pushed a commit to Isotr0py/vllm that referenced this pull request Nov 8, 2024
omer-dayan pushed a commit to omer-dayan/vllm that referenced this pull request Nov 10, 2024
JC1DA pushed a commit to JC1DA/vllm that referenced this pull request Nov 11, 2024
sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants