[CI] Optimize entrypoints API server tests by csahithi · Pull Request #23896 · vllm-project/vllm

csahithi · 2025-08-29T03:49:37Z

Purpose

Reduce CI time for entrypoint tests by creating shared server for grouped tests
Removed v0 references in entrypoint tests
Replaced large models with smaller ones - hmellor/tiny-random-LlamaForCausalLM, microsoft/DialoGPT-small

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

robertgshaw2-redhat · 2025-08-29T03:56:31Z

wow! great job!

tests/entrypoints/openai/embedding_tests/test_encoder_decoder.py

tests/entrypoints/openai/embedding_tests/test_optional_middleware.py

tests/entrypoints/openai/individual_tests/test_metrics.py

tests/entrypoints/openai/test_chat_with_tool_reasoning.py

tests/entrypoints/openai/multimodal_tests/conftest.py

mergify · 2025-08-29T16:04:41Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @csahithi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

njhill

Thanks @csahithi this is great!!

Replaced large models with smaller ones - hmellor/tiny-random-LlamaForCausalLM, microsoft/DialoGPT-small

~~Is the reason for the latter that the former doesn't have a chat template?~~
~~If so we can just ask @hmellor to add the llama 3.2 chat template and replace them all with that.~~

Oh sorry I see that it does already have a chat template. Then I'm curious what's the reason for using microsoft/DialoGPT-small too?

I know you have ideas for possible further streamlining but in the interests of incremental improvement could we get this merged first?

Could you fix the merge conflicts and we can see what the new CI timings are like after that too.

hmellor · 2025-08-30T15:31:36Z

If anything needs changing about hmellor/tiny-random-LlamaForCausalLM to make it more useful for our tests do let me know, vLLM testing is what I made it for and it's easy to update!

njhill · 2025-09-05T20:06:16Z

CI failures look related

mergify · 2025-09-07T16:39:48Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @csahithi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

csahithi · 2025-10-01T17:14:00Z

@csahithi re the fix to test_chat_echo.py, do you know why this wasn't failing before this PR? (with "top_logprobs": -1)

I'm not sure what the cause is, seems like it started failing after the rebase
I see this change top_logprobs: -1 has been added with this PR #25031 recently

Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>

njhill · 2025-10-01T20:33:18Z

@csahithi re the fix to test_chat_echo.py, do you know why this wasn't failing before this PR? (with "top_logprobs": -1)

I'm not sure what the cause is, seems like it started failing after the rebase I see this change top_logprobs: -1 has been added with this PR #25031 recently

It looks like this is because that test used to run the server with --max-logprobs arg set, as it was specifically intended to test the -1 case. So we may want to have this one keep the module-scoped server.

Not sure if there are others which had test-specific args that we should look out for?

njhill · 2025-10-01T20:55:23Z

Interestingly it seems that top_logprobs: -1 actually results in 0 top logprobs returned, seems like a bug; the test doesn't actually check for this (setting it to 50 does result in top 50 logprobs returned).

We can separately follow up on that but for now it would be good to have this test at least work as it did before, i.e. exercise the -1 case even though it's not properly checking the top_logprob count of the output. We can still change it to use hmellor/tiny-random-LlamaForCausalLM.

I've opened #26194 for this issue.

Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>

…mize

Signed-off-by: Nick Hill <nhill@redhat.com>

…nt-tests-optimize

mergify · 2025-10-06T20:18:10Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @csahithi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

chaunceyjiang · 2025-10-09T07:42:03Z

It looks like this is because that test used to run the server with --max-logprobs arg set, as it was specifically intended to test the -1 case. So we may want to have this one keep the module-scoped server.

Not sure if there are others which had test-specific args that we should look out for?

Yes, top_logprobs: -1 depends on the --max-logprobs parameter.

Interestingly it seems that top_logprobs: -1 actually results in 0 top logprobs returned, seems like a bug; the test doesn't actually check for this (setting it to 50 does result in top 50 logprobs returned).

I've submitted a PR to fix this issue, sorry, the original PR didn't check the length of top_logprobs.

@csahithi @njhill #26470

hmellor · 2025-10-09T15:39:45Z

These conflicts are caused by our migration to ruff. Please see https://vllm-dev.slack.com/archives/C07R5Q1Q2BB/p1759663228844749 which contains detailed instructions to make updating your branch as painless as possible.

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor · 2025-11-04T17:37:47Z

I've got the merge past ruff, but it wasn't a clean process because the process we came up with doesn't handle deleted files well so I had to do it manually.

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

njhill · 2025-11-04T17:57:38Z

Thanks @hmellor!

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

mergify · 2025-11-07T19:29:23Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @csahithi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

DarkLight1337 · 2025-11-27T15:12:42Z

Any update on this?

hmellor · 2025-12-23T18:08:39Z

Closing as it's now too stale. #31228 does a small amount of organising but not on the same scale as this PR did.

mergify bot added ci/build tool-calling labels Aug 29, 2025

github-project-automation bot added this to Tool Calling Aug 29, 2025