Skip to content

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test#37616

Open
AndreasKaratzas wants to merge 2 commits intovllm-project:mainfrom
ROCm:akaratza_fixcohere_openai
Open

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test#37616
AndreasKaratzas wants to merge 2 commits intovllm-project:mainfrom
ROCm:akaratza_fixcohere_openai

Conversation

@AndreasKaratzas
Copy link
Collaborator

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@mergify mergify bot added the rocm Related to AMD ROCm label Mar 19, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Mar 19, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a flaky test for Cohere/OpenAI embedding parity on ROCm by adding ROCM_EXTRA_ARGS to the test server's configuration. This introduces arguments to disable prefix caching and limit the maximum number of sequences to one on ROCm platforms. While this change successfully stabilizes the test, I have a concern that limiting sequences to one effectively disables batch processing, which undermines the purpose of the test_batch_parity test. My review includes a suggestion to handle this more explicitly to maintain test integrity.

@AndreasKaratzas
Copy link
Collaborator Author

Testing MI325 to see if issue is resolved (added rocm and ready labels).

@AndreasKaratzas AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 20, 2026
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@mergify mergify bot added the frontend label Mar 20, 2026
@AndreasKaratzas AndreasKaratzas marked this pull request as ready for review March 20, 2026 17:25
@AndreasKaratzas AndreasKaratzas requested a review from noooop as a code owner March 20, 2026 17:25
@AndreasKaratzas
Copy link
Collaborator Author

await self._prepare_generators(ctx)
await self._collect_batch(ctx)
try:
await self._collect_batch(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? We now use app level error handlers to convert error responses

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkLight1337 The app-level Exception handler at api_server.py:270 handles dimensions=-1 correctly (returns 400), but for the immediately following dimensions=16 request on the same connection, the same ValueError from pooling_params.verify() escapes the Starlette ExceptionMiddleware and crashes the ASGI app. The client gets APIConnectionError instead of BadRequestError.

https://buildkite.com/vllm/amd-ci/builds/6711/steps/canvas?sid=019d088b-f229-4de8-923d-b4c48a62c6fb&tab=output

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants