[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test#37616
[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test#37616AndreasKaratzas wants to merge 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
There was a problem hiding this comment.
Code Review
This pull request aims to fix a flaky test for Cohere/OpenAI embedding parity on ROCm by adding ROCM_EXTRA_ARGS to the test server's configuration. This introduces arguments to disable prefix caching and limit the maximum number of sequences to one on ROCm platforms. While this change successfully stabilizes the test, I have a concern that limiting sequences to one effectively disables batch processing, which undermines the purpose of the test_batch_parity test. My review includes a suggestion to handle this more explicitly to maintain test integrity.
|
Testing MI325 to see if issue is resolved (added |
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
Test has been confirmed green: https://buildkite.com/vllm/amd-ci/builds/6732/steps/canvas?sid=019d0bd9-2a24-4529-a7c3-4c16a3f66397&tab=output |
| await self._prepare_generators(ctx) | ||
| await self._collect_batch(ctx) | ||
| try: | ||
| await self._collect_batch(ctx) |
There was a problem hiding this comment.
Why is this needed? We now use app level error handlers to convert error responses
There was a problem hiding this comment.
@DarkLight1337 The app-level Exception handler at api_server.py:270 handles dimensions=-1 correctly (returns 400), but for the immediately following dimensions=16 request on the same connection, the same ValueError from pooling_params.verify() escapes the Starlette ExceptionMiddleware and crashes the ASGI app. The client gets APIConnectionError instead of BadRequestError.
Follow-up for:
Stabilizes Cohere test that was failing to due batch invariance issues on ROCm. Addresses failure in
mi325_1: Entrypoints Integration (Pooling)Motivation: https://buildkite.com/vllm/amd-ci/builds/6701/steps/canvas?sid=019d07a7-1a2e-4d29-91e7-9eb765bc4904&tab=output
Related:
cc @kenroche