Skip to content

[CI] Stabilize multinode DP internal LB completion tests#36356

Merged
njhill merged 3 commits intovllm-project:mainfrom
ROCm:akaratza_stabilize_distributed
Mar 16, 2026
Merged

[CI] Stabilize multinode DP internal LB completion tests#36356
njhill merged 3 commits intovllm-project:mainfrom
ROCm:akaratza_stabilize_distributed

Conversation

@AndreasKaratzas
Copy link
Collaborator

@AndreasKaratzas AndreasKaratzas commented Mar 7, 2026

Fixes flaky test_api_only_multinode_dp_completion and test_multinode_dp_completion which intermittently fail with empty model responses during concurrent load balancer testing.

Motivation

> assert len(choice.text) >= 1
E AssertionError: assert 0 >= 1
E  +  where 0 = len('')

These tests intentionally use temperature=1.0 to produce diverse outputs across 200 concurrent requests for realistic load balancer distribution testing. However, at temperature=1.0 the model can legitimately emit a stop token as its very first token, producing text='' with finish_reason='stop'. Over 400 requests (two bursts of 200), the probability of at least one empty response is high. Rather than changing temperature=0.0 (which would undermine the test's intent of exercising load balancing with diverse requests), the fix tolerates the valid edge case: when finish_reason='stop', empty text is accepted. The non-empty text assertion is only enforced when finish_reason='length'.

  • _make_completion_request helper: Extracted the duplicated make_request() closure from both non-streaming completion tests into a shared module-level function with diagnostic assertion messages that print actual values on failure.

  • _run_request_bursts helper: Extracted the duplicated two-burst loop pattern (create tasks -> gather -> validate -> sleep) shared by both non-streaming tests.

  • Streaming tests unchanged: They already use temperature=0.0 and have adequate assertions.

cc @kenroche

…ponses at temperature 1

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@mergify mergify bot added the v1 label Mar 7, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a flaky test issue by correctly handling empty model responses when finish_reason is 'stop'. The refactoring into _make_completion_request and _run_request_bursts helper functions significantly improves code readability and maintainability by removing duplication. The changes are well-justified and implemented correctly. I have one suggestion to further improve the robustness of the new test helper.

Comment on lines +92 to +98
results = await asyncio.gather(*all_tasks)
assert len(results) == num_requests, (
f"Burst {burst}: expected {num_requests} results, got {len(results)}"
)
assert all(completion is not None for completion in results), (
f"Burst {burst}: some completions were None"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using asyncio.gather without return_exceptions=True can lead to unhandled exceptions and resource leaks if one of the tasks fails. When a task in gather raises an exception, gather propagates that exception immediately, and other tasks might not be cancelled, potentially continuing to run in the background. This can affect the stability of subsequent tests in the suite.

By setting return_exceptions=True, gather will wait for all tasks to complete and return exceptions as results. You can then explicitly check for and handle any exceptions, ensuring a cleaner test shutdown. This improves test robustness.

        results = await asyncio.gather(*all_tasks, return_exceptions=True)
        assert len(results) == num_requests, (
            f"Burst {burst}: expected {num_requests} results, got {len(results)}"
        )

        # Raise any exceptions that were caught
        for result in results:
            if isinstance(result, BaseException):
                raise result

        assert all(completion is not None for completion in results), (
            f"Burst {burst}: some completions were None"
        )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using return_exceptions=True and re-raising to ensure clean task shutdown. Done :)

…ponses at temperature 1

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@AndreasKaratzas AndreasKaratzas added ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm labels Mar 7, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Mar 7, 2026
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njhill njhill merged commit 4f9b14c into vllm-project:main Mar 16, 2026
17 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in AMD Mar 16, 2026
@AndreasKaratzas AndreasKaratzas deleted the akaratza_stabilize_distributed branch March 16, 2026 23:09
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
andylolu2 pushed a commit to andylolu2/vllm that referenced this pull request Mar 18, 2026
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants