Skip to content

[CI][BugFix] Fix failure CI "amd-v1-sample-plus-logits-mi300-1"#42106

Closed
SoluMilken wants to merge 2 commits into
vllm-project:mainfrom
SoluMilken:fix/rocm-logitproc-entrypoint-tests
Closed

[CI][BugFix] Fix failure CI "amd-v1-sample-plus-logits-mi300-1"#42106
SoluMilken wants to merge 2 commits into
vllm-project:mainfrom
SoluMilken:fix/rocm-logitproc-entrypoint-tests

Conversation

@SoluMilken
Copy link
Copy Markdown
Contributor

@SoluMilken SoluMilken commented May 8, 2026

Purpose

Fix failure CI: amd-v1-sample-plus-logits-mi300-1
https://buildkite.com/vllm/ci/builds/65147/canvas?jid=019e06e7-5882-42b3-8c44-6723201294ea&tab=output
https://buildkite.com/vllm/ci/builds/65109/canvas?jid=019e0656-76cd-445e-8324-cfe4b042f924&tab=output
https://buildkite.com/vllm/ci/builds/65102/canvas?jid=019e0637-d718-43e0-b285-eaab513fd44e&tab=output

Root Cause

  • The test relied on fork semantics: monkeypatched entry_points was expected to be visible in workers.
  • ROCm CI uses spawn instead of fork, so spawned workers do not inherit the in-memory monkeypatch.
  • As a result, the dummy logits processor entrypoint was not discovered in EngineCore/worker.
  • The model generated normal tokens instead of the forced target token sequence.
  • Recent subprocess failure propagation made the existing issue visible.

Test Plan

python -m pytest \
"tests/v1/logits_processors/test_custom_offline.py::test_custom_logitsprocs[CustomLogitprocSource.LOGITPROC_SOURCE_ENTRYPOINT]" \
"tests/v1/logits_processors/test_custom_online.py::test_custom_logitsprocs[server0-facebook/opt-125m]" \
-v \
-p no:warnings

Test Result

image
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

@mergify mergify Bot added v1 bug Something isn't working labels May 8, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the mocking of custom logits processor entry points in tests by replacing in-memory patching of importlib.metadata with a more robust mechanism that creates temporary .dist-info files and updates the PYTHONPATH. This ensures that mocked entry points are correctly discovered by spawned worker processes. Feedback was provided regarding the construction of the PYTHONPATH environment variable, as the current implementation could result in a trailing separator that inadvertently adds the current working directory to the search path, potentially leading to non-hermetic tests.

Comment thread tests/v1/logits_processors/utils.py
SoluMilken added 2 commits May 9, 2026 02:42
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
@SoluMilken SoluMilken force-pushed the fix/rocm-logitproc-entrypoint-tests branch from b8f6833 to 287b48e Compare May 8, 2026 18:42
@SoluMilken SoluMilken changed the title [CI][BugFix] [CI][BugFix] Fix failure CI "amd-v1-sample-plus-logits-mi300-1" May 8, 2026
@SoluMilken SoluMilken marked this pull request as ready for review May 8, 2026 18:43
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added the rocm Related to AMD ROCm label May 8, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD May 8, 2026
@SoluMilken
Copy link
Copy Markdown
Contributor Author

Hi @afeldman-nm @dzhengAP could you please take a look?

This fixes the ROCm logits processor entrypoint test failures by replacing the in-process importlib.metadata.entry_points monkeypatch with a temporary real .dist-info/entry_points.txt. The old test relied on fork semantics, but ROCm runs these paths under spawn, so EngineCore/worker processes did not inherit the monkeypatch and failed to discover the dummy logits processor.

Thanks.

@dzhengAP
Copy link
Copy Markdown
Contributor

dzhengAP commented May 8, 2026

@SoluMilken thanks!there is an earlier PR discussed to fix this fork issue. Are they similar? #42040

@SoluMilken
Copy link
Copy Markdown
Contributor Author

Hi dzhengAP, they are almost the same. Let me close this PR.

@SoluMilken SoluMilken closed this May 9, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working rocm Related to AMD ROCm v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants