Skip to content

[Bugfix] Fix LOGITPROC_SOURCE_ENTRYPOINT test to use spawn-compatible dist-info registration for XPU/ROCm #42040

Merged
tjtanaa merged 5 commits intovllm-project:mainfrom
dzhengAP:bugfix/fix-entrypoint-spawn-compatible
May 9, 2026
Merged

[Bugfix] Fix LOGITPROC_SOURCE_ENTRYPOINT test to use spawn-compatible dist-info registration for XPU/ROCm #42040
tjtanaa merged 5 commits intovllm-project:mainfrom
dzhengAP:bugfix/fix-entrypoint-spawn-compatible

Conversation

@dzhengAP
Copy link
Copy Markdown
Contributor

@dzhengAP dzhengAP commented May 8, 2026

Follow-up to #41423, also discussed in #41895.

Problem

test_custom_logitsprocs[LOGITPROC_SOURCE_ENTRYPOINT] and
test_rejects_custom_logitsprocs[LOGITPROC_SOURCE_ENTRYPOINT] relied on
fork-based monkey-patching of importlib.metadata.entry_points to inject a
fake logitproc entrypoint.

That works with VLLM_WORKER_MULTIPROC_METHOD=fork, but it is not compatible
with XPU/ROCm platforms where the tests need to run with spawn-based
multiprocessing. With spawn, the monkey-patched entrypoint state is not
inherited by worker subprocesses, so the fake custom logits processor entrypoint
cannot be discovered.

Fix

Replace the spawn path’s in-memory monkey-patch with a real temporary
.dist-info package written to disk and exposed through PYTHONPATH.

Since importlib.metadata discovers entrypoints from installed package metadata
on disk, spawned subprocesses can discover the fake logitproc entrypoint without
requiring fork.

This PR adds/updates the shared fake-entrypoint setup in
tests/v1/logits_processors/utils.py to:

  1. Create a temporary .dist-info directory with METADATA and
    entry_points.txt.
  2. Add the temporary package directory to PYTHONPATH so spawned subprocesses
    can discover the entrypoint.
  3. Prepend the same directory to sys.path so the current driver process can
    discover the entrypoint as well.
  4. Use spawn-compatible registration when spawn multiprocessing is required.
  5. Keep the existing monkey-patched importlib.metadata.entry_points behavior
    for fork-based test execution.

The follow-up commits also apply this setup consistently across the custom
offline and online logits processor tests.

This makes the custom logits processor entrypoint tests compatible with
spawn-based multiprocessing and fixes the XPU/ROCm CI failures.

… dist-info registration

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added rocm Related to AMD ROCm intel-gpu Related to Intel GPU v1 bug Something isn't working labels May 8, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD May 8, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the custom logits processor tests to support spawned subprocesses by replacing manual monkey-patching of importlib.metadata.entry_points with a disk-based dist-info registration. A new utility function, register_fake_entrypoint, creates a temporary package and updates PYTHONPATH. Feedback indicates that sys.path should also be updated for the current process to ensure the driver process can successfully discover the entry point.

Comment thread tests/v1/logits_processors/utils.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: David Zheng <153074367+dzhengAP@users.noreply.github.com>
@zhenwei-intel
Copy link
Copy Markdown
Contributor

zhenwei-intel commented May 8, 2026

tests/v1/logits_processors/test_custom_online.py
Could you please also handle this test?

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

AndreasKaratzas commented May 8, 2026

CI is blocked, so I could not wait for the author. Opened a second PR here with their commits as well to honor their contributions:

UPDATE: Author is back and people can officially call me impatient.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
(cherry picked from commit a093d02)

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
(cherry picked from commit 82f2f93)
@dzhengAP
Copy link
Copy Markdown
Contributor Author

dzhengAP commented May 8, 2026

Good sign is the fix focused in this PR is passed in Intel CI. The only fail is LoRA, which has been already discussed here. It can be waived due to current XPU limitation support on LoRA. #41895 (comment)

Deeper insight: So this Qwen3.5 dense model path uses a GDN/Mamba-style layer where LoRA projections are not supported on XPU. The correct fix is to skip this test on XPU, not try to make it pass. @zhenwei-intel @jikunshang

@jikunshang
Copy link
Copy Markdown
Collaborator

Good sign is the fix focused in this PR is passed in Intel CI. The only fail is LoRA, which has been already discussed here. It can be waived due to current XPU limitation support on LoRA. #41895 (comment)

Deeper insight: So this Qwen3.5 dense model path uses a GDN/Mamba-style layer where LoRA projections are not supported on XPU. The correct fix is to skip this test on XPU, not try to make it pass. @zhenwei-intel @jikunshang

we disable some lora case on main, please rebase and check whether it pass.

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

@dzhengAP could you rebase? I think AMD docker build is having some very temporary issues.

@jikunshang
Copy link
Copy Markdown
Collaborator

rebased. let's see what CI say.

@dzhengAP
Copy link
Copy Markdown
Contributor Author

dzhengAP commented May 9, 2026

Intel CI all passed, but AMD CI still running after 3hours, do we have any experience or estimation of the typical AMD CI ruining time?@AndreasKaratzas and @jikunshang

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

@dzhengAP Yep, but AMD CI is like that(and it is not blocking), I was only interested in the blocking test group, and it is passing now. I am going to ping people in slack.

Copy link
Copy Markdown
Collaborator

@tjtanaa tjtanaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tjtanaa tjtanaa merged commit df2636a into vllm-project:main May 9, 2026
17 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working intel-gpu Related to Intel GPU ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants