[Bugfix] Fix LOGITPROC_SOURCE_ENTRYPOINT test to use spawn-compatible dist-info registration for XPU/ROCm #42040
Conversation
… dist-info registration Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the custom logits processor tests to support spawned subprocesses by replacing manual monkey-patching of importlib.metadata.entry_points with a disk-based dist-info registration. A new utility function, register_fake_entrypoint, creates a temporary package and updates PYTHONPATH. Feedback indicates that sys.path should also be updated for the current process to ensure the driver process can successfully discover the entry point.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: David Zheng <153074367+dzhengAP@users.noreply.github.com>
|
|
|
CI is blocked, so I could not wait for the author. Opened a second PR here with their commits as well to honor their contributions: UPDATE: Author is back and people can officially call me impatient. |
|
Good sign is the fix focused in this PR is passed in Intel CI. The only fail is LoRA, which has been already discussed here. It can be waived due to current XPU limitation support on LoRA. #41895 (comment) Deeper insight: So this Qwen3.5 dense model path uses a GDN/Mamba-style layer where LoRA projections are not supported on XPU. The correct fix is to skip this test on XPU, not try to make it pass. @zhenwei-intel @jikunshang |
we disable some lora case on main, please rebase and check whether it pass. |
|
@dzhengAP could you rebase? I think AMD docker build is having some very temporary issues. |
|
rebased. let's see what CI say. |
|
Intel CI all passed, but AMD CI still running after 3hours, do we have any experience or estimation of the typical AMD CI ruining time?@AndreasKaratzas and @jikunshang |
|
@dzhengAP Yep, but AMD CI is like that(and it is not blocking), I was only interested in the blocking test group, and it is passing now. I am going to ping people in slack. |
Follow-up to #41423, also discussed in #41895.
Problem
test_custom_logitsprocs[LOGITPROC_SOURCE_ENTRYPOINT]andtest_rejects_custom_logitsprocs[LOGITPROC_SOURCE_ENTRYPOINT]relied onfork-based monkey-patching of
importlib.metadata.entry_pointsto inject afake logitproc entrypoint.
That works with
VLLM_WORKER_MULTIPROC_METHOD=fork, but it is not compatiblewith XPU/ROCm platforms where the tests need to run with spawn-based
multiprocessing. With
spawn, the monkey-patched entrypoint state is notinherited by worker subprocesses, so the fake custom logits processor entrypoint
cannot be discovered.
Fix
Replace the spawn path’s in-memory monkey-patch with a real temporary
.dist-infopackage written to disk and exposed throughPYTHONPATH.Since
importlib.metadatadiscovers entrypoints from installed package metadataon disk, spawned subprocesses can discover the fake logitproc entrypoint without
requiring fork.
This PR adds/updates the shared fake-entrypoint setup in
tests/v1/logits_processors/utils.pyto:.dist-infodirectory withMETADATAandentry_points.txt.PYTHONPATHso spawned subprocessescan discover the entrypoint.
sys.pathso the current driver process candiscover the entrypoint as well.
importlib.metadata.entry_pointsbehaviorfor fork-based test execution.
The follow-up commits also apply this setup consistently across the custom
offline and online logits processor tests.
This makes the custom logits processor entrypoint tests compatible with
spawn-based multiprocessing and fixes the XPU/ROCm CI failures.