Skip to content

[Tests] Shutdown test RemoteVLLMServer cleanly#36950

Merged
markmc merged 1 commit intovllm-project:mainfrom
njhill:clean-test-shutdown
Mar 13, 2026
Merged

[Tests] Shutdown test RemoteVLLMServer cleanly#36950
markmc merged 1 commit intovllm-project:mainfrom
njhill:clean-test-shutdown

Conversation

@njhill
Copy link
Member

@njhill njhill commented Mar 13, 2026

Recent PR #33949 changed the teardown logic of the RemoteVLLMServer test utility class to send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated shutdown logic that assumes only the top-level process will receive a signal (for example when running in a container that's shut down).

This caused a bunch of errors and stacktraces in some test logs, even though those tests still pass. We should still attempt a normal shutdown and only kill other procs if they are still running after a few seconds.

Example of test containing the shutdown mess: https://buildkite.com/vllm/ci/builds/55850#019cdeea-4528-4cbd-b7e7-277742a3e3e9 (in particular tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly modifies the shutdown logic for the RemoteVLLMServer test utility to ensure a graceful shutdown. Instead of sending SIGTERM to the entire process group, it now sends it only to the root process, which aligns with best practices for coordinated shutdown. I've identified a minor issue with a hardcoded class name in a log message that could be improved for better clarity and consistency, especially when subclasses are used.

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 13, 2026
Copy link
Member

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, that was a pretty impactful change buried and unmentioned in a seemingly unrelated PR! Nice find, Nick

@markmc markmc merged commit b373b51 into vllm-project:main Mar 13, 2026
14 checks passed
whycoming pushed a commit to whycoming/vllm that referenced this pull request Mar 13, 2026
Recent PR vllm-project#33949 changed the teardown logic of the RemoteVLLMServer test utility class to
send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated
shutdown logic that assumes only the top-level process will receive a signal (for example
when running in a container that's shut down).

This caused a bunch of errors and stacktraces in some test logs, even though those tests
still pass. We should still attempt a normal shutdown and only kill other procs if they are
still running after a few seconds.

Example: tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: whycoming <120623296@qq.com>
@njhill njhill deleted the clean-test-shutdown branch March 13, 2026 16:15
athrael-soju pushed a commit to athrael-soju/vllm that referenced this pull request Mar 16, 2026
Recent PR vllm-project#33949 changed the teardown logic of the RemoteVLLMServer test utility class to
send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated
shutdown logic that assumes only the top-level process will receive a signal (for example
when running in a container that's shut down).

This caused a bunch of errors and stacktraces in some test logs, even though those tests
still pass. We should still attempt a normal shutdown and only kill other procs if they are
still running after a few seconds.

Example: tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
Recent PR vllm-project#33949 changed the teardown logic of the RemoteVLLMServer test utility class to
send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated
shutdown logic that assumes only the top-level process will receive a signal (for example
when running in a container that's shut down).

This caused a bunch of errors and stacktraces in some test logs, even though those tests
still pass. We should still attempt a normal shutdown and only kill other procs if they are
still running after a few seconds.

Example: tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming

Signed-off-by: Nick Hill <nickhill123@gmail.com>
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
Recent PR vllm-project#33949 changed the teardown logic of the RemoteVLLMServer test utility class to
send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated
shutdown logic that assumes only the top-level process will receive a signal (for example
when running in a container that's shut down).

This caused a bunch of errors and stacktraces in some test logs, even though those tests
still pass. We should still attempt a normal shutdown and only kill other procs if they are
still running after a few seconds.

Example: tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming

Signed-off-by: Nick Hill <nickhill123@gmail.com>
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
Recent PR vllm-project#33949 changed the teardown logic of the RemoteVLLMServer test utility class to
send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated
shutdown logic that assumes only the top-level process will receive a signal (for example
when running in a container that's shut down).

This caused a bunch of errors and stacktraces in some test logs, even though those tests
still pass. We should still attempt a normal shutdown and only kill other procs if they are
still running after a few seconds.

Example: tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants