[Bugfix] Add missing auto_create_handle_loop to communicator methods#19610
Merged
Kangyan-Zhou merged 1 commit intosgl-project:mainfrom Mar 1, 2026
Merged
Conversation
The handle_loop asyncio task in TokenizerManager is responsible for receiving responses from schedulers via ZMQ and dispatching them to the appropriate _Communicator. However, handle_loop is lazily started by auto_create_handle_loop() and several communicator methods were missing this call. This caused /server_info (and other endpoints like /flush_cache, /get_loads) to hang indefinitely when called on freshly-started servers that had not yet processed any inference request -- because no inference request had triggered auto_create_handle_loop() yet, the scheduler responses were never received. This is particularly critical for PD disaggregation setups where the sglang router's service discovery calls /server_info as the very first interaction with worker pods during the discover_metadata step, before any generate request is sent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Kangyan-Zhou
added a commit
to Kangyan-Zhou/sglang
that referenced
this pull request
Mar 4, 2026
…gl-project#19610) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
magicYang1573
pushed a commit
to magicYang1573/sglang
that referenced
this pull request
Mar 9, 2026
…gl-project#19610) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Wangzheee
pushed a commit
to Wangzheee/sglang
that referenced
this pull request
Mar 21, 2026
…gl-project#19610) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TokenizerCommunicatorMixin(e.g.,get_internal_state,flush_cache,get_load) were missing theself.auto_create_handle_loop()call that ensures the ZMQ response-receiving loop is running.handle_loopasyncio task is never started, causing confusion.Motivation
In PD disaggregation mode, the router calls
/server_info(which invokesget_internal_state()) immediately after a worker pod starts — before any inference request arrives. Sinceauto_create_handle_loop()is only called fromgenerate_request(), thehandle_loopthat receives scheduler responses via ZMQ is never created, causingget_internal_state()to wait forever.This also affects
flush_cache,get_load,get_loads,set_internal_state,dumper_control, and the HiCache management endpoints — any communicator method called on an idle server will hang.Fix
Add
self.auto_create_handle_loop()to the 10 communicator methods that were missing it, matching the pattern already used bygenerate_request(),slow_down(), and other working methods.The call is idempotent — it returns immediately if the handle_loop is already running.
Test plan
/server_infohangs before the fixauto_create_handle_loop) unblocks the pending/server_info/server_inforesponds immediately on freshly started pods without any prior inference traffic🤖 Generated with Claude Code