[Bugfix] Lazy tokenizer init to prevent semaphore leak in multiprocess mode by kitaekatt · Pull Request #33847 · vllm-project/vllm

kitaekatt · 2026-02-05T00:21:23Z

Summary

Supersedes #30409 (rebased and rewritten against current main).

Defers tokenizer initialization in StructuredOutputManager from __init__ to first access via a @property. This prevents semaphore exhaustion when GGUF models are loaded in multiprocess mode.

Problem

When StructuredOutputManager.__init__ eagerly calls cached_tokenizer_from_config(), the tokenizer builds BPE merges using multiprocessing primitives. In forked subprocesses (multiprocess model loading for GGUF), these primitives leak POSIX semaphores that aren't cleaned up, eventually exhausting the system limit (/proc/sys/kernel/sem) and causing the server to hang.

Changes

Replace eager self.tokenizer = cached_tokenizer_from_config(...) with a @property that initializes on first access
Reasoning parser init is also deferred (depends on tokenizer)
ThreadPoolExecutor for grammar compilation is still created eagerly (no multiprocessing primitives)
skip_tokenizer_init check moved to the property, raising a clear error if structured output is used without a tokenizer

Testing

Tested with repeated GGUF model load/unload cycles in multiprocess mode — no semaphore exhaustion.

gemini-code-assist

Code Review

This pull request correctly addresses a semaphore leak that occurs during GGUF model loading in multiprocess mode. By deferring the tokenizer initialization in StructuredOutputManager to a lazily-loaded @property, the eager creation of multiprocessing primitives within __init__ is avoided, which was the source of the leak. The reasoning parser's initialization is also correctly deferred, as it depends on the tokenizer. The implementation is clean, follows standard Python patterns for lazy initialization, and includes appropriate error handling for cases where structured output is attempted without a tokenizer. The changes are well-contained and effectively resolve the described issue.

…ess mode Defer tokenizer initialization in StructuredOutputManager from __init__ to first access via a property. When GGUF models are loaded in multiprocess mode, eager tokenizer init builds BPE merges using multiprocessing primitives that leak semaphores in forked subprocesses, eventually exhausting the system limit and causing server hangs. Supersedes vllm-project#30409. Signed-off-by: Christina <truffle@gmail.com>

kitaekatt · 2026-04-18T16:50:53Z

Closing — not pursuing this approach. The lazy tokenizer init idea isn't the right shape for fixing the semaphore leak; will explore alternatives separately if needed.

mergify Bot added structured-output v1 bug Something isn't working labels Feb 5, 2026

github-project-automation Bot added this to Structured Output Feb 5, 2026

gemini-code-assist Bot reviewed Feb 5, 2026

View reviewed changes

kitaekatt force-pushed the fix/lazy-tokenizer-structured-output branch from 40d34cb to dcf8c8c Compare March 25, 2026 17:28

kitaekatt closed this Apr 18, 2026

github-project-automation Bot moved this to Done in Structured Output Apr 18, 2026

kitaekatt deleted the fix/lazy-tokenizer-structured-output branch April 18, 2026 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Lazy tokenizer init to prevent semaphore leak in multiprocess mode#33847

[Bugfix] Lazy tokenizer init to prevent semaphore leak in multiprocess mode#33847
kitaekatt wants to merge 1 commit into
vllm-project:mainfrom
kitaekatt:fix/lazy-tokenizer-structured-output

kitaekatt commented Feb 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

kitaekatt commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kitaekatt commented Feb 5, 2026

Summary

Problem

Changes

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

kitaekatt commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant