[Bugfix] Fix `RuntimeError: Already borrowed` by enforcing one tokenizer per thread and one thread in threadpool by yzong-rh · Pull Request #41047 · vllm-project/vllm

yzong-rh · 2026-04-27T20:14:43Z

Purpose

Fixes #40949 by enforcing one tokenizer per thread: one tokenizer remains on the main thread, and a separate tokenizer copy is used by the renderer executor. This PR also forces the renderer executor to stay single-threaded by removing --renderer-num-workers, avoiding RuntimeError: Already borrowed from concurrent Hugging Face tokenizer access.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

gemini-code-assist

Code Review

This pull request removes the renderer_num_workers configuration, restricting the renderer thread pool to a single worker to address thread-safety concerns with the tokenizer and multimodal processor cache. It introduces a deep-copied executor_tokenizer for use within the thread pool. Review feedback highlights that using the get_executor_tokenizer() helper during multimodal processor initialization will cause errors when skip_tokenizer_init=True is enabled, as the processor should handle a missing tokenizer gracefully. Furthermore, the removal of renderer_num_workers from EngineArgs and the CLI constitutes a breaking change for the public API and deployment scripts, suggesting that the parameter should be retained for backward compatibility.

gemini-code-assist · 2026-04-27T20:17:25Z

        MultiModalConfig.mm_encoder_fp8_scale_save_margin
    )
    io_processor_plugin: str | None = None
-    renderer_num_workers: int = 1


Removing renderer_num_workers from EngineArgs is a breaking change for the public Python API. Any user code that constructs EngineArgs with this parameter will now fail with a TypeError.

Additionally, removing the corresponding CLI argument (previously at line 836) breaks existing deployment scripts. While forcing a single worker fixes the borrowing issue, consider keeping the argument for backward compatibility (potentially ignoring it with a warning) and to allow for future scaling improvements (e.g., using a pool of tokenizers).

Good point. This is a draft for illustrative purposes, will update before readying

…zer per thread Signed-off-by: Yifan <yzong@redhat.com>

Signed-off-by: Yifan <yzong@redhat.com>

noooop · 2026-04-28T02:00:17Z

+        self._executor = ThreadPoolExecutor(max_workers=1)
+        # Tokenizer to be used in the executor thread
+        # Deep copy to avoid sharing the tokenizer leading to
+        # "already borrowed" errors (see #36557).
+        self.executor_tokenizer = copy.deepcopy(tokenizer)


Since we can use deep copy to avoid sharing the tokenizer and prevent "already borrowed" errors (see #36557), we should find a way to make all tokenizers use a deep-copied instance (perhaps use a tokenizer pool ), rather than enforcing one tokenizer per thread.

We use multithreading in many places, and enforcing one tokenizer per thread here does not completely solve all the problems.

we should find a way to make all tokenizers use a deep-copied instance, rather than enforcing one tokenizer per thread.

You're saying we want to create a deep-copy of the tokenizer for each thread in the threadpool rather than limiting threadpool to one thread right?

I agree that this is the most flexible solution. I was refering to this in #40949 in Keep allowing --renderer-num-workers > 1 but use thread-local tokenizer. I'm working on a draft PR implementing this.

We use multithreading in many places, and enforcing one tokenizer per thread here does not completely solve all the problems.

I'm not sure I understand this. IIUC we use multithreading in many places (for mm processor, pooling io, applying chat template, etc.) but they all execute on the same underlying thread pool in renderer._executor. If we ensure a unique tokenizer for each thread in the thread pool (either by having many copies of the tokenizer or by enforcing thread pool contains only 1 thread), won't we avoid all the problems?

You're saying we want to create a deep-copy of the tokenizer for each thread in the threadpool rather than limiting threadpool to one thread right?

This is the most flexible solution. +1

I'm not sure I understand this. IIUC we use multithreading in many places (for mm processor, pooling io, applying chat template, etc.) but they all execute on the same underlying thread pool in renderer._executor. If we ensure a unique tokenizer for each thread in the thread pool (either by having many copies of the tokenizer or by enforcing thread pool contains only 1 thread), won't we avoid all the problems?

We are refactoring the preprocessing part so that we can place the thread pool and tokenizer pool in one place. However, we don't yet know how many tokenizers are being used at large and have gone unnoticed.

#41181: Adds thread-safety wrapper. Tried implementation with mutex and thread-local copy. Not a huge perf difference in my admittedly limited testing

Signed-off-by: Yifan <yzong@redhat.com>

yzong-rh · 2026-04-28T18:07:23Z

-            mm_tokenizer = copy.deepcopy(tokenizer)
+            # Cannot self.executor_tokenizer because the mm processor might
+            # mutate the tokenizer, corrupting the shared tokenizer.
+            self.mm_tokenizer = copy.deepcopy(tokenizer)


Unfortunately, we can't use self.executor_tokenizer here because mm_processor may mutate their tokenizer. They were not designed with thread-safety in mind.

For example, DeepseekVLV2Processor

vllm/vllm/transformers_utils/processors/deepseek_vl2.py

Lines 100 to 115 in a608836

if image_token_id is None:

special_tokens = [image_token]

special_tokens_dict = {"additional_special_tokens": special_tokens}

self.tokenizer.add_special_tokens(special_tokens_dict)

self.image_token_id = self.tokenizer.vocab.get(image_token)

# add five special tokens for grounding-related tasks

# <|ref|>, <|/ref|>, <|det|>, <|/det|>, <|grounding|>

special_tokens = ["<|ref|>", "<|/ref|>", "<|det|>", "<|/det|>", "<|grounding|>"]

special_tokens_dict = {"additional_special_tokens": special_tokens}

self.tokenizer.add_special_tokens(special_tokens_dict)

# add special tokens for SFT data

special_tokens = ["<|User|>", "<|Assistant|>"]

special_tokens_dict = {"additional_special_tokens": special_tokens}

self.tokenizer.add_special_tokens(special_tokens_dict)

mergify Bot added mistral Related to Mistral models bug Something isn't working labels Apr 27, 2026

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

[Bugfix] Fix RuntimeError: Already borrowed by enforcing one tokeni…

cd7560d

…zer per thread Signed-off-by: Yifan <yzong@redhat.com>

yzong-rh force-pushed the yzong-rh/executor_tokenizer branch from 1e687a9 to cd7560d Compare April 27, 2026 20:17

yzong-rh mentioned this pull request Apr 27, 2026

[Bug]: Huggingface Tokenizer "RuntimeError: Already borrowed". #40949

Open

1 task

Addr comment

c84ccf1

Signed-off-by: Yifan <yzong@redhat.com>

noooop reviewed Apr 28, 2026

View reviewed changes

yzong-rh changed the title ~~[Bugfix] Fix RuntimeError: Already borrowed by enforcing one tokenizer per thread~~ [Bugfix] Fix RuntimeError: Already borrowed by enforcing one tokenizer per thread and one thread in threadpool Apr 28, 2026

yzong-rh added 2 commits April 28, 2026 18:03

Give mm_processor its own tokeizer

980b6cf

Signed-off-by: Yifan <yzong@redhat.com>

Fix mm_tokenizer scope

48c537b

Signed-off-by: Yifan <yzong@redhat.com>

yzong-rh commented Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix `RuntimeError: Already borrowed` by enforcing one tokenizer per thread and one thread in threadpool#41047

[Bugfix] Fix `RuntimeError: Already borrowed` by enforcing one tokenizer per thread and one thread in threadpool#41047
yzong-rh wants to merge 4 commits intovllm-project:mainfrom
yzong-rh:yzong-rh/executor_tokenizer

yzong-rh commented Apr 27, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

yzong-rh Apr 27, 2026

Uh oh!

noooop Apr 28, 2026

Uh oh!

yzong-rh Apr 28, 2026

Uh oh!

noooop Apr 28, 2026

Uh oh!

yzong-rh Apr 29, 2026

Uh oh!

yzong-rh Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if image_token_id is None:
	special_tokens = [image_token]
	special_tokens_dict = {"additional_special_tokens": special_tokens}
	self.tokenizer.add_special_tokens(special_tokens_dict)
	self.image_token_id = self.tokenizer.vocab.get(image_token)

	# add five special tokens for grounding-related tasks
	# <\|ref\|>, <\|/ref\|>, <\|det\|>, <\|/det\|>, <\|grounding\|>
	special_tokens = ["<\|ref\|>", "<\|/ref\|>", "<\|det\|>", "<\|/det\|>", "<\|grounding\|>"]
	special_tokens_dict = {"additional_special_tokens": special_tokens}
	self.tokenizer.add_special_tokens(special_tokens_dict)

	# add special tokens for SFT data
	special_tokens = ["<\|User\|>", "<\|Assistant\|>"]
	special_tokens_dict = {"additional_special_tokens": special_tokens}
	self.tokenizer.add_special_tokens(special_tokens_dict)

Uh oh!

Conversation

yzong-rh commented Apr 27, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

yzong-rh Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

noooop Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

yzong-rh Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

noooop Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

yzong-rh Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

yzong-rh Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yzong-rh commented Apr 27, 2026 •

edited by github-actions Bot

Loading