Fix multi-threaded dataloader for Qwen3/Mistral text encoders by BitcrushedHeart · Pull Request #1346 · Nerogar/OneTrainer

BitcrushedHeart · 2026-02-26T20:14:08Z

Description

Enables dataloader_threads > 1 for Z-Image and Flux2.Klein models by working around a thread-safety bug in the transformers library's check_model_inputs decorator (huggingface/transformers#42673).

Closes #1291

Problem

The check_model_inputs decorator in transformers v4 monkey-patches child module .forward() methods on every call to capture output_hidden_states, then restores them after. When two dataloader threads call the same text encoder concurrently, they race on patching/restoring these methods, causing hidden states from different threads to bleed into each other.

Fix

Wraps the text encoder's .forward() with a per-instance threading.Lock to serialize concurrent calls, preventing the race condition. The lock is applied conditionally only when dataloader_threads > 1 and is idempotent (safe if called multiple times).

Performance impact is negligible since GPU computation is already serialized on a single device. The benefit of multiple dataloader threads (pipelining CPU image loading/preprocessing against GPU encoding) is preserved.

Also proactively applies the same fix to the Flux2.Dev (Mistral) path, which has the same underlying vulnerability via MistralModel.forward().

The upstream fix (huggingface/transformers#43765) shipped in transformers v5 only. This workaround can be removed when upgrading to v5+.

Changes

modules/util/thread_safety.py: New utility — apply_thread_safe_forward() wraps a model's forward with a per-instance lock
modules/dataLoader/ZImageBaseDataLoader.py: Replace NotImplementedError with thread-safe forward patch
modules/dataLoader/Flux2BaseDataLoader.py: Replace NotImplementedError (Klein) and add proactive fix (Dev)

Testing Notes

Verified the bug and fix with a tiny Qwen3ForCausalLM (CPU, random weights, 280K params):

Without lock: 4 threads x 100 iterations — hidden states corrupt immediately (expected 3 layers, got 6-9)
With lock: 4 threads x 100 iterations — 400/400 calls correct, zero errors

Tested on Windows 11, Python 3.10.11.

Will run a full test on Z-Image either today or tomorrow.

BitcrushedHeart · 2026-02-26T22:48:43Z

Tested on Z-Image. Using 12 threads took caching 110k files from 3.5 hours to 50 minutes.

fix: enable multi-threaded dataloader for Qwen3/Mistral text encoders

99c9e3d

BitcrushedHeart mentioned this pull request Feb 26, 2026

strip padding from text latent cache for variable length models #1345

Merged

dxqb added the merging last steps before merge label Feb 27, 2026

dxqb mentioned this pull request Feb 27, 2026

Upgrade transformers to 5.x and other dependencies #1285

Draft

dxqb changed the base branch from master to merge March 1, 2026 07:32

dxqb mentioned this pull request Mar 1, 2026

Qwen3ForCausalLM leaks VRAM if used in multiple dataloader threads huggingface/transformers#42673

Closed

4 tasks

dxqb merged commit 4d7ec96 into Nerogar:merge Mar 1, 2026
1 check passed

BitcrushedHeart deleted the fix/thread-safe-qwen3-forward branch March 1, 2026 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix multi-threaded dataloader for Qwen3/Mistral text encoders#1346

Fix multi-threaded dataloader for Qwen3/Mistral text encoders#1346
dxqb merged 1 commit intoNerogar:mergefrom
BitcrushedHeart:fix/thread-safe-qwen3-forward

BitcrushedHeart commented Feb 26, 2026 •

edited

Loading

Uh oh!

BitcrushedHeart commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

BitcrushedHeart commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Fix

Changes

Testing Notes

Uh oh!

BitcrushedHeart commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BitcrushedHeart commented Feb 26, 2026 •

edited

Loading