Skip to content

Fix multi-threaded dataloader for Qwen3/Mistral text encoders#1346

Merged
dxqb merged 1 commit intoNerogar:mergefrom
BitcrushedHeart:fix/thread-safe-qwen3-forward
Mar 1, 2026
Merged

Fix multi-threaded dataloader for Qwen3/Mistral text encoders#1346
dxqb merged 1 commit intoNerogar:mergefrom
BitcrushedHeart:fix/thread-safe-qwen3-forward

Conversation

@BitcrushedHeart
Copy link
Copy Markdown
Contributor

@BitcrushedHeart BitcrushedHeart commented Feb 26, 2026

Description

Enables dataloader_threads > 1 for Z-Image and Flux2.Klein models by working around a thread-safety bug in the transformers library's check_model_inputs decorator (huggingface/transformers#42673).

Closes #1291

Problem

The check_model_inputs decorator in transformers v4 monkey-patches child module .forward() methods on every call to capture output_hidden_states, then restores them after. When two dataloader threads call the same text encoder concurrently, they race on patching/restoring these methods, causing hidden states from different threads to bleed into each other.

Fix

Wraps the text encoder's .forward() with a per-instance threading.Lock to serialize concurrent calls, preventing the race condition. The lock is applied conditionally only when dataloader_threads > 1 and is idempotent (safe if called multiple times).

Performance impact is negligible since GPU computation is already serialized on a single device. The benefit of multiple dataloader threads (pipelining CPU image loading/preprocessing against GPU encoding) is preserved.

Also proactively applies the same fix to the Flux2.Dev (Mistral) path, which has the same underlying vulnerability via MistralModel.forward().

The upstream fix (huggingface/transformers#43765) shipped in transformers v5 only. This workaround can be removed when upgrading to v5+.

Changes

  • modules/util/thread_safety.py: New utility — apply_thread_safe_forward() wraps a model's forward with a per-instance lock
  • modules/dataLoader/ZImageBaseDataLoader.py: Replace NotImplementedError with thread-safe forward patch
  • modules/dataLoader/Flux2BaseDataLoader.py: Replace NotImplementedError (Klein) and add proactive fix (Dev)

Testing Notes

Verified the bug and fix with a tiny Qwen3ForCausalLM (CPU, random weights, 280K params):

  • Without lock: 4 threads x 100 iterations — hidden states corrupt immediately (expected 3 layers, got 6-9)
  • With lock: 4 threads x 100 iterations — 400/400 calls correct, zero errors

Tested on Windows 11, Python 3.10.11.

Will run a full test on Z-Image either today or tomorrow.

@BitcrushedHeart
Copy link
Copy Markdown
Contributor Author

Tested on Z-Image. Using 12 threads took caching 110k files from 3.5 hours to 50 minutes.

@dxqb dxqb added the merging last steps before merge label Feb 27, 2026
@dxqb dxqb changed the base branch from master to merge March 1, 2026 07:32
@dxqb dxqb merged commit 4d7ec96 into Nerogar:merge Mar 1, 2026
1 check passed
@BitcrushedHeart BitcrushedHeart deleted the fix/thread-safe-qwen3-forward branch March 1, 2026 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merging last steps before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feat]: Support > 1 dataloader threads for Z-Image and Flux2.Klein

2 participants