Fix for Windows file locking (os error 1224) by PantelisAndrianakis · Pull Request #589 · unslothai/unsloth-zoo

PantelisAndrianakis · 2026-04-10T12:42:40Z

Implement robust Windows file locking handling in model export

The export process was failing with os error 1224 ("file with a user-mapped section open") when trying to save merged LoRA weights on Windows. This occurred because safetensors was memory-mapping files that couldn't be replaced due to locks held by the kernel or other processes.

Changes:

Add retry logic with exponential backoff (10 attempts, up to ~1.6s wait)
Force garbage collection and CUDA cache cleanup before each write attempt
Attempt to delete the original locked file before writing
Write directly to target location instead of temp file (simpler atomic ops)
Improved error messages with clear remediation steps

This allows GGUF exports to succeed on Windows systems with aggressive file locking (antivirus, indexing, etc.). Users can now export Gemma4 models directly in Unsloth Studio without workarounds.

Tested with: Gemma4 model, Q4_K_M quantization, Windows 11 Pro

Implement robust Windows file locking handling in model export The export process was failing with os error 1224 ("file with a user-mapped section open") when trying to save merged LoRA weights on Windows. This occurred because safetensors was memory-mapping files that couldn't be replaced due to locks held by the kernel or other processes. Changes: - Add retry logic with exponential backoff (10 attempts, up to ~1.6s wait) - Force garbage collection and CUDA cache cleanup before each write attempt - Attempt to delete the original locked file before writing - Write directly to target location instead of temp file (simpler atomic ops) - Improved error messages with clear remediation steps This allows GGUF exports to succeed on Windows systems with aggressive file locking (antivirus, indexing, etc.). Users can now export Gemma4 models directly in Unsloth Studio without workarounds. Tested with: Gemma4 model, Q4_K_M quantization, Windows 11 Pro

gemini-code-assist

Code Review

This pull request introduces a retry mechanism with exponential backoff to resolve Windows file locking issues (OS error 1224) when saving tensors. The implementation includes aggressive cleanup steps such as manual garbage collection and CUDA cache clearing to help release file handles. Review feedback correctly identifies redundant local imports and inconsistent module aliasing that should be addressed to maintain code quality and consistency with the rest of the project.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f205ac82a5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41997c33b3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Applied the safe atomic write fix to the new version. The key differences: - Write to temp file first (original stays intact) - Delete original only after successful write (no data loss) - Atomic move replaces with temp file - Cleanup on failure

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06249dd53d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@PantelisAndrianakis

…#595) * Fix Windows file lock on resized shard rewrite with atomic os.replace The vocab-resize rewrite branch in _merge_and_overwrite_lora re-opened filename_original via safe_open before calling save_file on the same path. On Windows the Rust safetensors MapViewOfFile release can lag after the context manager exits, so the follow-up write hit WinError 1224 ("cannot be performed on a file with a user-mapped section open") during Unsloth Studio GGUF exports. Write to a sibling temp file and swap it in with os.replace, which maps to MoveFileExW(MOVEFILE_REPLACE_EXISTING) on Windows and rename(2) on POSIX. The original shard is never absent if the replacement fails, so a transient AV or indexer lock cannot cause data loss. A single gc.collect before the first attempt drops the lingering mmap, which is the load-bearing step for the 1224 case. Short retry loop with errno/winerror-based lock detection handles AV rescan jitter. Supersedes #589 with an atomic-replace implementation that closes a delete-then-move data-loss window present in that PR. Credit to @PantelisAndrianakis for identifying and reporting the issue. * Fix review findings: gate POSIX false positives, hoist save_file, preserve permissions - Remove errno-based lock detection that false-positives on POSIX (EIO=5, EACCES=13, EPIPE=32); keep winerror + Windows-specific string checks only - Move save_file() before retry loop so multi-GB shards are not re-serialized on each os.replace retry - Preserve original file permissions via os.chmod before os.replace (mkstemp creates 0o600, but originals are typically 0o644) - Gate gc.collect() + torch.cuda.empty_cache() behind os.name == 'nt' since the mmap lag is Windows-only - Use module-level safetensors.SafetensorError instead of in-function import * Fix review findings: POSIX-only direct save, tighten lock classifier, cleanup 1. Gate temp-file swap to Windows only (os.name == 'nt'). On POSIX, save_file writes directly to the original path, preserving symlinks, hardlinks, and avoiding the extra disk copy. 2. Tighten the winerror=5 lock classifier. Bare ACCESS_DENIED no longer triggers retry -- only when accompanied by "user-mapped", "being used by another process", or "sharing violation" in the error message. 3. Remove dead SafetensorError from except clause. os.replace only raises OSError; SafetensorError is unreachable. 4. Release mmap-backed tensor refs (del tensors + gc.collect) before the os.replace retry loop, not after. This drops the lingering mmap on the source shard before attempting the replace. 5. Preserve exception cause chain in outer except. RuntimeError now passes through directly; other exceptions are wrapped with 'from e' so __cause__ is not stripped. * Shorten inline comments

gemini-code-assist Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread unsloth_zoo/saving_utils.py Outdated

Comment thread unsloth_zoo/saving_utils.py Outdated

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread unsloth_zoo/saving_utils.py Outdated

PantelisAndrianakis and others added 2 commits April 10, 2026 15:48

Removed unsloth_zoo/saving_utils.py redundant imports.

4fa3009

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Removed redundant unsloth_zoo/saving_utils.py local import of gc

41997c3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread unsloth_zoo/saving_utils.py Outdated

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread unsloth_zoo/saving_utils.py

Import safetensors exception to catch wrapped Windows errors

74d9ada

danielhanchen mentioned this pull request Apr 15, 2026

Fix Windows file lock on resized shard rewrite with atomic os.replace #595

Merged

8 tasks

danielhanchen closed this in #595 Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for Windows file locking (os error 1224)#589

Fix for Windows file locking (os error 1224)#589
PantelisAndrianakis wants to merge 5 commits into
unslothai:mainfrom
PantelisAndrianakis:main

PantelisAndrianakis commented Apr 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PantelisAndrianakis commented Apr 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant