Skip to content

fix: handle Windows special filenames in cache clean#17241

Closed
blueberrycongee wants to merge 7 commits intoastral-sh:mainfrom
blueberrycongee:fix/windows-special-filename-cache-clean
Closed

fix: handle Windows special filenames in cache clean#17241
blueberrycongee wants to merge 7 commits intoastral-sh:mainfrom
blueberrycongee:fix/windows-special-filename-cache-clean

Conversation

@blueberrycongee
Copy link

Summary

Fix uv cache clean failing on Windows when cached sdist contains files with Windows-incompatible filenames (e.g., files ending with a period like logging.).

Problem:
On Windows, when running uv add uwsgi, the build fails (expected, as uwsgi doesn't support Windows). However, when subsequently running uv cache clean, the command fails with:

error: Failed to clear cache at: AppData\Local\uv\cache
  Caused by: failed to remove file `...\logging.`: The system cannot find the file specified. (os error 2)

This happens because Windows' standard Win32 API automatically normalizes paths, stripping trailing dots from filenames. When trying to delete such files, the normalized path doesn't match the actual filename on disk.

Solution:
Use the extended-length path prefix (\\?\) to bypass Win32 path normalization when deletion fails with NotFound or InvalidInput errors. This is the standard Windows approach for handling:

  • Filenames with special characters (trailing dots, spaces)
  • Paths exceeding MAX_PATH limit

The fix is applied to remove_file, remove_dir, and remove_dir_all functions in crates/uv-cache/src/removal.rs.

Fixes #16586

Test Plan

  • Added unit tests for to_extended_path() function covering:

    • Absolute paths → extended path conversion
    • Already extended paths → returned as-is
    • UNC paths → extended UNC format conversion
    • Non-Windows platforms → no-op behavior
  • Added unit tests for file/directory removal:

    • Normal file removal
    • Readonly file removal (verifies set_not_readonly logic)
    • Empty directory removal
    • Directory with contents removal (remove_dir_all)
  • Added tests for rm_rf() function:

    • Single file removal with statistics verification
    • Directory tree removal with statistics verification
    • Nonexistent path handling (should succeed with zero counts)
  • Added test for Removal struct's AddAssign implementation

  • Manual verification: Reproduced the original issue with uwsgi package and confirmed uv cache clean now succeeds after the fix.

@blueberrycongee blueberrycongee force-pushed the fix/windows-special-filename-cache-clean branch from 60c6ab8 to d6de313 Compare December 26, 2025 19:06
Comment on lines +227 to +230
let path_str = path.to_string_lossy();
if path_str.starts_with(r"\\?") {
return std::borrow::Cow::Borrowed(path);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an explicit match to Prefix::Verbatim, Prefix::VerbatimDisk, Prefix::VerbatimUNC would be better here. I assume \\? is a typo and you meant \\?\? Nonetheless, the std::path::Prefix enum would help with this.

///
/// On non-Windows systems, this is a no-op that returns the original path.
#[cfg(windows)]
fn to_extended_path(path: &Path) -> std::borrow::Cow<'_, Path> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_verbatim_path would be more appropriate naming wise

Comment on lines +236 to +240
match std::env::current_dir() {
Ok(cwd) => cwd.join(path),
// If current_dir() fails, we can't create a valid extended path
Err(_) => return std::borrow::Cow::Borrowed(path),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we should try to guess what the absolute path should be here as it depends on the context.

Comment on lines +245 to +250
// Handle UNC paths: \\server\share -> \\?\UNC\server\share
let extended = if let Some(stripped) = abs_str.strip_prefix(r"\\") {
PathBuf::from(format!(r"\\?\UNC\{stripped}"))
} else {
PathBuf::from(format!(r"\\?\{}", abs_path.display()))
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather check Prefix::UNC first before handling \\

Comment on lines +255 to +259
#[cfg(not(windows))]
#[allow(dead_code)]
fn to_extended_path(path: &Path) -> std::borrow::Cow<'_, Path> {
std::borrow::Cow::Borrowed(path)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? All the changed code I see is behind #[cfg(windows)]

Comment on lines +373 to +374
#[cfg(test)]
mod tests {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a test in crates/uv/tests/it/cache_clean.rs for the OP scenario instead.

@blueberrycongee blueberrycongee force-pushed the fix/windows-special-filename-cache-clean branch from d3fdada to d92821d Compare December 29, 2025 08:42
@blueberrycongee
Copy link
Author

The CI failure is unrelated to this PR - it's a GitHub LFS service timeout:

batch response: Fatal error: We couldn't respond to your request in time.

The failing tests (tool_install_git_lfs, tool_run_git_lfs) are flaky due to external service issues. Could you please re-run the failed jobs?

@blueberrycongee
Copy link
Author

@samypr100

@samypr100 samypr100 added the windows Specific to the Windows platform label Dec 31, 2025
Comment on lines +221 to +254
fn to_verbatim_path(path: &Path) -> std::borrow::Cow<'_, Path> {
use std::path::{Component, PathBuf, Prefix};

if let Some(Component::Prefix(prefix)) = path.components().next() {
match prefix.kind() {
// Already a verbatim path, return as-is
Prefix::Verbatim(_) | Prefix::VerbatimDisk(_) | Prefix::VerbatimUNC(_, _) => {
return std::borrow::Cow::Borrowed(path);
}
// UNC path: \\server\share\... -> \\?\UNC\server\share\...
Prefix::UNC(server, share) => {
let suffix: PathBuf = path.components().skip(1).collect();
let verbatim = PathBuf::from(format!(
r"\\?\UNC\{}\{}",
server.to_string_lossy(),
share.to_string_lossy()
))
.join(suffix);
return std::borrow::Cow::Owned(verbatim);
}
// Disk path: C:\... -> \\?\C:\...
Prefix::Disk(_) => {
return std::borrow::Cow::Owned(PathBuf::from(format!(r"\\?\{}", path.display())));
}
// DeviceNS path: \\.\device -> not typically used, return as-is
Prefix::DeviceNS(_) => {
return std::borrow::Cow::Borrowed(path);
}
}
}

// Relative path or no prefix, return unchanged
std::borrow::Cow::Borrowed(path)
}
Copy link
Collaborator

@samypr100 samypr100 Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the applying the changes.

I'm thinking this fits in uv-fs, but I'd like to hear from others first.

CC @konstin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this should be a generic util in uv-fs

use std::path::{Component, PathBuf, Prefix};

if let Some(Component::Prefix(prefix)) = path.components().next() {
match prefix.kind() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can return the match directly, and avoid all the other inner returns

Comment on lines +268 to +270
#[cfg(windows)]
Err(err)
if err.kind() == io::ErrorKind::NotFound
Copy link
Collaborator

@samypr100 samypr100 Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid the #cfg(windows) here by adding if cfg!(windows). That will help with the possible dead code warnings and make sure we always exercise the compiler on this for all platforms (since there should be no issues)

Comment on lines +276 to +289
let verbatim_path = to_verbatim_path(path);
if verbatim_path.as_ref() != path {
match fs_err::remove_file(verbatim_path.as_ref()) {
Ok(()) => return Ok(()),
// Handle the case where file has special chars AND is readonly
Err(e)
if e.kind() == io::ErrorKind::PermissionDenied
&& set_not_readonly(verbatim_path.as_ref()).unwrap_or(false) =>
{
return fs_err::remove_file(verbatim_path.as_ref()).or(Err(err));
}
Err(_) => {}
}
}
Copy link
Collaborator

@samypr100 samypr100 Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also thinking calling remove_file (recurse once) with a good exit condition here would be better than repeating all the code twice. Same for remove_dir/remove_dir_all.

@blueberrycongee
Copy link
Author

@samypr100

@konstin
Copy link
Member

konstin commented Jan 5, 2026

I'm not sure about the general approach here, can we do a directory traversal with UNC paths directly instead, to avoid an error occurring in the first place?

Comment on lines +238 to +240
r"\\?\UNC\{}\{}",
server.to_string_lossy(),
share.to_string_lossy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to avoid lossy conversion here

@blueberrycongee
Copy link
Author

@samypr100 - Fixed! The UNC path handling now uses PathBuf::push() directly with the &OsStr values instead of to_string_lossy().

@konstin - The current approach falls back to verbatim paths only when needed (on NotFound/InvalidInput errors). This avoids the overhead of always using verbatim paths for directory traversal while still handling edge cases with special characters like trailing dots.

@konstin
Copy link
Member

konstin commented Jan 7, 2026

This avoids the overhead of always using verbatim paths for directory traversal

What is the overhead for doing that?

@blueberrycongee
Copy link
Author

@konstin Looking at the code, the overhead would be:

Memory allocation: to_verbatim_path() creates new PathBuf/OsString for paths that need conversion (UNC and Disk paths), while already-verbatim paths return Cow::Borrowed with no allocation.

Path component processing: For UNC paths, it does path.components().skip(1).collect() which iterates through the remaining path components after the UNC prefix.

Directory traversal: The current code uses walkdir::WalkDir::new(path) which wouldn't automatically benefit from verbatim paths anyway - we'd need to convert every path during traversal.

The current approach only pays this cost when we hit specific errors (NotFound/InvalidInput), rather than for every path operation.

@konstin
Copy link
Member

konstin commented Jan 7, 2026

Did you use an LLM for that or did you check the options doing a walkdir with verbatim paths yourself?

@blueberrycongee
Copy link
Author

@konstin You're right, I didn't actually test the verbatim-path-first approach before answering.

I just tested it now:

WalkDir::new("\?\C:...") works correctly
entry.path() automatically inherits the \?\ prefix
Deletion with entry.path() succeeds without any fallback logic
So converting to verbatim path once at the rm_rf entry point is cleaner. I can refactor to that approach if you'd prefer.

@samypr100
Copy link
Collaborator

@konstin You're right, I didn't actually test the verbatim-path-first approach before answering.

I just tested it now:

WalkDir::new("?\C:...") works correctly entry.path() automatically inherits the ?\ prefix Deletion with entry.path() succeeds without any fallback logic So converting to verbatim path once at the rm_rf entry point is cleaner. I can refactor to that approach if you'd prefer.

Yes, please do so

@blueberrycongee
Copy link
Author

Update summary:

  • On Windows, rm_rf now converts to a verbatim path (\\?\) at the entry point, so traversal and deletion use the verbatim path throughout and avoid Win32 normalization issues with special filenames (e.g., trailing dots).
  • Removed the retry/fallback logic inside remove_file/remove_dir/remove_dir_all since the verbatim conversion is handled once up front.
  • Added a Windows-only integration test in cache_clean that creates a trailing-dot filename and verifies uv cache clean succeeds (matches issue 16586).

If you’d prefer to move the helper into uv-fs or adjust the approach, let me know and I can follow up.

On Windows, files with special characters (like trailing dots) cannot be deleted using standard Win32 API paths due to path normalization. This fix uses the extended-length path prefix (\\?\) to bypass normalization when the initial deletion fails with NotFound or InvalidInput errors.

Fixes astral-sh#16586
On Windows, files with special characters (like trailing dots) cannot be deleted using standard Win32 API paths due to path normalization. This fix uses the extended-length path prefix (\\?\) to bypass normalization when the initial deletion fails with NotFound or InvalidInput errors.

Fixes astral-sh#16586
On Windows, files with special characters (like trailing dots) fail to
delete due to Win32 path normalization. This causes \uv cache clean\
to fail when cached sdists contain such files (e.g., uwsgi's \logging.\).

Changes:
- Rename \	o_extended_path\ to \	o_verbatim_path\ for clarity
- Use \Prefix\ enum instead of string matching for path detection
- Add verbatim path fallback (\\\\?\\) for NotFound/InvalidInput errors
- Remove \current_dir()\ logic: relative paths return unchanged
- Remove non-Windows version of the function

Tests:
- Add \clean_trailing_dot_filename\ integration test
- Add unit tests for verbatim path conversion

Fixes astral-sh#15569
- Simplify to_verbatim_path by returning match directly instead of inner returns
- Extract _impl functions to avoid code duplication when retrying with verbatim paths
@blueberrycongee blueberrycongee force-pushed the fix/windows-special-filename-cache-clean branch from ec9b189 to 0b1918d Compare February 19, 2026 15:24
@konstin
Copy link
Member

konstin commented Feb 24, 2026

@blueberrycongee LLMs can't really do those path transformations well, and they also e.g. can't tell what we need unit tests for and what's unnecessary tests. For this PR to succeed, it needs to have a humna-driven, non-LLM implementation. Are you interested doing that? Otherwise, I don't think we can't land this PR.

@blueberrycongee
Copy link
Author

@konstin Hi, thank you for the honest feedback.

I apologize for the quality of this PR. You're right—I over-relied on LLMs to generate the code
without fully understanding the implementation details myself. This was a mistake and reflects
my lack of experience with proper code collaboration practices.

I'm a beginner and still learning, but I'm genuinely interested in fixing this bug the right
way. If this issue isn't urgent, I'd appreciate some time to study the codebase more carefully
and attempt a proper, human-driven fix.

I'm also very open to any specific guidance you could offer on how to approach this problem
correctly. I understand if you'd prefer to reassign this issue, but I'd welcome the opportunity
to learn and contribute properly.

Again, I'm sorry for the noise this PR created. I'll be more careful in the future.

@konstin konstin closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

windows Specific to the Windows platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

uv cache clean fails on Windows when sdist contains special filenames

3 participants