fix: handle Windows special filenames in cache clean#17241
fix: handle Windows special filenames in cache clean#17241blueberrycongee wants to merge 7 commits intoastral-sh:mainfrom
Conversation
60c6ab8 to
d6de313
Compare
crates/uv-cache/src/removal.rs
Outdated
| let path_str = path.to_string_lossy(); | ||
| if path_str.starts_with(r"\\?") { | ||
| return std::borrow::Cow::Borrowed(path); | ||
| } |
There was a problem hiding this comment.
I think an explicit match to Prefix::Verbatim, Prefix::VerbatimDisk, Prefix::VerbatimUNC would be better here. I assume \\? is a typo and you meant \\?\? Nonetheless, the std::path::Prefix enum would help with this.
crates/uv-cache/src/removal.rs
Outdated
| /// | ||
| /// On non-Windows systems, this is a no-op that returns the original path. | ||
| #[cfg(windows)] | ||
| fn to_extended_path(path: &Path) -> std::borrow::Cow<'_, Path> { |
There was a problem hiding this comment.
to_verbatim_path would be more appropriate naming wise
crates/uv-cache/src/removal.rs
Outdated
| match std::env::current_dir() { | ||
| Ok(cwd) => cwd.join(path), | ||
| // If current_dir() fails, we can't create a valid extended path | ||
| Err(_) => return std::borrow::Cow::Borrowed(path), | ||
| } |
There was a problem hiding this comment.
I'm not convinced we should try to guess what the absolute path should be here as it depends on the context.
crates/uv-cache/src/removal.rs
Outdated
| // Handle UNC paths: \\server\share -> \\?\UNC\server\share | ||
| let extended = if let Some(stripped) = abs_str.strip_prefix(r"\\") { | ||
| PathBuf::from(format!(r"\\?\UNC\{stripped}")) | ||
| } else { | ||
| PathBuf::from(format!(r"\\?\{}", abs_path.display())) | ||
| }; |
There was a problem hiding this comment.
I'd rather check Prefix::UNC first before handling \\
crates/uv-cache/src/removal.rs
Outdated
| #[cfg(not(windows))] | ||
| #[allow(dead_code)] | ||
| fn to_extended_path(path: &Path) -> std::borrow::Cow<'_, Path> { | ||
| std::borrow::Cow::Borrowed(path) | ||
| } |
There was a problem hiding this comment.
Is this needed? All the changed code I see is behind #[cfg(windows)]
| #[cfg(test)] | ||
| mod tests { |
There was a problem hiding this comment.
I'd add a test in crates/uv/tests/it/cache_clean.rs for the OP scenario instead.
d3fdada to
d92821d
Compare
|
The CI failure is unrelated to this PR - it's a GitHub LFS service timeout:
The failing tests ( |
| fn to_verbatim_path(path: &Path) -> std::borrow::Cow<'_, Path> { | ||
| use std::path::{Component, PathBuf, Prefix}; | ||
|
|
||
| if let Some(Component::Prefix(prefix)) = path.components().next() { | ||
| match prefix.kind() { | ||
| // Already a verbatim path, return as-is | ||
| Prefix::Verbatim(_) | Prefix::VerbatimDisk(_) | Prefix::VerbatimUNC(_, _) => { | ||
| return std::borrow::Cow::Borrowed(path); | ||
| } | ||
| // UNC path: \\server\share\... -> \\?\UNC\server\share\... | ||
| Prefix::UNC(server, share) => { | ||
| let suffix: PathBuf = path.components().skip(1).collect(); | ||
| let verbatim = PathBuf::from(format!( | ||
| r"\\?\UNC\{}\{}", | ||
| server.to_string_lossy(), | ||
| share.to_string_lossy() | ||
| )) | ||
| .join(suffix); | ||
| return std::borrow::Cow::Owned(verbatim); | ||
| } | ||
| // Disk path: C:\... -> \\?\C:\... | ||
| Prefix::Disk(_) => { | ||
| return std::borrow::Cow::Owned(PathBuf::from(format!(r"\\?\{}", path.display()))); | ||
| } | ||
| // DeviceNS path: \\.\device -> not typically used, return as-is | ||
| Prefix::DeviceNS(_) => { | ||
| return std::borrow::Cow::Borrowed(path); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Relative path or no prefix, return unchanged | ||
| std::borrow::Cow::Borrowed(path) | ||
| } |
There was a problem hiding this comment.
Thanks for the applying the changes.
I'm thinking this fits in uv-fs, but I'd like to hear from others first.
CC @konstin
There was a problem hiding this comment.
Yes this should be a generic util in uv-fs
crates/uv-cache/src/removal.rs
Outdated
| use std::path::{Component, PathBuf, Prefix}; | ||
|
|
||
| if let Some(Component::Prefix(prefix)) = path.components().next() { | ||
| match prefix.kind() { |
There was a problem hiding this comment.
nit: you can return the match directly, and avoid all the other inner returns
crates/uv-cache/src/removal.rs
Outdated
| #[cfg(windows)] | ||
| Err(err) | ||
| if err.kind() == io::ErrorKind::NotFound |
There was a problem hiding this comment.
We should avoid the #cfg(windows) here by adding if cfg!(windows). That will help with the possible dead code warnings and make sure we always exercise the compiler on this for all platforms (since there should be no issues)
crates/uv-cache/src/removal.rs
Outdated
| let verbatim_path = to_verbatim_path(path); | ||
| if verbatim_path.as_ref() != path { | ||
| match fs_err::remove_file(verbatim_path.as_ref()) { | ||
| Ok(()) => return Ok(()), | ||
| // Handle the case where file has special chars AND is readonly | ||
| Err(e) | ||
| if e.kind() == io::ErrorKind::PermissionDenied | ||
| && set_not_readonly(verbatim_path.as_ref()).unwrap_or(false) => | ||
| { | ||
| return fs_err::remove_file(verbatim_path.as_ref()).or(Err(err)); | ||
| } | ||
| Err(_) => {} | ||
| } | ||
| } |
There was a problem hiding this comment.
I'm also thinking calling remove_file (recurse once) with a good exit condition here would be better than repeating all the code twice. Same for remove_dir/remove_dir_all.
|
I'm not sure about the general approach here, can we do a directory traversal with UNC paths directly instead, to avoid an error occurring in the first place? |
crates/uv-cache/src/removal.rs
Outdated
| r"\\?\UNC\{}\{}", | ||
| server.to_string_lossy(), | ||
| share.to_string_lossy() |
There was a problem hiding this comment.
We'll want to avoid lossy conversion here
|
@samypr100 - Fixed! The UNC path handling now uses PathBuf::push() directly with the &OsStr values instead of to_string_lossy(). @konstin - The current approach falls back to verbatim paths only when needed (on NotFound/InvalidInput errors). This avoids the overhead of always using verbatim paths for directory traversal while still handling edge cases with special characters like trailing dots. |
What is the overhead for doing that? |
|
@konstin Looking at the code, the overhead would be: Memory allocation: to_verbatim_path() creates new PathBuf/OsString for paths that need conversion (UNC and Disk paths), while already-verbatim paths return Cow::Borrowed with no allocation. Path component processing: For UNC paths, it does path.components().skip(1).collect() which iterates through the remaining path components after the UNC prefix. Directory traversal: The current code uses walkdir::WalkDir::new(path) which wouldn't automatically benefit from verbatim paths anyway - we'd need to convert every path during traversal. The current approach only pays this cost when we hit specific errors (NotFound/InvalidInput), rather than for every path operation. |
|
Did you use an LLM for that or did you check the options doing a walkdir with verbatim paths yourself? |
|
@konstin You're right, I didn't actually test the verbatim-path-first approach before answering. I just tested it now: WalkDir::new("\?\C:...") works correctly |
Yes, please do so |
|
Update summary:
If you’d prefer to move the helper into |
On Windows, files with special characters (like trailing dots) cannot be deleted using standard Win32 API paths due to path normalization. This fix uses the extended-length path prefix (\\?\) to bypass normalization when the initial deletion fails with NotFound or InvalidInput errors. Fixes astral-sh#16586
On Windows, files with special characters (like trailing dots) cannot be deleted using standard Win32 API paths due to path normalization. This fix uses the extended-length path prefix (\\?\) to bypass normalization when the initial deletion fails with NotFound or InvalidInput errors. Fixes astral-sh#16586
On Windows, files with special characters (like trailing dots) fail to delete due to Win32 path normalization. This causes \uv cache clean\ to fail when cached sdists contain such files (e.g., uwsgi's \logging.\). Changes: - Rename \ o_extended_path\ to \ o_verbatim_path\ for clarity - Use \Prefix\ enum instead of string matching for path detection - Add verbatim path fallback (\\\\?\\) for NotFound/InvalidInput errors - Remove \current_dir()\ logic: relative paths return unchanged - Remove non-Windows version of the function Tests: - Add \clean_trailing_dot_filename\ integration test - Add unit tests for verbatim path conversion Fixes astral-sh#15569
- Simplify to_verbatim_path by returning match directly instead of inner returns - Extract _impl functions to avoid code duplication when retrying with verbatim paths
ec9b189 to
0b1918d
Compare
|
@blueberrycongee LLMs can't really do those path transformations well, and they also e.g. can't tell what we need unit tests for and what's unnecessary tests. For this PR to succeed, it needs to have a humna-driven, non-LLM implementation. Are you interested doing that? Otherwise, I don't think we can't land this PR. |
|
@konstin Hi, thank you for the honest feedback. I apologize for the quality of this PR. You're right—I over-relied on LLMs to generate the code I'm a beginner and still learning, but I'm genuinely interested in fixing this bug the right I'm also very open to any specific guidance you could offer on how to approach this problem Again, I'm sorry for the noise this PR created. I'll be more careful in the future. |
Summary
Fix
uv cache cleanfailing on Windows when cached sdist contains files with Windows-incompatible filenames (e.g., files ending with a period likelogging.).Problem:
On Windows, when running
uv add uwsgi, the build fails (expected, as uwsgi doesn't support Windows). However, when subsequently runninguv cache clean, the command fails with:This happens because Windows' standard Win32 API automatically normalizes paths, stripping trailing dots from filenames. When trying to delete such files, the normalized path doesn't match the actual filename on disk.
Solution:
Use the extended-length path prefix (
\\?\) to bypass Win32 path normalization when deletion fails withNotFoundorInvalidInputerrors. This is the standard Windows approach for handling:The fix is applied to
remove_file,remove_dir, andremove_dir_allfunctions incrates/uv-cache/src/removal.rs.Fixes #16586
Test Plan
Added unit tests for
to_extended_path()function covering:Added unit tests for file/directory removal:
set_not_readonlylogic)remove_dir_all)Added tests for
rm_rf()function:Added test for
Removalstruct'sAddAssignimplementationManual verification: Reproduced the original issue with
uwsgipackage and confirmeduv cache cleannow succeeds after the fix.