-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid calling normalize_path
with relative paths that extend beyond the current directory
#3013
Conversation
|
||
let cache_entry = cache.entry( | ||
CacheBucket::Interpreter, | ||
"", | ||
format!("{}.msgpack", digest(&executable_bytes)), | ||
format!("{}.msgpack", digest(&canonical)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@konstin - I think it's okay to cache under the canonical path here rather than the executable path... since we already use the canonical path for the timestamp. It could cause issues with executables that are symlinks (that's a common source of bugs here), but I think it's still fine since we're not using the canonical path to query the executable, which is what tends to cause issues.
crates/uv-fs/src/path.rs
Outdated
/// Normalize a path, removing things like `.` and `..`. | ||
/// | ||
/// Assumes that the path is already absolute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we want to remove those? Should we add a debug assertion that they're not present at the beginning of the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we want to remove ..
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that what you're asking?
2bff12c
to
460a029
Compare
460a029
to
1ff35cf
Compare
@@ -488,16 +488,15 @@ impl InterpreterInfo { | |||
/// unless the Python executable changes, so we use the executable's last modified | |||
/// time as a cache key. | |||
pub(crate) fn query_cached(executable: &Path, cache: &Cache) -> Result<Self, Error> { | |||
let executable_bytes = executable.as_os_str().as_encoded_bytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\cc @BurntSushi - I changed this to just hash the path directly. Not sure if it's equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to grok why you need the hashes to be the same here. I'm not understanding that park. Like, even if this generates a different hash, that seems okay, it just means you'll get a cache miss? Or is there something deeper here that I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't need to be the same!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was mostly a Chesterton's Fence of trying to understand why we did this in the first place.
@@ -36,6 +36,8 @@ impl VerbatimUrl { | |||
} | |||
|
|||
/// Create a [`VerbatimUrl`] from a file path. | |||
/// | |||
/// Assumes that the path is absolute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we panic if it isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.
pub fn normalize_path(path: impl AsRef<Path>) -> PathBuf { | ||
let mut components = path.as_ref().components().peekable(); | ||
/// | ||
/// CAUTION: Assumes that the path is already absolute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Can we panic if it isn't absolute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.
@@ -488,16 +488,15 @@ impl InterpreterInfo { | |||
/// unless the Python executable changes, so we use the executable's last modified | |||
/// time as a cache key. | |||
pub(crate) fn query_cached(executable: &Path, cache: &Cache) -> Result<Self, Error> { | |||
let executable_bytes = executable.as_os_str().as_encoded_bytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -488,16 +488,15 @@ impl InterpreterInfo { | |||
/// unless the Python executable changes, so we use the executable's last modified | |||
/// time as a cache key. | |||
pub(crate) fn query_cached(executable: &Path, cache: &Cache) -> Result<Self, Error> { | |||
let executable_bytes = executable.as_os_str().as_encoded_bytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to grok why you need the hashes to be the same here. I'm not understanding that park. Like, even if this generates a different hash, that seems okay, it just means you'll get a cache miss? Or is there something deeper here that I'm missing?
I guess |
normalize_path
on possibly-relative pathsnormalize_path
with relative paths that extend beyond the current directory
Okay, could use another review here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary
It turns out that
normalize_path
(sourced from Cargo) has a subtle bug. If you pass it a relative path that traverses beyond the root, it silently drops components. So, e.g., passing../foo/bar
, it will just drop the leading..
and returnfoo/bar
.This PR encodes that behavior as a
Result
and avoids using it in such cases.Closes #3012.