-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-99726: Add 'fast' argument to os.[l]stat for faster calculation #99727
Conversation
When passed as True, only st_mode's type bits, st_size and st_mtime[_nsec] are guaranteed to be set. Other fields may also be set for a given Python version on a given platform version, but may change without warning (in the case of OS changes - Python will try to keep them stable). This first implementation uses a new Windows API that is significantly faster, provided the volume identifier is not required. Other optimizations may be added later.
I'm not yet sold on the feature. We already have a faster stat combined with directory scanning. |
|
||
Get the status of the file descriptor *fd*. Return a :class:`stat_result` | ||
object. | ||
|
||
As of Python 3.3, this is equivalent to ``os.stat(fd)``. | ||
As of Python 3.3, this is equivalent to ``os.stat(fd, fast=fast)``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of Python 3.3, this is equivalent to ``os.stat(fd, fast=fast)``. | |
As of Python 3.3, this is equivalent to ``os.stat(fd, fast=False)``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 whoops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, no, this is correct. There's a fast
argument to this function, which gets passed through (it just doesn't happen to make any difference right now, but that's why fast
doesn't guarantee it'll leave out any information. Sometimes we have to take the slow path regardless, and that's the case here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, right, I missed that :) Never mind then
I'm wondering if something in the spirit of |
if (PathCchSkipRoot(path, &rootEnd) || | ||
wcsncpy_s(rootBuffer, MAX_PATH, path, (rootEnd - path))) { | ||
/* No root for the path, so let it use the current volume */ | ||
root = NULL; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is problematic. Instead, open a handle via CreateFileW()
and call GetVolumeInformationByHandleW()
. It's no more expensive than calling GetVolumeInformationW()
, which also has to open a path, and it's reliable. That said, we're only here because it isn't a 'fast' stat, so we may as well take the slow path from the start.
PathCchSkipRoot()
only supports backslash as the path separator. That's an irrelevant problem, however, considering the fundamental problem that the path may contain mount points, symlinks, and other reparse points. In this case the drive/device of the opened path is probably irrelevant to the file.
GetVolumeInformationW()
will open any path that has a trailing slash. However, after opening the path it queries the FileNameInformation
to verify that the final path ends in a backslash, i.e. that a filesystem root directory was opened. The problem is finding the relevant root directory in the path without resolving the final path. GetVolumePathNameW()
is supposed to handle this, but I find that it's buggy with substitute drives, for which it returns ridiculous results. It also returns the wrong result for a symlink to a UNC path. And it's more work than simply opening the path.
Actually, the requirement for a root directory is irrelevant to the underlying system call that queries the FileFsVolumeInformation (volume name and serial number) and FileFsAttributeInformation (filesystem name, flags, and max component length). Any open file on the volume is sufficient. Hence, just open the path and call GetVolumeInformationByHandleW()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know all of this. I'm still having a private discussion with the Windows team about optimising GetVolumeInformation
or adding VSN into the stat structure, either of which would let us bypass this extra step.
Right now, if someone passes fast=False
, we could always just go back to the original slow path. But in the future, this slow path could get drastically faster without us having to change anything, which means automatic "backports". Right now it's about the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know all of this.
Okay, I just felt it was worth discussing since the result of win32_xstat_get_st_dev()
as written would often be incorrect.
optimising GetVolumeInformation
They really should fix GetVolumeInformationW()
to work consistently with GetVolumeInformationByHandleW()
. All of the loops it jumps through to require a root directory are pointless and frustrating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an irrelevant problem, however, considering the fundamental problem that the path may contain mount points, symlinks, and other reparse points.
I think we can probably define this as an edge case: "lstat()
on Windows when passed a path including junctions or non-symlink reparse points that direct to another drive are ignored when getting st_dev
. The device matching the start of the path will be returned, even if the reparse points mean the file is on another drive."
It's also possible that if the VSN gets added into the stat data before Windows releases, that we won't need this path at all.
Co-authored-by: Eryk Sun <[email protected]>
Yeah, this is the reason I went with that API first (should be in a previous commit). I found it felt clunky compared to Guido is right that we already have a fast stat (for |
Ah, now we're on to something. A new function that returns a custom struct seems better than a flag to |
When passed as True, only st_mode's type bits, st_size and st_mtime[_nsec] are guaranteed to be set. Other fields may also be set for a given Python version on a given platform version, but may change without warning (in the case of OS changes - Python will try to keep them stable).
This first implementation uses a new Windows API that is significantly faster, provided the volume identifier is not required. Other optimizations may be added later.