-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pathlib.Path.glob does not follow symlinks #77609
Comments
Given a For example given the following:
Notice how the contents of Subfolder are present in the I would expect |
This looks like an issue specific to Windows? I can't replicate on Mac, and given Windows' method of implementing "symlinks" as junctions. |
Windows does not implement symlinks as junctions. Windows has hardlinks, symlinks and junctions which are all distinctly different in behaviour. I don't doubt that this is a Windows-specific issue, although I have not tested other platforms. Path.glob and .rglob does work for junctions and hardlinks but glob.glob works consistently for all three. |
I can reproduce the bug with Linux and python 3.7.5: Python 3.7.5 (default, Apr 19 2020, 20:18:17)
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pathlib import Path
>>> Path('a/b').mkdir(parents=True)
>>> Path('c/d').mkdir(parents=True)
>>> Path('a/c').symlink_to('../c')
>>> Path('e').symlink_to('c')
>>> list(Path('.').rglob('*'))
[PosixPath('e'), PosixPath('c'), PosixPath('a'), PosixPath('c/d'), PosixPath('a/c'), PosixPath('a/b')] Expected result:
|
Following symlinks was disabled on purpose as a fix for #70200: Line 319 in 69f6cc7
To re-enable it, we'd have to come up with a different mitigation for the symlink loops problem. Or possibly, if hidden behind a |
On Ubuntu 22 I have the same problem |
I realised this because I've been trying to implement an iterative version of |
zsh has a |
Having thought about this some more, I'd like to propose that we add follow_symlinks arguments to
Note that the default behaviour would therefore change: a pattern like With that in place, we could implement Thoughts? |
Is it possible to follow symlinks when matching non-wildcard segments? I'm not familiar with the implementation, though if you're planning to put it atop It might have to be a different mode, but if it preserves the current behaviour, it may be worth it. I can't decide whether it meets my expectations better or not, but it does seem we can avoid the harm of cycles in that case. |
With the The current In a possible future implementation, Lines 193 to 202 in f87f6e2
However, this only works if we treat symlinks the same no matter whether we've hit a |
Symlinks really complicate everything don't they? :) Another option: swallow ELOOP errors, like |
Ah I see, yeah, there's not really any alternative for handling Refusing to traverse on I feel like So given all that, could we default to |
Q: how do you suggest we detect cycles? Calling |
If you have a ton of directory symlinks, sure, but who does that? And I'm pretty sure that's the only generic option (we could try to be clever and remember e.g. the inode of the first file in each directory, but I doubt that's as portable). We only have to check on the way into the symlink - not for every file inside of it (unless it's another directory symlink). Maybe we can remember devs+inodes of the symlinks themselves and skip the check if it's a non-zero1 value that we haven't seen before? And now I'm thinking about this, how do we want to handle multiple non-cyclic symlinks into the same directory? If our search root has 100 symlinks to the same directory, none of them are a cycle, but we'll return the same files 100 times. That's probably the right thing to do. Footnotes
|
I rather like |
Fair. I imagine enough stuff breaks down with symlink cycles that people try to avoid them anyway. |
Add a keyword-only *follow_symlinks* parameter to `pathlib.Path.glob()` and `rglob()`, defaulting to false. When set to true, symlinks to directories are followed as if they were directories. Previously these methods followed symlinks except when evaluating "`**`" wildcards; on Windows they returned paths in filesystem casing except when evaluating non-wildcard tokens. Both these problems are solved here. This will allow us to address pythonGH-102613 and pythonGH-81079 in future commits.
PR available: #102616 |
We do :) Details don't matter really here, but we have a common directory for data, let's call it
We'd very much like to do stuff like |
Great scenario, appreciate you showing us. Will the support for |
yes, having to specify an option is no issue.
Not really, it's all under our control, so as long as there is the option to follow symlinks, at least we would be fine. |
…02616) Add a keyword-only *follow_symlinks* parameter to `pathlib.Path.glob()` and`rglob()`. When *follow_symlinks* is `None` (the default), these methods follow symlinks except when evaluating "`**`" wildcards. When set to true or false, symlinks are always or never followed, respectively.
Lets change the new People reading code that explicitly specifies This is good even if we never decide to work towards a default change. |
Replace tri-state `follow_symlinks` with boolean `recurse_symlinks` argument. The new argument controls whether symlinks are followed when expanding recursive `**` wildcards. The arguments correspond as follows: follow_symlinks recurse_symlinks =============== ================ False N/A None False True True
…7311) Replace tri-state `follow_symlinks` with boolean `recurse_symlinks` argument. The new argument controls whether symlinks are followed when expanding recursive `**` wildcards. The possible argument values correspond as follows: follow_symlinks recurse_symlinks =============== ================ False N/A None False True True We therefore drop support for not following symlinks when expanding non-recursive pattern parts; it wasn't requested in the original issue, and it's a feature not found in any shells. This makes the API a easier to grok by eliminating `None` as an option. No news blurb as `follow_symlinks` was new in 3.13.
Re-resolving - the argument is now |
python#117311) Replace tri-state `follow_symlinks` with boolean `recurse_symlinks` argument. The new argument controls whether symlinks are followed when expanding recursive `**` wildcards. The possible argument values correspond as follows: follow_symlinks recurse_symlinks =============== ================ False N/A None False True True We therefore drop support for not following symlinks when expanding non-recursive pattern parts; it wasn't requested in the original issue, and it's a feature not found in any shells. This makes the API a easier to grok by eliminating `None` as an option. No news blurb as `follow_symlinks` was new in 3.13.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
pathlib.Path.glob()
#102616pathlib.Path.glob()
#104176pathlib.Path.glob()
#117311The text was updated successfully, but these errors were encountered: