- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 393
dirwalk ignores non-regular files #1727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
287f7c2    to
    a33b624      
    Compare
  
    a33b624    to
    2090f7d      
    Compare
  
    …ned instead. That way, algorithms relying on dirwalking can still see them if they want to, but would have a hard time to use them (accidentally). Note that this replaces the `From` implementation with `entry::Kind::try_from_file_type()`, which makes this a breaking change.`
2090f7d    to
    a49c960      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, and gix clean benefits for sure.
In particular, as shown in this gist, this makes it so that when gix clean directly sees a FIFO (named pipe) on a Unix-like system, it disregards it as git clean does, rather than deleting it.
Before this change:
ek in 🌐 catenary in gitoxide on  main is 📦 v0.39.0 via 🦀 v1.83.0
❯ cargo run --bin=gix -- clean -n
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.77s
     Running `target/debug/gix clean -n`
WOULD remove pipe
Skipped 1 expendable entry - show with -x
After this change:
ek in 🌐 catenary in gitoxide on  main is 📦 v0.39.0 via 🦀 v1.83.0 took 14s
❯ cargo run --bin=gix -- clean -n
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.73s
     Running `target/debug/gix clean -n`
Nothing to clean (Skipped 1 expendable entry - show with -x)
There are some things relevant to #1629 and conceptually relevant to this PR and its goals that are not covered here, but that may be out of scope here. Some may even be outside the broader originally intended scope of #1629 itself. (In the following, I always mean the same thing by "FIFO" and "named pipe," leaning toward using the latter when talking about Windows only because it is the term used in the Microsoft documentation.)
- 
While the bug this fixes is not specific to the way cargousesgix, the behavior ofcargo publishwas the stated motivation for #1629, as well as possibly being the main motivation for this fix, going by how #1695 classifies it. But to fix the behavior ofcargo publishin the presence of FIFOs, it may be necessary forcargoto make more changes than just using a newergix-dir:- It is possible to publish a crate from a directory that is not a Git repository, and when that is done, the bug still occurs, because the separate directory walking logic cargoimplements also opens non-regular files including FIFOs.
- In general, cargo publishseems to attempt to archive non-regular files, other than directories, as regular files, by opening them for read. This includes even symlinks, which it seems to dereference. (This also includes symlinks that resolve to non-regular files including FIFOs--when I do a dry-run publish of a crate that has a symlink to a FIFO, it blocks reading from the FIFO.)
 
- It is possible to publish a crate from a directory that is not a Git repository, and when that is done, the bug still occurs, because the separate directory walking logic 
- 
Both Git and gitoxide attempt to read from .gitignoreeven when it is a FIFO, in at least some situations. I'm not totally sure what the preferred behavior is for that, since it's a very strange situation and maybe there are use cases for.gitignorefiles to be kinds of filesystem entries thatgitcannot track. But it seems to me that blocking indefinitely should be avoided even here, and that this is a bug in Git and gitoxide.git statusandgit clean -nblock in a repository where.gitignoreis a FIFO.gix statusandgix clean -nblocked in such a repository prior to the fix in this PR. Now, when there are no untracked files,gix statusno longer blocks (which surprised me), butgix clean -nstill does. But when there is a new unstaged regular file in addition to the FIFO, bothgix statusandgix clean -nstill block. Both before and after this PR,when aI have to press Ctrl+C only once withgixcommand blocks, it blocks twice, i.e., it tries to open the.gitignoreFIFO for read twice instead of just once, while in contrastgitcommands block just once. Or, to state it more carefully,gitbut twice withgix,which I think, but do not know for sure, is due to the entry being opened again.
- 
The tests here are rightly skipped on Windows. The reason this is the right thing to do, at least as they are currently written, is that they are all testing with FIFOs, and they use a fixture whose creation script uses the mkfifocommand.Windows does have named pipes, and a mkfifocommand does exist in Git Bash and other MSYS2 environments. But the MSYS2mkfifocommand does not actually create named pipes! Instead, it creates.lnkshortcut files that MSYS2 programs--programs that link tomsys-2.0.dlland use it for POSIX system call emulation--treat as named pipes. I think Cygwin programs--programs that usecygwin1.dll--can also use them. Native Windows programs just see them.lnkshortcut files--which, at least typically, cannot be dereferenced and, in any case, are not named pipes.Although Git for Windows ships a MSYS2 environment, it is a native Windows program. More precisely: - Although gitsomewhat misleadingly places its Windows-specific implementations of various functions that are natively available on Unix-like systems in amingw.cfile, this is really an implementation detail of a native Windows program.
- All directly Git-related binaries shipped in Git for Windows--the git.exewrapper incmd/as well as the more importantgit.exein a directory usually calledmingw64/bin/ormingw32/bin/that it wraps, as well as all binaries inside thegit-coredirectory whose locationgit --exec-pathgives--are native Windows programs rather than MSYS2 programs.
- Interpreters such as bashandperlare MSYS2 programs, however, so parts of Git that are implemented as scripts for them are exceptions in that they are effectively non-native. (Likewise, scripts provided by the user, such as hooks, if they are shell scripts, will also run in interpreters that perform POSIX functions usingmsys-2.0.dll.)
 The gitcommand on Windows therefore--and I very much believe intentionally and rightly--gives no special semantics to any.lnkfiles, even if they happen to be treated specially by MSYS2. If one is to create named pipes on Windows for tests, it cannot be done withmkfifoin Git Bash or another MSYS2 or (as far as I know) any other Cygwin-like environment.
- Although 
- 
At least with respect to FIFOs (named pipes), it is possible that the issue this is fixing should be considered not to affect Windows. My view is that probably we should consider Windows as affected and eventually try to test for the much weirder and harder to produce situation with named pipes that appear in a repository on Windows. But this is not obvious, and in order to decide, it may be necessary to consider this alongside other kinds of unusual filesystem entries on Windows, some of which are entirely Windows-specific. Unlike on Unix-like systems where they can be created just about anywhere (which the MSYS2 mkfifosituation with specially treated.lnkis meant to emulate), on Windows named pipes are only allowed to exist in a named pipe filesystem. On Windows, named pipes live under\\.\pipe\or, more broadly, under paths of the form\ServerName\pipe\. They are not directly present elsewhere.Such paths resist being exposed in a way that allows a Git repository to include them. For example, a directory symlink can point to \\.\pipe\and a file symlink can point to any particular named pipe, and these symlinks are valid and dereferenceable. But symlinks in repository working trees--regardless of whether named pipes are involved and regardless of whether the symlinks point outside the volume that holds them--do not cause their target entries to be part of a Git repository.When an NTFS junction (see #1354 (comment)), which can usually be viewed as similar to a recursive bind mount on Unix-like systems that support them ( mount --rbind), is present in a working tree, Git does deference it and regard the contents of the junctioned-to location as part of the repository. However, junctions do not work across filesystems; one cannot make a junction from an ordinary volume to a path like\\.\pipe\(nor to an individual pipe, though that is also because they can only point to directory-like entries).It is nonetheless possible to make this elusive scenario occur, so named pipes (not named in a way that makes them special to git--this is independent of the .gitignorebehavior described above) are seen as part of a repository, and sogitcommands even try to read form them and block. The way I have found to make this happen is to place a junction in a repository working tree that points to a directory symlink on the same volume, where the directory symlink itself points to\\.\pipe\. I would guess that there are other ways.By the way, if performing this experiment, it is probably best to use a virtual machine that is not being used for anything else, since attempts are made to read from a bunch of named pipes on the system--anything under \\.\pipe\may be read from!--including some where weird things happen. However, this is an area where I did not follow my own advice. The main effect I noticed, and reproduced, was that all consoles that were started before the experiment failed to be recognized as consoles by native Windows programs afterwards that check device type--for example, I could not runvimexcept in a new console--even though they seemed otherwise to continue working.One reason I mention the risk is to clarify what I mean when I say the situation is hard to produce. The experiment, as I performed it, is not too hard to set up, having figured out how. But it might be tricky to find an approach that wouldn't risk subtle havoc when run in automated tests. (I found that the behavior of MSYS2 programs when real named pipes are exposed to them using this technique is also very interesting, but I omit it here for, uh, brevity?) 
There are a few more gix clean related things I plan to try. I'll open issues for any bugs I do find (if any), and also probably for the situation when .gitignore is a FIFO.
Edit: I've opened #1729 about a new gix clean panic.
| Thanks so much for sharing! 
 That's true! There are non-git codepaths and probably they would fall for named pipes, even though I didn't validate it. Interestingly I am not aware of an existing issue in Cargo even, so it seems to be rare. 
 What happens is that  
 That was a very interesting read, even though I kept thinking that Windows seems to be quite broken if sideeffects like these are even possible. My feeling is that something in the console might have started reading, and blocking on, the named pipe, and now isn't able to respond to events anymore that help classify it as console. It's a denial of service for the console in question, probably a primtiive for further exploitation. 
 😁 
 Wonderful - this actually tells me that the natural behaviour of ignoring these named pipe entries that I just accepted for my convenience isn't actually correct, and that they should be officially pruned after all. Then all other tools should handle them correctly as well. | 
Supersedes #1629.
Let's try to keep the change minimal, but consider what's in the superseded PR.
Tasks