Return first file found and terminate #472

matthew-piziak · 2019-08-20T20:52:02Z

Does fd support the equivalent of find PATH -name NAME -print -quit, which finds the first match, prints the result, and terminates?

The text was updated successfully, but these errors were encountered:

matthew-piziak · 2019-08-21T03:38:23Z

I looked into some closed issues and found fd --max-buffer-time=0 NAME PATH | head -n 1, which takes 0.9s real time compared to 0.5s with find PATH -name NAME -print -quit. Am I missing something?

sharkdp · 2019-09-13T19:31:45Z

Thank you for your feedback.

I managed to find a similar example on my filesystem where I could reproduce your results. I think the problem is that piping into head -n 1 doesn't necessarily immediately shut down the process.

As a demonstration, let's look at find first. I am using hyperfine for running the benchmarks:

hyperfine --warmup 3 \
  'find -iname "*flow.yaml"' \
  'find -iname "*flow.yaml" | head -n1' \
  'find -iname "*flow.yaml" -print -quit'

Command	Mean [s]	Min [s]	Max [s]	Relative
`find -iname "*flow.yaml"`	2.558 ± 0.023	2.523	2.597	21.7
`find -iname "*flow.yaml" \| head -n1`	2.576 ± 0.043	2.542	2.684	21.9
`find -iname "*flow.yaml" -print -quit`	0.118 ± 0.002	0.114	0.122	1.0

Notice how the variant with | head -n 1 actually takes the same time. Apparently, find just keeps on running in case of a broken pipe (head closes it's STDIN when the necessary number of lines has been read).

With fd, the results look slightly different (note that these are milliseconds, not seconds like above):

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`fd --max-buffer-time=0 flow.yaml`	256.8 ± 2.8	253.9	263.0	1.3
`fd --max-buffer-time=0 flow.yaml \| head -n 1`	191.2 ± 3.5	184.4	196.6	1.0

The variant with head -n 1 is slightly faster. However, when I run fd interactively, I can clearly see that it outputs the first result very quickly and only quits when the second result would be about to get printed(!). The reason is that this is the first time that fd notices that its STDOUT pipe is closed (= heads STDIN).

We can demonstrate a similar behavior by running:

(echo first; sleep 1; echo second; sleep 100; echo third) | head -n 1

This command runs one second instead of quitting immediately.

To make sure that this is the actual problem with fd as well, I quickly changed the print_entry_uncolorized function to print an additional newline:

--- a/src/output.rs
+++ b/src/output.rs
@@ -90,5 +90,6 @@ fn print_entry_uncolorized(
     let separator = if config.null_separator { "\0" } else { "\n" };
 
     let path_str = path.to_string_lossy();
-    write!(stdout, "{}{}", path_str, separator)
+    write!(stdout, "{}{}", path_str, separator)?;
+    writeln!(stdout)
 }

With this small modification, fd is suddenly blazing fast (a factor of 10 faster than find instead of a factor 1.6 slower)

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`fd --max-buffer-time=0 flow.yaml \| head -n1`	11.3 ± 1.0	8.7	15.0	1.0

Now, this is obviously not something we want to implement in this way. If anybody has any good suggestions on how to "fix" this, please let us know. One potential way could be to test (however that works) if STDOUT has been closed after printing each result. However, it should be checked if this has any performance impact when not piping to head.

If there is no great solution, we should actually think about implementing a --max-results <count> option (see also #476).

tavianator · 2019-09-13T20:45:11Z

One potential way could be to test (however that works) if STDOUT has been closed after printing each result.

I believe you can attempt to write 0 bytes to stdout, and you'll get EPIPE back if the pipe is closed (and you're ignoring SIGPIPE like Rust does by default). It's probably not a good idea to do two write syscalls every time you print something though. And I think it's still racey, since if head hasn't finished reading the first line by the time you do the second write, it won't fail.

So maybe have a timer such that if the main thread hasn't received any files to print in a while, it writes 0 bytes to stdout and exits if that fails. Alternatively don't bother, since no other tool seems to.

tavianator · 2019-09-13T20:55:32Z

Correction: despite what StackOverflow said, writing 0 bytes to a closed pipe does not trigger EPIPE. I'm not sure there's a non-destructive way to find out if the other end of a pipe is closed.

tavianator · 2019-09-16T14:58:39Z

There is a way, at least on Linux: https://stackoverflow.com/a/57959507/502399

On Windows, apparently the write-zero-bytes thing works.

This new option can be used instead of piping to `head -n <count>` for improved performance: | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `fd --max-buffer-time=0 flow.yaml` | 153.9 ± 2.5 | 151.3 | 170.3 | 4.21 ± 5.86 | | `fd --max-buffer-time=0 flow.yaml \| head -n 1` | 145.3 ± 17.4 | 111.0 | 180.2 | 3.98 ± 5.55 | | `fd --max-results=1 flow.yaml` | 36.5 ± 50.8 | 7.2 | 145.7 | 1.00 | Note: there is a large standard deviation on the last result due to the non-deterministic file system traversal. With `--max-results`, we don't have to traverse the whole filesystem tree, so it's all about luck. closes #472 closes #476

sharkdp · 2020-04-02T17:01:02Z

@tavianator Thank you very much for your analysis. I opted to implement --max-results=<count> because that seemed like a much cleaner way of solving this use case.

Please see #555 for benchmark results.

This new option can be used instead of piping to `head -n <count>` for improved performance: | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `fd --max-buffer-time=0 flow.yaml` | 153.9 ± 2.5 | 151.3 | 170.3 | 4.21 ± 5.86 | | `fd --max-buffer-time=0 flow.yaml \| head -n 1` | 145.3 ± 17.4 | 111.0 | 180.2 | 3.98 ± 5.55 | | `fd --max-results=1 flow.yaml` | 36.5 ± 50.8 | 7.2 | 145.7 | 1.00 | Note: there is a large standard deviation on the last result due to the non-deterministic file system traversal. With `--max-results`, we don't have to traverse the whole filesystem tree, so it's all about luck. closes #472 closes #476

sharkdp · 2020-04-16T08:46:46Z

This has now been released in fd v8.0. We also have -1 as an alias for --max-results=1.

sharkdp added help wanted performance question labels Sep 13, 2019

sharkdp mentioned this issue Sep 13, 2019

Feature request: limit the number of find result #476

Closed

sharkdp mentioned this issue Oct 30, 2019

add an option to omit any output and exit with an status code #504

Closed

sharkdp mentioned this issue Apr 2, 2020

Add --max-results=<count> option #555

Merged

sharkdp closed this as completed in #555 Apr 2, 2020

sharkdp added this to the v8.0 milestone Apr 8, 2020

sharkdp mentioned this issue Oct 27, 2021

--max-results=1 does not actually quit after the first result #867

Closed

tavianator mentioned this issue Jan 18, 2024

[BUG] Full-path search and globbing leads to fd not exit on pipeline closing #1479

Open

1 task

tavianator mentioned this issue Mar 17, 2024

Support max-results tavianator/bfs#133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return first file found and terminate #472

Return first file found and terminate #472

matthew-piziak commented Aug 20, 2019

matthew-piziak commented Aug 21, 2019

sharkdp commented Sep 13, 2019

tavianator commented Sep 13, 2019

tavianator commented Sep 13, 2019

tavianator commented Sep 16, 2019

sharkdp commented Apr 2, 2020

sharkdp commented Apr 16, 2020

Return first file found and terminate #472

Return first file found and terminate #472

Comments

matthew-piziak commented Aug 20, 2019

matthew-piziak commented Aug 21, 2019

sharkdp commented Sep 13, 2019

tavianator commented Sep 13, 2019

tavianator commented Sep 13, 2019

tavianator commented Sep 16, 2019

sharkdp commented Apr 2, 2020

sharkdp commented Apr 16, 2020