Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve stdin detection on Windows #643

Closed
stej opened this issue Oct 19, 2017 · 19 comments · Fixed by #1017
Closed

improve stdin detection on Windows #643

stej opened this issue Oct 19, 2017 · 19 comments · Fixed by #1017
Labels
bug A bug. help wanted Others are encouraged to work on this issue.

Comments

@stej
Copy link

stej commented Oct 19, 2017

Using version 0.6.0

I tried to parse some text and for each parsed term to call rg. Something like this (added newlines fo readibility):

rg -F ".EmailHtml.HandleAttribute" -g 2017-10-18* -B 3 |
rg ".*? machine (\d+) Back.*" -r "$1" |
powershell -command "$input | % { rg -g 2017-10-18* \"machine $_ Action \d+\" }"

However, this doesn't work. I'm able to get much more simple example like this:

echo a|powershell -command "d:\prgs\rg.exe -g 2017-10-18* test " doesn't output anything
powershell -command "d:\prgs\rg.exe -g 2017-10-18* test " - searches for "test" as expected (not that only the echo a| is missing there)

Does anybody here any idea why rg doesn't work when called from PowerShell when something is sent through pipeline into PowerShell?

@BurntSushi
Copy link
Owner

I don't know PowerShell, but from my point of view, your command doesn't really make sense. It seems like you're trying to pipe text into ripgrep to search, but you're still using the -g flag to filter files to search. But if you're searching stdin, it doesn't make sense to filter files.

If ripgrep is having a problem detecting that it should search stdin, you might consider trying rg test -, where the - forces ripgrep to search stdin.

Otherwise, you might need to wait for someone who understands PowerShell to respond to this. Sorry.

@BurntSushi BurntSushi added the question An issue that is lacking clarity on one or more points. label Oct 19, 2017
@stej
Copy link
Author

stej commented Oct 19, 2017

You are right, the first sample doesn't make sense. I didn't notice it because I discovered that I was quickly started following the "pipe" problem.

It is really question here for anyone with deep inside about how that in PowerShell works. No problem with ripgrep, just asking here.

@BurntSushi
Copy link
Owner

@stej Did you try using -?

@stej
Copy link
Author

stej commented Oct 19, 2017

@BurntSushi just -? In what command?

@stej
Copy link
Author

stej commented Oct 19, 2017

Btw. this is for comparison when I tried Process Monitor to check the command line itself

image

and now without the pipe. Output in the console suggests that rg really started searching

image

The cmdline is the same. PowerShell doesn't change the way how rg is called.

@BurntSushi
Copy link
Owner

@stej Instead of echo foo | rg test, try echo foo | rg test -. Please also try running ripgrep with the --debug flag.

I otherwise don't know how to read your screenshots. Can you interpret them for me?

@stej
Copy link
Author

stej commented Oct 20, 2017

After I reread your comments, I realized that I really did simplify the examples too much. It's probably needed do invoke rg in inner pipeline.
So I rewrote them and added the --debug switch.

Environment:
Directory c:\temp with 3 log files. One of them contains the wanted string "1778071"

Commands and results:

rg 1778071 --debug - found the lines

C:\temp>rg 1778071 --debug
DEBUG:grep::search: regex ast:
Literal {
    chars: [
        '1',
        '7',
        '7',
        '8',
        '0',
        '7',
        '1'
    ],
    casei: false
}
DEBUG:grep::literals: literal prefixes detected: Literals { lits: [Complete(1778071)], limit_size: 250, limit_class: 10 }
2017-10-18-09.txt
61107:2017-10-18 09:36:50.353 DEBUG VM1 1778071 Action 1064692 set Running.
61108:2017-10-18 09:36:50.769 ERROR VM1 1778071 System.ArgumentException: Illegal characters in path.

powershell "rg 1778071 --debug" - found the line

C:\temp>powershell "rg 1778071 --debug"
DEBUG:grep::search: regex ast:
Literal {
    chars: [
        '1',
        '7',
        '7',
        '8',
        '0',
        '7',
        '1'
    ],
    casei: false
}
DEBUG:grep::literals: literal prefixes detected: Literals { lits: [Complete(1778071)], limit_size: 250, limit_class: 10 }
2017-10-18-09.txt
61107:2017-10-18 09:36:50.353 DEBUG VM1 1778071 Action 1064692 set Running.
61108:2017-10-18 09:36:50.769 ERROR VM1 1778071 System.ArgumentException: Illegal characters in path.

echo 1778071| powershell "$input | % { rg $_ --debug }" - didn't find the line

C:\temp>echo 1778071| powershell "$input | % { rg $_ --debug }"
DEBUG:grep::search: regex ast:
Literal {
    chars: [
        '1',
        '7',
        '7',
        '8',
        '0',
        '7',
        '1'
    ],
    casei: false
}
DEBUG:grep::literals: literal prefixes detected: Literals { lits: [Complete(1778071)], limit_size: 250, limit_class: 10 }

Note that the string goes really into the pipeline and is standard string as expected.

C:\temp>echo 1778071| powershell "$input | % { write-host $_ $_.gettype() }"
1778071 System.String                                                <= written by the write-host

I tried the - at the end like this echo 1778071| powershell "$input | % { rg $_ -}" but it didn't change the incorrect result. Btw. I didn't find the meaning of the char in command line help. Is that something like this is end of the command ?

@stej
Copy link
Author

stej commented Oct 20, 2017

Ah, this is pretty interesting. It's probably the pipe in front of the powershell command.

C:\temp>echo 1778071| powershell "rg 17 --debug"
DEBUG:grep::search: regex ast:
Literal {
    chars: [
        '1',
        '7'
    ],
    casei: false
}
DEBUG:grep::literals: literal prefixes detected: Literals { lits: [Complete(17)], limit_size: 250, limit_class: 10 }
1778071
C:\temp>rg 17 --debug
DEBUG:grep::search: regex ast:
Literal {
    chars: [
        '1',
        '7'
    ],
    casei: false
}
DEBUG:grep::literals: literal prefixes detected: Literals { lits: [Complete(17)], limit_size: 250, limit_class: 10 }
....... a lot of found lines from the log file here

@BurntSushi
Copy link
Owner

BurntSushi commented Oct 20, 2017

I'm still finding it very hard to interpret your comments. I can't tell which commands are running as expected for you and which you think are wrong. For example, echo 1778071| powershell "rg 17 --debug" seems to output what I would expect (1778071), but the rest of your comment suggests you think that's wrong? In your future comments, could you please be very explicit about what the expected output and the actual output are of each command?

To answer your question about -, it is a standard convention in UNIX command line tools to interpret - as "read input from stdin." For example, consider the following commands:

$ rg foo
$ rg foo C:\temp\some-file
$ rg foo -

The first command recursively searches the entire current directory for the pattern foo. The second command searches only C:\temp\some-file for foo. The third command searches stdin for -, but since there is nothing feeding input into rg's stdin, it should hang forever.

Now observe these commands:

$ echo barfoobaz | rg foo
$ echo barfoobaz | rg foo -

Ideally, they are equivalent, in that they both search the stdin content barfoobaz for the pattern foo. But notice that the first command rg foo is the same as the first command above. But above, it recursively searched the current directory. In this context, it is search stdin. How does ripgrep know to do that? Because it's supposed to be able to check whether something is on stdin or not, and it will modify its behavior appropriately.

This behavior works reliably on UNIX, but there have been issues on Windows. This is why I've suggested that you use -. However, your bug report is still so unclear to me that this is a shot in the dark, I don't actually know that this is the problem you're experiencing.

@stej
Copy link
Author

stej commented Oct 20, 2017

I'm sorry for lack of detail. I'll try to describe the previous commands in more detail.

The command rg 17 --debug is pretty clear. We can agree that it starts searching in current directory and tries to find any line that matches 17.

On the other hand the commnad echo 1778071| powershell "rg 17 --debug" is more complex.

At the beginning it's good to note that usually people use PowerShell in interactive mode.
But it's possible to run just some commands in PowerShell and then exit it and continue in previous shell e.g. in cmd.exe.
So PowerShell "somecommand" runs PowerShell and the interpreter in PowerShell runs the command.

Reading pipeline input in PowerShell:
Example:

c:\>powershell "1;2;3" | powershell "[string]$input"
1 2 3

first command outputs 3 numbers to stdout. The second command somehow catches all the stdin input, converts it to string and sends to output.

I shown the way how to consume pipeline input in powershell -- it's via $input variable (or at least I don't know any other way).

More comples command:

powershell "1..10 | % { start-sleep -sec 1; $_ }" | powershell "$input | % { write-host $_ -foreground Green }"

prints a number every second in green color. It is artifical of course, because it could be all done in one command.

So the first command powershell "1..10 | % { start-sleep -sec 1; $_ }" sends every second one number to output. The % is alias for For-EachObject command. The { ... } is lambda to be done for each item in pipeline.

And the second command reads the numbers from input (via $input variable) and writes to stdout.


Now back to the example with echo and rg

echo 1778071| powershell "rg 17 --debug"

Echo sends something to pipe. It is sent to PowerShell, which should execute command rg 17. So in other words it ignores the pipeline input and simply executes rg 17 which means "look for 17 in local directory".
Anyway, it seems that rg somehow interprets the command line and reads from pipe? Because command echo 123|powershell "rg 2" really finds 123.
Which is surprising to me.

@BurntSushi
Copy link
Owner

Anyway, it seems that rg somehow interprets the command line and reads from pipe? Because command echo 123|powershell "rg 2" really finds 123. Which is surprising to me.

Why is that surprising? That sounds correct to me. Why are you piping 123 into rg 2 at all if you don't want rg to search 123? It might help to describe the actual problem you're trying to solve.

In any case, if you want to force ripgrep to search the current directory, then just tell it to do that. echo ... | rg 2 ./.

@dieggsy
Copy link

dieggsy commented Oct 20, 2017

@stej can you give a simple example of the command you want to run, and what you want it to output? It's kind of hard to understand what you're trying to do and where you think the problem is.

for example,
command:
echo 'a' | rg 'foo'
expected output:
[whatever you want]
actual output:
[whatever you're getting]

EDIT: guess you've given a couple, but the intent or point is still unclear I guess.

@stej
Copy link
Author

stej commented Oct 23, 2017

Example: We have some log files that contain bunch of errors. Some files have almost no errors, some have large number (>= 5). And we need to find all logged in users from the files with large number of errors.

Example file where I don't want to search in:

INFO [email protected] logged in
ERROR some processing failed

Example file that I need to inspect for all logged in users:

INFO [email protected] logged in
INFO [email protected] logged in
INFO [email protected] logged in
ERROR some processing failed
ERROR some processing failed
ERROR some processing failed
ERROR some processing failed
ERROR some processing failed

The command I would use (split for readability):

rg error -i -c |                      << parse just counts
powershell "                          
  foreach($fc in $input) {            << iterate over items in pipeline, contains large.txt:5, normal.txt:1
    $file,[int]$count=$fc -split ':'  << interpret the count number ($fc contains e.g. 'large.txt:11')
    if ($count -ge 5) {               << decide whether running rg is needed
      rg logged -g $file              << run rg on the concrete file (e.g. 'large.txt:11')
    }
  }
"

On one line:

rg error -i -c | powershell "foreach($fc in $input) { $file,[int]$count=$fc -split ':'; if ($count -ge 10) { rg logged -g $file } }"

Result:
this command doesn't find anything.

Workaround:
Use .\ as @BurntSushi suggested. Then it works and searches in the given file. The final command that works is

rg error -i -c | powershell "foreach($fc in $input) { $file,[int]$count=$fc -split ':'; if ($count -ge 2) { rg logged -g $file .\ } }"

and the command outputs

large.txt
1:INFO [email protected] logged in
2:INFO [email protected] logged in
3:INFO [email protected] logged in

@BurntSushi
Copy link
Owner

BurntSushi commented Oct 23, 2017

@stej Thanks, that's much better. I agree with you that this is a bug. However, I don't know how to fix it. Someone with more Windows knowledge needs to either teach me or submit a PR themselves. I can outline the problem though.

ripgrep uses a fair bit of logic to determine what things it should search. This method collects all of the file paths (including -, which is interpreted as stdin):

ripgrep/src/args.rs

Lines 367 to 386 in 1aec4b1

/// Return all file paths that ripgrep should search.
fn paths(&self) -> Vec<PathBuf> {
let mut paths: Vec<PathBuf> = match self.values_of_os("path") {
None => vec![],
Some(vals) => vals.map(|p| Path::new(p).to_path_buf()).collect(),
};
// If --file, --files or --regexp is given, then the first path is
// always in `pattern`.
if self.is_present("file")
|| self.is_present("files")
|| self.is_present("regexp") {
if let Some(path) = self.value_of_os("PATTERN") {
paths.insert(0, Path::new(path).to_path_buf());
}
}
if paths.is_empty() {
paths.push(self.default_path());
}
paths
}

Since no explicit file paths are given, ripgrep falls back to grabbing a collection of "default" paths. Mostly, this is a matter of determining whether to search the current working directory, or stdin:

ripgrep/src/args.rs

Lines 388 to 404 in 1aec4b1

/// Return the default path that ripgrep should search.
fn default_path(&self) -> PathBuf {
let file_is_stdin =
self.values_of_os("file").map_or(false, |mut files| {
files.any(|f| f == "-")
});
let search_cwd = atty::is(atty::Stream::Stdin)
|| !stdin_is_readable()
|| (self.is_present("file") && file_is_stdin)
|| self.is_present("files")
|| self.is_present("type-list");
if search_cwd {
Path::new("./").to_path_buf()
} else {
Path::new("-").to_path_buf()
}
}

This logic is necessary to differentiate commands like echo foo | rg wat (where ripgrep should be searching stdin) and rg wat (where ripgrep should be searching the current working directory).

Since our set of file paths is empty, we can distill this logic down to two checks. Is there a tty on stdin? Is stdin readable? In your case, my guess is that ripgrep correctly determines that there is no tty on stdin (if there was, it would then search the CWD). Which means it falls down to stdin_is_readable, which on Unix is implemented like so:

ripgrep/src/args.rs

Lines 993 to 1004 in 1aec4b1

/// Returns true if and only if stdin is deemed searchable.
#[cfg(unix)]
fn stdin_is_readable() -> bool {
use std::os::unix::fs::FileTypeExt;
use same_file::Handle;
let ft = match Handle::stdin().and_then(|h| h.as_file().metadata()) {
Err(_) => return false,
Ok(md) => md.file_type(),
};
ft.is_file() || ft.is_fifo()
}

But on Windows is implemented like this:

ripgrep/src/args.rs

Lines 1006 to 1012 in 1aec4b1

/// Returns true if and only if stdin is deemed searchable.
#[cfg(windows)]
fn stdin_is_readable() -> bool {
// On Windows, it's not clear what the possibilities are to me, so just
// always return true.
true
}

We default to true here to basically ignore this property on Windows because I don't know how to do it. If we defaulted to false, then ripgrep would always search the CWD, even in cases like echo foo | rg wat, which is undesirable. That means the only way to fix this bug is to improve detection on Windows.

@BurntSushi BurntSushi added bug A bug. help wanted Others are encouraged to work on this issue. and removed question An issue that is lacking clarity on one or more points. labels Oct 23, 2017
@parkovski
Copy link

The Windows equivalent of that Unix code uses GetFileType, is_file becomes FILE_TYPE_DISK, is_fifo becomes FILE_TYPE_PIPE, and regular console stdin would be FILE_TYPE_CHAR.

DWORD is typedef'd as unsigned long, which is always 32 bits on Windows, so u32, and I haven't looked at same_file, but if it gives you the return value of either GetStdHandle(STD_INPUT_HANDLE) or CreateFile("CONIN$", ...) then just pass that to GetFileType.

@mqudsi
Copy link

mqudsi commented Dec 23, 2017

Note that WSL's tty is notable not implemented as a console, so functions like WriteConsoleOutput would be broken. GetStdHandle should be fine.

@powercode
Copy link

$input in powershell is an IEnumerable that is only available once all of stdin is read. PowerShell will then enumerate each item of the input and write it to is's pipeline, as objects of type System.String.
The pipeline in PowerShell is not the same as stdin.

[string]$input is a type conversion in powershell, that instructs PowerShell to convert the content of the IEnumerable $input into a System.String, which powershell does by joining $input with the content of $ofs, which by default is <space>.

A somewhat idiomatic powershell solution would look something like this:

class LineMatch{
	[int] $Line
	[string] $Text

	[string] ToString() { return "{0}: {1}" -f $this.Line, $this.Text}
}

class FileMatch{
	[string] $Path
	[System.Collections.Generic.List[LineMatch]] $Lines = [System.Collections.Generic.List[LineMatch]]::new()
	[string] ToString() {return $this.Path}
 }

 function Get-RGMatch {
	 [OutputType([FileMatch])]
	 param(
		 [Parameter(Mandatory)]
		 [string] $Pattern
	 )
	$rgoutput = rg --heading --line-number $Pattern
	switch -regex($rgoutput){
		'^[^\d]\w' {
				$p = [FileMatch] @{
					Path=$_
				}
				continue
			}
		'^(\d+):(.*)' {
				$line = [LineMatch] @{
					Line=[int]$matches.1
					Text=$matches.2
				}
				$p.Lines.Add($line)
				continue
			}
		'^\s*$' {
				$p
				$p=$null
				continue
			}
		default {Write-Warning "$_"}
	}
	$p
}

Get-RGMatch 'class' | where Count -gt 5 | sort Path | format-table -AutoSize

It would basically parse the ripgrep output and create objects of the output.
It then filters the output based on the Count property of the FileMatchs, and removes all objects that has less than 5 matching lines and outputs the contents of the group, sorted byt Path

Path                                                            Count Lines
----                                                            ----- -----
docs\cmdlet-example\command-line-simple-example.md                  7 {7: Because the binary module's assembly will be created as a .NET Standard 2.0 class library,, 46: 1. Use the `dotnet` CLI to create a starter `classlib` project based on .NET Sta...
src\libpsl-native\test\googletest\include\gtest\gtest.h           111 {68: // Depending on the platform, different string classes are available., 70: // class ::string, which has the same interface as ::std::string, but, 77: // If the user's ::std::s...
src\libpsl-native\test\googletest\include\gtest\gtest-message.h     6 {34: // This header file defines the Message class., 59: // The Message class works like an ostream repeater., 83: // class hides this difference by treating a NULL char pointer as...

@powercode
Copy link

#848 Is a feature request to make that parsing more straight forward and cheaper.

@BurntSushi
Copy link
Owner

@powercode Could you please state more clearly what you're trying to say? It sounds like you're trying to help someone write a better PowerShell script, but I don't see how that fixes the bug here?

@BurntSushi BurntSushi changed the title Ripgrep used in PowerShell inside pipe improve stdin detection on Windows Apr 24, 2018
BurntSushi added a commit that referenced this issue Aug 19, 2018
This commit updates the CHANGELOG to reflect all the work done to make
libripgrep a reality.

* Closes #162 (libripgrep)
* Closes #176 (multiline search)
* Closes #188 (opt-in PCRE2 support)
* Closes #244 (JSON output)
* Closes #416 (Windows CRLF support)
* Closes #917 (trim prefix whitespace)
* Closes #993 (add --null-data flag)
* Closes #997 (--passthru works with --replace)

* Fixes #2 (memory maps and context handling work)
* Fixes #200 (ripgrep stops when pipe is closed)
* Fixes #389 (more intuitive `-w/--word-regexp`)
* Fixes #643 (detection of stdin on Windows is better)
* Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird)
* Fixes #764 (coalesce color escapes)
* Fixes #922 (memory maps failing is no big deal)
* Fixes #937 (color escapes no longer used for empty matches)
* Fixes #940 (--passthru does not impact exit status)
* Fixes #1013 (show runtime CPU features in --version output)
BurntSushi added a commit that referenced this issue Aug 20, 2018
This commit updates the CHANGELOG to reflect all the work done to make
libripgrep a reality.

* Closes #162 (libripgrep)
* Closes #176 (multiline search)
* Closes #188 (opt-in PCRE2 support)
* Closes #244 (JSON output)
* Closes #416 (Windows CRLF support)
* Closes #917 (trim prefix whitespace)
* Closes #993 (add --null-data flag)
* Closes #997 (--passthru works with --replace)

* Fixes #2 (memory maps and context handling work)
* Fixes #200 (ripgrep stops when pipe is closed)
* Fixes #389 (more intuitive `-w/--word-regexp`)
* Fixes #643 (detection of stdin on Windows is better)
* Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird)
* Fixes #764 (coalesce color escapes)
* Fixes #922 (memory maps failing is no big deal)
* Fixes #937 (color escapes no longer used for empty matches)
* Fixes #940 (--passthru does not impact exit status)
* Fixes #1013 (show runtime CPU features in --version output)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug. help wanted Others are encouraged to work on this issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants