You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I often use ripgrep to search a few hundred million files that rarely change. Getting the list of files to search is a very expensive operation in my environment and the plocate tool lets me amortize that cost to overnight index generation.
The problem is that I want to run a command like the following:
rg foo $(locate srv pool ubuntu dsc)
This builds a command line that is far too long for my operating system or execution environment:
$ rg 'debian.*changelog' $(locate .dsc srv mirror ubuntu pool)
-bash: /home/sarnold/bin/rg: Argument list too long
This is a small number of files and the locate command executes very quickly:
$ time locate .dsc srv mirror ubuntu pool | wc -l
142375
real 0m0.578s
user 0m0.322s
sys 0m0.278s
Running a very similar ripgrep command, it's been executing for ten minutes now and has only reported a hundred or so candidate files to search:
So, I'd like some way to avoid this costly part of my search directly in ripgrep. I want to give a path to a file or a socket that will provide the filenames or directory names to search. (Accepting an fd with the filenames might also work, and could save mandatory trips to the filesystem, but working with non-stdin filedescriptors at the shell is a bit annoying. I'm only familiar with gpg using this for password inputs.) (Accepting a command to run to generate the list of filenames might also work, but it's an odd inversion of typical Unix control flow.)
My current work-around is to use xargs, which feels non-optimal:
I can't start searching until the entire list is generated, which leaves some of my disks idle longer than necessary
Repeated execution of rg with very large argvs is going to be less efficient than one execution of rg with a very short argv
At the end of each xargs-spawned rg run, there's going to be some moments when the process will be searching exactly one file
Manual management of the list of files or directories to search is ever so slightly tedious
I don't have good suggestions for a command line parameter name. I've thought of the following:
Hello, I often use ripgrep to search a few hundred million files that rarely change. Getting the list of files to search is a very expensive operation in my environment and the
plocate
tool lets me amortize that cost to overnight index generation.The problem is that I want to run a command like the following:
rg foo $(locate srv pool ubuntu dsc)
This builds a command line that is far too long for my operating system or execution environment:
This is a small number of files and the
locate
command executes very quickly:Running a very similar ripgrep command, it's been executing for ten minutes now and has only reported a hundred or so candidate files to search:
So, I'd like some way to avoid this costly part of my search directly in ripgrep. I want to give a path to a file or a socket that will provide the filenames or directory names to search. (Accepting an fd with the filenames might also work, and could save mandatory trips to the filesystem, but working with non-stdin filedescriptors at the shell is a bit annoying. I'm only familiar with
gpg
using this for password inputs.) (Accepting a command to run to generate the list of filenames might also work, but it's an odd inversion of typical Unix control flow.)My current work-around is to use xargs, which feels non-optimal:
I don't have good suggestions for a command line parameter name. I've thought of the following:
--paths-to-search <filename>
--files-to-search <filename>
--files-and-directories-to-search <filename>
--files-to-search-from-file <filename>
--files-and-directories-to-search-from-file <filename>
--files--to-search-from-fd <fd>
--files-to-search-from-command <command>
Thanks
The text was updated successfully, but these errors were encountered: