Skip to content

Commit

Permalink
hyperlink: rejigger how hyperlinks work
Browse files Browse the repository at this point in the history
This essentially takes the work done in #2483 and does a bit of a
facelift. A brief summary:

* We reduce the hyperlink API we expose to just the format, a
  configuration and an environment.
* We move buffer management into a hyperlink-specific interpolator.
* We expand the documentation on --hyperlink-format.
* We rewrite the hyperlink format parser to be a simple state machine
  with support for escaping '{{' and '}}'.
* We remove the 'gethostname' dependency and instead insist on the
  caller to provide the hostname. (So grep-printer doesn't get it
  itself, but the application will.) Similarly for the WSL prefix.
* Probably some other things.

Overall, the general structure of #2483 was kept. The biggest change is
probably requiring the caller to pass in things like a hostname instead
of having the crate do it. I did this for a couple reasons:

1. I feel uncomfortable with code deep inside the printing logic
   reaching out into the environment to assume responsibility for
   retrieving the hostname. This feels more like an application-level
   responsibility. Arguably, path canonicalization falls into this same
   bucket, but it is more difficult to rip that out. (And we can do it
   in the future in a backwards compatible fashion I think.)
2. I wanted to permit end users to tell ripgrep about their system's
   hostname in their own way, e.g., by running a custom executable. I
   want this because I know at least for my own use cases, I sometimes
   log into systems using an SSH hostname that is distinct from the
   system's actual hostname (usually because the system is shared in
   some way or changing its hostname is not allowed/practical).

I think that's about it.

Closes #665, Closes #2483
  • Loading branch information
BurntSushi committed Sep 25, 2023
1 parent 23e2113 commit f608d4d
Show file tree
Hide file tree
Showing 12 changed files with 1,267 additions and 764 deletions.
69 changes: 1 addition & 68 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions complete/_rg
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,7 @@ _rg() {
'--debug[show debug messages]'
'--field-context-separator[set string to delimit fields in context lines]'
'--field-match-separator[set string to delimit fields in matching lines]'
'--hostname-bin=[executable for getting system hostname]:hostname executable:_command_names -e'
'--hyperlink-format=[specify pattern for hyperlinks]:pattern'
'--trace[show more verbose debug messages]'
'--dfa-size-limit=[specify upper size limit of generated DFA]:DFA size (bytes)'
Expand Down
89 changes: 82 additions & 7 deletions crates/core/app.rs
Original file line number Diff line number Diff line change
Expand Up @@ -580,6 +580,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_glob_case_insensitive(&mut args);
flag_heading(&mut args);
flag_hidden(&mut args);
flag_hostname_bin(&mut args);
flag_hyperlink_format(&mut args);
flag_iglob(&mut args);
flag_ignore_case(&mut args);
Expand Down Expand Up @@ -1495,19 +1496,93 @@ This flag can be disabled with --no-hidden.
args.push(arg);
}

fn flag_hostname_bin(args: &mut Vec<RGArg>) {
const SHORT: &str = "Run a program to get this system's hostname.";
const LONG: &str = long!(
"\
This flag controls how ripgrep determines this system's hostname. The flag's
value should correspond to an executable (either a path or something that can
be found via your system's *PATH* environment variable). When set, ripgrep will
run this executable, with no arguments, and treat its output (with leading and
trailing whitespace stripped) as your system's hostname.
When not set (the default, or the empty string), ripgrep will try to
automatically detect your system's hostname. On Unix, this corresponds
to calling *gethostname*. On Windows, this corresponds to calling
*GetComputerNameExW* to fetch the system's \"physical DNS hostname.\"
ripgrep uses your system's hostname for producing hyperlinks.
"
);
let arg =
RGArg::flag("hostname-bin", "COMMAND").help(SHORT).long_help(LONG);
args.push(arg);
}

fn flag_hyperlink_format(args: &mut Vec<RGArg>) {
const SHORT: &str = "Set the format of hyperlinks to match results.";
const LONG: &str = long!(
"\
Set the format of hyperlinks to match results. This defines a pattern which
can contain the following placeholders: {file}, {line}, {column}, and {host}.
An empty pattern or 'none' disables hyperlinks.
Set the format of hyperlinks to match results. Hyperlinks make certain elements
of ripgrep's output, such as file paths, clickable. This generally only works
in terminal emulators that support OSC-8 hyperlinks. For example, the format
*file://{host}{file}* will emit an RFC 8089 hyperlink.
The following variables are available in the format string:
*{path}*: Required. This is replaced with a path to a matching file. The
path is guaranteed to be absolute and percent encoded such that it is valid to
put into a URI. Note that a path is guaranteed to start with a */*.
*{host}*: Optional. This is replaced with your system's hostname. On Unix,
this corresponds to calling *gethostname*. On Windows, this corresponds to
calling *GetComputerNameExW* to fetch the system's \"physical DNS hostname.\"
Alternatively, if --hostname-bin was provided, then the hostname returned from
the output of that program will be returned. If no hostname could be found,
then this variable is replaced with the empty string.
*{line}*: Optional. If appropriate, this is replaced with the line number of
a match. If no line number is available (for example, if --no-line-number was
given), then it is automatically replaced with the value *1*.
*{column}*: Optional, but requires the presence of **{line}**. If appropriate,
this is replaced with the column number of a match. If no column number is
available (for example, if --no-column was given), then it is automatically
replaced with the value *1*.
*{wslprefix}*: Optional. This is a special value that is set to
*wsl$/WSL_DISTRO_NAME*, where *WSL_DISTRO_NAME* corresponds to the value of
the equivalent environment variable. If the system is not Unix or if the
*WSL_DISTRO_NAME* environment variable is not set, then this is replaced with
the empty string.
Alternatively, a format string may correspond to one of the following
aliases: default, file, grep+, kitty, macvim, none, subl, textmate, vscode,
vscode-insiders, vscodium.
A format string may be empty. An empty format string is equivalent to the
*none* alias. In this case, hyperlinks will be disabled.
At present, the default format when ripgrep detects a tty on stdout all systems
is *default*. This is an alias that expands to *file://{host}{path}* on Unix
and *file://{path}* on Windows. When stdout is not a tty, then the default
format behaves as if it were *none*. That is, hyperlinks are disabled.
Note that hyperlinks are only written when colors are enabled. To write
hyperlinks without colors, you'll need to configure ripgrep to not colorize
anything without actually disabling all ANSI escape codes completely:
--colors 'path:none' --colors 'line:none' --colors 'column:none' --colors 'match:none'
The {file} placeholder is required, and will be replaced with the absolute
file path with a few adjustments: The leading '/' on Unix is removed,
and '\\' is replaced with '/' on Windows.
ripgrep works this way because it treats the *--color=(never|always|auto)* flag
as a proxy for whether ANSI escape codes should be used at all. This means
that environment variables like *NO_COLOR=1* and *TERM=dumb* not only disable
colors, but hyperlinks as well. Similarly, colors and hyperlinks are disabled
when ripgrep is not writing to a tty. (Unless one forces the issue by setting
*--color=always*.)
As an example, the default pattern on Unix systems is: 'file://{host}/{file}'
For more information on hyperlinks in terminal emulators, see:
https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda
"
);
let arg =
Expand Down
133 changes: 122 additions & 11 deletions crates/core/args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ use grep::pcre2::{
RegexMatcherBuilder as PCRE2RegexMatcherBuilder,
};
use grep::printer::{
default_color_specs, ColorSpecs, HyperlinkPattern, JSONBuilder,
PathPrinter, PathPrinterBuilder, Standard, StandardBuilder, Stats,
Summary, SummaryBuilder, SummaryKind, JSON,
default_color_specs, ColorSpecs, HyperlinkConfig, HyperlinkEnvironment,
HyperlinkFormat, JSONBuilder, PathPrinter, PathPrinterBuilder, Standard,
StandardBuilder, Stats, Summary, SummaryBuilder, SummaryKind, JSON,
};
use grep::regex::{
RegexMatcher as RustRegexMatcher,
Expand Down Expand Up @@ -236,7 +236,7 @@ impl Args {
let mut builder = PathPrinterBuilder::new();
builder
.color_specs(self.matches().color_specs()?)
.hyperlink_pattern(self.matches().hyperlink_pattern()?)
.hyperlink(self.matches().hyperlink_config()?)
.separator(self.matches().path_separator()?)
.terminator(self.matches().path_terminator().unwrap_or(b'\n'));
Ok(builder.build(wtr))
Expand Down Expand Up @@ -774,7 +774,7 @@ impl ArgMatches {
let mut builder = StandardBuilder::new();
builder
.color_specs(self.color_specs()?)
.hyperlink_pattern(self.hyperlink_pattern()?)
.hyperlink(self.hyperlink_config()?)
.stats(self.stats())
.heading(self.heading())
.path(self.with_filename(paths))
Expand Down Expand Up @@ -814,7 +814,7 @@ impl ArgMatches {
builder
.kind(self.summary_kind().expect("summary format"))
.color_specs(self.color_specs()?)
.hyperlink_pattern(self.hyperlink_pattern()?)
.hyperlink(self.hyperlink_config()?)
.stats(self.stats())
.path(self.with_filename(paths))
.max_matches(self.max_count()?)
Expand Down Expand Up @@ -1126,11 +1126,21 @@ impl ArgMatches {
/// for the current system is used if the value is not set.
///
/// If an invalid pattern is provided, then an error is returned.
fn hyperlink_pattern(&self) -> Result<HyperlinkPattern> {
Ok(match self.value_of_lossy("hyperlink-format") {
Some(pattern) => HyperlinkPattern::from_str(&pattern)?,
None => HyperlinkPattern::default_file_scheme(),
})
fn hyperlink_config(&self) -> Result<HyperlinkConfig> {
let mut env = HyperlinkEnvironment::new();
env.host(hostname(self.value_of_os("hostname-bin")))
.wsl_prefix(wsl_prefix());
let fmt = match self.value_of_lossy("hyperlink-format") {
None => HyperlinkFormat::from_str("default").unwrap(),
Some(format) => match HyperlinkFormat::from_str(&format) {
Ok(format) => format,
Err(err) => {
let msg = format!("invalid hyperlink format: {err}");
return Err(msg.into());
}
},
};
Ok(HyperlinkConfig::new(env, fmt))
}

/// Returns true if ignore files should be processed case insensitively.
Expand Down Expand Up @@ -1838,6 +1848,107 @@ fn current_dir() -> Result<PathBuf> {
.into())
}

/// Retrieves the hostname that ripgrep should use wherever a hostname is
/// required. Currently, that's just in the hyperlink format.
///
/// This works by first running the given binary program (if present and with
/// no arguments) to get the hostname after trimming leading and trailing
/// whitespace. If that fails for any reason, then it falls back to getting
/// the hostname via platform specific means (e.g., `gethostname` on Unix).
///
/// The purpose of `bin` is to make it possible for end users to override how
/// ripgrep determines the hostname.
fn hostname(bin: Option<&OsStr>) -> Option<String> {
let Some(bin) = bin else { return platform_hostname() };
let bin = match grep::cli::resolve_binary(bin) {
Ok(bin) => bin,
Err(err) => {
log::debug!(
"failed to run command '{bin:?}' to get hostname \
(falling back to platform hostname): {err}",
);
return platform_hostname();
}
};
let mut cmd = process::Command::new(&bin);
cmd.stdin(process::Stdio::null());
let rdr = match grep::cli::CommandReader::new(&mut cmd) {
Ok(rdr) => rdr,
Err(err) => {
log::debug!(
"failed to spawn command '{bin:?}' to get \
hostname (falling back to platform hostname): {err}",
);
return platform_hostname();
}
};
let out = match io::read_to_string(rdr) {
Ok(out) => out,
Err(err) => {
log::debug!(
"failed to read output from command '{bin:?}' to get \
hostname (falling back to platform hostname): {err}",
);
return platform_hostname();
}
};
let hostname = out.trim();
if hostname.is_empty() {
log::debug!(
"output from command '{bin:?}' is empty after trimming \
leading and trailing whitespace (falling back to \
platform hostname)",
);
return platform_hostname();
}
Some(hostname.to_string())
}

/// Attempts to get the hostname by using platform specific routines. For
/// example, this will do `gethostname` on Unix and `GetComputerNameExW` on
/// Windows.
fn platform_hostname() -> Option<String> {
let hostname_os = match grep::cli::hostname() {
Ok(x) => x,
Err(err) => {
log::debug!("could not get hostname: {}", err);
return None;
}
};
let Some(hostname) = hostname_os.to_str() else {
log::debug!(
"got hostname {:?}, but it's not valid UTF-8",
hostname_os
);
return None;
};
Some(hostname.to_string())
}

/// Returns a value that is meant to fill in the `{wslprefix}` variable for
/// a user given hyperlink format. A WSL prefix is a share/network like thing
/// that is meant to permit Windows applications to open files stored within
/// a WSL drive.
///
/// If a WSL distro name is unavailable, not valid UTF-8 or this isn't running
/// in a Unix environment, then this returns None.
///
/// See: <https://learn.microsoft.com/en-us/windows/wsl/filesystems>
fn wsl_prefix() -> Option<String> {
if !cfg!(unix) {
return None;
}
let distro_os = env::var_os("WSL_DISTRO_NAME")?;
let Some(distro) = distro_os.to_str() else {
log::debug!(
"found WSL_DISTRO_NAME={:?}, but value is not UTF-8",
distro_os
);
return None;
};
Some(format!("wsl$/{distro}"))
}

/// Tries to assign a timestamp to every `Subject` in the vector to help with
/// sorting Subjects by time.
fn load_timestamps<G>(
Expand Down
2 changes: 1 addition & 1 deletion crates/printer/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ serde = ["dep:base64", "dep:serde", "dep:serde_json"]
[dependencies]
base64 = { version = "0.21.4", optional = true }
bstr = "1.6.2"
gethostname = "0.4.3"
grep-matcher = { version = "0.1.6", path = "../matcher" }
grep-searcher = { version = "0.1.11", path = "../searcher" }
log = "0.4.5"
termcolor = "1.3.0"
serde = { version = "1.0.188", optional = true, features = ["derive"] }
serde_json = { version = "1.0.107", optional = true }
Expand Down
Loading

0 comments on commit f608d4d

Please sign in to comment.