Skip to content

feat: propagate input loading/collecting errors to top level #1864

Merged
thomas-zahner merged 66 commits intolycheeverse:masterfrom
rina-forks:propagate-early-errors
Nov 17, 2025
Merged

feat: propagate input loading/collecting errors to top level #1864
thomas-zahner merged 66 commits intolycheeverse:masterfrom
rina-forks:propagate-early-errors

Conversation

@katrinafyi
Copy link
Member

@katrinafyi katrinafyi commented Oct 4, 2025

previously, errors which happened while collecting links (e.g., due to invalid base join) would print a terse message to the console and then they would be forgotten. this looked like:

[WARN] Error creating request: InvalidPathToUri("/relative-link/")
🔍 0 Total (in 0s) ✅ 0 OK 🚫 0 Errors

this is unhelpful because it only shows the internal enum name and it's not reported in the final error count, so it's very easy to miss.

now, these "early" errors get propagated into the final error list and displayed alongside the HTTP failures.

[a.md]:
   [ERROR] error:// | Error building URL for "/relative-link/" (Attribute: Some("href")): Cannot convert path '/relative-link/' to a URI: To resolve relative links in local files, provide a root dir

🔍 1 Total (in 0s) ✅ 0 OK 🚫 1 Error

this makes them more obvious and lets us make use of the existing functions for displaying error details and suggested fixes.

this is implemented by changing the request-constructing functions to return a Result<_, RequestError> rather than Result<_, ErrorKind>. RequestError is a new error type which contains information about the URL or input source which caused the error. it also contains the underlying ErrorKind.

/// An error which occurs while trying to construct a [`Request`] object.
/// That is, an error which happens while trying to load links from an input
/// source.
#[derive(Error, Debug, PartialEq, Eq, Hash)]
pub enum RequestError {
/// Unable to construct a URL for a link appearing within the given source.
#[error("Error building URL for {0}: {2}")]
CreateRequestItem(RawUri, ResolvedInputSource, #[source] Box<ErrorKind>),
/// Unable to load the content of an input source.
#[error("Error reading input '{0}': {1}")]
GetInputContent(InputSource, #[source] Box<ErrorKind>),
}

i think it's nice to have it as a new error type because it makes it clear to the user that they only have to handle these two errors, and it avoids making the ErrorKind type recursive.

this makes it easier to handle the returned error. previously, if an Err did occur in the collected request stream, this would lead to an unexpected early exit and a tokio error. so, i think that changing this to Result<_, RequestError> is no great burden because lychee-bin did not have very good handling of the old ErrorKind.

some more examples of the new error messages:

$ cargo run -- https://google.com 'noasdjfi/[' non-existing-fjdaifdsjai.com no-perms https://example.com 
[http://non-existing-fjdaifdsjai.com/]:
   [ERROR] error:// | Error reading input 'http://non-existing-fjdaifdsjai.com/': Network error: Connection failed. Check network connectivity and firewall settings

[https://google.com/]:
     [404] https://google.com/images/branding/googlelogo/1x/googlelogo_white_background_color_272x92dp.png | Rejected status code (this depends on your "accept" configuration): Not Found

[no-perms]:
   [ERROR] error:// | Error reading input 'no-perms': Cannot traverse input directory: no-perms: IO error for operation on no-perms: Permission denied (os error 13): Directory traversal failed: no-perms: IO error for operation on no-perms: Permission denied (os error 13). Check directory permissions

[noasdjfi/[]:
   [ERROR] error:// | Error reading input 'noasdjfi/[': UNIX glob pattern is invalid: Invalid glob pattern: Pattern syntax error near position 9: invalid range pattern. Check pattern syntax

🔍 33 Total (in 4s) ✅ 5 OK 🚫 4 Errors 🔀 24 Redirects

compared to before (below). also note that the old behaviour is nondeterministic. any one of the errors could be the one which is printed, and it aborts the program with no other output.

$ cargo run -- https://google.com 'noasdjfi/[' non-existing-fjdaifdsjai.com no-perms 
thread 'tokio-runtime-worker' panicked at lychee-bin/src/commands/check.rs:256:18:
cannot send response to queue: SendError { .. }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: UNIX glob pattern is invalid

Caused by:
    Pattern syntax error near position 9: invalid range pattern

this implements step (2) of the plan here #1624 (comment)

outdated commentary. i changed it to a new RequestError type now

however, i don't like the way this is implemented. it has to "smuggle" the early errors through a new error case,

ErrorKind::CreateRequestItem(RawUri, ResolvedInputSource, Box<ErrorKind>)

and all of the commands have to deliberately handle this case by bypassing the check logic and directly constructing a failed Response. it would be very easy for a user to forget to do this. originally, i wanted to make this new error separate from the usual ErrorKind, but the lychee_lib::Result<T> type with ErrorKind is so pervasive and it would've needed extensive changes.

maybe, instead, i could embed this into the Request type by making its Uri field into a Result<Uri, CreatRequestError> but that seems not great too.

another downside of the current approach is it uses the fake error:// URL to display these messages. this is because the ResponseBody needs a "valid" Uri.

TODO:

  • tests. fix old and add new.
  • move check handling into handle function

also, along the way, this changes the ErrorKind::details cases for invalid path and invalid base join. these details previously just repeated the error message and would look like "Cannot convert path '/../../' to a URI: Cannot convert path to URI: '/../../'. Check path format".

related to #1265

@katrinafyi katrinafyi changed the title feat: propagate URL parsing/resolving errors to top level feat: propagate URL parsing/joining errors to top level Oct 4, 2025
@katrinafyi
Copy link
Member Author

I see that this PR contradicts an existing test case which says the current behaviour is intentional.

lychee/lychee-bin/tests/cli.rs

Lines 1458 to 1462 in 54e425c

/// If `base-dir` is not set, don't throw an error in case we encounter
/// an absolute local link (e.g. `/about`) within a file.
/// Instead, simply ignore the link.
#[test]
fn test_ignore_absolute_local_links_without_base() -> Result<()> {

Maybe there should be an option about whether to treat absolute links as an error when root-dir is not specified? At the moment, it would not be possible to get the old "ignoring" behaviour after this PR. Or, we just change the test

@katrinafyi katrinafyi changed the title feat: propagate URL parsing/joining errors to top level feat: propagate input loading/collecting errors to top level Oct 7, 2025
katrinafyi and others added 2 commits October 7, 2025 16:53
this shouldn't be needed anymore since we no longer use a recursive case in ErrorKind. instead, we use the new RequestError type.
@thomas-zahner
Copy link
Member

We have a new problem! Preprocessor errors are being turned into request errors and display like this:

@katrinafyi Honestly, I'm okay with this. Conceptually, it's not wrong to treat preprocessing as part of link extraction and therefore the link check process. This PR then also has the benefit that if preprocessing only fails on one/some file lychee doesn't abort, which is an improvement.

Before

cargo run Cargo.* --preprocess aa
Error: Preprocessor command 'aa' failed: could not start: No such file or directory (os error 2)

PR

cargo run Cargo.* --preprocess aa
  2/2 ━━━━━━━━━━━━━━━━━━━━ Finished extracting links
Issues found in 2 inputs. Find details below.

[Cargo.lock]:
   [ERROR] error:// | Error reading input 'Cargo.lock': Preprocessor command 'aa' failed: could not start: No such file or directory (os error 2): Command 'aa' failed could not start: No such file or directory (os error 2). Check value of the pre option

[Cargo.toml]:
   [ERROR] error:// | Error reading input 'Cargo.toml': Preprocessor command 'aa' failed: could not start: No such file or directory (os error 2): Command 'aa' failed could not start: No such file or directory (os error 2). Check value of the pre option

🔍 2 Total (in 0s) ✅ 0 OK 🚫 2 Errors

We could think about a simple prevalidation step, as you mentioned which checks if the value of --preprocess itself is valid (program exits in path) but this is not very important. For reference ripgrep doesn't do this either and this PR makes lychee behave more similar to ripgrep, which I like. As you mention it is impossible to validate if the preprocess commands exit successfully, without running them first. This would be another reason why it makes sense to conceptually move preprocessing into the link check process.

rg 'some text' Cargo.* --pre aa
rg: Cargo.toml: preprocessor command could not start: '"aa" "Cargo.toml"': No such file or directory (os error 2)
rg: Cargo.lock: preprocessor command could not start: '"aa" "Cargo.lock"': No such file or directory (os error 2)

Just two things I noticed:

  • In what cases do we use the error:// dummy value? Doesn't it make sense to make it optional and not display any dummy value?
  • The error message with preprocess is repetitive and then mentions the pre option which was the name before I renamed it to preprocess. Definitely my mistake 😅

@katrinafyi
Copy link
Member Author

@thomas-zahner Thanks for your reply. Yeah, I can see the logic in that approach. I'm happy to leave preprocess errors as link check errors.

Re the error://, it appears for every error arising before a HTTP request is made. I also don't like it very much, but it's passed through the stats and response formatter logic, so avoiding it would need a lot of changes there. But maybe this is something I should dig deeper into. The error:// url could also be confusing.

@katrinafyi
Copy link
Member Author

I don't want to pursue making the error URL optional in this PR. I think it will be a simple but extensive change. You just have to allow format_response to take an Option<Uri> instead of a ResponseBody (that must contain Uri). If you're already looking at changing these functions in #1914, maybe that is a better place to investigate it.

In the meantime, it could be changed to error: instead of error://. That might be less weird for a user to see.

@thomas-zahner thomas-zahner merged commit 694ffc2 into lycheeverse:master Nov 17, 2025
7 checks passed
@mre mre mentioned this pull request Nov 17, 2025
@thomas-zahner
Copy link
Member

Many thanks for this PR!
It's quite an improvement I think. In the long run hiding the error: dummy value would be nice but that's not urgent of course.

@mre mre mentioned this pull request Dec 5, 2025
AfoHT added a commit to AfoHT/rtic that referenced this pull request Jan 14, 2026
Exclude /$version links that are not yet populated
during this building stage
Add root-dir to handle relative links

Newline added to the file

The change in lychee introduced in
lycheeverse/lychee#1864 (comment)
github-merge-queue bot pushed a commit to rtic-rs/rtic that referenced this pull request Jan 15, 2026
Exclude /$version links that are not yet populated
during this building stage
Add root-dir to handle relative links

Newline added to the file

The change in lychee introduced in
lycheeverse/lychee#1864 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants