feat: propagate input loading/collecting errors to top level by katrinafyi · Pull Request #1864 · lycheeverse/lychee

katrinafyi · 2025-10-04T10:09:16Z

previously, errors which happened while collecting links (e.g., due to invalid base join) would print a terse message to the console and then they would be forgotten. this looked like:

[WARN] Error creating request: InvalidPathToUri("/relative-link/")
🔍 0 Total (in 0s) ✅ 0 OK 🚫 0 Errors

this is unhelpful because it only shows the internal enum name and it's not reported in the final error count, so it's very easy to miss.

now, these "early" errors get propagated into the final error list and displayed alongside the HTTP failures.

[a.md]:
   [ERROR] error:// | Error building URL for "/relative-link/" (Attribute: Some("href")): Cannot convert path '/relative-link/' to a URI: To resolve relative links in local files, provide a root dir

🔍 1 Total (in 0s) ✅ 0 OK 🚫 1 Error

this makes them more obvious and lets us make use of the existing functions for displaying error details and suggested fixes.

this is implemented by changing the request-constructing functions to return a Result<_, RequestError> rather than Result<_, ErrorKind>. RequestError is a new error type which contains information about the URL or input source which caused the error. it also contains the underlying ErrorKind.

lychee/lychee-lib/src/types/request.rs

Lines 7 to 19 in 976904a

    
           /// An error which occurs while trying to construct a [`Request`] object. 
        
           /// That is, an error which happens while trying to load links from an input 
        
           /// source. 
        
           #[derive(Error, Debug, PartialEq, Eq, Hash)] 
        
           pub enum RequestError { 
        
               /// Unable to construct a URL for a link appearing within the given source. 
        
               #[error("Error building URL for {0}: {2}")] 
        
               CreateRequestItem(RawUri, ResolvedInputSource, #[source] Box<ErrorKind>), 
        
               /// Unable to load the content of an input source. 
        
               #[error("Error reading input '{0}': {1}")] 
        
               GetInputContent(InputSource, #[source] Box<ErrorKind>), 
        
           }

i think it's nice to have it as a new error type because it makes it clear to the user that they only have to handle these two errors, and it avoids making the ErrorKind type recursive.

this makes it easier to handle the returned error. previously, if an Err did occur in the collected request stream, this would lead to an unexpected early exit and a tokio error. so, i think that changing this to Result<_, RequestError> is no great burden because lychee-bin did not have very good handling of the old ErrorKind.

some more examples of the new error messages:

$ cargo run -- https://google.com 'noasdjfi/[' non-existing-fjdaifdsjai.com no-perms https://example.com 
[http://non-existing-fjdaifdsjai.com/]:
   [ERROR] error:// | Error reading input 'http://non-existing-fjdaifdsjai.com/': Network error: Connection failed. Check network connectivity and firewall settings

[https://google.com/]:
     [404] https://google.com/images/branding/googlelogo/1x/googlelogo_white_background_color_272x92dp.png | Rejected status code (this depends on your "accept" configuration): Not Found

[no-perms]:
   [ERROR] error:// | Error reading input 'no-perms': Cannot traverse input directory: no-perms: IO error for operation on no-perms: Permission denied (os error 13): Directory traversal failed: no-perms: IO error for operation on no-perms: Permission denied (os error 13). Check directory permissions

[noasdjfi/[]:
   [ERROR] error:// | Error reading input 'noasdjfi/[': UNIX glob pattern is invalid: Invalid glob pattern: Pattern syntax error near position 9: invalid range pattern. Check pattern syntax

🔍 33 Total (in 4s) ✅ 5 OK 🚫 4 Errors 🔀 24 Redirects

compared to before (below). also note that the old behaviour is nondeterministic. any one of the errors could be the one which is printed, and it aborts the program with no other output.

$ cargo run -- https://google.com 'noasdjfi/[' non-existing-fjdaifdsjai.com no-perms 
thread 'tokio-runtime-worker' panicked at lychee-bin/src/commands/check.rs:256:18:
cannot send response to queue: SendError { .. }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: UNIX glob pattern is invalid

Caused by:
    Pattern syntax error near position 9: invalid range pattern

this implements step (2) of the plan here #1624 (comment)

outdated commentary. i changed it to a new RequestError type now

however, i don't like the way this is implemented. it has to "smuggle" the early errors through a new error case,

ErrorKind::CreateRequestItem(RawUri, ResolvedInputSource, Box<ErrorKind>)

and all of the commands have to deliberately handle this case by bypassing the check logic and directly constructing a failed Response. it would be very easy for a user to forget to do this. originally, i wanted to make this new error separate from the usual ErrorKind, but the lychee_lib::Result<T> type with ErrorKind is so pervasive and it would've needed extensive changes.

maybe, instead, i could embed this into the Request type by making its Uri field into a Result<Uri, CreatRequestError> but that seems not great too.

another downside of the current approach is it uses the fake error:// URL to display these messages. this is because the ResponseBody needs a "valid" Uri.

TODO:

tests. fix old and add new.
move check handling into handle function

also, along the way, this changes the ErrorKind::details cases for invalid path and invalid base join. these details previously just repeated the error message and would look like "Cannot convert path '/../../' to a URI: Cannot convert path to URI: '/../../'. Check path format".

related to #1265

This reverts commit a9fe9ad.

This reverts commit f7488dd.

This reverts commit 963b276.

katrinafyi · 2025-10-04T10:50:19Z

I see that this PR contradicts an existing test case which says the current behaviour is intentional.

lychee/lychee-bin/tests/cli.rs

Lines 1458 to 1462 in 54e425c

    
               /// If `base-dir` is not set, don't throw an error in case we encounter 
        
               /// an absolute local link (e.g. `/about`) within a file. 
        
               /// Instead, simply ignore the link. 
        
               #[test] 
        
               fn test_ignore_absolute_local_links_without_base() -> Result<()> {

Maybe there should be an option about whether to treat absolute links as an error when root-dir is not specified? At the moment, it would not be possible to get the old "ignoring" behaviour after this PR. Or, we just change the test

we still ned to feed it back to Status and that requires an old ErrorKind. maybe we add a new case to Status.

this slightly winds back the ResolvedInputSource usage and changes Response back to an ordinary InputSource

this shouldn't be needed anymore since we no longer use a recursive case in ErrorKind. instead, we use the new RequestError type.

thomas-zahner · 2025-11-13T08:31:54Z

We have a new problem! Preprocessor errors are being turned into request errors and display like this:

@katrinafyi Honestly, I'm okay with this. Conceptually, it's not wrong to treat preprocessing as part of link extraction and therefore the link check process. This PR then also has the benefit that if preprocessing only fails on one/some file lychee doesn't abort, which is an improvement.

Before

cargo run Cargo.* --preprocess aa
Error: Preprocessor command 'aa' failed: could not start: No such file or directory (os error 2)

PR

cargo run Cargo.* --preprocess aa
  2/2 ━━━━━━━━━━━━━━━━━━━━ Finished extracting links
Issues found in 2 inputs. Find details below.

[Cargo.lock]:
   [ERROR] error:// | Error reading input 'Cargo.lock': Preprocessor command 'aa' failed: could not start: No such file or directory (os error 2): Command 'aa' failed could not start: No such file or directory (os error 2). Check value of the pre option

[Cargo.toml]:
   [ERROR] error:// | Error reading input 'Cargo.toml': Preprocessor command 'aa' failed: could not start: No such file or directory (os error 2): Command 'aa' failed could not start: No such file or directory (os error 2). Check value of the pre option

🔍 2 Total (in 0s) ✅ 0 OK 🚫 2 Errors

We could think about a simple prevalidation step, as you mentioned which checks if the value of --preprocess itself is valid (program exits in path) but this is not very important. For reference ripgrep doesn't do this either and this PR makes lychee behave more similar to ripgrep, which I like. As you mention it is impossible to validate if the preprocess commands exit successfully, without running them first. This would be another reason why it makes sense to conceptually move preprocessing into the link check process.

rg 'some text' Cargo.* --pre aa
rg: Cargo.toml: preprocessor command could not start: '"aa" "Cargo.toml"': No such file or directory (os error 2)
rg: Cargo.lock: preprocessor command could not start: '"aa" "Cargo.lock"': No such file or directory (os error 2)

Just two things I noticed:

In what cases do we use the error:// dummy value? Doesn't it make sense to make it optional and not display any dummy value?
The error message with preprocess is repetitive and then mentions the pre option which was the name before I renamed it to preprocess. Definitely my mistake 😅

katrinafyi · 2025-11-13T13:50:32Z

@thomas-zahner Thanks for your reply. Yeah, I can see the logic in that approach. I'm happy to leave preprocess errors as link check errors.

Re the error://, it appears for every error arising before a HTTP request is made. I also don't like it very much, but it's passed through the stats and response formatter logic, so avoiding it would need a lot of changes there. But maybe this is something I should dig deeper into. The error:// url could also be confusing.

This reverts commit 759b691.

…rrors

katrinafyi · 2025-11-16T06:09:40Z

I don't want to pursue making the error URL optional in this PR. I think it will be a simple but extensive change. You just have to allow format_response to take an Option<Uri> instead of a ResponseBody (that must contain Uri). If you're already looking at changing these functions in #1914, maybe that is a better place to investigate it.

In the meantime, it could be changed to error: instead of error://. That might be less weird for a user to see.

lychee-lib/src/types/request_error.rs

lychee-lib/src/utils/request.rs

lychee-lib/src/types/request_error.rs

thomas-zahner · 2025-11-17T12:29:02Z

Many thanks for this PR!
It's quite an improvement I think. In the long run hiding the error: dummy value would be nice but that's not urgent of course.

Exclude /$version links that are not yet populated during this building stage Add root-dir to handle relative links Newline added to the file The change in lychee introduced in lycheeverse/lychee#1864 (comment)

katrinafyi added 10 commits October 4, 2025 18:06

stash

f7488dd

stash very broken

a9fe9ad

Revert "stash very broken"

cef997e

This reverts commit a9fe9ad.

Revert "stash"

25f130b

This reverts commit f7488dd.

feat: add CreateRequestItem error kind

d272a2b

wrap

f5fd8df

smuggle request errors through CreateRequestItem

e4c5482

stash Result with RawUri

963b276

Revert "stash Result with RawUri"

9461325

This reverts commit 963b276.

add help

c1bd7ca

katrinafyi changed the title ~~feat: propagate URL parsing/resolving errors to top level~~ feat: propagate URL parsing/joining errors to top level Oct 4, 2025

katrinafyi added 2 commits October 4, 2025 20:19

add CollectResult to be less dubious

65afcdd

fix lints

891255d

katrinafyi added 13 commits October 4, 2025 20:56

touch

ff87745

introduce separate RequestError type. but...

024772a

we still ned to feed it back to Status and that requires an old ErrorKind. maybe we add a new case to Status.

add Status::RequestError

3ee280e

mark as error

1ad2020

remove ErrorKind::CreateRequestItem case

25d32ce

blah

c7536d1

uncomment

d10190a

restore old panic behaviour for input-source errors

4f74a77

docs

3bca6c8

remove unused imports

49c550e

fix example

6a007c2

clippy. includes boxing errorkind because it's big

d7ebbc0

propagate input loading errors too

17fad2c

this slightly winds back the ResolvedInputSource usage and changes Response back to an ordinary InputSource

katrinafyi changed the title ~~feat: propagate URL parsing/joining errors to top level~~ feat: propagate input loading/collecting errors to top level Oct 7, 2025

katrinafyi and others added 2 commits October 7, 2025 16:53

handle is no longer fallible. add helper function

976904a

Delete clippy.toml

f0eda83

this shouldn't be needed anymore since we no longer use a recursive case in ErrorKind. instead, we use the new RequestError type.

katrinafyi added 2 commits November 13, 2025 11:13

add light test case test_create_request_from_relative_file_path_errors

918e216

comment2

acfa23c

katrinafyi added 10 commits November 16, 2025 15:12

RequestBatch

759b691

Revert "RequestBatch"

bbbd94b

This reverts commit 759b691.

use Vec<Result<..>>

b7bb174

add LycheeResult to disambiguate from std Result

438a2d3

manually deduplicate valid requests but not errors.

50e91d1

change preprocessor tests to expect link checking errors

8a6bc3f

clippy

9a34dac

Merge remote-tracking branch 'upstream/master' into propagate-early-e…

64be9a3

…rrors

merge main_command removal

6db8db4

change to error: and add into_response helper function

25b1290

katrinafyi added 2 commits November 16, 2025 16:26

move request_error into separate file.

70f793b

revert

a6ba2a1

thomas-zahner requested changes Nov 17, 2025

View reviewed changes

lychee-lib/src/types/request_error.rs Outdated Show resolved Hide resolved

lychee-lib/src/utils/request.rs Outdated Show resolved Hide resolved

lychee-lib/src/types/request_error.rs Outdated Show resolved Hide resolved

katrinafyi added 3 commits November 17, 2025 21:20

remove box (thanks to Thomas Zahner)

ec943db

use lazylock

fdbf533

inline errs_iter and reqs_iter with parentheses

5aff2e6

thomas-zahner approved these changes Nov 17, 2025

View reviewed changes

lychee-lib/src/types/request_error.rs Show resolved Hide resolved

thomas-zahner merged commit 694ffc2 into lycheeverse:master Nov 17, 2025
7 checks passed

mre mentioned this pull request Nov 17, 2025

chore: release v0.22.0 #1894

Merged

mre mentioned this pull request Dec 5, 2025

chore: release v0.23.0 #1946

Closed

jstirnaman mentioned this pull request Dec 10, 2025

chore(ci): update link-checker to v1.2.5 influxdata/docs-v2#6602

Merged

2 tasks

katrinafyi mentioned this pull request Jan 21, 2026

root-relative links ignored with default config #1265

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: propagate input loading/collecting errors to top level #1864

feat: propagate input loading/collecting errors to top level #1864
thomas-zahner merged 66 commits intolycheeverse:masterfrom
rina-forks:propagate-early-errors

katrinafyi commented Oct 4, 2025 •

edited

Loading

Uh oh!

katrinafyi commented Oct 4, 2025

Uh oh!

thomas-zahner commented Nov 13, 2025

Uh oh!

katrinafyi commented Nov 13, 2025

Uh oh!

katrinafyi commented Nov 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomas-zahner commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	/// An error which occurs while trying to construct a [`Request`] object.
	/// That is, an error which happens while trying to load links from an input
	/// source.
	#[derive(Error, Debug, PartialEq, Eq, Hash)]
	pub enum RequestError {
	/// Unable to construct a URL for a link appearing within the given source.
	#[error("Error building URL for {0}: {2}")]
	CreateRequestItem(RawUri, ResolvedInputSource, #[source] Box<ErrorKind>),

	/// Unable to load the content of an input source.
	#[error("Error reading input '{0}': {1}")]
	GetInputContent(InputSource, #[source] Box<ErrorKind>),
	}

Uh oh!

Conversation

katrinafyi commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

katrinafyi commented Oct 4, 2025

Uh oh!

thomas-zahner commented Nov 13, 2025

Uh oh!

katrinafyi commented Nov 13, 2025

Uh oh!

katrinafyi commented Nov 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomas-zahner commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

katrinafyi commented Oct 4, 2025 •

edited

Loading