Skip to content

Conversation

@katrinafyi
Copy link
Member

@katrinafyi katrinafyi commented Oct 23, 2025

previously, there was a lot of duplicated code between
InputResolver::resolve_input and Input::get_sources, where
resolve_input is used for main link checking and get_sources is used for
dumping inputs.

after this PR, Input::get_sources is implemented in terms of
InputResolver::resolve. however, in doing so, we have to loosen the
lifetime constraints on the &Input argument of resolve, so there's a
lot of changes in resolve_input to effectively transpose the logic.
that is, previously it matched on the input source within a single
try_stream!, and now it matches on the input source first then chooses
between multiple try_streams. however, because try_stream constructs a
new anonymous type every use, we then need to change the type to pin box
dyn send.

i don't know why this is needed but it seems to work. all in all, it's a
surprisingly difficult refactor to get rid of some unpleasant code
duplication.

idrc if this gets merged or not, but the transposing of resolve_input
does open the possibility of returning a Result<Box<dyn Stream ...>>
to represent input source failures, as desired in
https://www.github.com/lycheeverse/lychee/pull/1864#issuecomment-3389278177

TODO

  • double check comments are still accurate?

previously, there was a lot of duplicated code between
InputResolver::resolve_input and Input::get_sources, where
resolve_input is used for main link checking and get_sources is used for
dumping inputs.

after this PR, Input::get_sources is implemented in terms of
InputResolver::resolve. however, in doing so, we have to loosen the
lifetime constraints on the `&Input` argument of resolve, so there's a
lot of changes in resolve_input to effectively _transpose_ the logic.
that is, previously it matched on the input source within a single
try_stream!, and now it matches on the input source first then chooses
between multiple try_streams. however, because try_stream constructs a
new anonymous type every use, we then need to change the type to pin box
dyn send.

i don't know why this is needed but it seems to work. all in all, it's a
surprisingly difficult refactor to get rid of some unpleasant code
duplication.

idrc if this gets merged or not, but the transposing of resolve_input
does open the possibility of returning a `Result<Box<dyn Stream ...>>`
to represent input source failures, as desired in
https://www.github.com/lycheeverse/lychee/pull/1864#issuecomment-3389278177
@thomas-zahner
Copy link
Member

Thank you very much for this amazing PR 🚀 It was something that bothered me as well but I didn't end up digging so deep, so really appreciate it. I would really like to merge. What does your TODO refer to?

@katrinafyi
Copy link
Member Author

katrinafyi commented Nov 9, 2025

Thanks for the kind words! The todo comment was about checking the comments and moving them to where the code is implemented. I've done that now, so it should be ready to review :)

Edit: I can't see the test failure locally - flaky?

Conflicts:
lychee-lib/src/types/input/input.rs
lychee-lib/src/types/input/resolver.rs
@mre
Copy link
Member

mre commented Nov 11, 2025

Nice work!

i don't know why this is needed but it seems to work.

Before, resolve_input could borrow input: &'a Input and return impl Stream + 'a because the entire stream was constructed inside a single try_stream! macro call. The lifetime 'a was tied to the input reference throughout.
In your new version, you need to match on input.source first, then construct different streams based on the variant. But each branch wants to move or clone data from input.source, and you can't borrow input for 'a while also moving/cloning from it in the same expression.

The change from input: &'a Input to input: &'_ Input means each match arm constructs a stream with potentially different lifetime requirements. Now, each try_stream! creates a unique anonymous type. You can't return impl Stream when each arm produces a different concrete type.

However, by cloning the data (url.clone(), path.clone(), etc.), each stream becomes independent of the original input borrow. The streams live as long as needed without holding onto &'a Input. Finally, Box<Pin<dyn Stream>> erases the concrete types, allowing all arms to return the same type. It basically means "I don't care about the concrete type, just give me something that implements Stream and lives long enough."

Pin is necessary because Stream is a stateful, multi-value async iterator. The Stream::poll_next method must be called on the same memory location repeatedly to maintain iteration state correctly. Pin to prevents moves between polling calls.

As a side note, if we wanted to avoid the dynamic dispatch, the typical alternative is to write out all variants as an enum:

enum ResolveStream<'a> {
    RemoteUrl(...),
    FsGlob(...),
    FsPath(...),
    // ...
}

But honestly, Box<Pin<dyn Stream>> is probably the most pragmatic choice here.

Copy link
Member

@mre mre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor comments. Apart from that looks good to go!

/// Note: Individual glob match failures are logged to stderr but don't terminate the stream.
/// However, directory traversal errors will stop processing and return the error immediately.
/// Returns an error if [`InputResolver::resolve`] returns an error.
pub fn get_sources(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added #[must_use] to resolve() but not to get_sources(). Since get_sources() also returns a Stream, you might want consistency:

Suggested change
pub fn get_sources(
#[must_use]
pub fn get_sources(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried that but clippy didn't like it aha. I guess must_use is already implicitly propagated? Honestly, I only add must_use as clippy directs and am not an expert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right. What was the clippy message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was this message here https://github.com/lycheeverse/lychee/actions/runs/19263234742/job/55072805923?pr=1880. I could've kept it and added a comment message, but I didn't think of one.

error: this function has a `#[must_use]` attribute with no message, but returns a type already marked as `#[must_use]`
   --> lychee-lib/src/types/input/input.rs:188:5
    |
188 | /     pub fn get_sources(
189 | |         self,
190 | |         file_extensions: FileExtensions,
191 | |         skip_hidden: bool,
192 | |         skip_ignored: bool,
193 | |         excluded_paths: &PathExcludes,
194 | |     ) -> impl Stream<Item = Result<String>> {
    | |___________________________________________^
    |
    = help: either add some descriptive message or remove the attribute
    = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#double_must_use
    = note: `-D clippy::double-must-use` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::double_must_use)]`

#[must_use]
pub fn resolve<'a>(
input: &'a Input,
input: &'_ Input,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The &'_ is fine but I think &Input should work, too, and would be more conventional since you're explicitly not tying it to 'a.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, idk why but i had the idea that that wouldn't work. Thanks!

@mre mre merged commit 10c87a2 into lycheeverse:master Nov 12, 2025
7 checks passed
@mre
Copy link
Member

mre commented Nov 12, 2025

Thanks for making lychee's code a little cleaner.

@mre mre mentioned this pull request Nov 11, 2025
@mre mre mentioned this pull request Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants