Skip to content

Conversation

@katrinafyi
Copy link
Member

now, when providing an input with glob characters but an invalid glob pattern, this will cause a "cannot parse inputs" error rather than erroring out later during link checking. this makes use of the existing ErrorKind::InvalidGlobPattern error case.

for example, a glob with unbalanced brackets is invalid and now looks like this:

$ cargo run -- 'noasdjfi/['
Error: Cannot parse inputs from arguments

Caused by:
    0: UNIX glob pattern is invalid
    1: Pattern syntax error near position 9: invalid range pattern

i've also added some TODO notes which would make the logic neater. i didn't do those refactors in this PR because i am mindful of another currently-open (and much bigger) PR which touches these lines (https://www.github.com/lycheeverse/lychee/pull/1837)

i wonder if the reason FsGlob stores strings rather than a Pattern is for flexibility to switch to different glob packages.

this PR was prompted by findings in #1864 (comment)

now, when providing an input with glob characters but an invalid glob
pattern, this will cause a "cannot parse inputs" error rather than
erroring out later during link checking. this makes use of the
existing ErrorKind::InvalidGlobPattern error case.

for example, a glob with unbalanced brackets is invalid and now looks
like this:
```console
$ cargo run -- 'noasdjfi/['
Error: Cannot parse inputs from arguments

Caused by:
    0: UNIX glob pattern is invalid
    1: Pattern syntax error near position 9: invalid range pattern
```

i've also added some TODO notes which would make the logic neater.
i didn't do those refactors in this PR because i am mindful of another
currently-open (and much bigger) PR which touches these lines
(https://www.github.com/lycheeverse/lychee/pull/1837)

i wonder if the reason FsGlob stores strings rather than a Pattern is
for flexibility to switch to different glob packages.

this PR was prompted by findings in
lycheeverse#1864 (comment)
@thomas-zahner
Copy link
Member

i wonder if the reason FsGlob stores strings rather than a Pattern is for flexibility to switch to different glob packages.

Good question, I wouldn't think so but then I'm not the author. @mre might know the answer to that.
Have you tried to change the type to Pattern? I would much prefer that if it is possible without too much effort.

Otherwise I have no objections to this PR.

@thomas-zahner thomas-zahner force-pushed the master branch 2 times, most recently from fcdf77c to e0912ab Compare October 21, 2025 12:53
@katrinafyi
Copy link
Member Author

Have you tried to change the type to Pattern?

If you're asking me, I haven't tried. I think the change would be pretty easy. But it's the kind of thing which is so obvious that if it hasn't been done already, then there's probably a logical reason why it hasn't. That's my reasoning at least.

If you're asking @ mre, then we have to wait for him :)

@mre
Copy link
Member

mre commented Nov 7, 2025

Sorry for the late reply.

If you're asking me, I haven't tried. I think the change would be pretty easy. But it's the kind of thing which is so obvious that if it hasn't been done already, then there's probably a logical reason why it hasn't. That's my reasoning at least.

That's a sensible approach in general. In this specific case however, I have to disappoint you. It might just have been an oversight on my side. 😆 Well, perhaps one issue might be serializing/deserializing those values when writing/reading the config. The glob crate does not support this. We'd have to find a workaround.

@mre
Copy link
Member

mre commented Nov 7, 2025

i've also added some TODO notes which would make the logic neater. i didn't do those refactors in this PR because i am mindful of another currently-open (and much bigger) PR which touches these lines (github.com/lycheeverse/lychee/pull/1837)

Don't worry about this. My PR is stale anyway, so I'd have to rebase it. So feel free to make any changes you like.

unfortunately, the `glob_with` function (which walks the dirs and
performs the globbing) does _not_ support taking a Pattern as input, so
we have to convert it back to a string at that point. this is a bit
weird, but i think it's still a bit nicer.
ths parsing logic is also changed to use early return so there is less
nesting
Conflicts:
lychee-bin/tests/cli.rs
lychee-lib/src/collector.rs
lychee-lib/src/types/input/input.rs
@katrinafyi
Copy link
Member Author

Thanks for your comments! I've done the two extra TODOs within this PR. The InputSource parsing is now in InputSource::new and the FsGlob field is a Pattern. As @mre foreshadowed, we have to do some custom serialisation/deserialisation, but it's not too much trouble.

Copy link
Member

@thomas-zahner thomas-zahner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katrinafyi Thanks, I really like it. This makes the code quite a bit cleaner and usability is also improved. I have just a few small remarks.

fn test_pattern_serialization_is_original_pattern() {
let pat = "asd[f]*";
assert_eq!(
to_json(&InputSource::FsGlob {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then unwrap both results here when we expect two Oks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's done ~

katrinafyi and others added 2 commits November 10, 2025 19:48
@thomas-zahner thomas-zahner merged commit 100caf6 into lycheeverse:master Nov 10, 2025
7 checks passed
@mre mre mentioned this pull request Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants