Skip to content

Fix local directory checking#1771

Merged
thomas-zahner merged 4 commits intolycheeverse:masterfrom
thomas-zahner:fix-local-directory-checking
Jul 26, 2025
Merged

Fix local directory checking#1771
thomas-zahner merged 4 commits intolycheeverse:masterfrom
thomas-zahner:fix-local-directory-checking

Conversation

@thomas-zahner
Copy link
Member

@thomas-zahner thomas-zahner commented Jul 18, 2025

81f2605 introduced a new bug. The bug is now described in test_local_directories. I've reverted the few lines in file.rs that introduced the bug.

So previously we had:

[ERROR] file:///home/thomas/Projects/lychee/fixtures/fragments/empty_dir | Cannot find file

now we're back at:

[200] file:///home/thomas/Projects/lychee/fixtures/fragments/empty_dir

This bug was introduced in #1756. The tests form this PR make sense and can be left unchanged apart from this one difference mentioned above. (which results in the 30/31 OK and 12/11 Errors difference)

@ocavue @mre @katrinafyi Can you explain the reasoning behind removing the else if? If there really is a reason behind skipping that check we could introduce a new feature flag for that but I don't understand the idea yet.

return self.check_file(&file_path.unwrap(), uri).await;
}
// If path is a directory, and we cannot find an index file inside it,
// and we don't have a fragment, just return success.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original comment:

// If path is a directory, and we cannot find an index file inside it,
// and we don't have a fragment, just return success. This is for
// backward compatibility.

I do not agree with the original statement This is for backward compatibility. lychee should not mark directories that exist with ErrorKind::InvalidFilePath.

@katrinafyi
Copy link
Member

katrinafyi commented Jul 18, 2025 via email

@ocavue
Copy link
Contributor

ocavue commented Jul 19, 2025

Directories without index.html should be treated as 404

I agree.

I think the PR that introduced index.html resolution, #1752, recognised this inconsistency which is why they wrote the comment "This is for backwards compatibility

Yes. In my original PR #1752, I wrote down the comment "This is for backwards compatibility" because I want to change the behavior in another PR. Please see the thread below for the context.

https://github.com/lycheeverse/lychee/pull/1752/files#r2174138374

Nice! I'll keep this PR as it is to limit its scope. I'll open another PR to make empty_dir#fragment an error.

@katrinafyi
Copy link
Member

I'm sorry, I think my first comment was too harsh.

You're right that it is confusing to see an error message "cannot find file" next to a path that exists. This should be improved, maybe by differentiating URLs and their resolved paths on disk. Also, the inconsistencies which I pointed out are not really the fault of this PR. Instead, it's just an interaction with the earlier #1752 which changed the behaviour of directory links.

I still think the different use-cases need to be clarified and supported, but I'm sorry the general tone in the first message. Thanks for tagging me and asking for my input.

Copy link
Member

@mre mre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm missing anything, but to me this looks ready to get merged.

@thomas-zahner
Copy link
Member Author

thomas-zahner commented Jul 26, 2025

Thanks for the comments. We now seem to agree that this is indeed a bug which is why I'm merging this now. Currently, lychee on master marks any links to directories (no matter whether directory is empty or contains something) as broken which is definitely not intended behaviour. (see the test in the PR)

lychee's default behaviour is and should be that it checks files without "special" assumptions about GitHub and index files. We should try to keep things simple and verify linked URIs without too much assumptions and opinions. I'm even beginning to question whether the index.{ext} fallback even makes sense to be enabled by default.

At the same time I definitely understand that you want to cover your use cases. But we should do this by configuration/feature flags, not by changing the default behaviour which is already well tested and accepted by users.

Directories without index.html should be treated as 404

I definitely do not agree with this statement as explained above. If a link points to a directory lychee tells you whether this linked directory exists or not, at least by default. But we could introduce a new feature flag to make lychee reject any directories that do not contain an index.html file.

In fact, I realised that this is already possible with --remap:

--remap '/([^./]+)$ /$1/index.html'

This rewrites directories to a inner index.html file. This might already be sufficient for your use case. (note that this regex has the assumption that directories are files which don't contain dots, which is not quite true but most probably sufficient) We could now still introduce a new flag which either is an alias to this remap flag or which "properly" checks for the index files in Rust code.

@thomas-zahner thomas-zahner merged commit ea415c8 into lycheeverse:master Jul 26, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants