Fix local directory checking#1771
Conversation
| return self.check_file(&file_path.unwrap(), uri).await; | ||
| } | ||
| // If path is a directory, and we cannot find an index file inside it, | ||
| // and we don't have a fragment, just return success. |
There was a problem hiding this comment.
Original comment:
// If path is a directory, and we cannot find an index file inside it,
// and we don't have a fragment, just return success. This is for
// backward compatibility.
I do not agree with the original statement This is for backward compatibility. lychee should not mark directories that exist with ErrorKind::InvalidFilePath.
|
I thought that I had made myself clear in #1765 and that you understood my reasoning, but I can re-state it more directly.
Directories without index.html should be treated as 404 to check files that will be uploaded to GitHub pages / other static hosts. I believe this is a use case which Lychee should support (but correct me if I'm wrong). In these static site hosts, navigating to a directory with no index.html will return 404.
Separate from any miscommunications, this PR introduces inconsistencies which I think are problematic:
* The presence/absence of a fragment now determines whether a directory link can get resolved to index.html or not. Why should a URL fragment affect this?
* This puts us in the bizarre situation where, in order to guarantee using index.html, I'd need to put a fragment on all of my links. However, the empty fragment is excluded from the logic, so this is actually impossible.
* For links without fragments, directory link resolution now silently tries two paths (index.html and the directory itself). This is surprising. I can link to a directory with index.html and Lychee will detect that index.html. Then, I can remove that index.html file and the link does not break. Why? URLs should resolve to a single path.
* I think the PR that introduced index.html resolution, #1752, recognised this inconsistency which is why they wrote the comment "This is for backwards compatibility".
All that is to say, it needs consistency. There are two distinct use-cases here and they need to be handled differently. If you want to look at files as they literally appear on disk (e.g., you have a markdown notes app), then Lychee should accept directories without index.html, and it should never try to access index.html. For users (like me) who want to check files for Github pages, Lychee should always check index.html and never check the directory simply exists.
In my opinion, both use-cases should be supported and they should be kept separate, and so a flag is warranted
|
I agree.
Yes. In my original PR #1752, I wrote down the comment "This is for backwards compatibility" because I want to change the behavior in another PR. Please see the thread below for the context.
|
|
I'm sorry, I think my first comment was too harsh. You're right that it is confusing to see an error message "cannot find file" next to a path that exists. This should be improved, maybe by differentiating URLs and their resolved paths on disk. Also, the inconsistencies which I pointed out are not really the fault of this PR. Instead, it's just an interaction with the earlier #1752 which changed the behaviour of directory links. I still think the different use-cases need to be clarified and supported, but I'm sorry the general tone in the first message. Thanks for tagging me and asking for my input. |
mre
left a comment
There was a problem hiding this comment.
Not sure if I'm missing anything, but to me this looks ready to get merged.
|
Thanks for the comments. We now seem to agree that this is indeed a bug which is why I'm merging this now. Currently, lychee on master marks any links to directories (no matter whether directory is empty or contains something) as broken which is definitely not intended behaviour. (see the test in the PR) lychee's default behaviour is and should be that it checks files without "special" assumptions about GitHub and index files. We should try to keep things simple and verify linked URIs without too much assumptions and opinions. I'm even beginning to question whether the At the same time I definitely understand that you want to cover your use cases. But we should do this by configuration/feature flags, not by changing the default behaviour which is already well tested and accepted by users.
I definitely do not agree with this statement as explained above. If a link points to a directory lychee tells you whether this linked directory exists or not, at least by default. But we could introduce a new feature flag to make lychee reject any directories that do not contain an index.html file. In fact, I realised that this is already possible with This rewrites directories to a inner index.html file. This might already be sufficient for your use case. (note that this regex has the assumption that directories are files which don't contain dots, which is not quite true but most probably sufficient) We could now still introduce a new flag which either is an alias to this remap flag or which "properly" checks for the index files in Rust code. |
81f2605 introduced a new bug. The bug is now described in
test_local_directories. I've reverted the few lines in file.rs that introduced the bug.So previously we had:
[ERROR] file:///home/thomas/Projects/lychee/fixtures/fragments/empty_dir | Cannot find filenow we're back at:
[200] file:///home/thomas/Projects/lychee/fixtures/fragments/empty_dirThis bug was introduced in #1756. The tests form this PR make sense and can be left unchanged apart from this one difference mentioned above. (which results in the 30/31 OK and 12/11 Errors difference)
@ocavue @mre @katrinafyi Can you explain the reasoning behind removing the
else if? If there really is a reason behind skipping that check we could introduce a new feature flag for that but I don't understand the idea yet.