Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions fixtures/fragments/file1.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,7 @@ This is a test file for the fragment loader.

## HTML Fragments

Explicit fragment links are currently not supported.
Therefore we put the test into a code block for now to prevent false positives.
Explicit fragment links are also supported.

<a id="explicit-fragment"></a>

Expand Down Expand Up @@ -83,8 +82,8 @@ A link to the non-existing fragment: [try](https://github.com/lycheeverse/lychee
- Bad: [With trailing slash](sub_dir_non_existing_1/)
- Bad: [Without trailing slash](sub_dir_non_existing_2)
- Link to a empty directory
- Bad: [With trailing slash](empty_dir/)
- Bad: [Without trailing slash](empty_dir)
- Good: [With trailing slash](empty_dir/)
- Good: [Without trailing slash](empty_dir)
- Link to a fragment in a non-existing sub directory
- Bad: [With trailing slash](empty_dir/#non-existing-fragment-3)
- Bad: [Without trailing slash](empty_dir#non-existing-fragment-4)
Expand Down
132 changes: 82 additions & 50 deletions lychee-bin/tests/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1877,60 +1877,92 @@ mod cli {
let mut cmd = main_command();
let input = fixtures_path().join("fragments");

cmd.arg("--verbose")
let mut result = cmd
.arg("--include-fragments")
.arg("--verbose")
.arg(input)
.assert()
.failure()
.stderr(contains("fixtures/fragments/file1.md#fragment-1"))
.stderr(contains("fixtures/fragments/file1.md#fragment-2"))
.stderr(contains("fixtures/fragments/file1.md#code-heading"))
.stderr(contains("fixtures/fragments/file2.md#custom-id"))
.stderr(contains("fixtures/fragments/file1.md#missing-fragment"))
.stderr(contains("fixtures/fragments/file2.md#fragment-1"))
.stderr(contains("fixtures/fragments/file1.md#kebab-case-fragment"))
.stderr(contains(
"fixtures/fragments/file1.md#lets-wear-a-hat-%C3%AAtre",
))
.stderr(contains("fixtures/fragments/file2.md#missing-fragment"))
.stderr(contains("fixtures/fragments/empty_file#fragment"))
.stderr(contains("fixtures/fragments/file.html#a-word"))
.stderr(contains("fixtures/fragments/file.html#in-the-beginning"))
.stderr(contains("fixtures/fragments/file.html#in-the-end"))
.stderr(contains(
"fixtures/fragments/file1.md#kebab-case-fragment-1",
))
.stderr(contains("fixtures/fragments/file.html#top"))
.stderr(contains("fixtures/fragments/file2.md#top"))
.stderr(contains(
"https://github.com/lycheeverse/lychee#table-of-contents",
))
.stderr(contains(
"https://github.com/lycheeverse/lychee#non-existent-anchor",
))
.stderr(contains("fixtures/fragments/sub_dir#non-existing-fragment-1"))
.stderr(contains("fixtures/fragments/sub_dir#non-existing-fragment-2"))
.stderr(contains("fixtures/fragments/sub_dir_non_existing_1"))
.stderr(contains("fixtures/fragments/sub_dir_non_existing_2"))
.stderr(contains("fixtures/fragments/empty_dir"))
.stderr(contains("fixtures/fragments/empty_dir#non-existing-fragment-3"))
.stderr(contains("fixtures/fragments/empty_dir#non-existing-fragment-4"))
.stderr(contains("fixtures/fragments/zero.bin"))
.stderr(contains("fixtures/fragments/zero.bin#"))
.stderr(contains(
"https://raw.githubusercontent.com/lycheeverse/lychee/master/fixtures/fragments/zero.bin",
))
.stderr(contains(
"https://raw.githubusercontent.com/lycheeverse/lychee/master/fixtures/fragments/zero.bin#",
))
.stderr(contains("fixtures/fragments/zero.bin#fragment"))
.stderr(contains(
"https://raw.githubusercontent.com/lycheeverse/lychee/master/fixtures/fragments/zero.bin#fragment",
))
.stdout(contains("42 Total"))
.stdout(contains("31 OK"))
.failure();

let expected_successes = vec![
"fixtures/fragments/empty_dir",
"fixtures/fragments/empty_file#fragment", // XXX: is this a bug? a fragment in an empty file is being treated as valid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right to question this. I think we should treat fragment links to empty files as invalid.

Our job is to identify links that won't provide a meaningful user experience. A fragment reference in an empty file is functionally broken: it promises to take the user to specific content that doesn't exist. While the file itself may be reachable, the fragment reference is semantically meaningless, making this a legitimate case to flag as invalid.

I'd be willing to accept a pull request, which changes this, and I would welcome a PR for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree in principle, but I couldn't think of a nice way to implement it. Fragments in empty file are treated as existing because empty files are detected as plaintext. But I think it would be too heavy-handed to reject all fragments on plaintext files, especially since plaintext is the fallback file type for unknown files.

I think the info message is a okay for now, and maybe someone more experienced can look at it :) Maybe the first step would be to differentiate plaintext files and unknown file types, so you could handle them differently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a really good idea. I guess in the long run, we should consider deeper file inspection anyway. This gives us more flexibility and takes away some of the guesswork.

"fixtures/fragments/file1.md#code-heading",
"fixtures/fragments/file1.md#explicit-fragment",
"fixtures/fragments/file1.md#f%C3%BCnf-s%C3%9C%C3%9Fe-%C3%84pfel",
"fixtures/fragments/file1.md#f%C3%BCnf-s%C3%BC%C3%9Fe-%C3%A4pfel",
"fixtures/fragments/file1.md#fragment-1",
"fixtures/fragments/file1.md#fragment-2",
"fixtures/fragments/file1.md#IGNORE-CASING",
"fixtures/fragments/file1.md#kebab-case-fragment",
"fixtures/fragments/file1.md#kebab-case-fragment-1",
"fixtures/fragments/file1.md#lets-wear-a-hat-%C3%AAtre",
"fixtures/fragments/file2.md#",
"fixtures/fragments/file2.md#custom-id",
"fixtures/fragments/file2.md#fragment-1",
"fixtures/fragments/file2.md#top",
"fixtures/fragments/file.html#",
"fixtures/fragments/file.html#a-word",
"fixtures/fragments/file.html#in-the-beginning",
"fixtures/fragments/file.html#tangent%3A-kustomize",
"fixtures/fragments/file.html#top",
"fixtures/fragments/file.html#Upper-%C3%84%C3%96%C3%B6",
"fixtures/fragments/sub_dir",
"fixtures/fragments/sub_dir#a-link-inside-index-html-inside-sub-dir",
"fixtures/fragments/zero.bin",
"fixtures/fragments/zero.bin#",
"fixtures/fragments/zero.bin#fragment",
"https://github.com/lycheeverse/lychee#table-of-contents",
"https://raw.githubusercontent.com/lycheeverse/lychee/master/fixtures/fragments/zero.bin",
"https://raw.githubusercontent.com/lycheeverse/lychee/master/fixtures/fragments/zero.bin#",
// zero.bin#fragment succeeds because fragment checking is skipped for this URL
"https://raw.githubusercontent.com/lycheeverse/lychee/master/fixtures/fragments/zero.bin#fragment",
];

let expected_failures = vec![
"fixtures/fragments/sub_dir_non_existing_1",
"fixtures/fragments/sub_dir#non-existing-fragment-2",
"fixtures/fragments/empty_dir#non-existing-fragment-3",
"fixtures/fragments/file2.md#missing-fragment",
"fixtures/fragments/sub_dir#non-existing-fragment-1",
"fixtures/fragments/sub_dir_non_existing_2",
"fixtures/fragments/file1.md#missing-fragment",
"fixtures/fragments/empty_dir#non-existing-fragment-4",
"fixtures/fragments/file.html#in-the-end",
"fixtures/fragments/file.html#in-THE-begiNNing",
"https://github.com/lycheeverse/lychee#non-existent-anchor",
];

// the stdout/stderr format looks like this:
//
// [ERROR] https://github.com/lycheeverse/lychee#non-existent-anchor | Cannot find fragment
// [200] file:///home/rina/progs/lychee/fixtures/fragments/file.html#a-word
//
// errors are printed to both, but 200s are printed to stderr only.
// we take advantage of this to ensure that good URLs do not appear
// in stdout, and bad URLs do appear in stdout.
//
// also, a space or newline is appended to the URL to prevent
// incorrect matches where one URL is a prefix of another.
for good_url in &expected_successes {
// additionally checks that URL is within stderr to ensure that
// the URL is detected by lychee.
result = result
.stdout(contains(format!("{good_url} ")).not())
.stderr(contains(format!("{good_url}\n")));
}
for bad_url in &expected_failures {
result = result.stdout(contains(format!("{bad_url} ")));
}

let ok_num = expected_successes.len();
let err_num = expected_failures.len();
let total_num = ok_num + err_num;
result
.stdout(contains(format!("{ok_num} OK")))
// Failures because of missing fragments or failed binary body scan
.stdout(contains("11 Errors"));
.stdout(contains(format!("{err_num} Errors")))
.stdout(contains(format!("{total_num} Total")));
}

#[test]
Expand Down
6 changes: 5 additions & 1 deletion lychee-lib/src/utils/fragment_checker.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
use log::info;
use std::{
borrow::Cow,
collections::{HashMap, HashSet, hash_map::Entry},
Expand Down Expand Up @@ -130,7 +131,10 @@ impl FragmentChecker {
let extractor = match file_type {
FileType::Markdown => extract_markdown_fragments,
FileType::Html => extract_html_fragments,
FileType::Plaintext => return Ok(true),
FileType::Plaintext => {
info!("Skipping fragment check for {url} within a plaintext file");
return Ok(true);
}
};

let fragment_candidates = FragmentBuilder::new(fragment, url, file_type)?;
Expand Down