fix: skip fragment check if website URL doesn't contain fragment#1733
fix: skip fragment check if website URL doesn't contain fragment#1733mre merged 5 commits intolycheeverse:masterfrom
Conversation
Signed-off-by: MichaIng <micha@dietpi.com>
5a12f2d to
aa7b905
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR addresses issue #1709 by modifying the website fragment handling logic to skip the fragment check when the URL does not contain a fragment.
- The conditional check has been updated to include a verification of the URL fragment.
- The change ensures that HTML fragment checks are only performed when the URL contains a fragment.
Comments suppressed due to low confidence (1)
lychee-lib/src/checker/website.rs:106
- Consider adding an inline comment explaining that the additional check 'response.url().fragment().is_some()' ensures that HTML fragment processing is only performed when a fragment is present, improving code clarity for future maintainers.
if self.include_fragments
|
Seems to make sense. But yeah, maybe we should add tests before merging, will take a closer look in the coming days. @mre What's up with that Copilot comment 😅 Did you enable this or was this enabled automatically by GitHub? I'd really prefer to disable this. It just reiterates the obvious with a good chance of hallucinating and lying about things. |
|
Yeah, seems to be a new feature. I'll try to disable it. |
c3ce5ed to
6bf642a
Compare
|
The Copilot "review" was triggered by me. I was curious to see that I can add it even for PRs on remote repos. Potentially okay to get a summary for larger PRs. But so far I saw it doing only generic best practice recommendations, like here to add a comment to the code, in other cases to do sanity checks and such. Sorry for the spam 😉. Okay I added tests, but they do not behave as expected: [200] file:///home/runner/work/lychee/lychee/fixtures/fragments/zero.bin#fragment
[ERROR] https://raw.githubusercontent.com/MichaIng/lychee/skip-fragment-check-by-url/fixtures/fragments/zero.bin#fragment | Cannot find fragment
EDIT: Okay
|
6bf642a to
1bf8dd7
Compare
Signed-off-by: MichaIng <micha@dietpi.com>
1bf8dd7 to
04e2450
Compare
`is_some()` is true as well if the fragment is given but empty, i.e. `#`. While it is an edge case, skip the fragment checker as well in case of an empty fragment. Signed-off-by: MichaIng <micha@dietpi.com>
fb85e63 to
ab0da6f
Compare
|
Interesting: Pretty inconsistent behavior between file and website checker:
It is an edge case, and I do not want to pull all exceptions/early positive returns from the fragment checker into the parent file and website checkers, but Generally, the fragment checker adds additional processing, and the parent checkers pull and pass the whole content to it. So it makes sense to avoid invoking the fragment checker in the first place, if there is no fragment to check, or no way it can succeed. Strictly seen this includes ... but better to do that in a new PR. This one is good enough, probably the last commit already too much. Test file description is not 100% accurate, since the remote binary URL does not fail with a read error (for such small file) like the local one does, but the result is the same. Next I'll look into |
Signed-off-by: MichaIng <micha@dietpi.com>
|
Okay I tested new builds and To preserve the succeeding tests, before I adjust the URLs to point to Ready to be merged from my end. |
Signed-off-by: MichaIng <micha@dietpi.com>
Ah I see, no problem. I'm just really skeptical that it will be of any use to us which is why I'd like to disable the feature. In my opinion there is something wrong with the PR, documentation or commit history if we need generative AI to summarise a PR. And code reviews should be done by humans. We don't want to end up like this 😅 Thanks for adding the tests. The PR looks good to me. Did I understand correctly that the CI tests now only fail because you've updated the URLs to point to master? So they will pass after merging?
I agree. It probably makes sense to create a new issue for that. |
True, but I do see PRs too big/complex to understand, even when knowing the code base well. If AI fails to create a good summary, then it is a good hint that it needs split 😅, and maybe AI does similar suggestions already, before a human reviewer needs to do a review iteration 😉.
Also true, it cannot be a replacement, but a potential time saver as an initial reviewer, like CI can be. Always needs to be taken with care, but I think it is worth it to experiment, see what it can do, and how it develops over next years. In the examples you linked, someone always explicitly assigned Copilot to an issue to (attempt to) solve it. So it does not look like AI is driving employees insane, but employees are experimenting themselves 😄: dotnet/runtime#115649 (comment) Back to topic:
Exactly. Two of the If there is a clever non-complex way to test HTTP URLs not on a hard-coded repo but on the current one, give me a hint. Like if Rust tools or |
|
@MichaIng True, it could be that Copilot gets better in the coming years but as of now I think it's much more distracting than helpful. The dotnet/runtime PRs you linked I think are a perfect example of that. I just had a look at them. Updating the search with So I really don't think generative AI fits into a developer platform like GitHub. If people want to test or use such tools they can do so locally. If they end up with anything good or useful they can still submit a PR. I can also recommend this little tale: https://endler.dev/2025/paolo/ You can probably use |
Generally I think it fits pretty well, just the quality is not yet ready for production, or at least not above chore tasks. As said, to me it looks like Microsoft developers explicitly experiment with it. Their decision, and it makes sense since Microsoft is owning and investing in it. No one forces them to merge bad PRs, but letting it do its stuff and letting Copilot (meta) developers review and use the result to make it better, totally makes sense, IMO. I do understand the skepticism also expressed in the tale, and stories about vibe coding underline it pretty well. But I think it is a mistake to take it as "AI is bad in general" or "AI should never do any coding". As with every new technology, it must be used with brain and care, and requires learning. Especially we must not let it do things we do not understand. I mean there are already a lot of black boxes for most in IT, like that the two of us surely have no idea what exactly the Rust compiler is doing in detail to generate the binary from our code. But AI is lifting it to another level, as even AI (meta) developers have only limited idea of what their AI might be coming up with, due to is non-programmed nature. But its potential is large, it will be used more and more, and it will be more and more useful, and I am sure it will/does change how we (need to) work, whether we like the change or not. So learning to use it, experimenting with it, also its limits, checking out in how far it can be useful for own projects, now or in future, is no waste of time. At least as long as its only the click of a button, or typing a question into an AI chat. So far I would never think about letting it solve tasks in my project, where a change in function A may have unrecognized impact elsewhere. I hope no one is annoyed by us/me bloating this PR with AI discussion 😅.
I checked it out a bit, and it really is for mocking requests essentially, not for serving local content. So one would need to define manually that it shall serve the content of file A when a request of type X with a path A is done, and that it shall return code 200, which then somewhat defeats the purpose. For this particular binary file, it would be okay, guess could be set up to serve certain or random binary code on test request, eliminating the need for the test file. But it would be great to have an actual small static webserver, so that all or most tests done on files in Maybe easiest (least code for good common webserver behavior) is using warp. For now within the particular test, it could be something like this? let input = fixtures_path().join("fragments");
// Spin up webserver to serve local files via HTTP
let file_server = tokio::spawn(async {
warp::serve(warp::fs::dir(input))
.run(([127, 0, 0, 1], 1337))
.await;
});
...
// Stop webserver and wait for it to finish
file_server.abort();
let _ = file_server.await;Generally would be better to start it once before/for all tests. But currently not sure how to achieve this best. Maybe a function which checks whether the server is running already, and starts it only if not. Or is there some kind of test init feature? |
mre
left a comment
There was a problem hiding this comment.
From my side, we could merge this. The tests should pass after the merge to master as far as I understood.
Yeah, there is/was another test checking |
|
I agree, this can be merged |
|
@MichaIng, can you check out https://github.com/lycheeverse/lychee/actions/runs/15782696285/job/44492011076? |
|
Oops, I adjusted the repo in the test file, but not in the assertion 😄. Will open a PR to fix it. It least that one will then be green right from the start. |
Fixes #1709
Part 2, since the same issue happens with remote URLs as well.
EDIT: Tested those builds, and works as expected. I am too lazy to add respective tests to the code 😛.