Conversation
This can be used in the future to add line numbers and columns in output. Note that the html5ever extractor does not output column information, since currently there is no way to retrieve it.
|
@Akida31 as you can see in CI So the line numbers and columns for So in summary the newer html5ever version produces a single |
|
I think the problem is that the line number given by html5ever is the number of the last line of the multi-line tendril. So I think the following patch should fix that by giving the span provider the line number of the first line of the tendril. I didn't test this patch, so I don't know if this fixes the issue. diff --git a/lychee-lib/src/extract/html/html5ever.rs b/lychee-lib/src/extract/html/html5ever.rs
index d7fa67bbf1..2c70d00865 100644
--- a/lychee-lib/src/extract/html/html5ever.rs
+++ b/lychee-lib/src/extract/html/html5ever.rs
@@ -58,6 +58,14 @@
return TokenSinkResult::Continue;
}
if self.include_verbatim {
+ // offset line number by line breaks included in the raw text
+ let line_number = line_number.saturating_sub(
+ raw.chars()
+ .filter(|c| *c == '\n')
+ .count()
+ .try_into()
+ .unwrap(),
+ );
self.links
.borrow_mut()
.extend(extract_raw_uri_from_plaintext(If you have questions how my larger changes work, I'm happy to answer them. Sadly, documenting code isn't a strength of mine currently. |
0fe6217 to
114c94d
Compare
Replaces #1806. Rebased on master and tidied up a few small things. This moves us one big step closer to completing #1304.
Thank you very much @Akida31