feat: add 'user-content-' prefix to support github markdown fragment#1750
feat: add 'user-content-' prefix to support github markdown fragment#1750mre merged 3 commits intolycheeverse:masterfrom
Conversation
Signed-off-by: Keming <kemingy94@gmail.com>
|
The We could introduce a new struct and move your logic there: struct FragmentCandidates {
candidates: Vec<Cow<'static, str>>,
}
impl FragmentCandidates {
fn new(fragment: &str, url: &Url, file_type: FileType) -> Result<Self, std::str::Utf8Error> {
let mut fragment_candidates = vec![Cow::Owned(fragment.to_string())];
// For GitHub links, add "user-content-" prefix
if url.host_str().is_some_and(|host| host.ends_with("github.com")) {
fragment_candidates.push(format!("user-content-{fragment}").into());
}
let mut all_fragments = fragment_candidates.clone();
for fragment in &fragment_candidates {
let mut fragment_decoded = percent_decode_str(fragment).decode_utf8()?;
if file_type == FileType::Markdown {
fragment_decoded = fragment_decoded.to_lowercase().into();
}
all_fragments.push(fragment_decoded);
}
Ok(Self { candidates: all_fragments })
}
fn any_matches(&self, file_fragments: &HashSet<String>) -> bool {
self.candidates.iter().any(|frag| file_fragments.contains(frag.as_ref()))
}
}And the pub(crate) async fn check(&self, input: FragmentInput, url: &Url) -> Result<bool> {
let Some(fragment) = url.fragment() else {
return Ok(true);
};
if fragment.is_empty() || fragment.eq_ignore_ascii_case("top") {
return Ok(true);
}
let FragmentInput { content, file_type } = input;
if file_type == FileType::Plaintext {
return Ok(true);
}
let fragment_candidates = FragmentCandidates::new(fragment, url, file_type)?;
let url_without_frag = Self::remove_fragment(url.clone());
let extractor = match file_type {
FileType::Markdown => extract_markdown_fragments,
FileType::Html => extract_html_fragments,
FileType::Plaintext => return Ok(true),
};
match self.cache.lock().await.entry(url_without_frag) {
Entry::Vacant(entry) => {
let file_frags = extractor(&content);
let contains_fragment = fragment_candidates.any_matches(&file_frags);
entry.insert(file_frags);
Ok(contains_fragment)
}
Entry::Occupied(entry) => {
Ok(fragment_candidates.any_matches(entry.get()))
}
}
}There might be things I'm missing, so it might not compile, but you get the idea. This makes testing way easier. We can then easily add some unit test to |
Signed-off-by: Keming <kemingy94@gmail.com>
|
Hi @mre, I have extracted the fragment candidates as a standalone struct. I didn't use Instead, I use |
|
Thanks, @kemingy, it turned out great! |
What can be handled:
user-content-prefix)What cannot be handled:
blob/master/README.mdhttps://github.com/PyO3/pyo3/issues/1800#issuecomment-906786649We have to simulate with a headless browser library to support these cases.