Skip to content

Ignore Links in Markdown Link Text#1831

Merged
mre merged 2 commits intomasterfrom
do-not-treat-markdown-link-text-as-url
Sep 3, 2025
Merged

Ignore Links in Markdown Link Text#1831
mre merged 2 commits intomasterfrom
do-not-treat-markdown-link-text-as-url

Conversation

@mre
Copy link
Member

@mre mre commented Aug 29, 2025

At the moment, links inside Markdown link texts get parsed and handled.

For example:

[https://dummyexample.gov/notexist (archive.org link)](https://web.archive.org/web/20241129184733/http://example.com)

This would find https://dummyexample.gov/notexist.

This PR changes that and ignores those links unless the user explicitly enabled the check by setting --include-verbatim.

This is similar to the handling of HTML files. See #1399.

Wikilinks behaved slightly differently and would be extracted twice when --include-verbatim was enabled - once from the WikiLink tag itself (which contains the destination URL) and once from the text content within the WikiLink. This caused duplicate URL entries in the output. That's why I introduced a separate flag for that.

Fixes: #1579

Details

  • Respect include_verbatim flag when processing link text
  • Fix WikiLink double extraction by using separate tracking
  • Add tests for link text extraction behavior

mre added 2 commits August 29, 2025 17:14
- Set inside_link_block for Inline and Reference link types to prevent text extraction
- Prioritize inside_link_block check to skip text extraction when inside any link
- Add comprehensive test test_link_text_not_checked() to verify fix

Fixes issue where [broken-url](good-url) would check both broken-url and good-url
instead of just checking the destination good-url.

Fixes: #1579
- Respect `include_verbatim` flag when processing link text
- Fix WikiLink double extraction by using separate tracking
- Add tests for link text extraction behavior
@mre
Copy link
Member Author

mre commented Aug 29, 2025

@dbr, it's been a very long time. So sorry for the delay. I'm not sure if you're still interested in the fix.

@mre mre requested a review from thomas-zahner August 29, 2025 17:23
Copy link
Member

@thomas-zahner thomas-zahner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me 👍

@mre mre merged commit dab8952 into master Sep 3, 2025
6 checks passed
@mre mre deleted the do-not-treat-markdown-link-text-as-url branch September 3, 2025 10:14
@mre mre mentioned this pull request Sep 1, 2025
@dbr
Copy link

dbr commented Sep 6, 2025

Great, thanks for this!

This was referenced Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Treats markdown link text as URL

3 participants