Fix HTML tags appearing in wiki table of contents#36284
Fix HTML tags appearing in wiki table of contents#36284wxiaoguang merged 8 commits intogo-gitea:mainfrom
Conversation
|
Could you add some test? |
f0b0b36 to
d737929
Compare
|
Whether any |
|
Good question! I went with stripping all HTML tags rather than just The heading itself still renders with the HTML in the document body, so anchor links like I think adding an option would be overkill for this - can't think of a case where someone would actually want raw HTML showing up in their ToC. But happy to discuss if you see it differently! |
|
|
So, I think 'raw HTML' is useful when it is only accidentally HTML. |
d737929 to
a1c7525
Compare
|
Good edge case to think about! I tested this and the fix handles it correctly: ToC shows: Click and Bold The HTML tags get stripped but the text content inside them is preserved - which is exactly what we want for a readable ToC. I've added test cases covering this scenario. Also verified that code spans like |
e211fab to
0b84de7
Compare
0b84de7 to
e211fab
Compare
|
I fixed the tests, it needs to clearly assert what we want. And we can see that the result doesn't seem good. By the way: no need to rebase or force push, see the contribution guideline https://github.com/go-gitea/gitea/blob/main/CONTRIBUTING.md#maintaining-open-prs
|
|
Roger that, I'll step back. |
|
@silverwind this PR still has problems, see #36284 (comment), AI's review won't really help. |
|
Sorry, I was just going by "ready" PRs. |
|
Well, it seems that this PR is unlikely to get progresses. Then, it becomes my work, again. |
wxiaoguang
left a comment
There was a problem hiding this comment.
Now, the result is what it should be.
There was a problem hiding this comment.
Pull request overview
This pull request fixes a bug where HTML tags in wiki headings were appearing verbatim in the table of contents instead of being stripped out. The fix refactors the ToC generation from an AST-based approach to an HTML-based approach, extracting plain text from heading nodes and properly escaping it during ToC rendering.
Changes:
- Refactored ToC generation to work with HTML nodes instead of Markdown AST nodes
- Added proper text extraction from heading elements that strips HTML tags while preserving text content
- Implemented HTML-safe ToC rendering using new
htmlutilhelper functions
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| web_src/js/markup/anchors.ts | Added FIXME comment expressing architectural opinion (non-functional) |
| routers/web/repo/wiki.go | Updated to use new ToC rendering approach with proper HTML handling |
| modules/markup/render.go | Added new types TocShowInSectionType and TocHeadingItem for ToC data structure |
| modules/markup/mdstripper/mdstripper.go | Removed WithAutoHeadingID() parser option as heading IDs are now handled in HTML post-processing |
| modules/markup/markdown/transform_heading.go | Deleted (replaced by HTML-based heading processing) |
| modules/markup/markdown/toc.go | Deleted (replaced by RenderTocHeadingItems in html.go) |
| modules/markup/markdown/prefixed_id.go | Deleted (ID prefixing now handled in HTML post-processing) |
| modules/markup/markdown/markdown.go | Removed custom prefix ID generator and AutoHeadingID parser option |
| modules/markup/markdown/goldmark.go | Updated to set ToC mode flags instead of generating AST nodes |
| modules/markup/html_toc_test.go | Added comprehensive test verifying HTML tags are stripped from ToC |
| modules/markup/html_node.go | Enhanced to extract heading text and populate ToC items during HTML post-processing |
| modules/markup/html.go | Added RenderTocHeadingItems function with proper HTML escaping |
| modules/markup/common/footnote.go | Updated footnote IDs to include "user-content-" prefix in renderer |
| modules/htmlutil/html.go | Added HTMLPrintf, HTMLPrint, and HTMLPrintTag helper functions for safe HTML output |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1b6e34c to
560ff13
Compare
It is ready now. And maybe you can take a look at this FIXME later (in following up PRs) https://github.com/go-gitea/gitea/pull/36284/files#diff-a1ea66825d0703d3cd1da7b9428ad1efecce97e52fb828246cc2b7729283faa0R3
|
The reason for these prefixes is so that markdown content is not in direct control of the It would be better if backend sets these prefixes. |
|
The removal on |
It matches, if eventually these prefixes don't exist when users visit the page, why backend should add them?
|
because it's usually best to alter HTML in backend, but I see this mechanism with the removal requires JS to work, so it can not work with JS disabled. While could move the addition of the prefix to frontend but then we give markdown documents access to the full I see no better way. The need for prefixes is to prevent markdown from altering the full |
Sorry I don't understand. If backend doesn't add these prefixes, isn't it the same as current approach for end users? The same as GitHub? So I don't see why "even vulnerabilities" #36284 (comment) I don't see what is a "need for prefixes", what is the full "id" namespace. Can you show a real example? |
|
For example, repo homepage has |
But you also always remove these prefixes, right? The same as GitHub? Do you mean :
|
No,
No, Github leaves the
Both Github and Gitea only remove it from |
Hmm ... thanks, I think I can understand more, so the comment can be updated to clarify the purpose of that part of code |
|
So in summary:
|
Update: just found #36443 |
|
I find my description in #36443 better, let's continue there. |
See discussion in #36284. --------- Signed-off-by: silverwind <me@silverwind.io> Co-authored-by: wxiaoguang <wxiaoguang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* giteaofficial/main: Normalize guessed languages for code highlighting (go-gitea#36450) Add `knip` linter (go-gitea#36442) Fix various bugs (go-gitea#36446) Update tool dependencies (go-gitea#36445) Update JS dependencies, adjust webpack config, misc fixes (go-gitea#36431) fix: Improve image captcha contrast for dark mode (go-gitea#36265) Refactor template render (go-gitea#36438) Add documentation for markdown anchor post-processing (go-gitea#36443) Fix markup heading parsing, fix emphasis parsing (go-gitea#36284) Front port changelog for 1.25.4 (go-gitea#36432) Bugfix: Potential incorrect runID in run status update (go-gitea#36437) Restrict branch naming when new change matches with protection rules (go-gitea#36405)



Fixes #36106
When wiki headings contain HTML elements (like
<a name="anchor"></a>), the raw HTML code was appearing verbatim in the table of contents instead of being stripped out.Before: ToC displays
<a name="asdf"></a> has strange htmlAfter: ToC displays
has strange htmlAlso fix #17958 by the way