-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Headers in html_block Tokens aren't given slugs #105
Comments
Oh, that's a tricky one. Currently markdown-it-anchor works by identifying the Markdown headings in the markdown-it token stream (the Adding anchors inside plain HTML would mean parsing that HTML in order to apply the anchor logic to it, which would mean writing equivalent HTML DOM renderers for each of the existing "markdown-it token stream" permalinks, and ensuring they behave in a consistent way. Or at that point to avoid duplicating the logic, changing the way markdown-it-anchor works so that it's not a markdown-it plugin anymore but an HTML post-processor, e.g. you could do I never looked for such a thing and after a very quick search I didn't find an existing library to do that on the backend (it's all in-browser JS solutions...). I would encourage you to write such a library as in retrospective, adding permalinks is probably more of an HTML post-processor job than a Markdown parser plugin job. That being said, a quick solution would be to make a markdown-it plugin that's inserted before markdown-it-anchor and that identifies such headings in |
What I've done as a stop-gap is to use If cheerio use isn't something you're on board with (and I wouldn't blame you), could a hook function in options be implemented for |
I don't want to force the dependency on cheerio, but But that's the tip of the iceberg. markdown-it-anchor is meant to deal with a markdown-it token stream both as input and output. We can use cheerio to identify matching headings in const tokens = [
{ type: 'heading_open', attributes: [...], level: ... },
{ type: 'inline', children: [...] },
{ type: 'heading_close' }
] Then calling the anchor renderer like Or alternatively, finding a way to reliably turn
Then this could be done as a pre-markdown-it-anchor transform, which would be better as it wouldn't require any code change in the core of markdown-it-anchor, but you still have the same challenge about building the And I'm not sure cheerio would support splitting the HMTL that way, but it's something to look into. While the happy path seems easy to implement, I can see many ways this would break or give unexpected results, and I don't want to include a just-the-happy-path solution in markdown-it-anchor. And I genuinely think there's less work in making an HTML post-processor that adds permalinks the same way markdown-it-anchor does than trying to make markdown-it-anchor's not-at-all-designed-for-html code work with HTML (without essentially rewriting the core of the plugin as well as existing permalink renderers). I'll leave that issue open if anybody wants to tackle this or can think of better solutions than those. :) |
All that makes total sense, thanks for taking the time on this. I'll give this some more thought today. It's a need for us, as we're parsing a lot of markdown from Github, so I'll definitely stay on top of it. |
Awesome, I'm curious to see what you come up with. FWIW if you need extra help on this issue, I have up to 20 hours I can dedicate to contracting next week or in the second half of September. Feel free to reach out to my personal email for this :) |
I've got a working POC for converting html headings into heading_* tokens, but it's not pretty. Going to wait on some guidance from the markdown-it maintainers. This may be out of scope for this plugin, and I'm starting to think anything that processes additional HTML into tokens should probably be at the top of the stack. Will update when I have more to share. |
OK so got some feedback from import cheerio from 'cheerio';
import MarkdownIt from 'markdown-it';
import Token from 'markdown-it/lib/token';
export default function htmlHeaders(md: MarkdownIt) {
md.core.ruler.after('inline', 'html-headers', (state) => {
state.tokens.forEach((blockToken) => {
if (blockToken.type !== 'html_block') {
return;
}
const $ = cheerio.load(`${blockToken.content}`, { xmlMode: true });
const headings = $('h1,h2,h3,h4,h5,h6');
if (!headings.length) {
return;
}
const { map } = blockToken;
headings.each((_, e) => {
const { tagName } = e;
const level = parseInt(tagName.substring(1), 10);
const markup = ''.padStart(level, '#');
const element = $(e);
const open = new Token('heading_open', tagName, 1);
open.markup = markup;
open.map = map;
Object.entries(e.attribs).forEach(([key, value]) => {
open.attrSet(key, value);
});
const content = new Token('text', '', 0);
content.map = map;
content.content = element.text() || '';
const body = new Token('inline', '', 0);
body.content = content.content;
body.map = map;
body.children = [content];
const close = new Token('heading_close', tagName, -1);
close.markup = markup;
const position = state.tokens.indexOf(blockToken);
state.tokens.splice(position, 0, open, body, close);
element.remove();
});
// eslint-disable-next-line no-param-reassign
blockToken.content = $.html();
});
return false;
});
} That has compatibility with this plugin as well as the |
Wow that's pretty cool! I just found out on your other issue that npm was using markdown-it and a similar approach to parse HTML headers, I had no idea. At that point I would just reuse npm's implementation, e.g. I'll add a line to document that in the readme because it's good to know, but I wouldn't add that code to markdown-it-anchor unless it actually works with edge cases too. But at that point I think it would be easier to add the anchors on the HTML rather than during the Markdown parsing phase, e.g.: const { parse } = require('node-html-parser')
const root = parse(html)
for (const h of root.querySelectorAll('h1,h2,h3,h4,h5,h6')) {
const slug = h.getAttribute('id') || slugify(h.textContent)
h.setAttribute('id', slug)
h.insertAdjacentHTML('afterbegin', `<a class="header-anchor" href="#${slug}">#</a> `)
}
console.log(root.toString()) |
Yeah I think we can close this one. I do want to note that NPM's code doesn't pick up headers with attributes, as their regex is pretty constrained. I'm left wondering if they use this code at all anymore. See the Vue rendered readme here: https://www.npmjs.com/package/vue. I ran that readme through marky-markdown and that |
Oh wow, right. The more I think about it, the more I feel like all of this (anchors, TOC) should be done at the HTML level instead of Markdown, which is likely how GitHub not only has anchors and TOC on Markdown files, but also RST, and Org mode |
That's a valid point. |
When headers (
<h2 .../>
etc) are within tokens with.type === 'html_block'
the plugin doesn't assign IDs nor create slugs. I've been using Vue's README.md as a baseline for that https://github.com/vuejs/vue/blob/dev/README.mdThe text was updated successfully, but these errors were encountered: