Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When a math fence immediately follows text, the resulting tokens are multiplied #1023

Closed
schanzer opened this issue May 2, 2024 · 4 comments

Comments

@schanzer
Copy link

schanzer commented May 2, 2024

I would expect that the following markdown:

---
# AAA

BBB
$$$ math
1
$$$

To have an HR token, a heading containing the "AAA" text, a paragraph containing the "BBB" text, and a generated_image with the $$$ markup and 1 content.

Instead, I'm seeing the following tokens:

[
    {
        "type": "hr",
        "tag": "hr",
        "attrs": null,
        "map": [
            1,
            2
        ],
        "nesting": 0,
        "level": 0,
        "children": null,
        "content": "",
        "markup": "---",
        "info": "",
        "meta": null,
        "block": true,
        "hidden": false
    },
    {
        "type": "heading_open",
        "tag": "h1",
        "attrs": null,
        "map": [
            2,
            3
        ],
        "nesting": 1,
        "level": 0,
        "children": null,
        "content": "",
        "markup": "#",
        "info": "",
        "meta": null,
        "block": true,
        "hidden": false
    },
    {
        "type": "inline",
        "tag": "",
        "attrs": null,
        "map": [
            2,
            3
        ],
        "nesting": 0,
        "level": 1,
        "children": [
            {
                "type": "text",
                "tag": "",
                "attrs": null,
                "map": null,
                "nesting": 0,
                "level": 0,
                "children": null,
                "content": "AAA",
                "markup": "",
                "info": "",
                "meta": null,
                "block": false,
                "hidden": false
            }
        ],
        "content": "AAA",
        "markup": "",
        "info": "",
        "meta": null,
        "block": true,
        "hidden": false
    },
    {
        "type": "heading_close",
        "tag": "h1",
        "attrs": null,
        "map": null,
        "nesting": -1,
        "level": 0,
        "children": null,
        "content": "",
        "markup": "#",
        "info": "",
        "meta": null,
        "block": true,
        "hidden": false
    },
    {
        "type": "generated_image",
        "tag": "div",
        "attrs": null,
        "map": [
            5,
            8
        ],
        "nesting": 0,
        "level": 0,
        "children": null,
        "content": "1\n",
        "markup": "$$$",
        "info": " math",
        "meta": null,
        "block": true,
        "hidden": false
    },
    {
        "type": "generated_image",
        "tag": "div",
        "attrs": null,
        "map": [
            5,
            8
        ],
        "nesting": 0,
        "level": 0,
        "children": null,
        "content": "1\n",
        "markup": "$$$",
        "info": " math",
        "meta": null,
        "block": true,
        "hidden": false
    },
    {
        "type": "paragraph_open",
        "tag": "p",
        "attrs": null,
        "map": [
            4,
            5
        ],
        "nesting": 1,
        "level": 0,
        "children": null,
        "content": "",
        "markup": "",
        "info": "",
        "meta": null,
        "block": true,
        "hidden": false
    },
    {
        "type": "inline",
        "tag": "",
        "attrs": null,
        "map": [
            4,
            5
        ],
        "nesting": 0,
        "level": 1,
        "children": [
            {
                "type": "text",
                "tag": "",
                "attrs": null,
                "map": null,
                "nesting": 0,
                "level": 0,
                "children": null,
                "content": "BBB",
                "markup": "",
                "info": "",
                "meta": null,
                "block": false,
                "hidden": false
            }
        ],
        "content": "BBB",
        "markup": "",
        "info": "",
        "meta": null,
        "block": true,
        "hidden": false
    },
    {
        "type": "paragraph_close",
        "tag": "p",
        "attrs": null,
        "map": null,
        "nesting": -1,
        "level": 0,
        "children": null,
        "content": "",
        "markup": "",
        "info": "",
        "meta": null,
        "block": true,
        "hidden": false
    },
    {
        "type": "generated_image",
        "tag": "div",
        "attrs": null,
        "map": [
            5,
            8
        ],
        "nesting": 0,
        "level": 0,
        "children": null,
        "content": "1\n",
        "markup": "$$$",
        "info": " math",
        "meta": null,
        "block": true,
        "hidden": false
    }
]

I ran it through Dingus as well, to make sure the reference implementation wasn't also generating multiple images!

@schanzer
Copy link
Author

schanzer commented May 2, 2024

Sorry, Alex, this may just be my own ignorance about how the markdown-it parser is used. Here's the code that's generating the token stream:

const parser = markdownIt(mdOptions)
  .use(attrs)
  .use(lazyHeaders)
  .use(emoji, {shortcuts: {}})
  .use(expandTabs, {tabWidth: 4})
  .use(generatedImage)
  .use(video, {youtube: {width: 640, height: 390}});

function parseMarkdown(markdown: string): Token[] {
  const parseTree = parser.parse(markdown, {});
  console.log(JSON.stringify(parseTree, null, 4))
  return parseTree;
}

The link you shared certainly produces the right html after tokenizing - but I've got a custom renderer consuming the token stream and I'm trying to make sense of the tokens I'm seeing. So I guess this boils down to two related questions:

  1. If you examine the output of your tokenizer on this input, do you expect see three different image tokens? If so, then this is clearly not a bug!
  2. If it's not a bug, and three different image tokens are expected for a single fence, how are you collapsing those tokens back to a single fence in your rendered HTML output, so that I can do the same in my renderer?

@schanzer
Copy link
Author

schanzer commented May 2, 2024

Aha! I see now that it's one of the plugins that seems to be wreaking havoc (probably the generatedImage one!). My apologies, Alex.

@schanzer schanzer closed this as completed May 3, 2024
@rlidwka
Copy link
Member

rlidwka commented May 3, 2024

No problem.

Yes, generated_image token is generated by one of the plugins (markdown-it does not support mathjax by itself), so you should look into that plugin first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants