Skip to content

Fix the Markdown language#3950

Merged
DmitrySharabin merged 2 commits intosimplifyfrom
fix-markdown
May 29, 2025
Merged

Fix the Markdown language#3950
DmitrySharabin merged 2 commits intosimplifyfrom
fix-markdown

Conversation

@DmitrySharabin
Copy link
Member

@DmitrySharabin DmitrySharabin commented May 29, 2025

  • Add the missing inside token and fix the existing one
  • Fix the regex that is too greedy now: if we have more than one code block next to each other, all of them starting from the second one will be interpreted as a code block of the first one
  • Ignore spaces before the language code

However, to fully support it, we need to support functional tokens first. For now, we don't, and we work with it as it is a regex, or an object with the pattern property. In both cases, matchPattern() (in tokenize/match.ts) throws.

@DmitrySharabin DmitrySharabin requested a review from LeaVerou May 29, 2025 10:49
@github-actions
Copy link

github-actions bot commented May 29, 2025

No JS Changes

Generated by 🚫 dangerJS against aae926d

- Add the missing inside token and fix the existing one
- Fix the regexp that is too greedy now: if we have more than one code block next to each other, all of them starting from the second one will be interpreted as a code block of the first one
Copy link
Member

@LeaVerou LeaVerou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I added code in tokenize() to support functional tokens?

@DmitrySharabin
Copy link
Member Author

DmitrySharabin commented May 29, 2025

I thought I added code in tokenize() to support functional tokens?

You did. It covers functional inside (without arguments) and cases where every part of a string we tokenize belongs to a named group (inside the corresponding pattern).

If, OTOH, we have parts that are not in a group (like ”```“ in a code block in Markdown), we tokenize it by passing the same grammar (with functional tokens):

content = tokenize.call(prism, content, localInsideGrammar);

While tokenizing, we iterate over the grammar’s tokens:

for (const token in grammar) {

These functional tokens are not regular expressions, and when we call exec() on them (inside matchPattern()), our code throws:

match = matchPattern(pattern, 0, str, lookbehind);

const match = pattern.exec(text);

@DmitrySharabin DmitrySharabin merged commit 830fdff into simplify May 29, 2025
2 checks passed
@DmitrySharabin DmitrySharabin deleted the fix-markdown branch May 29, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants