fix: Simplify unicode punctuation #2841

calculuschild · 2023-06-07T17:42:31Z

Marked version: 5.0.5

Markdown flavor: GitHub Flavored Markdown

Description

Cleans up the unicode punctuation from #2811 by using \p{P} instead of a long list of unicode characters. There are a handful of punctuation characters $+<=>`^|~ not included in that set for whatever reason, so they are still specified here. Includes the accompanying tweaks to a couple other regexes to apply it correctly. This also lets us cover a slightly larger punctuation set since my understanding is JS unicode symbols end at \uFFFF but there are a few more after that.

And a tiny unrelated logic simplification in the emStrong Tokenizer.

My only question is if there is a better way to exclude single characters from \p{P}, for instance in the emStrong, we don't include the current delimiter * or _ in the punctuation checks. I get around this now with an additional lookahead regex:

Something like (?!_)\p{P}

For instance, this example lets us exclude _ from the \p{P} group. I'm ok with this, but if there is some secret "subtraction from a unicode set" syntax, I would like to know.

I didn't add tests, but could potentially look up some of the characters that were missing previously and add them to the existing unicode test.

Contributor

Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
no tests required for this PR.
If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

CI is green (no forced merge required).
Squash and Merge PR following conventional commit guidelines.

vercel · 2023-06-07T17:42:34Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
marked-website	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jun 7, 2023 5:42pm

UziTech

LGTM. I think the current tests are enough to tell if this is working as expected.

# [5.1.0](v5.0.5...v5.1.0) (2023-06-10) ### Bug Fixes * Simplify unicode punctuation ([#2841](#2841)) ([f19fe76](f19fe76)) ### Features * add Marked instance ([#2831](#2831)) ([353e13b](353e13b))

calculuschild added 2 commits June 7, 2023 11:48

Replace list of unicode punct with \p{P}

d21c5d4

cleanup

17607fb

calculuschild added the RR - refactor & re-engineer Results in an improvement to developers using Marked, or end-users, or both. label Jun 7, 2023

vercel bot deployed to Preview June 7, 2023 17:42 View deployment

UziTech approved these changes Jun 8, 2023

View reviewed changes

UziTech requested review from styfle, joshbruce and davisjam June 9, 2023 02:31

styfle approved these changes Jun 9, 2023

View reviewed changes

UziTech changed the title ~~Simplify unicode punctuation~~ fix: Simplify unicode punctuation Jun 10, 2023

UziTech merged commit f19fe76 into markedjs:master Jun 10, 2023

UziTech mentioned this pull request Jul 3, 2023

Improper emoji rendering with v5.1.0 #2865

Closed

UziTech mentioned this pull request Jul 17, 2023

revert 2841 #2879

Closed

5 tasks

calculuschild mentioned this pull request Aug 14, 2023

Fix unicode Regex miscounting emoji length #2942

Merged

5 tasks

X-oss-byte mentioned this pull request May 24, 2024

[Snyk] Upgrade marked from 5.0.2 to 5.1.2 X-oss-byte/Appwrite#53

Open

X-oss-byte mentioned this pull request Sep 7, 2024

[Snyk] Upgrade: , marked, marked-gfm-heading-id X-oss-byte/Appwrite#66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Simplify unicode punctuation #2841

fix: Simplify unicode punctuation #2841

calculuschild commented Jun 7, 2023 •

edited

Loading

vercel bot commented Jun 7, 2023 •

edited

Loading

UziTech left a comment

fix: Simplify unicode punctuation #2841

fix: Simplify unicode punctuation #2841

Conversation

calculuschild commented Jun 7, 2023 • edited Loading

Description

Contributor

Committer

vercel bot commented Jun 7, 2023 • edited Loading

UziTech left a comment

Choose a reason for hiding this comment

calculuschild commented Jun 7, 2023 •

edited

Loading

vercel bot commented Jun 7, 2023 •

edited

Loading