Skip to content

Conversation

wooorm
Copy link
Member

@wooorm wooorm commented Oct 1, 2020

This is a giant change for remark. It replaces the 5+ year old internals with a new low-level parser: https://github.com/micromark/micromark. The old internals have served billions of users well over the years, but markdown has changed over that time. micromark comes with 100% CommonMark (and GFM as an extension) compliance, and (WIP) docs on parsing rules for how to tokenize markdown with a state machine: https://github.com/micromark/common-markup-state-machine. micromark, and micromark in remark, is a good base for the future.

remark-parse

remark-parse now defers its work to micromark and mdast-util-from-markdown. micromark is a new, small, complete, and CommonMark compliant low-level markdown parser. from-markdown turns its tokens into the previously (and still) used syntax tree: mdast. Extensions to remark-parse work differently: they’re a two-part act. See for example micromark-extension-footnote and mdast-util-footnote.

  • change: commonmark is no longer an option — it’s the default
  • move: gfm is no longer an option — moved to remark-gfm
  • remove: pedantic is no longer an option — this legacy and buggy flavor of markdown is no longer widely used
  • remove: blocks is no longer an options — it’s no longer suggested to change the internal list of HTML “block” tag names

remark-stringify

remark-stringify now defers its work to mdast-util-to-markdown. It’s a new and better serializer with powerful features to ensure serialized markdown represents the syntax tree (mdast), no matter what plugins do. Extensions to it work differently: see for example mdast-util-footnote.

options
  • change: commonmark is no longer an option, it’s the default
  • change: emphasis now defaults to *
  • change: bullet now defaults to *
  • move: gfm is no longer an option — moved to remark-gfm
  • move: tableCellPadding — moved to remark-gfm
  • move: tablePipeAlign — moved to remark-gfm
  • move: stringLength — moved to remark-gfm
  • remove: pedantic is no longer an option — this legacy and buggy flavor of markdown is no longer widely used
  • remove: entities is no longer an option — with CommonMark there is almost never a need to use character references, as character escapes are preferred
  • new: quote — you can now prefer single quotes (') over double quotes (") in titles

Changes to output / the tree

All of these are for CommonMark compatibility. They’re all fixes. Most of them are inconsequential to most folks.

  • notable: references (as in, links [text][id] and images ![alt][id]) are no longer present as such in the syntax tree if they don’t have a corresponding definition ([id]: example.com). The reason for this is that CommonMark requires [text *emphasis start][undefined] emphasis end* to be emphasis.
  • notable: it is no longer possible to use two blank lines between two lists or a list and indented code. CommonMark prohibits it. For a solution, use an empty comment to end lists (<!---->)
  • inconsequential: whitespace at the start and end of lines in paragraphs is now ignored
  • inconsequential: <mailto:foobarbaz> are now correctly parsed, and the scheme is part of the tree
  • inconsequential: indented code can now follow a block quote w/o blank line
  • inconsequential: trailing indented blank lines after indented code are no longer part of that code
  • inconsequential: character references and escapes are no longer present as separate text nodes
  • inconsequential: character references which HTML allows but CommonMark doesn’t, such as &copy w/o the semicolon, are no longer recognized
  • inconsequential: the indent field is no longer available on position
  • fix: multiline setext headings
  • fix: lazy lists
  • fix: attention (emphasis, strong)
  • fix: tabs
  • fix: empty alt on images is now present as an empty string
  • …plus a ton of other minor previous differences from CommonMark

For now

  • get folks to use this and report problems!

Up next

  • make remark-gfm
  • start making next branches for plugins
  • get types into {from,to}-markdown and use them here

Closes

Closes GH-218.
Closes GH-306.
Closes GH-315.
Closes GH-324.
Closes GH-398.
Closes GH-402.
Closes GH-407.
Closes GH-439.
Closes GH-450.
Closes GH-459.
Closes GH-493.
Closes GH-494.
Closes GH-497.
Closes GH-504.
Closes GH-517.
Closes GH-521.
Closes GH-523.

Closes remarkjs/remark-lint#111.

Thanks

Thanks to Salesforce, Gatsby, Vercel, and Netlify, and our other backers for sponsoring the work on micromark!
To support our continued work, back us on OpenCollective!

This is a giant change for remark.
It replaces the 5+ year old internals with a new low-level parser:
<https://github.com/micromark/micromark>
The old internals have served billions of users well over the years, but
markdown has changed over that time.
micromark comes with 100% CommonMark (and GFM as an extension) compliance,
and (WIP) docs on parsing rules for how to tokenize markdown with a state
machine: <https://github.com/micromark/common-markup-state-machine>.
micromark, and micromark in remark, is a good base for the future.

`remark-parse` now defers its work to [`micromark`][micromark] and
[`mdast-util-from-markdown`][from-markdown].
`micromark` is a new, small, complete, and CommonMark compliant low-level
markdown parser.
`from-markdown` turns its tokens into the previously (and still) used syntax
tree: [mdast][].
Extensions to `remark-parse` work differently: they’re a two-part act.
See for example [`micromark-extension-footnote`][micromark-footnote] and
[`mdast-util-footnote`][from-markdown-footnote].

* change: `commonmark` is no longer an option — it’s the default
* move: `gfm` is no longer an option — moved to `remark-gfm`
* remove: `pedantic` is no longer an option — this legacy and buggy flavor of
  markdown is no longer widely used
* remove: `blocks` is no longer an options — it’s no longer suggested to
  change the internal list of HTML “block” tag names

remark-stringify now defers its work to [`mdast-util-to-markdown`][to-markdown].
It’s a new and better serializer with powerful features to ensure serialized
markdown represents the syntax tree (mdast), no matter what plugins do.
Extensions to it work differently: see for example
[`mdast-util-footnote`][to-markdown-footnote].

* change: `commonmark` is no longer an option, it’s the default
* change: `emphasis` now defaults to `*`
* change: `bullet` now defaults to `*`
* move: `gfm` is no longer an option — moved to `remark-gfm`
* move: `tableCellPadding` — moved to `remark-gfm`
* move: `tablePipeAlign` — moved to `remark-gfm`
* move: `stringLength` — moved to `remark-gfm`
* remove: `pedantic` is no longer an option — this legacy and buggy flavor of
  markdown is no longer widely used
* remove: `entities` is no longer an option — with CommonMark there is almost
  never a need to use character references, as character escapes are preferred
* new: `quote` — you can now prefer single quotes (`'`) over double quotes
  (`"`) in titles

All of these are for CommonMark compatibility.
Most of them are inconsequential.

* **notable**: references (as in, links `[text][id]` and images `![alt][id]`)
  are no longer present as such in the syntax tree if they don’t have a
  corresponding definition (`[id]: example.com`).
  The reason for this is that CommonMark requires `[text *emphasis
  start][undefined] emphasis end*` to be emphasis.
* **notable**: it is no longer possible to use two blank lines between two
  lists or a list and indented code.
  CommonMark prohibits it.
  For a solution, use an empty comment to end lists (`<!---->`)
* inconsequential: whitespace at the start and end of lines in paragraphs is
  now ignored
* inconsequential: `<mailto:foobarbaz>` are now correctly parsed, and the
  scheme is part of the tree
* inconsequential: indented code can now follow a block quote w/o blank line
* inconsequential: trailing indented blank lines after indented code are no
  longer part of that code
* inconsequential: character references and escapes are no longer present as
  separate text nodes
* inconsequential: character references which HTML allows but CommonMark
  doesn’t, such as `&copy` w/o the semicolon, are no longer recognized
* inconsequential: the `indent` field is no longer available on `position`
* fix: multiline setext headings
* fix: lazy lists
* fix: attention (emphasis, strong)
* fix: tabs
* fix: empty alt on images is now present as an empty string
* …plus a ton of other minor previous differences from CommonMark

* get folks to use this and report problems!

* make `remark-gfm`
* start making next branches for plugins
* get types into {from,to}-markdown and use them here

Closes GH-218.
Closes GH-306.
Closes GH-315.
Closes GH-324.
Closes GH-398.
Closes GH-402.
Closes GH-407.
Closes GH-439.
Closes GH-450.
Closes GH-459.
Closes GH-493.
Closes GH-494.
Closes GH-497.
Closes GH-504.
Closes GH-517.
Closes GH-521.
Closes GH-523.

Closes remarkjs/remark-lint#111.

[micromark]: https://github.com/micromark/micromark

[from-markdown]: https://github.com/syntax-tree/mdast-util-from-markdown

[to-markdown]: https://github.com/syntax-tree/mdast-util-to-markdown

[micromark-footnote]: https://github.com/micromark/micromark-extension-footnote/blob/main/index.js

[to-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/to-markdown.js

[from-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/from-markdown.js

[mdast]: https://github.com/syntax-tree/mdast
@wooorm wooorm added 🐛 type/bug This is a problem 🦋 type/enhancement This is great to have 🧑 semver/major This is a change 🗄 area/interface This affects the public interface 🙆 yes/confirmed This is confirmed and ready to be worked on 📣 type/announcement This is meta 💬 type/discussion This is a request for comments labels Oct 1, 2020
@wooorm wooorm self-assigned this Oct 1, 2020
Copy link
Member

@BarryThePenguin BarryThePenguin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@wooorm
Copy link
Member Author

wooorm commented Oct 7, 2020

Update on the ecosystem

I checked with the community (see the referenced issues above). Most plugins are fine. I’m in contact with authors of stuff that isn’t.

Here is a breakdown of the stuff maintained in the remarkjs org.

Changes

These plugins have new versions which work with the new parser/compiler, but don’t with remark@prev.

  • remark-frontmatter
  • remark-footnotes
  • remark-gfm (new)
  • remarkjs/remark-heading-gap
  • remarkjs/remark-yaml-config
  • remarkjs/remark-comment-config
  • remark-github
  • remark-breaks
  • remarkjs/remark-gemoji
  • remarkjs/remark-math
  • remarkjs/remark-lint (depends, 8% of the tests failed so most is fine, but some subplugins received updates)

Tiny changes

These plugins received a tiny update to match commonmark, but otherwise work w/ remark@next and remark@prev the same.

  • remarkjs/remark-html
  • remarkjs/remark-rehype
  • remarkjs/remark-external-links
  • remarkjs/remark-inline-links
  • remarkjs/remark-strip-badges

No changes

These plugins did not need any update at all for remark@next

  • rehypejs/rehype-remark
  • remarkjs/remark-slug
  • remarkjs/remark-squeeze-paragraphs
  • remarkjs/remark-retext
  • remarkjs/remark-validate-links
  • remarkjs/remark-autolink-headings
  • remarkjs/strip-markdown
  • remarkjs/remark-react
  • remarkjs/remark-message-control
  • remarkjs/remark-reference-links
  • remarkjs/remark-images
  • remarkjs/remark-highlight.js
  • remarkjs/remark-unwrap-images
  • remarkjs/remark-textr
  • remarkjs/remark-license
  • remarkjs/remark-normalize-headings
  • remarkjs/remark-unlink
  • remarkjs/remark-usage
  • remarkjs/remark-defsplit
  • remarkjs/remark-embed-images
  • remarkjs/remark-midas
  • remarkjs/remark-contributors
  • remarkjs/remark-git-contributors
  • remarkjs/remark-man
  • remarkjs/remark-vdom

Archived

Not used a lot, too much time in updating:

  • remarkjs/remark-bookmarks (archived)

@wooorm wooorm merged commit 48b1278 into main Oct 13, 2020
@wooorm wooorm deleted the next branch October 13, 2020 16:30
fisker added a commit to fisker/prettier that referenced this pull request Oct 14, 2020
@wooorm wooorm added ⛵️ status/released and removed 🙆 yes/confirmed This is confirmed and ready to be worked on labels Oct 14, 2020
@wooorm
Copy link
Member Author

wooorm commented Oct 14, 2020

This is now released in [email protected]

Martii added a commit to Martii/OpenUserJS.org that referenced this pull request Oct 19, 2020
* Please read their CHANGELOGs
* *remark* , *remark-strip-html* , and *strip-markdown* are on hold since they are interdependent and needs in-depth retesting. See craftzdog/remark-strip-html#2 , remarkjs/remark#536 , and remarkjs/strip-markdown@0ceb371#diff-5a831ea67cf5cf8703b0de46901ab25bd191f56b320053be9332d9a3b0d01d15
* *sanitize-html* CHANGELOG at https://github.com/apostrophecms/sanitize-html/blob/main/CHANGELOG.md#200-2020-09-23 . We don't DOM insert , pro *node* is acceptable, and we override `allowedTags` to usually match GH.
* *spdx-license-ids* is going to take some time as a bunch of new ones have been added and need to be cross-checked/restricted. On hold.
* *moment* is in "maintenance mode" and deprecated. Will address this much later.
* Delete op retested
Martii added a commit to OpenUserJS/OpenUserJS.org that referenced this pull request Oct 19, 2020
* Please read their CHANGELOGs
* *remark* , *remark-strip-html* , and *strip-markdown* are on hold since they are interdependent and needs in-depth retesting. See craftzdog/remark-strip-html#2 , remarkjs/remark#536 , and remarkjs/strip-markdown@0ceb371#diff-5a831ea67cf5cf8703b0de46901ab25bd191f56b320053be9332d9a3b0d01d15
* *sanitize-html* CHANGELOG at https://github.com/apostrophecms/sanitize-html/blob/main/CHANGELOG.md#200-2020-09-23 . We don't DOM insert , pro *node* is acceptable, and we override `allowedTags` to usually match GH.
* *spdx-license-ids* is going to take some time as a bunch of new ones have been added and need to be cross-checked/restricted. On hold.
* *moment* is in "maintenance mode" and deprecated. Will address this much later.
* Delete op retested

Auto-merge
@wooorm wooorm added the 💪 phase/solved Post is done label Aug 4, 2021
@wooorm wooorm mentioned this pull request Nov 18, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment