Nested list item becomes single paragraph when indenting >=4 spaces #523

lishid · 2020-08-19T10:36:30Z

Subject of the issue

It seems like if the 4-space indented list item is followed by an unindented text block, the parser would recognize all 3 lines into a single paragraph within the first list item. This doesn't seem to happen without an unindented text block after, nor with 2 or 3 spaces of indentation. With CommonMark, it seems that this behavior only happens when indenting >=6 spaces.

Your environment

OS: Windows 10
Packages: remark-parse 8.0.3
Env: Chrome 84.0.4147.125

Steps to reproduce

Sample: (notice there are exactly 4 spaces in the second list item, the same happens with a tab character)

- Test
    - Test
Test

remark

CommonMark dingus

Observation

Interestingly, when the indentation is done with 2 or 3 spaces, the parser correctly recognize the list item. Only when 4 or more spaces does remark fail to parse the indented list item.

With CommonMark, it seems like the required number of spaces to produce a similar result is 6 spaces instead of 4.

Debugging

I've put in a few hours stepping through the code. It's a bit tedious and difficult to understand, but it would seem that the normalListItem function eats up the - at the beginning of the first line, causing the difference in indentation to be 2 characters greater, which may be causing the symptom observed.

Unfortunately, with my limited experience, I am unsure how to continue with debugging or coming up with a fix.

Additional Information

This behavior is similar to #315 so they might be related.

The text was updated successfully, but these errors were encountered:

wooorm · 2020-08-22T14:58:43Z

Heya, just wanted to give an update about micromark, it’s sort-of a new motor that we’ll soon use in remark to parse markdown. It’s not yet 100% ready but will be relatively soon. The good news is, it fixes this issue! (P.S. see this twitter thread for some more info!)

lishid · 2020-08-22T19:10:51Z

Awesome, thanks for the update!

This is a giant change for remark. It replaces the 5+ year old internals with a new low-level parser: <https://github.com/micromark/micromark> The old internals have served billions of users well over the years, but markdown has changed over that time. micromark comes with 100% CommonMark (and GFM as an extension) compliance, and (WIP) docs on parsing rules for how to tokenize markdown with a state machine: <https://github.com/micromark/common-markup-state-machine>. micromark, and micromark in remark, is a good base for the future. `remark-parse` now defers its work to [`micromark`][micromark] and [`mdast-util-from-markdown`][from-markdown]. `micromark` is a new, small, complete, and CommonMark compliant low-level markdown parser. `from-markdown` turns its tokens into the previously (and still) used syntax tree: [mdast][]. Extensions to `remark-parse` work differently: they’re a two-part act. See for example [`micromark-extension-footnote`][micromark-footnote] and [`mdast-util-footnote`][from-markdown-footnote]. * change: `commonmark` is no longer an option — it’s the default * move: `gfm` is no longer an option — moved to `remark-gfm` * remove: `pedantic` is no longer an option — this legacy and buggy flavor of markdown is no longer widely used * remove: `blocks` is no longer an options — it’s no longer suggested to change the internal list of HTML “block” tag names remark-stringify now defers its work to [`mdast-util-to-markdown`][to-markdown]. It’s a new and better serializer with powerful features to ensure serialized markdown represents the syntax tree (mdast), no matter what plugins do. Extensions to it work differently: see for example [`mdast-util-footnote`][to-markdown-footnote]. * change: `commonmark` is no longer an option, it’s the default * change: `emphasis` now defaults to `*` * change: `bullet` now defaults to `*` * move: `gfm` is no longer an option — moved to `remark-gfm` * move: `tableCellPadding` — moved to `remark-gfm` * move: `tablePipeAlign` — moved to `remark-gfm` * move: `stringLength` — moved to `remark-gfm` * remove: `pedantic` is no longer an option — this legacy and buggy flavor of markdown is no longer widely used * remove: `entities` is no longer an option — with CommonMark there is almost never a need to use character references, as character escapes are preferred * new: `quote` — you can now prefer single quotes (`'`) over double quotes (`"`) in titles All of these are for CommonMark compatibility. Most of them are inconsequential. * **notable**: references (as in, links `[text][id]` and images `![alt][id]`) are no longer present as such in the syntax tree if they don’t have a corresponding definition (`[id]: example.com`). The reason for this is that CommonMark requires `[text *emphasis start][undefined] emphasis end*` to be emphasis. * **notable**: it is no longer possible to use two blank lines between two lists or a list and indented code. CommonMark prohibits it. For a solution, use an empty comment to end lists (``) * inconsequential: whitespace at the start and end of lines in paragraphs is now ignored * inconsequential: `<mailto:foobarbaz>` are now correctly parsed, and the scheme is part of the tree * inconsequential: indented code can now follow a block quote w/o blank line * inconsequential: trailing indented blank lines after indented code are no longer part of that code * inconsequential: character references and escapes are no longer present as separate text nodes * inconsequential: character references which HTML allows but CommonMark doesn’t, such as `&copy` w/o the semicolon, are no longer recognized * inconsequential: the `indent` field is no longer available on `position` * fix: multiline setext headings * fix: lazy lists * fix: attention (emphasis, strong) * fix: tabs * fix: empty alt on images is now present as an empty string * …plus a ton of other minor previous differences from CommonMark * get folks to use this and report problems! * make `remark-gfm` * start making next branches for plugins * get types into {from,to}-markdown and use them here Closes GH-218. Closes GH-306. Closes GH-315. Closes GH-324. Closes GH-398. Closes GH-402. Closes GH-407. Closes GH-439. Closes GH-450. Closes GH-459. Closes GH-493. Closes GH-494. Closes GH-497. Closes GH-504. Closes GH-517. Closes GH-521. Closes GH-523. Closes remarkjs/remark-lint#111. [micromark]: https://github.com/micromark/micromark [from-markdown]: https://github.com/syntax-tree/mdast-util-from-markdown [to-markdown]: https://github.com/syntax-tree/mdast-util-to-markdown [micromark-footnote]: https://github.com/micromark/micromark-extension-footnote/blob/main/index.js [to-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/to-markdown.js [from-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/from-markdown.js [mdast]: https://github.com/syntax-tree/mdast

wooorm · 2020-10-01T15:18:27Z

Sorry for the wait! I just wanted to share that there’s now a PR that solves this issue: #536.

This is a giant change for remark. It replaces the 5+ year old internals with a new low-level parser: <https://github.com/micromark/micromark> The old internals have served billions of users well over the years, but markdown has changed over that time. micromark comes with 100% CommonMark (and GFM as an extension) compliance, and (WIP) docs on parsing rules for how to tokenize markdown with a state machine: <https://github.com/micromark/common-markup-state-machine>. micromark, and micromark in remark, is a good base for the future. `remark-parse` now defers its work to [`micromark`][micromark] and [`mdast-util-from-markdown`][from-markdown]. `micromark` is a new, small, complete, and CommonMark compliant low-level markdown parser. `from-markdown` turns its tokens into the previously (and still) used syntax tree: [mdast][]. Extensions to `remark-parse` work differently: they’re a two-part act. See for example [`micromark-extension-footnote`][micromark-footnote] and [`mdast-util-footnote`][from-markdown-footnote]. * change: `commonmark` is no longer an option — it’s the default * move: `gfm` is no longer an option — moved to `remark-gfm` * remove: `pedantic` is no longer an option — this legacy and buggy flavor of markdown is no longer widely used * remove: `blocks` is no longer an options — it’s no longer suggested to change the internal list of HTML “block” tag names remark-stringify now defers its work to [`mdast-util-to-markdown`][to-markdown]. It’s a new and better serializer with powerful features to ensure serialized markdown represents the syntax tree (mdast), no matter what plugins do. Extensions to it work differently: see for example [`mdast-util-footnote`][to-markdown-footnote]. * change: `commonmark` is no longer an option, it’s the default * change: `emphasis` now defaults to `*` * change: `bullet` now defaults to `*` * move: `gfm` is no longer an option — moved to `remark-gfm` * move: `tableCellPadding` — moved to `remark-gfm` * move: `tablePipeAlign` — moved to `remark-gfm` * move: `stringLength` — moved to `remark-gfm` * remove: `pedantic` is no longer an option — this legacy and buggy flavor of markdown is no longer widely used * remove: `entities` is no longer an option — with CommonMark there is almost never a need to use character references, as character escapes are preferred * new: `quote` — you can now prefer single quotes (`'`) over double quotes (`"`) in titles All of these are for CommonMark compatibility. Most of them are inconsequential. * **notable**: references (as in, links `[text][id]` and images `![alt][id]`) are no longer present as such in the syntax tree if they don’t have a corresponding definition (`[id]: example.com`). The reason for this is that CommonMark requires `[text *emphasis start][undefined] emphasis end*` to be emphasis. * **notable**: it is no longer possible to use two blank lines between two lists or a list and indented code. CommonMark prohibits it. For a solution, use an empty comment to end lists (``) * inconsequential: whitespace at the start and end of lines in paragraphs is now ignored * inconsequential: `<mailto:foobarbaz>` are now correctly parsed, and the scheme is part of the tree * inconsequential: indented code can now follow a block quote w/o blank line * inconsequential: trailing indented blank lines after indented code are no longer part of that code * inconsequential: character references and escapes are no longer present as separate text nodes * inconsequential: character references which HTML allows but CommonMark doesn’t, such as `&copy` w/o the semicolon, are no longer recognized * inconsequential: the `indent` field is no longer available on `position` * fix: multiline setext headings * fix: lazy lists * fix: attention (emphasis, strong) * fix: tabs * fix: empty alt on images is now present as an empty string * …plus a ton of other minor previous differences from CommonMark * get folks to use this and report problems! * make `remark-gfm` * start making next branches for plugins * get types into {from,to}-markdown and use them here Closes GH-218. Closes GH-306. Closes GH-315. Closes GH-324. Closes GH-398. Closes GH-402. Closes GH-407. Closes GH-439. Closes GH-450. Closes GH-459. Closes GH-493. Closes GH-494. Closes GH-497. Closes GH-504. Closes GH-517. Closes GH-521. Closes GH-523. Closes remarkjs/remark-lint#111. [micromark]: https://github.com/micromark/micromark [from-markdown]: https://github.com/syntax-tree/mdast-util-from-markdown [to-markdown]: https://github.com/syntax-tree/mdast-util-to-markdown [micromark-footnote]: https://github.com/micromark/micromark-extension-footnote/blob/main/index.js [to-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/to-markdown.js [from-markdown-footnote]: https://github.com/syntax-tree/mdast-util-footnote/blob/main/from-markdown.js [mdast]: https://github.com/syntax-tree/mdast

wooorm · 2020-10-14T08:53:38Z

This is now released in [email protected]

lishid · 2020-10-25T14:20:04Z

This is great, thank you!

lishid added 🐛 type/bug This is a problem 🙉 open/needs-info This needs some more info labels Aug 19, 2020

wooorm added remark-parse 🐛 type/bug This is a problem 🙆 yes/confirmed This is confirmed and ready to be worked on and removed 🐛 type/bug This is a problem 🙉 open/needs-info This needs some more info labels Aug 22, 2020

wooorm mentioned this issue Oct 1, 2020

Change to use micromark #536

Merged

wooorm closed this as completed in #536 Oct 13, 2020

wooorm added ⛵️ status/released and removed 🙆 yes/confirmed This is confirmed and ready to be worked on labels Oct 14, 2020

pastak mentioned this issue Nov 26, 2020

md2sb: nested list pastak/scrapbox-converter#51

Closed

wooorm added the 💪 phase/solved Post is done label Aug 4, 2021

cadamini mentioned this issue Feb 3, 2023

Markdown: paragraphs in list items aren't being rendered as expected mdn/yari#5042

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nested list item becomes single paragraph when indenting >=4 spaces #523

Nested list item becomes single paragraph when indenting >=4 spaces #523

lishid commented Aug 19, 2020

wooorm commented Aug 22, 2020

lishid commented Aug 22, 2020

wooorm commented Oct 1, 2020

wooorm commented Oct 14, 2020

lishid commented Oct 25, 2020

Nested list item becomes single paragraph when indenting >=4 spaces #523

Nested list item becomes single paragraph when indenting >=4 spaces #523

Comments

lishid commented Aug 19, 2020

Subject of the issue

Your environment

Steps to reproduce

remark

CommonMark dingus

Observation

Debugging

Additional Information

wooorm commented Aug 22, 2020

lishid commented Aug 22, 2020

wooorm commented Oct 1, 2020

wooorm commented Oct 14, 2020

lishid commented Oct 25, 2020