Skip to content

Commit 10f758a

Browse files
committed
rewrite for speed and spec compliance
markdown-to-jsx is now 100% gfm & commonmark compliant, while being more than 10x faster depending on the benchmark at all input sizes it is the fastest markdown parser... maybe ever? definitely the fastest javascript-based markdown parser to achieve this there were some tradeoffs, notably the bundle size if we dropped html entity compliance down to the absolute bare minimum, the library would be much smaller and still much larger than earlier majors however, I think in 2025 the performance is more worthwhile and enables fun scenarios like butter-smooth live authoring plus now you can use the parser directly and build your own renderer if you want
1 parent feb6e26 commit 10f758a

File tree

1,109 files changed

+44270
-13135
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,109 files changed

+44270
-13135
lines changed
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
---
2+
"markdown-to-jsx": major
3+
---
4+
5+
Adopt CommonMark-compliant class naming for code blocks
6+
7+
## Breaking Change
8+
9+
Code blocks now use the `language-` class name prefix instead of `lang-` to match the CommonMark specification.
10+
11+
### Before
12+
13+
```markdown
14+
```js
15+
console.log('hello');
16+
\```
17+
```
18+
19+
Generated:
20+
```html
21+
<pre><code class="lang-js">console.log('hello');</code></pre>
22+
```
23+
24+
### After
25+
26+
```markdown
27+
```js
28+
console.log('hello');
29+
\```
30+
```
31+
32+
Generated:
33+
```html
34+
<pre><code class="language-js">console.log('hello');</code></pre>
35+
```
36+
37+
## Migration
38+
39+
If you have CSS targeting `.lang-*` classes, update your selectors to use `.language-*` instead:
40+
41+
```css
42+
/* Before */
43+
.lang-js {
44+
color: blue;
45+
}
46+
47+
/* After */
48+
.language-js {
49+
color: blue;
50+
}
51+
```
52+
53+
Or use a more flexible selector that matches both:
54+
55+
```css
56+
code[class^="lang"] {
57+
color: blue;
58+
}
59+
```
60+
61+
## Rationale
62+
63+
The CommonMark specification explicitly uses `language-` as the class name prefix for fenced code blocks. This change ensures compliance with the specification and improves interoperability with other CommonMark-compliant tools.
64+
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
'markdown-to-jsx': major
3+
---
4+
5+
Adopt CommonMark-compliant inline formatting parsing
6+
7+
**BREAKING CHANGE**: Inline formatting delimiters (emphasis, bold, strikethrough, mark) can no longer span across newlines, per CommonMark specification.
8+
9+
**Previous Behavior (Non-Compliant):**
10+
11+
The library previously allowed inline formatting to span multiple lines:
12+
13+
```markdown
14+
_Hello
15+
World._
16+
```
17+
18+
This was parsed as a single `<em>` element containing the newline.
19+
20+
**New Behavior (CommonMark Compliant):**
21+
22+
Per CommonMark specification, inline formatting cannot span newlines. The above example is now parsed as literal underscores:
23+
24+
```markdown
25+
_Hello
26+
World._
27+
```
28+
29+
Renders as:
30+
31+
```html
32+
<p>_Hello World._</p>
33+
```
34+
35+
**Impact:**
36+
37+
- Single-line formatting still works: `*Hello World*``<em>Hello World</em>`
38+
- Multi-line formatting is now rejected: `*Hello\nWorld*` → literal asterisks
39+
- Affects all inline formatting: `*emphasis*`, `**bold**`, `~~strikethrough~~`, `==mark==`
40+
- Improves CommonMark compliance (passes 269/652 tests, up from 268)
41+
42+
**Migration:**
43+
44+
If you have markdown with multi-line inline formatting:
45+
46+
1. Keep formatting on a single line: `*Hello World*`
47+
2. Use HTML tags: `<em>Hello\nWorld</em>`
48+
3. Accept that multi-line formatting renders as literal delimiters
49+
50+
**Examples:**
51+
52+
```markdown
53+
# Works (single line)
54+
55+
_This is emphasized_
56+
**This is bold**
57+
58+
# No longer works (multi-line)
59+
60+
_This is
61+
emphasized_
62+
**This is
63+
bold**
64+
65+
# Renders as literal delimiters:
66+
67+
<p>_This is
68+
emphasized_</p>
69+
<p>**This is
70+
bold**</p>
71+
72+
# Workaround: Use HTML tags
73+
74+
<em>This is
75+
emphasized</em>
76+
<strong>This is
77+
bold</strong>
78+
```
79+
80+
This change aligns the library with the [CommonMark 0.31.2 specification](https://spec.commonmark.org/0.31/), improving compatibility with other CommonMark-compliant parsers and tools.
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
'markdown-to-jsx': major
3+
---
4+
5+
Complete GFM+CommonMark specification compliance with comprehensive testing and refinements
6+
7+
This major version achieves full compliance with both GitHub Flavored Markdown (GFM) and CommonMark specifications through comprehensive testing, parser refinements, and specification alignment. All existing GFM features are now verified against official specifications and edge cases are properly handled.
8+
9+
## ✅ Specification Compliance Achievements
10+
11+
### GFM Extensions (All Previously Implemented)
12+
13+
- **Tables**: Pipe-delimited tables with alignment support and inline markdown content
14+
- **Task Lists**: `[ ]` and `[x]` checkbox syntax in unordered lists
15+
- **Strikethrough**: `~~text~~` syntax with proper nesting and precedence rules
16+
- **Autolinks**: Bare URLs (including `www.` domains) and enhanced email detection
17+
- **HTML Filtering**: GitHub-compatible tag filtering for security
18+
19+
### CommonMark Compatibility
20+
21+
- **Verified against 652 official CommonMark test cases**
22+
- **Complete spec coverage** including edge cases and error conditions
23+
- **Consistent parsing behavior** across all markdown constructs
24+
25+
## 🔧 Technical Improvements
26+
27+
### Parser Refinements
28+
29+
- **Edge case handling**: Improved parsing of malformed and edge-case markdown
30+
- **Performance optimizations**: Enhanced efficiency for complex markdown structures
31+
- **Memory safety**: Better handling of deeply nested and pathological inputs
32+
33+
### Security Enhancements
34+
35+
- **HTML tag filtering**: Default filtering of dangerous tags (`<script>`, `<iframe>`, etc.)
36+
- **URL sanitization**: Protection against `javascript:`, `vbscript:`, and malicious `data:` URLs
37+
- **Autolink safety**: Secure bare URL detection without false positives
38+
39+
## 📋 Compliance Status
40+
41+
| Feature Area | Previous Status | New Status | Details |
42+
| ----------------- | --------------- | -------------------- | ------------------------------ |
43+
| CommonMark Core | 268/652 tests | 652/652 tests | Complete spec compliance |
44+
| GFM Tables | ✅ Implemented | ✅ Spec-verified | Official test suite compliance |
45+
| GFM Task Lists | ✅ Implemented | ✅ Spec-verified | Full syntax support |
46+
| GFM Strikethrough | ✅ Implemented | ✅ Spec-verified | Proper precedence and nesting |
47+
| GFM Autolinks | ✅ Implemented | ✅ Spec-verified | Enhanced URL pattern detection |
48+
| HTML Security | ✅ Basic | ✅ GitHub-compatible | Complete tag filtering |
49+
50+
## 🧪 Testing & Validation
51+
52+
### Comprehensive Test Coverage
53+
54+
- **Official CommonMark test suite**: All 652 specification tests now pass
55+
- **GFM specification tests**: Complete coverage of GFM extensions
56+
- **Security regression tests**: Protection against XSS and injection attacks
57+
- **Performance benchmarks**: Maintained parsing speed despite increased compliance
58+
59+
### Edge Case Handling
60+
61+
- **Pathological inputs**: Protection against malicious or malformed markdown
62+
- **Deep nesting**: Safe handling of extremely nested structures
63+
- **Unicode support**: Proper handling of international characters and emojis
64+
- **Mixed syntax**: Correct precedence resolution in complex combinations
65+
66+
## 🔒 Security & Safety
67+
68+
### HTML Content Filtering
69+
70+
Default filtering of potentially dangerous HTML tags:
71+
72+
- `<script>`, `<iframe>`, `<object>`, `<embed>`
73+
- `<title>`, `<textarea>`, `<style>`, `<xmp>`
74+
- `<plaintext>`, `<noembed>`, `<noframes>`
75+
76+
### URL Security
77+
78+
Protection against malicious URL schemes:
79+
80+
- `javascript:` and `vbscript:` protocol handlers
81+
- Malicious `data:` URLs (except safe `data:image/*`)
82+
- URL-encoded attack vectors
83+
84+
## 📚 Documentation Updates
85+
86+
- **GFM feature documentation**: Comprehensive examples and usage patterns
87+
- **Security guidelines**: Best practices for safe markdown processing
88+
- **Specification references**: Links to official CommonMark and GFM specs
89+
- **Migration notes**: Handling of edge cases and breaking changes
90+
91+
## 🎯 Migration Considerations
92+
93+
### No Breaking Changes for Typical Usage
94+
95+
Most users will experience no changes in behavior. Existing markdown content continues to work exactly as before.
96+
97+
### Potential Edge Case Changes
98+
99+
- **Malformed HTML**: Previously accepted invalid HTML may now be filtered or escaped
100+
- **Edge case parsing**: Some ambiguous markdown constructs now follow strict specification rules
101+
- **Security filtering**: Previously allowed dangerous HTML/URLs may now be blocked
102+
103+
### Configuration Options
104+
105+
All security features can be customized or disabled via options:
106+
107+
```typescript
108+
compiler(markdown, {
109+
tagfilter: false, // Disable HTML tag filtering
110+
sanitizer: customFn, // Custom URL sanitization
111+
})
112+
```
113+
114+
## Bundle Size Impact
115+
116+
The library is now ~27kB minzipped, up from ~6.75kB. Being spec-compliant for a complex DSL like markdown is quite hard to achieve in a generalized way, but I'm confident there will be further opportunities to trim down the bundle size down the road. In exchange for the extra bytes, the library is quite a bit faster now as well.
117+
118+
## 📈 Performance Impact
119+
120+
### Benchmark Results
121+
122+
Performance maintained with improvements in complex markdown parsing:
123+
124+
| Input Type | Operations/sec | Performance |
125+
| -------------------------------------- | ----------------- | -------------------------- |
126+
| Simple markdown (`_Hello_ **world**!`) | 1,090,276 ops/sec | **6x faster than v8.0.0** |
127+
| Large markdown (27KB spec) | 1,889 ops/sec | **28% faster than v8.0.0** |
128+
129+
## ✅ Quality Assurance
130+
131+
This release represents the most thoroughly tested and specification-compliant version of `markdown-to-jsx` to date, with complete coverage of both CommonMark and GFM specifications.
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
'markdown-to-jsx': major
3+
---
4+
5+
Refactor: Major internal restructuring and performance optimizations
6+
7+
This branch includes significant internal improvements:
8+
9+
- **Code Organization**: Restructured codebase by moving all source files into `src/` directory for better organization
10+
- **Parser Refactoring**: Split inline formatting matching logic from `match.ts` into separate `parse.ts` and `types.ts` modules
11+
- **Performance Optimizations**: Multiple performance improvements including:
12+
- Optimized character lookup functions using Sets instead of arrays
13+
- Eliminated state object cloning in parseMarkdown
14+
- Optimized string concatenation in loops for large documents
15+
- Reduced string slicing operations in link and image parsing
16+
- Added early-exit optimizations for parser dispatch
17+
- Optimized HTML entity processing by skipping regex when no `&` present
18+
- Consolidated duplicate URL parsing logic
19+
- **Code Quality**: Improved void element detection and fixed malformed HTML handling
20+
- **Constants**: Deduplicated shared constants and utilities across modules
21+
22+
All changes are internal and maintain backward compatibility with no breaking API changes.
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
'markdown-to-jsx': major
3+
---
4+
5+
Remove internal type definitions and rename RuleOutput to ASTRender
6+
7+
This change removes internal type definitions from the `MarkdownToJSX` namespace:
8+
9+
- Removed `NestedParser` type
10+
- Removed `Parser` type
11+
- Removed `Rule` type
12+
- Removed `Rules` type
13+
- Renamed `RuleOutput` to `ASTRender` for clarity
14+
15+
**Breaking changes:**
16+
17+
- Code referencing `MarkdownToJSX.NestedParser`, `MarkdownToJSX.Parser`, `MarkdownToJSX.Rule`, or `MarkdownToJSX.Rules` will need to be updated
18+
- The `renderRule` option in `MarkdownToJSX.Options` now uses `ASTRender` instead of `RuleOutput` for the `renderChildren` parameter type
19+
- `HTMLNode.children` type changed from `ReturnType<MarkdownToJSX.NestedParser>` to `ASTNode[]` (semantically equivalent, but requires updates if using the old type)
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
"markdown-to-jsx": major
3+
---
4+
5+
Remove `namedCodesToUnicode` option. All named HTML entities are now supported by default via the full entity list (`NAMED_CODES_TO_UNICODE`), so custom entity mappings are no longer needed.
6+
7+
**Migration:**
8+
9+
If you were using `namedCodesToUnicode` to add custom entity mappings, you can remove the option entirely as all standard HTML entities are now supported automatically.
10+
11+
```tsx
12+
// Before
13+
<Markdown options={{ namedCodesToUnicode: { le: '\u2264' } }}>
14+
&le; symbol
15+
</Markdown>
16+
17+
// After
18+
<Markdown>
19+
&le; symbol
20+
</Markdown>
21+
```
22+
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
'markdown-to-jsx': major
3+
---
4+
5+
Drop support for React versions less than 16
6+
7+
- Update peer dependency requirement from `>= 0.14.0` to `>= 16.0.0`
8+
- Remove legacy code that wrapped string children in `<span>` elements for React < 16 compatibility
9+
- Directly return single children and null without wrapper elements
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
'markdown-to-jsx': minor
3+
---
4+
5+
Separate JSX renderer from compiler and add new entry points
6+
7+
## New Features
8+
9+
- **New `parser` function**: Low-level API that returns AST nodes. Exported from main entry point and all sub-entry points.
10+
11+
```tsx
12+
import { parser } from 'markdown-to-jsx'
13+
const ast = parser('# Hello world')
14+
```
15+
16+
- **New `/react` entry point**: React-specific entry point that exports compiler, Markdown component, parser, types, and utils.
17+
18+
```tsx
19+
import Markdown, { compiler, parser } from 'markdown-to-jsx/react'
20+
```
21+
22+
- **New `/html` entry point**: HTML string output entry point that exports html function, parser, types, and utils.
23+
```tsx
24+
import { html, parser } from 'markdown-to-jsx/html'
25+
const htmlString = html(parser('# Hello world'))
26+
```
27+
28+
## Deprecations
29+
30+
React code in the main entry point `markdown-to-jsx` is deprecated and will be removed in a future major release.
31+
32+
## Migration
33+
34+
- Existing imports from `markdown-to-jsx` continue to work (backward compatible)
35+
- For React-specific usage, consider importing from `markdown-to-jsx/react` for better tree-shaking
36+
- For HTML output, use `markdown-to-jsx/html` entry point
37+
- Use `parser()` for low-level AST access instead of `compiler(..., { ast: true })`

0 commit comments

Comments
 (0)