Skip to content

test(markdown-parser): add criterion benchmarks and CI workflow#9657

Merged
ematipico merged 8 commits intobiomejs:mainfrom
jfmcdowell:bench/markdown-parser
Mar 29, 2026
Merged

test(markdown-parser): add criterion benchmarks and CI workflow#9657
ematipico merged 8 commits intobiomejs:mainfrom
jfmcdowell:bench/markdown-parser

Conversation

@jfmcdowell
Copy link
Copy Markdown
Contributor

@jfmcdowell jfmcdowell commented Mar 29, 2026

Note

AI Assistance Disclosure: This PR was developed with assistance from Codex.

Summary

Add Criterion benchmarks for biome_markdown_parser and a dedicated CodSpeed workflow for tracking parser performance regressions.

  • Follows the existing parser benchmark pattern used elsewhere in the repository
  • Adds 14 fixtures across real-world Markdown documents, CommonMark spec examples, and synthetic stress cases
  • Covers both cached and uncached parse modes
  • Adds CI coverage for benchmark regression tracking
  • Triggers the markdown benchmark workflow on benchmark fixture changes, so fixture-only updates also rerun CodSpeed
  • Excludes the benchmark fixtures from markdownlint in the workflow because they are test data for the parser, not docs, and they intentionally include Markdown that would fail style linting

Test Plan

  • just f
  • just l
  • cargo bench -p biome_markdown_parser

Docs

N/A — internal benchmarking and CI coverage only, with no user-facing documentation or changeset required.

Add parser benchmarks for biome_markdown_parser following the HTML parser
benchmark pattern. Includes 14 fixtures across three categories (real-world
documents, CommonMark spec examples, synthetic stress tests) with both
cached and uncached parse modes. Adds CodSpeed CI workflow for automated
regression detection.
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 29, 2026

⚠️ No Changeset found

Latest commit: 8cfeaaf

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Parser Area: parser L-Markdown Language: Markdown labels Mar 29, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 29, 2026

Merging this PR will not alter performance

✅ 58 untouched benchmarks
⏩ 168 skipped benchmarks1


Comparing jfmcdowell:bench/markdown-parser (8cfeaaf) with main (1dd4a56)

Open in CodSpeed

Footnotes

  1. 168 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@jfmcdowell jfmcdowell force-pushed the bench/markdown-parser branch from 17369f8 to 28accbb Compare March 29, 2026 02:10
@jfmcdowell jfmcdowell marked this pull request as ready for review March 29, 2026 03:16
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a938c6ac-a124-46d6-814c-95b0d66ac387

📥 Commits

Reviewing files that changed from the base of the PR and between 15a48ac and 8cfeaaf.

📒 Files selected for processing (2)
  • .github/workflows/benchmark_markdown.yml
  • crates/biome_markdown_parser/benches/markdown_parser.rs

Walkthrough

Adds Markdown benchmarking: a new GitHub Actions workflow .github/workflows/benchmark_markdown.yml that runs CodSpeed in simulation mode on dispatched/PR/merge_group/push events; a Criterion-based benchmark binary benches/markdown_parser.rs that loads .md fixtures and measures cached vs uncached parsing; a [[bench]] target and codspeed/criterion dev-dependencies in crates/biome_markdown_parser/Cargo.toml; many new CommonMark/spec/synthetic Markdown fixtures under crates/biome_markdown_parser/benches/fixtures; and a small markdownlint exclusion tweak.

Possibly related PRs

Suggested labels

A-Tooling

Suggested reviewers

  • ematipico
  • dyc3
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarises the main changeset: adding Criterion benchmarks and a CI workflow for the Markdown parser, matching the actual additions throughout the PR.
Description check ✅ Passed The description is well-structured and directly relevant to the changeset, detailing the benchmarks added, fixture coverage, CI workflow integration, and rationale for excluding fixtures from linting.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/biome_markdown_parser/benches/markdown_parser.rs (1)

61-63: Optional: sort fixtures for deterministic benchmark order

read_dir order is filesystem-dependent. Sorting keeps output stable and easier to compare run-to-run.

Patch suggestion
     visit(&fixtures_root, &fixtures_root, &mut cases);
+    cases.sort_unstable_by(|a, b| a.0.cmp(&b.0).then_with(|| a.1.cmp(&b.1)));
     cases
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/benches/markdown_parser.rs` around lines 61 -
63, The benchmark collects fixtures into the mutable Vec cases via
visit(&fixtures_root, &fixtures_root, &mut cases) but relies on filesystem
read_dir order, causing non-deterministic ordering; after populating cases
(before returning from the function that contains visit and cases), sort the
cases vector deterministically (e.g., using cases.sort_unstable() or
sort_by(|a,b| a.path.cmp(&b.path)) based on the fixture path/string) so the
benchmark runs in a stable, reproducible order.
.github/workflows/benchmark_markdown.yml (1)

10-27: Expand paths to include benchmark manifest/workflow edits

Right now, benchmark-relevant edits in crates/biome_markdown_parser/Cargo.toml (or this workflow file itself) won’t trigger the job when they’re the only changes.

Patch suggestion
   pull_request:
@@
     paths:
       - 'Cargo.lock'
+      - '.github/workflows/benchmark_markdown.yml'
+      - 'crates/biome_markdown_parser/Cargo.toml'
       - 'crates/biome_markdown_parser/benches/**/*.md'
       - 'crates/biome_markdown_parser/**/*.rs'
@@
   push:
@@
     paths:
       - 'Cargo.lock'
+      - '.github/workflows/benchmark_markdown.yml'
+      - 'crates/biome_markdown_parser/Cargo.toml'
       - 'crates/biome_markdown_parser/benches/**/*.md'
       - 'crates/biome_markdown_parser/**/*.rs'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark_markdown.yml around lines 10 - 27, The
workflow's paths filter omits package manifests and the workflow file itself so
changes to crates/biome_markdown_parser/Cargo.toml or this workflow won't
trigger the job; update the 'paths' arrays (both the top-level and the 'push'
block) to include the Cargo.toml for the markdown crate (e.g.,
'crates/biome_markdown_parser/Cargo.toml') and the workflow file (e.g.,
'.github/workflows/benchmark_markdown.yml') so edits to those files will trigger
the workflow—modify the existing paths entries in this file to add these two
paths alongside the existing entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/benches/markdown_parser.rs`:
- Around line 31-33: The fixture loader currently ignores read errors and can
yield zero benchmarks; update the visit function to propagate or fail on
read_dir errors instead of silently skipping (change if let Ok(entries) =
fs::read_dir(dir) to using fs::read_dir(dir).expect/unwrap or return a Result
and propagate the error) and after collecting fixtures assert or panic if the
cases Vec is empty so the benchmark fails fast; apply the same change to the
other similar loader block around the 61-67 region to ensure I/O failures or
missing fixtures do not produce a “successful” empty run.

---

Nitpick comments:
In @.github/workflows/benchmark_markdown.yml:
- Around line 10-27: The workflow's paths filter omits package manifests and the
workflow file itself so changes to crates/biome_markdown_parser/Cargo.toml or
this workflow won't trigger the job; update the 'paths' arrays (both the
top-level and the 'push' block) to include the Cargo.toml for the markdown crate
(e.g., 'crates/biome_markdown_parser/Cargo.toml') and the workflow file (e.g.,
'.github/workflows/benchmark_markdown.yml') so edits to those files will trigger
the workflow—modify the existing paths entries in this file to add these two
paths alongside the existing entries.

In `@crates/biome_markdown_parser/benches/markdown_parser.rs`:
- Around line 61-63: The benchmark collects fixtures into the mutable Vec cases
via visit(&fixtures_root, &fixtures_root, &mut cases) but relies on filesystem
read_dir order, causing non-deterministic ordering; after populating cases
(before returning from the function that contains visit and cases), sort the
cases vector deterministically (e.g., using cases.sort_unstable() or
sort_by(|a,b| a.path.cmp(&b.path)) based on the fixture path/string) so the
benchmark runs in a stable, reproducible order.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 817ee070-6319-40f1-baa6-33d3fe17c289

📥 Commits

Reviewing files that changed from the base of the PR and between c17e08e and 15b03e5.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock and included by **
📒 Files selected for processing (18)
  • .github/workflows/benchmark_markdown.yml
  • .github/workflows/pull_request_markdown.yml
  • crates/biome_markdown_parser/Cargo.toml
  • crates/biome_markdown_parser/benches/fixtures/real/blog-post.md
  • crates/biome_markdown_parser/benches/fixtures/real/readme-style.md
  • crates/biome_markdown_parser/benches/fixtures/spec/autolinks.md
  • crates/biome_markdown_parser/benches/fixtures/spec/blockquotes.md
  • crates/biome_markdown_parser/benches/fixtures/spec/emphasis.md
  • crates/biome_markdown_parser/benches/fixtures/spec/inline-html.md
  • crates/biome_markdown_parser/benches/fixtures/spec/links.md
  • crates/biome_markdown_parser/benches/fixtures/spec/lists.md
  • crates/biome_markdown_parser/benches/fixtures/synthetic/blockquotes-nested.md
  • crates/biome_markdown_parser/benches/fixtures/synthetic/emphasis-heavy.md
  • crates/biome_markdown_parser/benches/fixtures/synthetic/inline-html.md
  • crates/biome_markdown_parser/benches/fixtures/synthetic/links-and-images.md
  • crates/biome_markdown_parser/benches/fixtures/synthetic/long-paragraphs.md
  • crates/biome_markdown_parser/benches/fixtures/synthetic/nested-lists.md
  • crates/biome_markdown_parser/benches/markdown_parser.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
crates/biome_markdown_parser/benches/markdown_parser.rs (2)

94-105: Consider adding a steady-state cached benchmark variant.

Current iter_batched mode recreates cache per sample, so this mainly captures a “warmed once, parsed once” path. A second variant reusing one cache across many iter calls would give a useful long-run cache signal too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/benches/markdown_parser.rs` around lines 94 -
105, The cached benchmark currently recreates NodeCache per sample using
b.iter_batched, which measures a "warmed once" path; add a steady-state variant
that reuses a single NodeCache across many iterations to measure long-run cache
behavior by creating a mutable NodeCache once (outside the per-iteration
closure) and calling parse_markdown_with_cache(code, &mut cache,
options.clone()) inside b.iter or group.bench_with_input's closure, referencing
the same cache for each iteration (use a mutable binding or a small
synchronization wrapper if necessary), and name the BenchmarkId something like
"cached_steady" so both cached-warm and cached-steady cases are reported.

47-58: Use relative fixture path for benchmark identity to avoid collisions.

Because discovery is recursive, (first_dir, basename) is not unique. Two nested files can collapse to the same benchmark id later. Prefer storing the full relative path as name.

♻️ Suggested refactor
-                let name = path
-                    .file_name()
-                    .and_then(|s| s.to_str())
-                    .unwrap_or_default()
-                    .to_string();
+                let name = rel.to_string_lossy().replace('\\', "/");
@@
-        let id = format!("{}/{}", group_name, name);
+        let id = name.clone();

Also applies to: 72-73, 86-86

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/benches/markdown_parser.rs` around lines 47 -
58, The benchmark identity currently uses only the file basename (`name`) which
can collide for nested files; change the code that sets `name` to use the full
relative path string instead of `path.file_name()` (e.g., replace the `name`
assignment with something like `let name = rel.to_string_lossy().to_string()`),
keep `group` as the first directory component from `rel`, and apply the same
change to the other occurrences where `name` is derived (the other `name`
assignments referenced in the comment).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_markdown_parser/benches/markdown_parser.rs`:
- Around line 94-105: The cached benchmark currently recreates NodeCache per
sample using b.iter_batched, which measures a "warmed once" path; add a
steady-state variant that reuses a single NodeCache across many iterations to
measure long-run cache behavior by creating a mutable NodeCache once (outside
the per-iteration closure) and calling parse_markdown_with_cache(code, &mut
cache, options.clone()) inside b.iter or group.bench_with_input's closure,
referencing the same cache for each iteration (use a mutable binding or a small
synchronization wrapper if necessary), and name the BenchmarkId something like
"cached_steady" so both cached-warm and cached-steady cases are reported.
- Around line 47-58: The benchmark identity currently uses only the file
basename (`name`) which can collide for nested files; change the code that sets
`name` to use the full relative path string instead of `path.file_name()` (e.g.,
replace the `name` assignment with something like `let name =
rel.to_string_lossy().to_string()`), keep `group` as the first directory
component from `rel`, and apply the same change to the other occurrences where
`name` is derived (the other `name` assignments referenced in the comment).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8910058c-e346-440e-a283-bc2d70156d47

📥 Commits

Reviewing files that changed from the base of the PR and between ecd068c and 15a48ac.

📒 Files selected for processing (1)
  • crates/biome_markdown_parser/benches/markdown_parser.rs

@ematipico
Copy link
Copy Markdown
Member

It seems that the benchmarks weren't added to codspeed

Comment on lines +27 to +65
fn load_fixtures() -> Vec<(String, String, String)> {
let fixtures_root = Path::new(env!("CARGO_MANIFEST_DIR")).join("benches/fixtures");
let mut cases = Vec::new();

fn visit(dir: &Path, root: &Path, cases: &mut Vec<(String, String, String)>) {
let entries = fs::read_dir(dir).unwrap_or_else(|err| {
panic!("failed to read benchmark fixtures directory {dir:?}: {err}")
});

for entry in entries {
let entry = entry.unwrap_or_else(|err| {
panic!("failed to read benchmark fixture entry in {dir:?}: {err}")
});
let path = entry.path();
if path.is_dir() {
visit(&path, root, cases);
} else if path.is_file() {
if !matches!(path.extension().and_then(|e| e.to_str()), Some("md")) {
continue;
}
let rel = path.strip_prefix(root).unwrap_or(&path);
let group = rel
.iter()
.next()
.and_then(|s| s.to_str())
.unwrap_or("root")
.to_string();
let name = path
.file_name()
.and_then(|s| s.to_str())
.unwrap_or_default()
.to_string();
let content = fs::read_to_string(&path).unwrap_or_else(|err| {
panic!("failed to read benchmark fixture {path:?}: {err}")
});
cases.push((group, name, content));
}
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it simple, please. No need for recursion or fancy things. Let's plainly load the files we have. If a new file is added, we add it here too. That's what we tend to do in the other benchmarks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that was unnecessary. I replaced the recursive helper with a simple loop over the fixture directories.

@Conaclos Conaclos added M-Likely Agent This was likely an automated PR without a human in the loop and removed M-Likely Agent This was likely an automated PR without a human in the loop labels Mar 29, 2026
@jfmcdowell
Copy link
Copy Markdown
Contributor Author

It seems that the benchmarks weren't added to codspeed

@ematipico thats fixed now.

@ematipico ematipico merged commit 63fcbb3 into biomejs:main Mar 29, 2026
35 checks passed
@jfmcdowell jfmcdowell deleted the bench/markdown-parser branch March 29, 2026 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser L-Markdown Language: Markdown

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants