test(markdown-parser): add criterion benchmarks and CI workflow#9657
test(markdown-parser): add criterion benchmarks and CI workflow#9657ematipico merged 8 commits intobiomejs:mainfrom
Conversation
Add parser benchmarks for biome_markdown_parser following the HTML parser benchmark pattern. Includes 14 fixtures across three categories (real-world documents, CommonMark spec examples, synthetic stress tests) with both cached and uncached parse modes. Adds CodSpeed CI workflow for automated regression detection.
|
Merging this PR will not alter performance
Comparing Footnotes
|
17369f8 to
28accbb
Compare
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughAdds Markdown benchmarking: a new GitHub Actions workflow Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
crates/biome_markdown_parser/benches/markdown_parser.rs (1)
61-63: Optional: sort fixtures for deterministic benchmark order
read_dirorder is filesystem-dependent. Sorting keeps output stable and easier to compare run-to-run.Patch suggestion
visit(&fixtures_root, &fixtures_root, &mut cases); + cases.sort_unstable_by(|a, b| a.0.cmp(&b.0).then_with(|| a.1.cmp(&b.1))); cases }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/benches/markdown_parser.rs` around lines 61 - 63, The benchmark collects fixtures into the mutable Vec cases via visit(&fixtures_root, &fixtures_root, &mut cases) but relies on filesystem read_dir order, causing non-deterministic ordering; after populating cases (before returning from the function that contains visit and cases), sort the cases vector deterministically (e.g., using cases.sort_unstable() or sort_by(|a,b| a.path.cmp(&b.path)) based on the fixture path/string) so the benchmark runs in a stable, reproducible order..github/workflows/benchmark_markdown.yml (1)
10-27: Expandpathsto include benchmark manifest/workflow editsRight now, benchmark-relevant edits in
crates/biome_markdown_parser/Cargo.toml(or this workflow file itself) won’t trigger the job when they’re the only changes.Patch suggestion
pull_request: @@ paths: - 'Cargo.lock' + - '.github/workflows/benchmark_markdown.yml' + - 'crates/biome_markdown_parser/Cargo.toml' - 'crates/biome_markdown_parser/benches/**/*.md' - 'crates/biome_markdown_parser/**/*.rs' @@ push: @@ paths: - 'Cargo.lock' + - '.github/workflows/benchmark_markdown.yml' + - 'crates/biome_markdown_parser/Cargo.toml' - 'crates/biome_markdown_parser/benches/**/*.md' - 'crates/biome_markdown_parser/**/*.rs'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/benchmark_markdown.yml around lines 10 - 27, The workflow's paths filter omits package manifests and the workflow file itself so changes to crates/biome_markdown_parser/Cargo.toml or this workflow won't trigger the job; update the 'paths' arrays (both the top-level and the 'push' block) to include the Cargo.toml for the markdown crate (e.g., 'crates/biome_markdown_parser/Cargo.toml') and the workflow file (e.g., '.github/workflows/benchmark_markdown.yml') so edits to those files will trigger the workflow—modify the existing paths entries in this file to add these two paths alongside the existing entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/benches/markdown_parser.rs`:
- Around line 31-33: The fixture loader currently ignores read errors and can
yield zero benchmarks; update the visit function to propagate or fail on
read_dir errors instead of silently skipping (change if let Ok(entries) =
fs::read_dir(dir) to using fs::read_dir(dir).expect/unwrap or return a Result
and propagate the error) and after collecting fixtures assert or panic if the
cases Vec is empty so the benchmark fails fast; apply the same change to the
other similar loader block around the 61-67 region to ensure I/O failures or
missing fixtures do not produce a “successful” empty run.
---
Nitpick comments:
In @.github/workflows/benchmark_markdown.yml:
- Around line 10-27: The workflow's paths filter omits package manifests and the
workflow file itself so changes to crates/biome_markdown_parser/Cargo.toml or
this workflow won't trigger the job; update the 'paths' arrays (both the
top-level and the 'push' block) to include the Cargo.toml for the markdown crate
(e.g., 'crates/biome_markdown_parser/Cargo.toml') and the workflow file (e.g.,
'.github/workflows/benchmark_markdown.yml') so edits to those files will trigger
the workflow—modify the existing paths entries in this file to add these two
paths alongside the existing entries.
In `@crates/biome_markdown_parser/benches/markdown_parser.rs`:
- Around line 61-63: The benchmark collects fixtures into the mutable Vec cases
via visit(&fixtures_root, &fixtures_root, &mut cases) but relies on filesystem
read_dir order, causing non-deterministic ordering; after populating cases
(before returning from the function that contains visit and cases), sort the
cases vector deterministically (e.g., using cases.sort_unstable() or
sort_by(|a,b| a.path.cmp(&b.path)) based on the fixture path/string) so the
benchmark runs in a stable, reproducible order.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 817ee070-6319-40f1-baa6-33d3fe17c289
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lockand included by**
📒 Files selected for processing (18)
.github/workflows/benchmark_markdown.yml.github/workflows/pull_request_markdown.ymlcrates/biome_markdown_parser/Cargo.tomlcrates/biome_markdown_parser/benches/fixtures/real/blog-post.mdcrates/biome_markdown_parser/benches/fixtures/real/readme-style.mdcrates/biome_markdown_parser/benches/fixtures/spec/autolinks.mdcrates/biome_markdown_parser/benches/fixtures/spec/blockquotes.mdcrates/biome_markdown_parser/benches/fixtures/spec/emphasis.mdcrates/biome_markdown_parser/benches/fixtures/spec/inline-html.mdcrates/biome_markdown_parser/benches/fixtures/spec/links.mdcrates/biome_markdown_parser/benches/fixtures/spec/lists.mdcrates/biome_markdown_parser/benches/fixtures/synthetic/blockquotes-nested.mdcrates/biome_markdown_parser/benches/fixtures/synthetic/emphasis-heavy.mdcrates/biome_markdown_parser/benches/fixtures/synthetic/inline-html.mdcrates/biome_markdown_parser/benches/fixtures/synthetic/links-and-images.mdcrates/biome_markdown_parser/benches/fixtures/synthetic/long-paragraphs.mdcrates/biome_markdown_parser/benches/fixtures/synthetic/nested-lists.mdcrates/biome_markdown_parser/benches/markdown_parser.rs
There was a problem hiding this comment.
🧹 Nitpick comments (2)
crates/biome_markdown_parser/benches/markdown_parser.rs (2)
94-105: Consider adding a steady-state cached benchmark variant.Current
iter_batchedmode recreates cache per sample, so this mainly captures a “warmed once, parsed once” path. A second variant reusing one cache across manyitercalls would give a useful long-run cache signal too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/benches/markdown_parser.rs` around lines 94 - 105, The cached benchmark currently recreates NodeCache per sample using b.iter_batched, which measures a "warmed once" path; add a steady-state variant that reuses a single NodeCache across many iterations to measure long-run cache behavior by creating a mutable NodeCache once (outside the per-iteration closure) and calling parse_markdown_with_cache(code, &mut cache, options.clone()) inside b.iter or group.bench_with_input's closure, referencing the same cache for each iteration (use a mutable binding or a small synchronization wrapper if necessary), and name the BenchmarkId something like "cached_steady" so both cached-warm and cached-steady cases are reported.
47-58: Use relative fixture path for benchmark identity to avoid collisions.Because discovery is recursive,
(first_dir, basename)is not unique. Two nested files can collapse to the same benchmark id later. Prefer storing the full relative path asname.♻️ Suggested refactor
- let name = path - .file_name() - .and_then(|s| s.to_str()) - .unwrap_or_default() - .to_string(); + let name = rel.to_string_lossy().replace('\\', "/"); @@ - let id = format!("{}/{}", group_name, name); + let id = name.clone();Also applies to: 72-73, 86-86
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/benches/markdown_parser.rs` around lines 47 - 58, The benchmark identity currently uses only the file basename (`name`) which can collide for nested files; change the code that sets `name` to use the full relative path string instead of `path.file_name()` (e.g., replace the `name` assignment with something like `let name = rel.to_string_lossy().to_string()`), keep `group` as the first directory component from `rel`, and apply the same change to the other occurrences where `name` is derived (the other `name` assignments referenced in the comment).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/biome_markdown_parser/benches/markdown_parser.rs`:
- Around line 94-105: The cached benchmark currently recreates NodeCache per
sample using b.iter_batched, which measures a "warmed once" path; add a
steady-state variant that reuses a single NodeCache across many iterations to
measure long-run cache behavior by creating a mutable NodeCache once (outside
the per-iteration closure) and calling parse_markdown_with_cache(code, &mut
cache, options.clone()) inside b.iter or group.bench_with_input's closure,
referencing the same cache for each iteration (use a mutable binding or a small
synchronization wrapper if necessary), and name the BenchmarkId something like
"cached_steady" so both cached-warm and cached-steady cases are reported.
- Around line 47-58: The benchmark identity currently uses only the file
basename (`name`) which can collide for nested files; change the code that sets
`name` to use the full relative path string instead of `path.file_name()` (e.g.,
replace the `name` assignment with something like `let name =
rel.to_string_lossy().to_string()`), keep `group` as the first directory
component from `rel`, and apply the same change to the other occurrences where
`name` is derived (the other `name` assignments referenced in the comment).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8910058c-e346-440e-a283-bc2d70156d47
📒 Files selected for processing (1)
crates/biome_markdown_parser/benches/markdown_parser.rs
|
It seems that the benchmarks weren't added to codspeed |
| fn load_fixtures() -> Vec<(String, String, String)> { | ||
| let fixtures_root = Path::new(env!("CARGO_MANIFEST_DIR")).join("benches/fixtures"); | ||
| let mut cases = Vec::new(); | ||
|
|
||
| fn visit(dir: &Path, root: &Path, cases: &mut Vec<(String, String, String)>) { | ||
| let entries = fs::read_dir(dir).unwrap_or_else(|err| { | ||
| panic!("failed to read benchmark fixtures directory {dir:?}: {err}") | ||
| }); | ||
|
|
||
| for entry in entries { | ||
| let entry = entry.unwrap_or_else(|err| { | ||
| panic!("failed to read benchmark fixture entry in {dir:?}: {err}") | ||
| }); | ||
| let path = entry.path(); | ||
| if path.is_dir() { | ||
| visit(&path, root, cases); | ||
| } else if path.is_file() { | ||
| if !matches!(path.extension().and_then(|e| e.to_str()), Some("md")) { | ||
| continue; | ||
| } | ||
| let rel = path.strip_prefix(root).unwrap_or(&path); | ||
| let group = rel | ||
| .iter() | ||
| .next() | ||
| .and_then(|s| s.to_str()) | ||
| .unwrap_or("root") | ||
| .to_string(); | ||
| let name = path | ||
| .file_name() | ||
| .and_then(|s| s.to_str()) | ||
| .unwrap_or_default() | ||
| .to_string(); | ||
| let content = fs::read_to_string(&path).unwrap_or_else(|err| { | ||
| panic!("failed to read benchmark fixture {path:?}: {err}") | ||
| }); | ||
| cases.push((group, name, content)); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Let's keep it simple, please. No need for recursion or fancy things. Let's plainly load the files we have. If a new file is added, we add it here too. That's what we tend to do in the other benchmarks
There was a problem hiding this comment.
Agreed, that was unnecessary. I replaced the recursive helper with a simple loop over the fixture directories.
@ematipico thats fixed now. |
Note
AI Assistance Disclosure: This PR was developed with assistance from Codex.
Summary
Add Criterion benchmarks for
biome_markdown_parserand a dedicated CodSpeed workflow for tracking parser performance regressions.Test Plan
just fjust lcargo bench -p biome_markdown_parserDocs
N/A — internal benchmarking and CI coverage only, with no user-facing documentation or changeset required.