Skip to content

chore(bench/html): add real test fixtures to biome_html_analyze benches#9578

Merged
dyc3 merged 1 commit intomainfrom
dyc3/html-analyze-benches
Mar 22, 2026
Merged

chore(bench/html): add real test fixtures to biome_html_analyze benches#9578
dyc3 merged 1 commit intomainfrom
dyc3/html-analyze-benches

Conversation

@dyc3
Copy link
Copy Markdown
Contributor

@dyc3 dyc3 commented Mar 21, 2026

Summary

I chose not to include the synthetic html benchmark fixtures because those are likely to be low signal for the analyzer benchmarks

generated by gpt 5.4

Test Plan

new benchmarks should show up in codspeed

Docs

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 21, 2026

⚠️ No Changeset found

Latest commit: 20826ae

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Linter Area: linter L-HTML Language: HTML and super languages labels Mar 21, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8fbed8ce-3716-4ac3-abcb-49ec5027f9c3

📥 Commits

Reviewing files that changed from the base of the PR and between 72584be and 20826ae.

📒 Files selected for processing (2)
  • crates/biome_html_analyze/benches/fixtures/real
  • crates/biome_html_analyze/benches/html_analyzer.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/biome_html_analyze/benches/fixtures/real
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/biome_html_analyze/benches/html_analyzer.rs

Walkthrough

Benchmarks in crates/biome_html_analyze now load filesystem fixtures and run fixture-driven benchmarks alongside existing library-based benches. A fixture entry was added at crates/biome_html_analyze/benches/fixtures/real referencing ../../../biome_html_parser/benches/fixtures/real. The bench harness now recursively collects non-.md fixtures, derives HtmlFileSource from file path/extension, pre-parses inputs once per case, reports throughput in bytes, and registers dynamic benchmark IDs per fixture.

Possibly related PRs

Suggested reviewers

  • ematipico
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding real test fixtures to HTML analyser benchmarks.
Description check ✅ Passed The description explains the motivation (excluding synthetic fixtures due to low signal) and testing approach, though brief.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dyc3/html-analyze-benches

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

Rust Clippy can be used to improve the quality of Rust code reviews.

Clippy is the official Rust linter. It provides lints to catch common mistakes and improve your Rust code.

To configure Clippy, add a clippy.toml file to your project root.

See Clippy Documentation for more details.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
crates/biome_html_analyze/benches/html_analyzer.rs (1)

45-56: Keep the full relative path in the benchmark ID.

group only keeps the first path segment and name only the basename, so nested fixtures like real/foo/index.html and real/bar/index.html both collapse to real/index.html. Using rel as the parameter would keep Codspeed labels distinct and less cryptic.

Also applies to: 131-135

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_html_analyze/benches/html_analyzer.rs` around lines 45 - 56, The
benchmark ID currently collapses nested fixtures by building `group` from only
the first path segment and `name` from the basename; instead use the full
relative path `rel` as the label to keep IDs distinct. Replace the
`group`/`name` construction with a single string created from `rel` (e.g. `let
rel_label = rel.to_string_lossy().to_string()` and use `rel_label` where the
benchmark ID/label is passed), falling back to `"root"` if `rel` is empty; apply
the same replacement for the other occurrence around the `rel` handling at the
second location (the block currently at lines 131-135).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_html_analyze/benches/html_analyzer.rs`:
- Around line 85-86: The code currently silently falls back to a default
HtmlFileSource when HtmlFileSource::try_from(test_case.path()) fails (uses
unwrap_or_default), which masks bad paths/extensions; change this to propagate
or handle the error explicitly by removing unwrap_or_default and returning or
panicking with a clear message (e.g., use expect or map_err+? to surface the
failure) so failures in HtmlFileSource::try_from are loud; apply the same change
to the similar usage around parse_html and HtmlParserOptions::from (the block at
lines ~128-130) so both lookup failures fail fast instead of downgrading to
default HTML.
- Around line 29-30: The fixture discovery is currently swallowing IO errors;
change load_fixtures to propagate failures instead of returning an empty result:
modify the signature of load_fixtures to return
Result<Vec<(String,String,String)>, std::io::Error>, use the ? operator on
fs::read_dir and fs::read_to_string (and any iterator item.file_name()/metadata
calls) to bubble up errors, and update the bench entry point that calls
load_fixtures to .expect("failed to load fixtures") (or otherwise handle the
Result) so the run aborts with a clear error message; apply the same pattern to
other similar blocks that use read_dir/read_to_string in this file.

---

Nitpick comments:
In `@crates/biome_html_analyze/benches/html_analyzer.rs`:
- Around line 45-56: The benchmark ID currently collapses nested fixtures by
building `group` from only the first path segment and `name` from the basename;
instead use the full relative path `rel` as the label to keep IDs distinct.
Replace the `group`/`name` construction with a single string created from `rel`
(e.g. `let rel_label = rel.to_string_lossy().to_string()` and use `rel_label`
where the benchmark ID/label is passed), falling back to `"root"` if `rel` is
empty; apply the same replacement for the other occurrence around the `rel`
handling at the second location (the block currently at lines 131-135).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a46cdc5f-3f32-4814-85ed-19b63026fd85

📥 Commits

Reviewing files that changed from the base of the PR and between 9359fc5 and f4f2a0a.

📒 Files selected for processing (2)
  • crates/biome_html_analyze/benches/fixtures/real
  • crates/biome_html_analyze/benches/html_analyzer.rs

Comment thread crates/biome_html_analyze/benches/html_analyzer.rs Outdated
Comment thread crates/biome_html_analyze/benches/html_analyzer.rs Outdated
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 21, 2026

Merging this PR will not alter performance

✅ 64 untouched benchmarks
🆕 3 new benchmarks
⏩ 152 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
🆕 html_analyzer[real/wikipedia-JavaScript.html] N/A 161.9 ms N/A
🆕 html_analyzer[real/wikipedia-Unix.html] N/A 138.1 ms N/A
🆕 html_analyzer[real/wikipedia-fr-Guerre_de_Canudos.html] N/A 422.2 ms N/A

Comparing dyc3/html-analyze-benches (20826ae) with main (97b80a8)

Open in CodSpeed

Footnotes

  1. 152 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@dyc3 dyc3 force-pushed the dyc3/html-analyze-benches branch from f4f2a0a to 0b64642 Compare March 21, 2026 20:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
crates/biome_html_analyze/benches/html_analyzer.rs (2)

133-133: ⚠️ Potential issue | 🟠 Major

Same silent downgrade concern here.

try_from_extension(ext).unwrap_or_default() will silently treat unknown extensions as plain HTML. Worth failing fast to catch fixture misconfigurations.

Suggested fix
-        let file_source = HtmlFileSource::try_from_extension(ext).unwrap_or_default();
+        let file_source = HtmlFileSource::try_from_extension(ext)
+            .unwrap_or_else(|_| panic!("unsupported HTML benchmark fixture extension: {ext}"));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_html_analyze/benches/html_analyzer.rs` at line 133, The code
silently downgrades unknown extensions by using
HtmlFileSource::try_from_extension(ext).unwrap_or_default(); change this to fail
fast so fixture misconfigurations are caught: replace the unwrap_or_default call
on HtmlFileSource::try_from_extension(ext) (used when assigning file_source)
with an explicit failure path (e.g., use .expect or .unwrap_or_else to panic
with a clear message including ext) or propagate the Result so unknown
extensions return Err instead of defaulting to plain HTML.

89-90: ⚠️ Potential issue | 🟠 Major

Don't silently downgrade unknown sources to default HTML.

This still uses unwrap_or_default(), which masks bad paths or unexpected extensions. A failing benchmark is better than a quietly misleading one.

Suggested fix
-                let file_source = HtmlFileSource::try_from(test_case.path()).unwrap_or_default();
+                let file_source = HtmlFileSource::try_from(test_case.path())
+                    .unwrap_or_else(|_| panic!("unsupported HTML bench case: {:?}", test_case.path()));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_html_analyze/benches/html_analyzer.rs` around lines 89 - 90, The
code silently downgrades unknown sources by calling
HtmlFileSource::try_from(test_case.path()).unwrap_or_default() which masks
invalid paths; change this to handle the Result explicitly — either propagate
the error or fail fast with a clear message (e.g., replace unwrap_or_default()
with .expect("Invalid HtmlFileSource for test_case.path(): <path>") or match the
Result and panic/log), then pass the valid HtmlFileSource into
HtmlParserOptions::from and parse_html(code,
HtmlParserOptions::from(&file_source)); ensure you reference
HtmlFileSource::try_from, HtmlFileSource (the variable file_source),
HtmlParserOptions::from and parse_html when making the change so unknown sources
are not silently defaulted.
🧹 Nitpick comments (1)
crates/biome_html_analyze/benches/html_analyzer.rs (1)

139-163: Consider extracting shared benchmark logic.

The filter/options construction and analyze call are duplicated between the library and fixture loops. A small helper would reduce this repetition.

That said, for benchmark code, explicitness has its merits—so entirely optional.

Example helper
fn run_analysis(parse: &HtmlParse, file_source: HtmlFileSource) {
    let filter = AnalysisFilter {
        categories: RuleCategoriesBuilder::default()
            .with_syntax()
            .with_lint()
            .with_assist()
            .build(),
        ..AnalysisFilter::default()
    };
    let options = AnalyzerOptions::default().with_configuration(
        AnalyzerConfiguration::default().with_jsx_runtime(JsxRuntime::default()),
    );
    biome_html_analyze::analyze(&parse.tree(), filter, &options, file_source, |event| {
        black_box(event.diagnostic());
        black_box(event.actions());
        ControlFlow::<Never>::Continue(())
    });
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_html_analyze/benches/html_analyzer.rs` around lines 139 - 163,
The benchmark duplicates AnalysisFilter/AnalyzerOptions construction and the
biome_html_analyze::analyze call; extract that shared logic into a helper
function (e.g., fn run_analysis(parse: &HtmlParse, file_source: HtmlFileSource))
that builds the AnalysisFilter with RuleCategoriesBuilder, creates
AnalyzerOptions with
AnalyzerConfiguration::with_jsx_runtime(JsxRuntime::default()), and invokes
biome_html_analyze::analyze(&parse.tree(), filter, &options, file_source,
|event| { black_box(event.diagnostic()); black_box(event.actions());
ControlFlow::<Never>::Continue(()) }); then replace the duplicated blocks in
both the library and fixture benchmark loops with a call to run_analysis.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/biome_html_analyze/benches/html_analyzer.rs`:
- Line 133: The code silently downgrades unknown extensions by using
HtmlFileSource::try_from_extension(ext).unwrap_or_default(); change this to fail
fast so fixture misconfigurations are caught: replace the unwrap_or_default call
on HtmlFileSource::try_from_extension(ext) (used when assigning file_source)
with an explicit failure path (e.g., use .expect or .unwrap_or_else to panic
with a clear message including ext) or propagate the Result so unknown
extensions return Err instead of defaulting to plain HTML.
- Around line 89-90: The code silently downgrades unknown sources by calling
HtmlFileSource::try_from(test_case.path()).unwrap_or_default() which masks
invalid paths; change this to handle the Result explicitly — either propagate
the error or fail fast with a clear message (e.g., replace unwrap_or_default()
with .expect("Invalid HtmlFileSource for test_case.path(): <path>") or match the
Result and panic/log), then pass the valid HtmlFileSource into
HtmlParserOptions::from and parse_html(code,
HtmlParserOptions::from(&file_source)); ensure you reference
HtmlFileSource::try_from, HtmlFileSource (the variable file_source),
HtmlParserOptions::from and parse_html when making the change so unknown sources
are not silently defaulted.

---

Nitpick comments:
In `@crates/biome_html_analyze/benches/html_analyzer.rs`:
- Around line 139-163: The benchmark duplicates AnalysisFilter/AnalyzerOptions
construction and the biome_html_analyze::analyze call; extract that shared logic
into a helper function (e.g., fn run_analysis(parse: &HtmlParse, file_source:
HtmlFileSource)) that builds the AnalysisFilter with RuleCategoriesBuilder,
creates AnalyzerOptions with
AnalyzerConfiguration::with_jsx_runtime(JsxRuntime::default()), and invokes
biome_html_analyze::analyze(&parse.tree(), filter, &options, file_source,
|event| { black_box(event.diagnostic()); black_box(event.actions());
ControlFlow::<Never>::Continue(()) }); then replace the duplicated blocks in
both the library and fixture benchmark loops with a call to run_analysis.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 61360687-aad1-414b-b776-3cbcaef699ef

📥 Commits

Reviewing files that changed from the base of the PR and between f4f2a0a and 0b64642.

📒 Files selected for processing (2)
  • crates/biome_html_analyze/benches/fixtures/real
  • crates/biome_html_analyze/benches/html_analyzer.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/biome_html_analyze/benches/fixtures/real

@dyc3 dyc3 force-pushed the dyc3/html-analyze-benches branch from 0b64642 to 72584be Compare March 21, 2026 20:21
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/biome_html_analyze/benches/html_analyzer.rs (1)

99-110: Consider extracting the common filter/options setup.

The AnalysisFilter and AnalyzerOptions construction is duplicated between the library benchmarks and fixture benchmarks. A small helper would reduce this repetition.

♻️ Possible refactor
+fn bench_filter() -> AnalysisFilter<'static> {
+    AnalysisFilter {
+        categories: RuleCategoriesBuilder::default()
+            .with_syntax()
+            .with_lint()
+            .with_assist()
+            .build(),
+        ..AnalysisFilter::default()
+    }
+}
+
+fn bench_options() -> AnalyzerOptions {
+    AnalyzerOptions::default()
+        .with_configuration(AnalyzerConfiguration::default().with_jsx_runtime(JsxRuntime::default()))
+}

Then use let filter = bench_filter(); and let options = bench_options(); in both benchmark loops.

Also applies to: 143-153

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_html_analyze/benches/html_analyzer.rs` around lines 99 - 110,
Extract the duplicated construction into two helpers (e.g., bench_filter() ->
AnalysisFilter and bench_options() -> AnalyzerOptions): implement bench_filter()
to return AnalysisFilter built with
RuleCategoriesBuilder::default().with_syntax().with_lint().with_assist().build()
and default for the rest, and implement bench_options() to return
AnalyzerOptions::default().with_configuration(AnalyzerConfiguration::default().with_jsx_runtime(JsxRuntime::default()));
then replace the duplicated inline constructions in the library and fixture
benchmarks with let filter = bench_filter(); and let options = bench_options();
respectively, updating references to AnalysisFilter, RuleCategoriesBuilder,
AnalyzerOptions, AnalyzerConfiguration, and JsxRuntime as needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/biome_html_analyze/benches/html_analyzer.rs`:
- Around line 99-110: Extract the duplicated construction into two helpers
(e.g., bench_filter() -> AnalysisFilter and bench_options() -> AnalyzerOptions):
implement bench_filter() to return AnalysisFilter built with
RuleCategoriesBuilder::default().with_syntax().with_lint().with_assist().build()
and default for the rest, and implement bench_options() to return
AnalyzerOptions::default().with_configuration(AnalyzerConfiguration::default().with_jsx_runtime(JsxRuntime::default()));
then replace the duplicated inline constructions in the library and fixture
benchmarks with let filter = bench_filter(); and let options = bench_options();
respectively, updating references to AnalysisFilter, RuleCategoriesBuilder,
AnalyzerOptions, AnalyzerConfiguration, and JsxRuntime as needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c7e6608e-f9c4-42aa-852a-21dbb379f6ad

📥 Commits

Reviewing files that changed from the base of the PR and between 0b64642 and 72584be.

📒 Files selected for processing (2)
  • crates/biome_html_analyze/benches/fixtures/real
  • crates/biome_html_analyze/benches/html_analyzer.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/biome_html_analyze/benches/fixtures/real

@dyc3 dyc3 requested review from a team March 21, 2026 20:31
Comment thread crates/biome_html_analyze/benches/fixtures/real
Comment thread crates/biome_html_analyze/benches/html_analyzer.rs
@dyc3 dyc3 force-pushed the dyc3/html-analyze-benches branch from 72584be to 20826ae Compare March 22, 2026 13:00
@dyc3 dyc3 merged commit 2088ec5 into main Mar 22, 2026
17 checks passed
@dyc3 dyc3 deleted the dyc3/html-analyze-benches branch March 22, 2026 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Linter Area: linter L-HTML Language: HTML and super languages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants