Skip to content

fix(codegen): remove duplicate preamble in generated analyzer files#8993

Merged
ematipico merged 5 commits intobiomejs:mainfrom
mldangelo:fix/codegen-duplicate-preamble
Feb 9, 2026
Merged

fix(codegen): remove duplicate preamble in generated analyzer files#8993
ematipico merged 5 commits intobiomejs:mainfrom
mldangelo:fix/codegen-duplicate-preamble

Conversation

@mldangelo
Copy link
Contributor

@mldangelo mldangelo commented Feb 7, 2026

Summary

The analyzer codegen had five near-identical generate_*_analyzer functions, and the reformat() pattern was applied inconsistently across them — four called it twice (producing a duplicate //! Generated file preamble), one called it once correctly. The copy-paste duplication made the inconsistency invisible.

This PR removes the redundant reformat() calls, deduplicates the five functions into a single generate_analyzer_crate helper so the pattern is defined once, adds an assertion in prepend_generated_preamble() to catch any future double-call, and fixes 4 orphaned files that had the same duplicate from the since-removed generate_options function (#5543).

Test plan

  • cargo check and cargo clippy pass
  • Zero files with duplicate preambles (was 45)
  • Unit tests for the preamble assertion (cargo test -p xtask_glue)

AI assistance disclosure

This PR was written primarily by Claude Code.

mldangelo and others added 3 commits February 7, 2026 17:30
The codegen for analyzer crates was calling `reformat()` twice on the
same content — once to format the `quote!{}` output, then again before
writing. Since `reformat()` always prepends the generated file preamble,
this produced a duplicate `//! Generated file` header in 41 files.

Remove the redundant second `reformat()` call from `generate_category`,
`generate_group`, `update_json_registry_builder`, and
`update_css_registry_builder`, making them consistent with the other
registry builder functions that already used a single call. Also fix 4
orphaned files (options.rs/assists.rs) that had the same duplicate from
a since-removed `generate_options` function.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The codegen for analyzer crates was calling `reformat()` twice on the
same content — once to format the `quote!{}` output, then again before
writing. Since `reformat()` always prepends the generated file preamble,
this produced a duplicate `//! Generated file` header in 41 files.

Remove the redundant second `reformat()` call from `generate_category`,
`generate_group`, `update_json_registry_builder`, and
`update_css_registry_builder`, making them consistent with the other
registry builder functions that already used a single call. Also fix 4
orphaned files (options.rs/assists.rs) that had the same duplicate from
a since-removed `generate_options` function.

Add an assertion in `prepend_generated_preamble()` to prevent regression
— it now panics if content already contains the preamble.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace five near-identical `generate_*_analyzer` functions with a
shared `generate_analyzer_crate` helper that takes the crate name,
category list, and registry update function. Also remove dead
`_category_comment` code and a pointless `let key = name` alias.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@changeset-bot
Copy link

changeset-bot bot commented Feb 7, 2026

⚠️ No Changeset found

Latest commit: 6aca566

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added A-Linter Area: linter A-Tooling Area: internal tools L-JavaScript Language: JavaScript and super languages L-CSS Language: CSS L-JSON Language: JSON and super languages L-HTML Language: HTML and super languages labels Feb 7, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 7, 2026

Walkthrough

Removed duplicated top-of-file "Generated file, do not edit by hand, see xtask/codegen" preamble lines from many generated analyzer and support files across biome_css_analyze, biome_graphql_analyze, biome_html_analyze, biome_js_analyze and biome_json_analyze. Refactored xtask/codegen/src/generate_analyzer.rs to consolidate per-crate generators into a generalized flow via generate_analyzer_crate and a single generate_analyzer entry point, and simplified build-script/category handling. Added a defensive conversion, runtime assertion and unit tests in xtask/glue/src/lib.rs to detect and panic on duplicate preambles.

Possibly related PRs

Suggested labels

A-Project

Suggested reviewers

  • dyc3
  • ematipico
  • arendjr
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarises the main change: removing duplicate preambles in generated analyser files.
Description check ✅ Passed The PR description clearly explains the root cause (duplicate reformat() calls), the solution (consolidation and assertion), and provides verification steps.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@dyc3 dyc3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the premise behind this change, but the actual diff on the codegen side is a bit large for what I would expect for this fix.

@mldangelo
Copy link
Contributor Author

I don't mind the premise behind this change, but the actual diff on the codegen side is a bit large for what I would expect for this fix.

Good point! The actual fix is:

  • Removing 4 redundant reformat() calls (the root cause)
  • Fixing 4 orphaned files that had the same duplicate from the old generate_options function
  • Adding an assertion in prepend_generated_preamble() to prevent regression

Re. the codegen side. The bug existed because of the duplication. Five near-identical functions, and the reformat() pattern was applied inconsistently across them, some had double calls, some didn't. The copy-paste made the inconsistency invisible. Deduplicating into a single function means the pattern is defined once, so it can't drift between copies again. This is in the third commit, I can remove it if you would prefer to keep the diff smaller.

Verify that prepend_generated_preamble adds exactly one preamble and
panics when called on content that already contains one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the PR description contained the contents of your second comment, we wouldn't have needed to understand why "so many changes". The description emitted by the AI is full of a slop, and misses the very reason of the bug.

Food for thought.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 8, 2026

Merging this PR will not alter performance

✅ 58 untouched benchmarks
⏩ 95 skipped benchmarks1


Comparing mldangelo:fix/codegen-duplicate-preamble (6aca566) with main (2cba2b3)2

Open in CodSpeed

Footnotes

  1. 95 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (16fd71d) during the generation of this report, so 2cba2b3 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Inline generate_categories into generate_analyzer_crate to avoid
the complex return type that triggered clippy::type_complexity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
xtask/codegen/src/generate_analyzer.rs (1)

26-28: ⚠️ Potential issue | 🟡 Minor

Pre-existing nit: #crate_name in doc comment renders literally.

Line 27's //! Build script for #crate_name. won't interpolate — quote! doesn't substitute inside doc comments. The generated build.rs files will literally say #crate_name. Since generate_build_script doesn't even receive a crate name parameter, this was presumably always the case. Not introduced by this PR, but worth a quick fix while you're in the neighbourhood.

🧹 Nitpick comments (1)
xtask/codegen/src/generate_analyzer.rs (1)

246-339: Optional: the five update_*_registry_builder functions are still near-identical.

These differ only in the crate path and the Language type. Since the language type is a generic parameter in RegistryVisitor<L>, you could potentially collapse these into a single function that takes the crate name and a quote!-generated language token. Not urgent — the current state is perfectly readable and the PR already did the heavy lifting on dedup. Just flagging for a potential follow-up.

@mldangelo
Copy link
Contributor Author

If the PR description contained the contents of your second comment, we wouldn't have needed to understand why "so many changes". The description emitted by the AI is full of a slop, and misses the very reason of the bug.

Food for thought.

@ematipico Thank you and you're right. I updated the description. I'll be more careful going forward.

@ematipico
Copy link
Member

@mldangelo just FYI, the command you want to use for linting is just lint

@ematipico ematipico merged commit a368d9f into biomejs:main Feb 9, 2026
28 checks passed
@mldangelo mldangelo deleted the fix/codegen-duplicate-preamble branch February 9, 2026 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Linter Area: linter A-Tooling Area: internal tools L-CSS Language: CSS L-HTML Language: HTML and super languages L-JavaScript Language: JavaScript and super languages L-JSON Language: JSON and super languages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants