feat(linter/plugins): handle BOMs#18376
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
CodSpeed Performance ReportMerging this PR will not alter performanceComparing Summary
Footnotes
|
Merge activity
|
Closes #12526. Handle BOM on start of files in the same way that ESLint does - do not include it in the source text on JS side, but `context.sourceCode.hasBOM` evaluates to `true`. Method: * Alter `program.source_text` to trim off the BOM before passing AST to JS side. * Add a `has_bom` flag to `RawTransferMetadata`. * Add ability to add an offset in the conversion from UTF-8 to UTF-16 spans. The result is that the file as it's seen on JS side is as if the BOM didn't exist (except for the `hasBOM` flag). Spans are converted accordingly in JS-side AST, and converted back when passing diagnostics back to Rust.
#18375) Previous raw transfer required that the source text start exactly at the start of the buffer. In linter, relax this restriction and handle when source text is stored elsewhere in the buffer. This is necessary for stripping BOM from source (#18376), as then the source text used on JS side starts 3 bytes after the start of the buffer.
26d0f46 to
6ac09e2
Compare
1ce13f5 to
8db0e78
Compare
There was a problem hiding this comment.
Pull request overview
This PR teaches the raw-transfer pipeline and JS plugin system to correctly handle Unicode BOM at the start of files, matching ESLint semantics: JS-side source text excludes the BOM, context.sourceCode.hasBOM is accurate, and span conversions round-trip correctly between Rust and JS.
Changes:
- Extend
RawTransferMetadata(parser and linter) and the raw-transfer generators to carry ahas_bomflag and expose its byte offset to JS via newHAS_BOM_FLAG_POSconstants. - Enhance UTF-8→UTF-16 span conversion with an optional leading-byte offset (
Utf8ToUtf16::new_with_offset/build_translations), and apply BOM trimming + offset-aware conversion in both the JS-plugin linter path and theparse_rawpath. - Add BOM-focused JS fixtures, a JS plugin to assert
hasBOM, and updated conformance snapshots reflecting now-fully-passingunicode-bomandno-irregular-whitespacebehavior.
Reviewed changes
Copilot reviewed 16 out of 21 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
tasks/ast_tools/src/generators/raw_transfer.rs |
Adds has_bom field discovery in raw metadata and generates HAS_BOM_FLAG_POS so JS can read the BOM flag from the transfer buffer. |
oxfmtrc.jsonc |
Excludes BOM fixture files from formatting to preserve literal BOM bytes in test fixtures. |
napi/parser/src/raw_transfer_types.rs |
Extends RawTransferMetadata with a has_bom: bool field (initialized to false in parser-only usage). |
napi/parser/src/generated/assert_layouts.rs |
Updates layout assertions for RawTransferMetadata including the new has_bom field and adjusted padding count. |
napi/parser/src-js/generated/constants.js |
Regenerates JS constants, adding HAS_BOM_FLAG_POS aligned with the new metadata layout. |
crates/oxc_linter/src/lib.rs |
Trims a leading BOM from program.source_text, uses Utf8ToUtf16::new_with_offset when BOM is present, and writes has_bom into RawTransferMetadata for JS plugins. |
crates/oxc_linter/src/generated/assert_layouts.rs |
Updates layout checks for RawTransferMetadata2 to include has_bom and correct padding size. |
crates/oxc_ast_visit/src/utf8_to_utf16/translation.rs |
Allows build_translations to start with a non-zero UTF-8/UTF-16 difference offset, enabling leading-byte exclusion (e.g., BOM). |
crates/oxc_ast_visit/src/utf8_to_utf16/mod.rs |
Adds Utf8ToUtf16::new_with_offset, wiring the new offset parameter into translation generation and documenting its BOM use case. |
crates/oxc_ast_macros/src/generated/structs.rs |
Updates PHF struct field order metadata for RawTransferMetadata/RawTransferMetadata2 to account for the new has_bom field. |
apps/oxlint/test/fixtures/bom/plugin.ts |
Introduces a JS plugin used in tests to assert context.sourceCode.hasBOM, source text, and spans with and without BOM. |
apps/oxlint/test/fixtures/bom/output.snap.md |
Adds snapshot output verifying correct BOM handling in messages, source text, and spans across BOM / non-BOM and unicode / non-unicode files. |
apps/oxlint/test/fixtures/bom/files/no_bom_unicode.js |
Test input: non-BOM JS file including Unicode characters to compare against BOM behavior. |
apps/oxlint/test/fixtures/bom/files/no_bom.js |
Test input: simple non-BOM JS file for baseline spans and hasBOM behavior. |
apps/oxlint/test/fixtures/bom/files/bom_unicode.js |
Test input: BOM-prefixed JS file with Unicode characters to validate combined BOM and multi-byte handling. |
apps/oxlint/test/fixtures/bom/files/bom.js |
Test input: basic BOM-prefixed JS file to validate BOM stripping and span conversion. |
apps/oxlint/test/fixtures/bom/.oxlintrc.json |
Local linter config enabling the BOM test plugin for the new BOM fixtures. |
apps/oxlint/src/js_plugins/parse.rs |
Mirrors the linter path: trims BOM, uses Utf8ToUtf16::new_with_offset, and propagates has_bom into RawTransferMetadata for raw parse buffers. |
apps/oxlint/src-js/plugins/lint.ts |
Reads has_bom via HAS_BOM_FLAG_POS and passes it into setupSourceForFile so context.sourceCode.hasBOM is accurate for JS plugins. |
apps/oxlint/src-js/generated/constants.ts |
Regenerates TS constants with HAS_BOM_FLAG_POS matching the Rust-side metadata layout. |
apps/oxlint/conformance/snapshot.md |
Updates conformance statistics, marking no-irregular-whitespace and unicode-bom as fully passing in light of the new BOM handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
### 💥 BREAKING CHANGES - 22dec6a semantic: [**BREAKING**] Remove `Scoping::scope_build_child_ids` and all related APIs (#18362) (Dunqing) - 30a4899 oxc: [**BREAKING**] Remove `CompilerInterface::semantic_child_scope_ids` (#18361) (Dunqing) - 777fc40 ast: [**BREAKING**] Add `Ident` type (#18354) (Boshen) - af0ca46 span: [**BREAKING**] Use `ModuleKind::CommonJS` for `SourceType::cjs()` (#18276) (sapphi-red) ### 🚀 Features - 0a02026 semantic: Add TS1499 code to diagnostic (#18557) (camc314) - 8b4618f parser: Add TS1500 code to diagnostic (#18547) (camc314) - 866b6b3 parser: Add TS1048 code to diagnostic (#18546) (camc314) - 1117c44 parser: Add TS1054 code to diagnostic (#18541) (camc314) - e4fcdde semantic: Add TS1053 code to diagnostic (#18539) (camc314) - bcbf396 semantic: Add TS1052 code to diagnostic (#18538) (camc314) - 8155edf semantic: Add TS1049 code to diagnostic (#18535) (camc314) - 51d3b3f parser: Add TS1502 code to diagnostic (#18534) (camc314) - 00854e8 semantic: Add TS2337 error code to super call diagnostic (#18531) (camc314) - 993fd2b parser: Parse unambiguous await with better error messages (#18480) (Boshen) - 8db0e78 linter/plugins: Handle BOMs (#18376) (overlookmotel) - 6ac09e2 linter/plugins: Support source text not being at start of buffer (#18375) (overlookmotel) - 2ef5647 ast: Add escape_raw parameter to template_element builders (#18121) (Boshen) ### 🐛 Bug Fixes - 74d0998 semantic: Update error msg for multiple `default` cases in switch stmt (#18526) (camc314) - c205b0d ast: Remove `ThisExpression` from `TSModuleReference` (#18489) (Boshen) - aed3669 parser: Parse HTML-like comments in unambiguous mode (#18442) (Boshen) - c4132fb parser: Validate accessor parameters in interface method signatures (#18391) (Boshen) - b0cd74d semantic: Allow `var` and `function` with same name in static blocks (#18358) (Boshen) - 6037995 semantic: Allow `new.target` in class field initializers (#18349) (Boshen) - 9a15c6a semantic: Do not rely on spans for node comparison in `Function::bind` (#18296) (overlookmotel) ### ⚡ Performance - 6b600c4 semantic: Skip parent lookup for function declarations in `Function::bind` (#18293) (overlookmotel) - c27ad2d semantic: Move check for function declaration out of `is_function_part_of_if_statement` (#18292) (overlookmotel) - 63eb89e semantic: Skip checking redeclarations for function expressions (#18291) (overlookmotel) - 7c12743 semantic: Skip checking unresolved exports in CommonJS files (#18250) (overlookmotel) - 2349031 allocator: Increase initial chunk size from 512B to 16KB (#18234) (Boshen) ### 📚 Documentation - 8ccd853 npm: Update package homepage URLs and add keywords (#18509) (Boshen) - 9b3165f napi/parser: Clarify when to use `parseAsync` vs `parseSync` (#18486) (Boshen) - 1b59f63 napi/parser: Correct typo in README (#18251) (overlookmotel) - 00ff75f mangler: Fix `top_level` option in example (#18233) (overlookmotel) - 2ddc073 semantic: Fix typo in comment (#18238) (overlookmotel) Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
# Oxlint ### 💥 BREAKING CHANGES - 777fc40 ast: [**BREAKING**] Add `Ident` type (#18354) (Boshen) ### 🚀 Features - 34c3ec3 linter/prefer-logical-operator-over-ternary: Implement fixer (#18545) (camc314) - 019e0aa linter/valid-typeof: Add suggestions if type is misspelled (#18543) (camchenry) - 704c8eb linter/use-isnan: Add more specific error message for equality/inequality (#18542) (camchenry) - 1e99ace linter/use-isnan: Support more `indexOf` cases and improve diagnostic messages (#18537) (camchenry) - bffd134 linter/text-encoding-identifier-case: Add `withDash` option (#18533) (camc314) - 993fd2b parser: Parse unambiguous await with better error messages (#18480) (Boshen) - b4b6247 linter/plugins: `RuleTester` support settings (#18445) (overlookmotel) - 15d69dc linter: Implement react/display-name rule (#18426) (camchenry) - 2fbceae linter: Implement rule docs and config support for rules with tuple config options. (#18372) (connorshea) - 8db0e78 linter/plugins: Handle BOMs (#18376) (overlookmotel) - 6ac09e2 linter/plugins: Support source text not being at start of buffer (#18375) (overlookmotel) - fc3c86b linter: Update 125 rules to raise errors when provided with invalid config options. (#18104) (connorshea) - 2cc6ad2 linter/plugins: Add `ecmaFeatures` to `parserOptions` (#18313) (overlookmotel) ### 🐛 Bug Fixes - 2acf568 linter/plugins: Keep `Infinity` in rule default options (#18550) (overlookmotel) - 332d2ef linter/plugins: Add `jsx` property to `parserOptions.ecmaFeatures` (#18549) (overlookmotel) - 7d9bb1b linter: Update `eslint/func-names` to error on invalid rule config options, improve docs. (#18510) (connorshea) - 9c67974 linter: Improve the jsx-a11y/no-noninteractive-tabindex rule to match original rule logic better (#17848) (connorshea) - 75e7163 vscode: Support json5 for oxfmt (#18502) (Sysix) - c205b0d ast: Remove `ThisExpression` from `TSModuleReference` (#18489) (Boshen) - c51339a oxlint/lsp: Respect code action `source.fixAll` as an alias for `source.fixAll.oxc` (#18366) (Sysix) - 3c0e9b9 oxlint/lsp: Skip dangerous fixes/suggestions for "fix all" code action and command (#18364) (Sysix) - c44c093 linter: Fix behavior of unicorn/catch-error-name to match original rule (#18209) (connorshea) - 9c65aff linter/jsx-a11y: Change `no-autofocus` autofix to suggestion (#18155) (Ben Lowery) - 235c820 linter/unicorn: Fix `prefer-array-some` autofix for `.filter().length` pattern (#18153) (Ben Lowery) - a9925dc linter: Mark fixes in `unicorn/no-null` rule as dangerous. (#18436) (connorshea) - cee29b4 linter: Remove confusing scope from `react/only-export-components` rule diagnostics. (#18434) (connorshea) - aed3669 parser: Parse HTML-like comments in unambiguous mode (#18442) (Boshen) - b8a371d linter: Fix the path used in the gitlab format output (#18165) (connorshea) - e046ea6 linter: `vue/no-lifecycle-after-await` skip looking into arrow functions (#18302) (Sysix) - a9bfbcf linter: Compatibility issue with `DiagnosticData` type in ESLint (#18396) (루밀LuMir) - 10ab424 linter: `react/no_array_index_key` continue search for other attributes (#18409) (Lonami) - 9d776d4 linter: Update `import/no-cycle` rule to error on invalid config options. (#18330) (connorshea) - c163231 linter: Update eslint/sort-imports to validate options. (#18378) (connorshea) - 79bbcff linter: Update `eslint/func-style` to error on invalid configuration options. (#18390) (connorshea) - b871235 linter/plugins: Fix identifying "use strict" directives in scope analysis (#18402) (overlookmotel) - 5985141 linter: Update `jest/prefer-lowercase-title` rule to error on invalid config options. (#18332) (connorshea) - faca4b5 linter/plugins: Tokenize `let`, `static` and `yield` as `Keyword`s (#18368) (overlookmotel) - a3914fd linter/plugins: Allow line number passed to `report` to be 1 over line count (#18341) (overlookmotel) - 88e0896 linter: Update `typescript/no-restricted-types` rule to error on invalid config options. (#18329) (connorshea) - 9eec600 linter: Update `react/jsx-fragments` rule to raise an error on invalid configuration options (#18111) (connorshea) - 0fa969d linter: Update `react/no-will-update-set-state` to error on invalid config options (#18112) (connorshea) - 70e7be4 linter: Update `import/no-unassigned-import` to raise an error when passed invalid config options. (#18108) (connorshea) - 496cac7 linter: Update `unicorn/explicit-length-check` to raise an error when passed invalid config options. (#18107) (connorshea) - 080b1ec linter: Update 5 more rules to error on invalid config options. (#18113) (connorshea) - c5d05dd linter: Update 11 rules to raise an error on invalid config options. (#18109) (connorshea) - 9e359d4 linter/plugins: Set all properties on global vars objects (#18317) (overlookmotel) - 39c7f32 linter/plugins: Set `writeable` flag on variables where defined as globals (#18316) (overlookmotel) - a570693 linter/plugins: Fix `CatchClause` scopes (#18312) (overlookmotel) - 8c98e69 linter: `vitest/prefer-describe-function-title`: Check earlier to avoid false positive (#18177) (Jovi De Croock) - 44be0eb linter/plugins: Set scope analyse settings based on source type (#18306) (overlookmotel) - b9a14fd vscode: Update package.json to restrict a few more config options. (#18270) (Connor Shea) - c1260cb vscode: Update version info formatting. (#18274) (connorshea) - 2f68dc6 vscode: Update notification for client restart to specify tool. (#18273) (connorshea) ### ⚡ Performance - dc931ba linter/no-inner-declarations: Skip scope flags lookup in modules (#18249) (overlookmotel) - 07618a7 linter: Turn off `scope_build_child_ids` for SemanticBuilder (#18360) (Dunqing) - 1aac079 linter/exhaustive-deps: Simplify the logic of checking if the identifier it is a dependency of hook (#18350) (Dunqing) - 591d522 linter/block-scoped-var: Avoid `iter_all_scope_child_ids` by walking references/redeclarations scope ancestors (#18335) (Dunqing) - 2eefd6d linter/plugins: Remove branch from token parsing (#18369) (overlookmotel) ### 📚 Documentation - 698c21d linter: Modernize docs for various React rules (#18559) (connorshea) - 314a47c linter: Clarify the `no-find-dom-node` rule with a note that the method was removed in React 19. (#18556) (connorshea) - 5eff704 linter: Update `no-inner-declarations` to fix config option docs (#18511) (connorshea) - dd5d2f6 linter: Improve diagnostic message in `valid_typeof` rule. (#18507) (connorshea) - 8ccd853 npm: Update package homepage URLs and add keywords (#18509) (Boshen) - 4958233 linter: Add missing "What it does" section in prefer-reflect-apply rule. (#18475) (connorshea) - 2fa83a4 linter: Improve the docs for import/unambiguous. (#18474) (connorshea) - 7b1505c linter: Improve docs for `oxc/only-used-in-recursion` rule. (#18473) (connorshea) - ab506d6 linter/plugins: Correct comment (#18456) (overlookmotel) - 4565c73 linter: `react/display-name`: add docs for config options (#18430) (camchenry) - b95a89f linter: Fix docs for the curly rule. (#18374) (connorshea) - f675eb4 linter: Fix the `react/only-export-components` rule docs. (#18319) (connorshea) - 704db95 linter: "no-unused-vars" extend ignored files section for svelte and astro files (#18304) (Sysix) - 3af4a88 linter: Add "Examples" headers to rules missing them (#18266) (connorshea) # Oxfmt ### 💥 BREAKING CHANGES - 777fc40 ast: [**BREAKING**] Add `Ident` type (#18354) (Boshen) ### 🚀 Features - d71c15d oxfmt: Enable tailwind sort inside xxx-in-js (#18417) (leaysgur) - 52b5003 formatter,oxfmt: Support Angular `@Component({ template, styles })` (#18324) (leaysgur) ### 🐛 Bug Fixes - 224140c oxfmt: Canonicalize `..` component in config path (#18570) (leaysgur) - 30b467e formatter: Preserve trailing comments before the semicolon in class methods without a body (#18446) (Dunqing) - c205b0d ast: Remove `ThisExpression` from `TSModuleReference` (#18489) (Boshen) - 164bbd7 formatter: Preserve trailing comments inside ternary alternate branch (#18433) (Dunqing) - 1c50800 formatter: Use HTML entity escaping for JSX attribute strings (#18385) (Boshen) - 4e156d2 formatter: Preserve parentheses for `in` expressions in arrow function block bodies (#18352) (Boshen) - 7e6c15b oxfmt: Increase Tailwind CSS test timeout for Windows CI (#18339) (Boshen) - 29966eb formatter/dead-code-removal: Handle tailwind sorting (#18321) (leaysgur) - 29f41be formatter: Only expand mapped types when newline immediately follows opening brace (#18087) (Boshen) - 2194552 formatter: Relocate leading comments for single-element union/intersection types (#18083) (Boshen) ### ⚡ Performance - 85ab400 formatter: Store `AstNodes` itself instead of `&'a AstNodes` as the `parent` field of `AstNode` (#18428) (Dunqing) - 194d384 formatter: Reduce AstNode size by 8 bytes using following_span_start (#18347) (Dunqing) - b2df8fb oxfmt: Enable tailwind plugin only for relevant parser (#18418) (leaysgur) ### 📚 Documentation - 8ccd853 npm: Update package homepage URLs and add keywords (#18509) (Boshen) Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Previously we had to store source text at start of buffers sent to JS via raw transfer. #18376 made changes to how raw transfer deserializer handles strings, in order to support files containing a BOM. Building on that, we're now able to remove the requirement that source text be at start of the buffer entirely. This PR changes the deserializer used in Oxlint JS plugins to accept source text being anywhere in the buffer, *as long as no other strings are after it*. In practice this just means that the source text must be allocated before anything else, which is easy to satisfy. Now the source text can be allocated with just the usual safe `allocator.alloc_str(source_text)` method. This change removes a ton of dodgy workarounds and unsafe code we used previously to get source text at the start of buffer. It makes the code less labyrinthine and far less likely a slip up can inadvertently introduce UB. Note: In `napi/parser`, source text still *is* at start of the buffer, as that's simpler and more efficient when the source text is written into the buffer on JS side. This change only affects Oxlint.

Closes #12526.
Handle BOM on start of files in the same way that ESLint does - do not include it in the source text on JS side, but
context.sourceCode.hasBOMevaluates totrue.Method:
program.source_textto trim off the BOM before passing AST to JS side.has_bomflag toRawTransferMetadata.The result is that the file as it's seen on JS side is as if the BOM didn't exist (except for the
hasBOMflag). Spans are converted accordingly in JS-side AST, and converted back when passing diagnostics back to Rust.