Skip to content

feat(linter/plugins): handle BOMs#18376

Merged
graphite-app[bot] merged 1 commit intomainfrom
om/01-22-feat_linter_plugins_handle_boms
Jan 22, 2026
Merged

feat(linter/plugins): handle BOMs#18376
graphite-app[bot] merged 1 commit intomainfrom
om/01-22-feat_linter_plugins_handle_boms

Conversation

@overlookmotel
Copy link
Member

@overlookmotel overlookmotel commented Jan 22, 2026

Closes #12526.

Handle BOM on start of files in the same way that ESLint does - do not include it in the source text on JS side, but context.sourceCode.hasBOM evaluates to true.

Method:

  • Alter program.source_text to trim off the BOM before passing AST to JS side.
  • Add a has_bom flag to RawTransferMetadata.
  • Add ability to add an offset in the conversion from UTF-8 to UTF-16 spans.

The result is that the file as it's seen on JS side is as if the BOM didn't exist (except for the hasBOM flag). Spans are converted accordingly in JS-side AST, and converted back when passing diagnostics back to Rust.

@github-actions github-actions bot added A-linter Area - Linter A-parser Area - Parser A-cli Area - CLI A-ast-tools Area - AST tools A-linter-plugins Area - Linter JS plugins C-enhancement Category - New feature or request labels Jan 22, 2026
Copy link
Member Author

overlookmotel commented Jan 22, 2026


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 22, 2026

CodSpeed Performance Report

Merging this PR will not alter performance

Comparing om/01-22-feat_linter_plugins_handle_boms (1ce13f5) with main (b95a89f)1

Summary

✅ 42 untouched benchmarks
⏩ 3 skipped benchmarks2

Footnotes

  1. No successful run was found on om/01-21-feat_linter_plugins_support_source_text_not_being_at_start_of_buffer (26d0f46) during the generation of this report, so main (b95a89f) was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@overlookmotel overlookmotel marked this pull request as ready for review January 22, 2026 02:13
Copilot AI review requested due to automatic review settings January 22, 2026 02:13
@overlookmotel overlookmotel self-assigned this Jan 22, 2026
@overlookmotel overlookmotel added the 0-merge Merge with Graphite Merge Queue label Jan 22, 2026
Copy link
Member Author

overlookmotel commented Jan 22, 2026

Merge activity

Closes #12526.

Handle BOM on start of files in the same way that ESLint does - do not include it in the source text on JS side, but `context.sourceCode.hasBOM` evaluates to `true`.

Method:

* Alter `program.source_text` to trim off the BOM before passing AST to JS side.
* Add a `has_bom` flag to `RawTransferMetadata`.
* Add ability to add an offset in the conversion from UTF-8 to UTF-16 spans.

The result is that the file as it's seen on JS side is as if the BOM didn't exist (except for the `hasBOM` flag). Spans are converted accordingly in JS-side AST, and converted back when passing diagnostics back to Rust.
graphite-app bot pushed a commit that referenced this pull request Jan 22, 2026
#18375)

Previous raw transfer required that the source text start exactly at the start of the buffer. In linter, relax this restriction and handle when source text is stored elsewhere in the buffer.

This is necessary for stripping BOM from source (#18376), as then the source text used on JS side starts 3 bytes after the start of the buffer.
@graphite-app graphite-app bot force-pushed the om/01-21-feat_linter_plugins_support_source_text_not_being_at_start_of_buffer branch from 26d0f46 to 6ac09e2 Compare January 22, 2026 02:16
@graphite-app graphite-app bot force-pushed the om/01-22-feat_linter_plugins_handle_boms branch from 1ce13f5 to 8db0e78 Compare January 22, 2026 02:17
Base automatically changed from om/01-21-feat_linter_plugins_support_source_text_not_being_at_start_of_buffer to main January 22, 2026 02:23
@graphite-app graphite-app bot merged commit 8db0e78 into main Jan 22, 2026
22 checks passed
@graphite-app graphite-app bot deleted the om/01-22-feat_linter_plugins_handle_boms branch January 22, 2026 02:23
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Jan 22, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR teaches the raw-transfer pipeline and JS plugin system to correctly handle Unicode BOM at the start of files, matching ESLint semantics: JS-side source text excludes the BOM, context.sourceCode.hasBOM is accurate, and span conversions round-trip correctly between Rust and JS.

Changes:

  • Extend RawTransferMetadata (parser and linter) and the raw-transfer generators to carry a has_bom flag and expose its byte offset to JS via new HAS_BOM_FLAG_POS constants.
  • Enhance UTF-8→UTF-16 span conversion with an optional leading-byte offset (Utf8ToUtf16::new_with_offset / build_translations), and apply BOM trimming + offset-aware conversion in both the JS-plugin linter path and the parse_raw path.
  • Add BOM-focused JS fixtures, a JS plugin to assert hasBOM, and updated conformance snapshots reflecting now-fully-passing unicode-bom and no-irregular-whitespace behavior.

Reviewed changes

Copilot reviewed 16 out of 21 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tasks/ast_tools/src/generators/raw_transfer.rs Adds has_bom field discovery in raw metadata and generates HAS_BOM_FLAG_POS so JS can read the BOM flag from the transfer buffer.
oxfmtrc.jsonc Excludes BOM fixture files from formatting to preserve literal BOM bytes in test fixtures.
napi/parser/src/raw_transfer_types.rs Extends RawTransferMetadata with a has_bom: bool field (initialized to false in parser-only usage).
napi/parser/src/generated/assert_layouts.rs Updates layout assertions for RawTransferMetadata including the new has_bom field and adjusted padding count.
napi/parser/src-js/generated/constants.js Regenerates JS constants, adding HAS_BOM_FLAG_POS aligned with the new metadata layout.
crates/oxc_linter/src/lib.rs Trims a leading BOM from program.source_text, uses Utf8ToUtf16::new_with_offset when BOM is present, and writes has_bom into RawTransferMetadata for JS plugins.
crates/oxc_linter/src/generated/assert_layouts.rs Updates layout checks for RawTransferMetadata2 to include has_bom and correct padding size.
crates/oxc_ast_visit/src/utf8_to_utf16/translation.rs Allows build_translations to start with a non-zero UTF-8/UTF-16 difference offset, enabling leading-byte exclusion (e.g., BOM).
crates/oxc_ast_visit/src/utf8_to_utf16/mod.rs Adds Utf8ToUtf16::new_with_offset, wiring the new offset parameter into translation generation and documenting its BOM use case.
crates/oxc_ast_macros/src/generated/structs.rs Updates PHF struct field order metadata for RawTransferMetadata/RawTransferMetadata2 to account for the new has_bom field.
apps/oxlint/test/fixtures/bom/plugin.ts Introduces a JS plugin used in tests to assert context.sourceCode.hasBOM, source text, and spans with and without BOM.
apps/oxlint/test/fixtures/bom/output.snap.md Adds snapshot output verifying correct BOM handling in messages, source text, and spans across BOM / non-BOM and unicode / non-unicode files.
apps/oxlint/test/fixtures/bom/files/no_bom_unicode.js Test input: non-BOM JS file including Unicode characters to compare against BOM behavior.
apps/oxlint/test/fixtures/bom/files/no_bom.js Test input: simple non-BOM JS file for baseline spans and hasBOM behavior.
apps/oxlint/test/fixtures/bom/files/bom_unicode.js Test input: BOM-prefixed JS file with Unicode characters to validate combined BOM and multi-byte handling.
apps/oxlint/test/fixtures/bom/files/bom.js Test input: basic BOM-prefixed JS file to validate BOM stripping and span conversion.
apps/oxlint/test/fixtures/bom/.oxlintrc.json Local linter config enabling the BOM test plugin for the new BOM fixtures.
apps/oxlint/src/js_plugins/parse.rs Mirrors the linter path: trims BOM, uses Utf8ToUtf16::new_with_offset, and propagates has_bom into RawTransferMetadata for raw parse buffers.
apps/oxlint/src-js/plugins/lint.ts Reads has_bom via HAS_BOM_FLAG_POS and passes it into setupSourceForFile so context.sourceCode.hasBOM is accurate for JS plugins.
apps/oxlint/src-js/generated/constants.ts Regenerates TS constants with HAS_BOM_FLAG_POS matching the Rust-side metadata layout.
apps/oxlint/conformance/snapshot.md Updates conformance statistics, marking no-irregular-whitespace and unicode-bom as fully passing in light of the new BOM handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

overlookmotel pushed a commit that referenced this pull request Jan 26, 2026
### 💥 BREAKING CHANGES

- 22dec6a semantic: [**BREAKING**] Remove
`Scoping::scope_build_child_ids` and all related APIs (#18362) (Dunqing)
- 30a4899 oxc: [**BREAKING**] Remove
`CompilerInterface::semantic_child_scope_ids` (#18361) (Dunqing)
- 777fc40 ast: [**BREAKING**] Add `Ident` type (#18354) (Boshen)
- af0ca46 span: [**BREAKING**] Use `ModuleKind::CommonJS` for
`SourceType::cjs()` (#18276) (sapphi-red)

### 🚀 Features

- 0a02026 semantic: Add TS1499 code to diagnostic (#18557) (camc314)
- 8b4618f parser: Add TS1500 code to diagnostic (#18547) (camc314)
- 866b6b3 parser: Add TS1048 code to diagnostic (#18546) (camc314)
- 1117c44 parser: Add TS1054 code to diagnostic (#18541) (camc314)
- e4fcdde semantic: Add TS1053 code to diagnostic (#18539) (camc314)
- bcbf396 semantic: Add TS1052 code to diagnostic (#18538) (camc314)
- 8155edf semantic: Add TS1049 code to diagnostic (#18535) (camc314)
- 51d3b3f parser: Add TS1502 code to diagnostic (#18534) (camc314)
- 00854e8 semantic: Add TS2337 error code to super call diagnostic
(#18531) (camc314)
- 993fd2b parser: Parse unambiguous await with better error messages
(#18480) (Boshen)
- 8db0e78 linter/plugins: Handle BOMs (#18376) (overlookmotel)
- 6ac09e2 linter/plugins: Support source text not being at start of
buffer (#18375) (overlookmotel)
- 2ef5647 ast: Add escape_raw parameter to template_element builders
(#18121) (Boshen)

### 🐛 Bug Fixes

- 74d0998 semantic: Update error msg for multiple `default` cases in
switch stmt (#18526) (camc314)
- c205b0d ast: Remove `ThisExpression` from `TSModuleReference` (#18489)
(Boshen)
- aed3669 parser: Parse HTML-like comments in unambiguous mode (#18442)
(Boshen)
- c4132fb parser: Validate accessor parameters in interface method
signatures (#18391) (Boshen)
- b0cd74d semantic: Allow `var` and `function` with same name in static
blocks (#18358) (Boshen)
- 6037995 semantic: Allow `new.target` in class field initializers
(#18349) (Boshen)
- 9a15c6a semantic: Do not rely on spans for node comparison in
`Function::bind` (#18296) (overlookmotel)

### ⚡ Performance

- 6b600c4 semantic: Skip parent lookup for function declarations in
`Function::bind` (#18293) (overlookmotel)
- c27ad2d semantic: Move check for function declaration out of
`is_function_part_of_if_statement` (#18292) (overlookmotel)
- 63eb89e semantic: Skip checking redeclarations for function
expressions (#18291) (overlookmotel)
- 7c12743 semantic: Skip checking unresolved exports in CommonJS files
(#18250) (overlookmotel)
- 2349031 allocator: Increase initial chunk size from 512B to 16KB
(#18234) (Boshen)

### 📚 Documentation

- 8ccd853 npm: Update package homepage URLs and add keywords (#18509)
(Boshen)
- 9b3165f napi/parser: Clarify when to use `parseAsync` vs `parseSync`
(#18486) (Boshen)
- 1b59f63 napi/parser: Correct typo in README (#18251) (overlookmotel)
- 00ff75f mangler: Fix `top_level` option in example (#18233)
(overlookmotel)
- 2ddc073 semantic: Fix typo in comment (#18238) (overlookmotel)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
overlookmotel pushed a commit that referenced this pull request Jan 26, 2026
# Oxlint
### 💥 BREAKING CHANGES

- 777fc40 ast: [**BREAKING**] Add `Ident` type (#18354) (Boshen)

### 🚀 Features

- 34c3ec3 linter/prefer-logical-operator-over-ternary: Implement fixer
(#18545) (camc314)
- 019e0aa linter/valid-typeof: Add suggestions if type is misspelled
(#18543) (camchenry)
- 704c8eb linter/use-isnan: Add more specific error message for
equality/inequality (#18542) (camchenry)
- 1e99ace linter/use-isnan: Support more `indexOf` cases and improve
diagnostic messages (#18537) (camchenry)
- bffd134 linter/text-encoding-identifier-case: Add `withDash` option
(#18533) (camc314)
- 993fd2b parser: Parse unambiguous await with better error messages
(#18480) (Boshen)
- b4b6247 linter/plugins: `RuleTester` support settings (#18445)
(overlookmotel)
- 15d69dc linter: Implement react/display-name rule (#18426) (camchenry)
- 2fbceae linter: Implement rule docs and config support for rules with
tuple config options. (#18372) (connorshea)
- 8db0e78 linter/plugins: Handle BOMs (#18376) (overlookmotel)
- 6ac09e2 linter/plugins: Support source text not being at start of
buffer (#18375) (overlookmotel)
- fc3c86b linter: Update 125 rules to raise errors when provided with
invalid config options. (#18104) (connorshea)
- 2cc6ad2 linter/plugins: Add `ecmaFeatures` to `parserOptions` (#18313)
(overlookmotel)

### 🐛 Bug Fixes

- 2acf568 linter/plugins: Keep `Infinity` in rule default options
(#18550) (overlookmotel)
- 332d2ef linter/plugins: Add `jsx` property to
`parserOptions.ecmaFeatures` (#18549) (overlookmotel)
- 7d9bb1b linter: Update `eslint/func-names` to error on invalid rule
config options, improve docs. (#18510) (connorshea)
- 9c67974 linter: Improve the jsx-a11y/no-noninteractive-tabindex rule
to match original rule logic better (#17848) (connorshea)
- 75e7163 vscode: Support json5 for oxfmt (#18502) (Sysix)
- c205b0d ast: Remove `ThisExpression` from `TSModuleReference` (#18489)
(Boshen)
- c51339a oxlint/lsp: Respect code action `source.fixAll` as an alias
for `source.fixAll.oxc` (#18366) (Sysix)
- 3c0e9b9 oxlint/lsp: Skip dangerous fixes/suggestions for "fix all"
code action and command (#18364) (Sysix)
- c44c093 linter: Fix behavior of unicorn/catch-error-name to match
original rule (#18209) (connorshea)
- 9c65aff linter/jsx-a11y: Change `no-autofocus` autofix to suggestion
(#18155) (Ben Lowery)
- 235c820 linter/unicorn: Fix `prefer-array-some` autofix for
`.filter().length` pattern (#18153) (Ben Lowery)
- a9925dc linter: Mark fixes in `unicorn/no-null` rule as dangerous.
(#18436) (connorshea)
- cee29b4 linter: Remove confusing scope from
`react/only-export-components` rule diagnostics. (#18434) (connorshea)
- aed3669 parser: Parse HTML-like comments in unambiguous mode (#18442)
(Boshen)
- b8a371d linter: Fix the path used in the gitlab format output (#18165)
(connorshea)
- e046ea6 linter: `vue/no-lifecycle-after-await` skip looking into arrow
functions (#18302) (Sysix)
- a9bfbcf linter: Compatibility issue with `DiagnosticData` type in
ESLint (#18396) (루밀LuMir)
- 10ab424 linter: `react/no_array_index_key` continue search for other
attributes (#18409) (Lonami)
- 9d776d4 linter: Update `import/no-cycle` rule to error on invalid
config options. (#18330) (connorshea)
- c163231 linter: Update eslint/sort-imports to validate options.
(#18378) (connorshea)
- 79bbcff linter: Update `eslint/func-style` to error on invalid
configuration options. (#18390) (connorshea)
- b871235 linter/plugins: Fix identifying "use strict" directives in
scope analysis (#18402) (overlookmotel)
- 5985141 linter: Update `jest/prefer-lowercase-title` rule to error on
invalid config options. (#18332) (connorshea)
- faca4b5 linter/plugins: Tokenize `let`, `static` and `yield` as
`Keyword`s (#18368) (overlookmotel)
- a3914fd linter/plugins: Allow line number passed to `report` to be 1
over line count (#18341) (overlookmotel)
- 88e0896 linter: Update `typescript/no-restricted-types` rule to error
on invalid config options. (#18329) (connorshea)
- 9eec600 linter: Update `react/jsx-fragments` rule to raise an error on
invalid configuration options (#18111) (connorshea)
- 0fa969d linter: Update `react/no-will-update-set-state` to error on
invalid config options (#18112) (connorshea)
- 70e7be4 linter: Update `import/no-unassigned-import` to raise an error
when passed invalid config options. (#18108) (connorshea)
- 496cac7 linter: Update `unicorn/explicit-length-check` to raise an
error when passed invalid config options. (#18107) (connorshea)
- 080b1ec linter: Update 5 more rules to error on invalid config
options. (#18113) (connorshea)
- c5d05dd linter: Update 11 rules to raise an error on invalid config
options. (#18109) (connorshea)
- 9e359d4 linter/plugins: Set all properties on global vars objects
(#18317) (overlookmotel)
- 39c7f32 linter/plugins: Set `writeable` flag on variables where
defined as globals (#18316) (overlookmotel)
- a570693 linter/plugins: Fix `CatchClause` scopes (#18312)
(overlookmotel)
- 8c98e69 linter: `vitest/prefer-describe-function-title`: Check earlier
to avoid false positive (#18177) (Jovi De Croock)
- 44be0eb linter/plugins: Set scope analyse settings based on source
type (#18306) (overlookmotel)
- b9a14fd vscode: Update package.json to restrict a few more config
options. (#18270) (Connor Shea)
- c1260cb vscode: Update version info formatting. (#18274) (connorshea)
- 2f68dc6 vscode: Update notification for client restart to specify
tool. (#18273) (connorshea)

### ⚡ Performance

- dc931ba linter/no-inner-declarations: Skip scope flags lookup in
modules (#18249) (overlookmotel)
- 07618a7 linter: Turn off `scope_build_child_ids` for SemanticBuilder
(#18360) (Dunqing)
- 1aac079 linter/exhaustive-deps: Simplify the logic of checking if the
identifier it is a dependency of hook (#18350) (Dunqing)
- 591d522 linter/block-scoped-var: Avoid `iter_all_scope_child_ids` by
walking references/redeclarations scope ancestors (#18335) (Dunqing)
- 2eefd6d linter/plugins: Remove branch from token parsing (#18369)
(overlookmotel)

### 📚 Documentation

- 698c21d linter: Modernize docs for various React rules (#18559)
(connorshea)
- 314a47c linter: Clarify the `no-find-dom-node` rule with a note that
the method was removed in React 19. (#18556) (connorshea)
- 5eff704 linter: Update `no-inner-declarations` to fix config option
docs (#18511) (connorshea)
- dd5d2f6 linter: Improve diagnostic message in `valid_typeof` rule.
(#18507) (connorshea)
- 8ccd853 npm: Update package homepage URLs and add keywords (#18509)
(Boshen)
- 4958233 linter: Add missing "What it does" section in
prefer-reflect-apply rule. (#18475) (connorshea)
- 2fa83a4 linter: Improve the docs for import/unambiguous. (#18474)
(connorshea)
- 7b1505c linter: Improve docs for `oxc/only-used-in-recursion` rule.
(#18473) (connorshea)
- ab506d6 linter/plugins: Correct comment (#18456) (overlookmotel)
- 4565c73 linter: `react/display-name`: add docs for config options
(#18430) (camchenry)
- b95a89f linter: Fix docs for the curly rule. (#18374) (connorshea)
- f675eb4 linter: Fix the `react/only-export-components` rule docs.
(#18319) (connorshea)
- 704db95 linter: "no-unused-vars" extend ignored files section for
svelte and astro files (#18304) (Sysix)
- 3af4a88 linter: Add "Examples" headers to rules missing them (#18266)
(connorshea)
# Oxfmt
### 💥 BREAKING CHANGES

- 777fc40 ast: [**BREAKING**] Add `Ident` type (#18354) (Boshen)

### 🚀 Features

- d71c15d oxfmt: Enable tailwind sort inside xxx-in-js (#18417)
(leaysgur)
- 52b5003 formatter,oxfmt: Support Angular `@Component({ template,
styles })` (#18324) (leaysgur)

### 🐛 Bug Fixes

- 224140c oxfmt: Canonicalize `..` component in config path (#18570)
(leaysgur)
- 30b467e formatter: Preserve trailing comments before the semicolon in
class methods without a body (#18446) (Dunqing)
- c205b0d ast: Remove `ThisExpression` from `TSModuleReference` (#18489)
(Boshen)
- 164bbd7 formatter: Preserve trailing comments inside ternary alternate
branch (#18433) (Dunqing)
- 1c50800 formatter: Use HTML entity escaping for JSX attribute strings
(#18385) (Boshen)
- 4e156d2 formatter: Preserve parentheses for `in` expressions in arrow
function block bodies (#18352) (Boshen)
- 7e6c15b oxfmt: Increase Tailwind CSS test timeout for Windows CI
(#18339) (Boshen)
- 29966eb formatter/dead-code-removal: Handle tailwind sorting (#18321)
(leaysgur)
- 29f41be formatter: Only expand mapped types when newline immediately
follows opening brace (#18087) (Boshen)
- 2194552 formatter: Relocate leading comments for single-element
union/intersection types (#18083) (Boshen)

### ⚡ Performance

- 85ab400 formatter: Store `AstNodes` itself instead of `&'a AstNodes`
as the `parent` field of `AstNode` (#18428) (Dunqing)
- 194d384 formatter: Reduce AstNode size by 8 bytes using
following_span_start (#18347) (Dunqing)
- b2df8fb oxfmt: Enable tailwind plugin only for relevant parser
(#18418) (leaysgur)

### 📚 Documentation

- 8ccd853 npm: Update package homepage URLs and add keywords (#18509)
(Boshen)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
graphite-app bot pushed a commit that referenced this pull request Jan 30, 2026
Previously we had to store source text at start of buffers sent to JS via raw transfer.

#18376 made changes to how raw transfer deserializer handles strings, in order to support files containing a BOM. Building on that, we're now able to remove the requirement that source text be at start of the buffer entirely.

This PR changes the deserializer used in Oxlint JS plugins to accept source text being anywhere in the buffer, *as long as no other strings are after it*. In practice this just means that the source text must be allocated before anything else, which is easy to satisfy.

Now the source text can be allocated with just the usual safe `allocator.alloc_str(source_text)` method.

This change removes a ton of dodgy workarounds and unsafe code we used previously to get source text at the start of buffer. It makes the code less labyrinthine and far less likely a slip up can inadvertently introduce UB.

Note: In `napi/parser`, source text still *is* at start of the buffer, as that's simpler and more efficient when the source text is written into the buffer on JS side. This change only affects Oxlint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ast-tools Area - AST tools A-cli Area - CLI A-linter Area - Linter A-linter-plugins Area - Linter JS plugins A-parser Area - Parser C-enhancement Category - New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Linter plugins: Handle BOM

2 participants