perf(cli): lock-free storage, and reduce system calls by ematipico · Pull Request #9391 · biomejs/biome

ematipico · 2026-03-08T11:56:06Z

Summary

This PR attempts to improve the speed of the CLI with two changes:

Use papaya::HashSet instead of BTreeSet for the evaluated_paths. Sorting is used for predictable reporting; it's useless during crawling, as it only slows us down. So now we use a lock-free data structure for collecting the paths, and we sort them only at the end. There's only one small hiccup, where we need to do the collection twice, but I believe the cost here is less than keeping all paths ordered.
Reduce system calls. BiomePath now keeps the information of what kind of path is (file/dir/symbolic link), so we can re-use this information in hot paths such as is_ignored. Slow paths such as indexing of dependencies stay the same, so we don't lose anything with this refactor.

Test Plan

Current tests should pass.

I'll create a preview release and test it in some projects.

Docs

N/A

changeset-bot · 2026-03-08T11:56:11Z

🦋 Changeset detected

Latest commit: 5c4fd9a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 13 packages

Name	Type
@biomejs/biome	Patch
@biomejs/cli-win32-x64	Patch
@biomejs/cli-win32-arm64	Patch
@biomejs/cli-darwin-x64	Patch
@biomejs/cli-darwin-arm64	Patch
@biomejs/cli-linux-x64	Patch
@biomejs/cli-linux-arm64	Patch
@biomejs/cli-linux-x64-musl	Patch
@biomejs/cli-linux-arm64-musl	Patch
@biomejs/wasm-web	Patch
@biomejs/wasm-bundler	Patch
@biomejs/wasm-nodejs	Patch
@biomejs/backend-jsonrpc	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

ematipico · 2026-03-08T11:57:15Z

If I see some concrete result, I will add a changeset. Otherwise, no, but I would like to keep the change nonetheless.

coderabbitai · 2026-03-08T11:59:19Z

Walkthrough

This PR replaces many usages of BTreeSet<BiomePath> with Vec<BiomePath> across reporter and traversal APIs, migrates traversal storage from RwLock<BTreeSet<...>> to papaya::HashSet with pin-based HashSetRef access, adds PathKind (including is_symlink) to BiomePath with new constructors/accessors, and extends ignore APIs to accept explicit path_kind / is_dir parameters. Several method and struct signatures were updated to propagate these types.

Possibly related PRs

refactor(cli): make commands more generic #8485 — Modifies reporter/runner interfaces and evaluated_paths types (directly overlaps API changes to ReporterVisitor and Crawler).
fix(cli): files in ignored folders are now correctly ignored #7275 — Adjusts ignore/traversal decision APIs (related changes to PathIsIgnoredParams / is_dir propagation).
feat(core): add force-ignore syntax to files.includes #7359 — Changes ignore-handling surfaces and path-kind plumbing (method signature and bridge updates that align with this PR).

Suggested reviewers

dyc3
siketyan

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarises the main changes: replacing BTreeSet with a lock-free HashSet for better performance and reducing system calls by caching path kind information.
Description check	✅ Passed	The description directly addresses the changeset by explaining both performance improvements: lock-free storage using papaya::HashSet and reducing system calls via cached path kind information.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch perf/lock-free-collection

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codspeed-hq · 2026-03-08T11:59:40Z

Merging this PR will not alter performance

✅ 58 untouched benchmarks
⏩ 156 skipped benchmarks¹

_{Comparing perf/lock-free-collection (5c4fd9a) with main (f4d2073)²}

156 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on main (71c7df6) during the generation of this report, so f4d2073 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

crates/biome_fs/src/interner.rs (1)

32-42: Redundant .to_path_buf() call.

path is already a Utf8PathBuf, so path.to_path_buf() is effectively a clone. You can use path directly:

♻️ Proposed simplification

     pub fn intern_path(&self, path: Utf8PathBuf) -> bool {
         let guard = self.storage.pin();
         if guard.contains(&path) {
             return false;
         }
-        let owned = path.to_path_buf();
-        let result = guard.insert(owned.clone());
+        let result = guard.insert(path.clone());
         if result {
-            self.handler.send(owned).ok();
+            self.handler.send(path).ok();
         }
         result
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/biome_fs/src/interner.rs` around lines 32 - 42, The call to
path.to_path_buf() is redundant because path is already a Utf8PathBuf; replace
the allocation with a direct clone (e.g., use path.clone()) or otherwise use
path directly when inserting/sending to avoid an unnecessary conversion. Update
the block around self.storage.pin(), guard.contains(&path), guard.insert(…), and
self.handler.send(…) to create owned via path.clone() (or use path directly) and
then insert/send that owned value so behavior remains identical but without the
redundant to_path_buf() call.

crates/biome_service/src/scanner.rs (1)

430-453: Keep the cached PathKind here.

Line 430 rebuilds each BiomePath from the raw path, so Line 453 has to symlink_path_kind() everything again. Cloning the stored BiomePath keeps the cached kind and lets this optimisation pay off in watch mode too.

♻️ Small refactor

-        let mut evaluated_paths: Vec<_> = ctx
-            .evaluated_paths()
-            .iter()
-            .map(|p| BiomePath::new(p.as_path()))
-            .collect();
+        let mut evaluated_paths: Vec<_> = ctx.evaluated_paths().iter().cloned().collect();

-            } else if ctx.watch && fs.symlink_path_kind(&path).is_ok_and(PathKind::is_dir) {
+            } else if ctx.watch && path.path_kind().is_dir() {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/biome_service/src/scanner.rs` around lines 430 - 453, The loop is
rebuilding BiomePath instances later which loses the cached PathKind; instead
construct and keep BiomePath values up front (use evaluated_paths:
Vec<BiomePath> via BiomePath::new) and then clone those BiomePath instances
where needed in the processing loop so the cached kind is preserved; update uses
around evaluated_paths, for path in evaluated_paths, and any place that
previously reconstructed a BiomePath so they consume or clone the stored
BiomePath (references: evaluated_paths, BiomePath::new, ctx.watch,
fs.symlink_path_kind, PathKind::is_dir, handleable_paths).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_cli/src/runner/crawler.rs`:
- Around line 147-150: increment_changed currently calls
evaluated_paths.pin().insert(path.to_written()) but because BiomePath's
Hash/PartialEq omit the was_written flag the insert won't replace the existing
entry; locate the increment_changed function and either (A) remove the existing
key first and then insert the to_written variant (e.g., lookup/remove by the
equivalent key then insert path.to_written()), or (B) change BiomePath's
equality/hash to include was_written so insert will replace the entry; update
the code around evaluated_paths.pin().insert and ensure
push_message(Message::Stats(MessageStat::Changed)) remains unchanged.

---

Nitpick comments:
In `@crates/biome_fs/src/interner.rs`:
- Around line 32-42: The call to path.to_path_buf() is redundant because path is
already a Utf8PathBuf; replace the allocation with a direct clone (e.g., use
path.clone()) or otherwise use path directly when inserting/sending to avoid an
unnecessary conversion. Update the block around self.storage.pin(),
guard.contains(&path), guard.insert(…), and self.handler.send(…) to create owned
via path.clone() (or use path directly) and then insert/send that owned value so
behavior remains identical but without the redundant to_path_buf() call.

In `@crates/biome_service/src/scanner.rs`:
- Around line 430-453: The loop is rebuilding BiomePath instances later which
loses the cached PathKind; instead construct and keep BiomePath values up front
(use evaluated_paths: Vec<BiomePath> via BiomePath::new) and then clone those
BiomePath instances where needed in the processing loop so the cached kind is
preserved; update uses around evaluated_paths, for path in evaluated_paths, and
any place that previously reconstructed a BiomePath so they consume or clone the
stored BiomePath (references: evaluated_paths, BiomePath::new, ctx.watch,
fs.symlink_path_kind, PathKind::is_dir, handleable_paths).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3b34e396-6e86-42b0-b700-d92e6e167276

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb9edc and 533cd86.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock and included by **

📒 Files selected for processing (17)

crates/biome_cli/Cargo.toml
crates/biome_cli/src/reporter/mod.rs
crates/biome_cli/src/reporter/summary.rs
crates/biome_cli/src/reporter/terminal.rs
crates/biome_cli/src/runner/crawler.rs
crates/biome_cli/src/runner/handler.rs
crates/biome_cli/src/runner/impls/commands/traversal.rs
crates/biome_cli/src/runner/impls/crawlers/default.rs
crates/biome_cli/src/runner/impls/finalizers/default.rs
crates/biome_fs/src/fs.rs
crates/biome_fs/src/fs/os.rs
crates/biome_fs/src/interner.rs
crates/biome_fs/src/path.rs
crates/biome_service/src/scanner.rs
crates/biome_service/src/scanner/watcher.rs
crates/biome_service/src/scanner/workspace_bridges.rs
crates/biome_service/src/workspace/server.rs

crates/biome_cli/src/runner/crawler.rs

dyc3 · 2026-03-08T14:15:59Z

On my relatively small work repo, I got from ~830 ms -> ~795ms. I also tried this on the signoz repo, which takes ~2s for me on 2.4.6, and biome did not exit. It got stuck on something before it could start analyzing.

ematipico · 2026-03-08T14:29:23Z

Oh good to know. Can you share the repo so I can test it?

dyc3 · 2026-03-08T14:47:11Z

Specifically this branch: SigNoz/signoz#10333

cd frontend
yarn install
run biome

https://github.com/elastic/kibana is another good repo to test on

ematipico · 2026-03-08T15:20:46Z

I also tried this on the signoz repo

I tried too, and it works as expected. This branch is slightly faster, from ~5s to ~4s

ematipico · 2026-03-08T15:39:03Z

Oh, there's one thing to note. Preview releases enable recommended nursery rules because we don't provide BIOME_VERSION. So if there's some regressions, it's because more rules are executed

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

crates/biome_service/src/scanner/test_utils.rs (1)

88-94: Don't let path_kind become decorative in the mock.

Line 93 drops the new signal entirely, so these watcher tests still stay green if a caller passes None or the wrong kind. Worth teaching the mock to assert or model the expected kind in the ignore-path cases, otherwise this API change is barely exercised here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_service/src/scanner/test_utils.rs` around lines 88 - 94, The
mock implementation of is_ignored currently ignores the new path_kind parameter
(named _path_kind) so tests don’t exercise the API change; update the mock to
accept and validate path_kind (remove the leading underscore), and in the
ignore-path branches assert/expect that path_kind matches the expected PathKind
(or explicitly handle None when applicable), returning an error or failing the
test if it doesn’t match; update any test callers of is_ignored to pass the
correct PathKind where needed so the mock actually verifies the signal.

crates/biome_cli/src/runner/crawler.rs (1)

91-95: Skip the second full snapshot when nothing changed.

On read-only runs, increment_changed() never mutates the set, so this extra collect + sort is just an O(n log n) scenic route. Reusing handle_paths when changed == 0 keeps the fast path fast.

⚡ Possible trim

-        // Re-collect after handle phase to capture was_written mutations
-        let mut evaluated_paths: Vec<_> = ctx.evaluated_paths().into_iter().cloned().collect();
-        evaluated_paths.sort_unstable();
+        // Re-collect only when handle mutated the written bit.
+        let evaluated_paths = if ctx.changed.load(Ordering::Relaxed) == 0 {
+            handle_paths
+        } else {
+            let mut evaluated_paths: Vec<_> = ctx.evaluated_paths().into_iter().cloned().collect();
+            evaluated_paths.sort_unstable();
+            evaluated_paths
+        };
 
         (start.elapsed(), evaluated_paths)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/biome_cli/src/runner/crawler.rs` around lines 91 - 95, The code
currently always re-collects and sorts ctx.evaluated_paths() into
evaluated_paths even when nothing changed; modify the end of the function to
check the mutation count (use the existing changed variable incremented by
increment_changed()) and when changed == 0 return the previously computed
handle_paths (already sorted/collected) instead of re-collecting; keep the
existing tuple return shape (start.elapsed(), paths) and only perform the
ctx.evaluated_paths().into_iter().cloned().collect()/sort_unstable() path when
changed > 0 so read-only runs avoid the extra O(n log n) work.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_cli/src/runner/crawler.rs`:
- Around line 86-87: The handle pass is dropping cached BiomePath.kind by
converting to PathBuf in the loop that calls TraversalScope::handle; update the
API so the handler receives the full BiomePath with its evaluated kind: add a
new TraversalScope::handle_path(&self, ctx, biome_path: BiomePath) (or overload
TraversalScope::handle to accept BiomePath) and change the loop that iterates
handle_paths to call that new method with the original BiomePath (or clone it)
instead of path.to_path_buf(); alternatively, if you prefer not to change the
API, propagate the evaluated path_kind alongside the PathBuf and reconstruct
BiomePath::with_kind(path, path_kind) before calling TraversalScope::handle so
directories and symlinks retain their metadata when Handler::handle inspects
biome_path.path_kind().

---

Nitpick comments:
In `@crates/biome_cli/src/runner/crawler.rs`:
- Around line 91-95: The code currently always re-collects and sorts
ctx.evaluated_paths() into evaluated_paths even when nothing changed; modify the
end of the function to check the mutation count (use the existing changed
variable incremented by increment_changed()) and when changed == 0 return the
previously computed handle_paths (already sorted/collected) instead of
re-collecting; keep the existing tuple return shape (start.elapsed(), paths) and
only perform the
ctx.evaluated_paths().into_iter().cloned().collect()/sort_unstable() path when
changed > 0 so read-only runs avoid the extra O(n log n) work.

In `@crates/biome_service/src/scanner/test_utils.rs`:
- Around line 88-94: The mock implementation of is_ignored currently ignores the
new path_kind parameter (named _path_kind) so tests don’t exercise the API
change; update the mock to accept and validate path_kind (remove the leading
underscore), and in the ignore-path branches assert/expect that path_kind
matches the expected PathKind (or explicitly handle None when applicable),
returning an error or failing the test if it doesn’t match; update any test
callers of is_ignored to pass the correct PathKind where needed so the mock
actually verifies the signal.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0d48dff3-57f3-4b59-a266-4908c633b2f9

📥 Commits

Reviewing files that changed from the base of the PR and between 0cec9f3 and e8701f1.

⛔ Files ignored due to path filters (1)

packages/@biomejs/backend-jsonrpc/src/workspace.ts is excluded by !**/backend-jsonrpc/src/workspace.ts and included by **

📒 Files selected for processing (3)

crates/biome_cli/src/runner/crawler.rs
crates/biome_fs/src/fs/memory.rs
crates/biome_service/src/scanner/test_utils.rs

crates/biome_cli/src/runner/crawler.rs

arendjr

Love it!

coderabbitai

🧹 Nitpick comments (1)

.changeset/wet-grapes-cross.md (1)
5-5: Polish the release note wording.

This reads a bit mechanical, and ~2K files feels more like an internal benchmark than an end-user note. Something like Improved CLI performance in large projects. would be cleaner. As per coding guidelines, "Changeset descriptions must be written for end users (explain impact and what changed), not developers (avoid implementation details) in 1-3 sentences".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.changeset/wet-grapes-cross.md at line 5, Update the changeset description
in .changeset/wet-grapes-cross.md to be end-user facing and concise (1–3
sentences); replace the current line "Slightly increased the performance of the
CLI in projects that have more than ~2K files." with a polished note such as
"Improved CLI performance in large projects, reducing time spent scanning many
files." to remove internal benchmarking phrasing and focus on user impact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.changeset/wet-grapes-cross.md:
- Line 5: Update the changeset description in .changeset/wet-grapes-cross.md to
be end-user facing and concise (1–3 sentences); replace the current line
"Slightly increased the performance of the CLI in projects that have more than
~2K files." with a polished note such as "Improved CLI performance in large
projects, reducing time spent scanning many files." to remove internal
benchmarking phrasing and focus on user impact.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d3aec368-02ef-46e3-9b85-b9440d422801

📥 Commits

Reviewing files that changed from the base of the PR and between e8701f1 and 5c4fd9a.

📒 Files selected for processing (1)

.changeset/wet-grapes-cross.md

perf(cli): lock-free evaluation paths collection

ec5c4dd

perf(cli): reduce sys call for is_ignored by using path kind instead

533cd86

ematipico force-pushed the perf/lock-free-collection branch from f452450 to 533cd86 Compare March 8, 2026 11:56

ematipico requested review from a team March 8, 2026 11:56

github-actions bot added A-CLI Area: CLI A-Core Area: core A-Project Area: project labels Mar 8, 2026

coderabbitai bot reviewed Mar 8, 2026

View reviewed changes

crates/biome_cli/src/runner/crawler.rs Show resolved Hide resolved

linting and more

0cec9f3

github-actions bot added the A-LSP Area: language server protocol label Mar 8, 2026

[autofix.ci] apply automated fixes

6f1805b

more linting and address feedback

e8701f1

coderabbitai bot reviewed Mar 8, 2026

View reviewed changes

crates/biome_cli/src/runner/crawler.rs Show resolved Hide resolved

arendjr approved these changes Mar 11, 2026

View reviewed changes

chore: add a small changeset

5c4fd9a

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

ematipico merged commit 4bffb66 into main Mar 11, 2026
34 checks passed

ematipico deleted the perf/lock-free-collection branch March 11, 2026 16:18

github-actions bot mentioned this pull request Mar 11, 2026

ci: release #9346

Merged

Uh oh!

Conversation

ematipico commented Mar 8, 2026

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

ematipico commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Suggested reviewers

Uh oh!

codspeed-hq bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dyc3 commented Mar 8, 2026

Uh oh!

ematipico commented Mar 8, 2026

Uh oh!

dyc3 commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ematipico commented Mar 8, 2026

Uh oh!

ematipico commented Mar 8, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arendjr left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot bot commented Mar 8, 2026 •

edited

Loading

ematipico commented Mar 8, 2026 •

edited

Loading

coderabbitai bot commented Mar 8, 2026 •

edited

Loading

codspeed-hq bot commented Mar 8, 2026 •

edited

Loading

dyc3 commented Mar 8, 2026 •

edited

Loading