Skip to content

perf(cli): lock-free storage, and reduce system calls#9391

Merged
ematipico merged 6 commits intomainfrom
perf/lock-free-collection
Mar 11, 2026
Merged

perf(cli): lock-free storage, and reduce system calls#9391
ematipico merged 6 commits intomainfrom
perf/lock-free-collection

Conversation

@ematipico
Copy link
Member

Summary

This PR attempts to improve the speed of the CLI with two changes:

  • Use papaya::HashSet instead of BTreeSet for the evaluated_paths. Sorting is used for predictable reporting; it's useless during crawling, as it only slows us down. So now we use a lock-free data structure for collecting the paths, and we sort them only at the end. There's only one small hiccup, where we need to do the collection twice, but I believe the cost here is less than keeping all paths ordered.
  • Reduce system calls. BiomePath now keeps the information of what kind of path is (file/dir/symbolic link), so we can re-use this information in hot paths such as is_ignored. Slow paths such as indexing of dependencies stay the same, so we don't lose anything with this refactor.

Test Plan

Current tests should pass.

I'll create a preview release and test it in some projects.

Docs

N/A

@changeset-bot
Copy link

changeset-bot bot commented Mar 8, 2026

🦋 Changeset detected

Latest commit: 5c4fd9a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 13 packages
Name Type
@biomejs/biome Patch
@biomejs/cli-win32-x64 Patch
@biomejs/cli-win32-arm64 Patch
@biomejs/cli-darwin-x64 Patch
@biomejs/cli-darwin-arm64 Patch
@biomejs/cli-linux-x64 Patch
@biomejs/cli-linux-arm64 Patch
@biomejs/cli-linux-x64-musl Patch
@biomejs/cli-linux-arm64-musl Patch
@biomejs/wasm-web Patch
@biomejs/wasm-bundler Patch
@biomejs/wasm-nodejs Patch
@biomejs/backend-jsonrpc Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@ematipico ematipico force-pushed the perf/lock-free-collection branch from f452450 to 533cd86 Compare March 8, 2026 11:56
@ematipico ematipico requested review from a team March 8, 2026 11:56
@github-actions github-actions bot added A-CLI Area: CLI A-Core Area: core A-Project Area: project labels Mar 8, 2026
@ematipico
Copy link
Member Author

ematipico commented Mar 8, 2026

If I see some concrete result, I will add a changeset. Otherwise, no, but I would like to keep the change nonetheless.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

Walkthrough

This PR replaces many usages of BTreeSet<BiomePath> with Vec<BiomePath> across reporter and traversal APIs, migrates traversal storage from RwLock<BTreeSet<...>> to papaya::HashSet with pin-based HashSetRef access, adds PathKind (including is_symlink) to BiomePath with new constructors/accessors, and extends ignore APIs to accept explicit path_kind / is_dir parameters. Several method and struct signatures were updated to propagate these types.

Possibly related PRs

Suggested reviewers

  • dyc3
  • siketyan
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarises the main changes: replacing BTreeSet with a lock-free HashSet for better performance and reducing system calls by caching path kind information.
Description check ✅ Passed The description directly addresses the changeset by explaining both performance improvements: lock-free storage using papaya::HashSet and reducing system calls via cached path kind information.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch perf/lock-free-collection

Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 8, 2026

Merging this PR will not alter performance

✅ 58 untouched benchmarks
⏩ 156 skipped benchmarks1


Comparing perf/lock-free-collection (5c4fd9a) with main (f4d2073)2

Open in CodSpeed

Footnotes

  1. 156 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (71c7df6) during the generation of this report, so f4d2073 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/biome_fs/src/interner.rs (1)

32-42: Redundant .to_path_buf() call.

path is already a Utf8PathBuf, so path.to_path_buf() is effectively a clone. You can use path directly:

♻️ Proposed simplification
     pub fn intern_path(&self, path: Utf8PathBuf) -> bool {
         let guard = self.storage.pin();
         if guard.contains(&path) {
             return false;
         }
-        let owned = path.to_path_buf();
-        let result = guard.insert(owned.clone());
+        let result = guard.insert(path.clone());
         if result {
-            self.handler.send(owned).ok();
+            self.handler.send(path).ok();
         }
         result
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_fs/src/interner.rs` around lines 32 - 42, The call to
path.to_path_buf() is redundant because path is already a Utf8PathBuf; replace
the allocation with a direct clone (e.g., use path.clone()) or otherwise use
path directly when inserting/sending to avoid an unnecessary conversion. Update
the block around self.storage.pin(), guard.contains(&path), guard.insert(…), and
self.handler.send(…) to create owned via path.clone() (or use path directly) and
then insert/send that owned value so behavior remains identical but without the
redundant to_path_buf() call.
crates/biome_service/src/scanner.rs (1)

430-453: Keep the cached PathKind here.

Line 430 rebuilds each BiomePath from the raw path, so Line 453 has to symlink_path_kind() everything again. Cloning the stored BiomePath keeps the cached kind and lets this optimisation pay off in watch mode too.

♻️ Small refactor
-        let mut evaluated_paths: Vec<_> = ctx
-            .evaluated_paths()
-            .iter()
-            .map(|p| BiomePath::new(p.as_path()))
-            .collect();
+        let mut evaluated_paths: Vec<_> = ctx.evaluated_paths().iter().cloned().collect();

-            } else if ctx.watch && fs.symlink_path_kind(&path).is_ok_and(PathKind::is_dir) {
+            } else if ctx.watch && path.path_kind().is_dir() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_service/src/scanner.rs` around lines 430 - 453, The loop is
rebuilding BiomePath instances later which loses the cached PathKind; instead
construct and keep BiomePath values up front (use evaluated_paths:
Vec<BiomePath> via BiomePath::new) and then clone those BiomePath instances
where needed in the processing loop so the cached kind is preserved; update uses
around evaluated_paths, for path in evaluated_paths, and any place that
previously reconstructed a BiomePath so they consume or clone the stored
BiomePath (references: evaluated_paths, BiomePath::new, ctx.watch,
fs.symlink_path_kind, PathKind::is_dir, handleable_paths).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_cli/src/runner/crawler.rs`:
- Around line 147-150: increment_changed currently calls
evaluated_paths.pin().insert(path.to_written()) but because BiomePath's
Hash/PartialEq omit the was_written flag the insert won't replace the existing
entry; locate the increment_changed function and either (A) remove the existing
key first and then insert the to_written variant (e.g., lookup/remove by the
equivalent key then insert path.to_written()), or (B) change BiomePath's
equality/hash to include was_written so insert will replace the entry; update
the code around evaluated_paths.pin().insert and ensure
push_message(Message::Stats(MessageStat::Changed)) remains unchanged.

---

Nitpick comments:
In `@crates/biome_fs/src/interner.rs`:
- Around line 32-42: The call to path.to_path_buf() is redundant because path is
already a Utf8PathBuf; replace the allocation with a direct clone (e.g., use
path.clone()) or otherwise use path directly when inserting/sending to avoid an
unnecessary conversion. Update the block around self.storage.pin(),
guard.contains(&path), guard.insert(…), and self.handler.send(…) to create owned
via path.clone() (or use path directly) and then insert/send that owned value so
behavior remains identical but without the redundant to_path_buf() call.

In `@crates/biome_service/src/scanner.rs`:
- Around line 430-453: The loop is rebuilding BiomePath instances later which
loses the cached PathKind; instead construct and keep BiomePath values up front
(use evaluated_paths: Vec<BiomePath> via BiomePath::new) and then clone those
BiomePath instances where needed in the processing loop so the cached kind is
preserved; update uses around evaluated_paths, for path in evaluated_paths, and
any place that previously reconstructed a BiomePath so they consume or clone the
stored BiomePath (references: evaluated_paths, BiomePath::new, ctx.watch,
fs.symlink_path_kind, PathKind::is_dir, handleable_paths).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3b34e396-6e86-42b0-b700-d92e6e167276

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb9edc and 533cd86.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock and included by **
📒 Files selected for processing (17)
  • crates/biome_cli/Cargo.toml
  • crates/biome_cli/src/reporter/mod.rs
  • crates/biome_cli/src/reporter/summary.rs
  • crates/biome_cli/src/reporter/terminal.rs
  • crates/biome_cli/src/runner/crawler.rs
  • crates/biome_cli/src/runner/handler.rs
  • crates/biome_cli/src/runner/impls/commands/traversal.rs
  • crates/biome_cli/src/runner/impls/crawlers/default.rs
  • crates/biome_cli/src/runner/impls/finalizers/default.rs
  • crates/biome_fs/src/fs.rs
  • crates/biome_fs/src/fs/os.rs
  • crates/biome_fs/src/interner.rs
  • crates/biome_fs/src/path.rs
  • crates/biome_service/src/scanner.rs
  • crates/biome_service/src/scanner/watcher.rs
  • crates/biome_service/src/scanner/workspace_bridges.rs
  • crates/biome_service/src/workspace/server.rs

@github-actions github-actions bot added the A-LSP Area: language server protocol label Mar 8, 2026
@dyc3
Copy link
Contributor

dyc3 commented Mar 8, 2026

On my relatively small work repo, I got from ~830 ms -> ~795ms. I also tried this on the signoz repo, which takes ~2s for me on 2.4.6, and biome did not exit. It got stuck on something before it could start analyzing.

@ematipico
Copy link
Member Author

Oh good to know. Can you share the repo so I can test it?

@dyc3
Copy link
Contributor

dyc3 commented Mar 8, 2026

Specifically this branch: SigNoz/signoz#10333

  1. cd frontend
  2. yarn install
  3. run biome

https://github.com/elastic/kibana is another good repo to test on

@ematipico
Copy link
Member Author

I also tried this on the signoz repo

I tried too, and it works as expected. This branch is slightly faster, from ~5s to ~4s

@ematipico
Copy link
Member Author

Oh, there's one thing to note. Preview releases enable recommended nursery rules because we don't provide BIOME_VERSION. So if there's some regressions, it's because more rules are executed

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/biome_service/src/scanner/test_utils.rs (1)

88-94: Don't let path_kind become decorative in the mock.

Line 93 drops the new signal entirely, so these watcher tests still stay green if a caller passes None or the wrong kind. Worth teaching the mock to assert or model the expected kind in the ignore-path cases, otherwise this API change is barely exercised here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_service/src/scanner/test_utils.rs` around lines 88 - 94, The
mock implementation of is_ignored currently ignores the new path_kind parameter
(named _path_kind) so tests don’t exercise the API change; update the mock to
accept and validate path_kind (remove the leading underscore), and in the
ignore-path branches assert/expect that path_kind matches the expected PathKind
(or explicitly handle None when applicable), returning an error or failing the
test if it doesn’t match; update any test callers of is_ignored to pass the
correct PathKind where needed so the mock actually verifies the signal.
crates/biome_cli/src/runner/crawler.rs (1)

91-95: Skip the second full snapshot when nothing changed.

On read-only runs, increment_changed() never mutates the set, so this extra collect + sort is just an O(n log n) scenic route. Reusing handle_paths when changed == 0 keeps the fast path fast.

⚡ Possible trim
-        // Re-collect after handle phase to capture was_written mutations
-        let mut evaluated_paths: Vec<_> = ctx.evaluated_paths().into_iter().cloned().collect();
-        evaluated_paths.sort_unstable();
+        // Re-collect only when handle mutated the written bit.
+        let evaluated_paths = if ctx.changed.load(Ordering::Relaxed) == 0 {
+            handle_paths
+        } else {
+            let mut evaluated_paths: Vec<_> = ctx.evaluated_paths().into_iter().cloned().collect();
+            evaluated_paths.sort_unstable();
+            evaluated_paths
+        };
 
         (start.elapsed(), evaluated_paths)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_cli/src/runner/crawler.rs` around lines 91 - 95, The code
currently always re-collects and sorts ctx.evaluated_paths() into
evaluated_paths even when nothing changed; modify the end of the function to
check the mutation count (use the existing changed variable incremented by
increment_changed()) and when changed == 0 return the previously computed
handle_paths (already sorted/collected) instead of re-collecting; keep the
existing tuple return shape (start.elapsed(), paths) and only perform the
ctx.evaluated_paths().into_iter().cloned().collect()/sort_unstable() path when
changed > 0 so read-only runs avoid the extra O(n log n) work.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_cli/src/runner/crawler.rs`:
- Around line 86-87: The handle pass is dropping cached BiomePath.kind by
converting to PathBuf in the loop that calls TraversalScope::handle; update the
API so the handler receives the full BiomePath with its evaluated kind: add a
new TraversalScope::handle_path(&self, ctx, biome_path: BiomePath) (or overload
TraversalScope::handle to accept BiomePath) and change the loop that iterates
handle_paths to call that new method with the original BiomePath (or clone it)
instead of path.to_path_buf(); alternatively, if you prefer not to change the
API, propagate the evaluated path_kind alongside the PathBuf and reconstruct
BiomePath::with_kind(path, path_kind) before calling TraversalScope::handle so
directories and symlinks retain their metadata when Handler::handle inspects
biome_path.path_kind().

---

Nitpick comments:
In `@crates/biome_cli/src/runner/crawler.rs`:
- Around line 91-95: The code currently always re-collects and sorts
ctx.evaluated_paths() into evaluated_paths even when nothing changed; modify the
end of the function to check the mutation count (use the existing changed
variable incremented by increment_changed()) and when changed == 0 return the
previously computed handle_paths (already sorted/collected) instead of
re-collecting; keep the existing tuple return shape (start.elapsed(), paths) and
only perform the
ctx.evaluated_paths().into_iter().cloned().collect()/sort_unstable() path when
changed > 0 so read-only runs avoid the extra O(n log n) work.

In `@crates/biome_service/src/scanner/test_utils.rs`:
- Around line 88-94: The mock implementation of is_ignored currently ignores the
new path_kind parameter (named _path_kind) so tests don’t exercise the API
change; update the mock to accept and validate path_kind (remove the leading
underscore), and in the ignore-path branches assert/expect that path_kind
matches the expected PathKind (or explicitly handle None when applicable),
returning an error or failing the test if it doesn’t match; update any test
callers of is_ignored to pass the correct PathKind where needed so the mock
actually verifies the signal.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0d48dff3-57f3-4b59-a266-4908c633b2f9

📥 Commits

Reviewing files that changed from the base of the PR and between 0cec9f3 and e8701f1.

⛔ Files ignored due to path filters (1)
  • packages/@biomejs/backend-jsonrpc/src/workspace.ts is excluded by !**/backend-jsonrpc/src/workspace.ts and included by **
📒 Files selected for processing (3)
  • crates/biome_cli/src/runner/crawler.rs
  • crates/biome_fs/src/fs/memory.rs
  • crates/biome_service/src/scanner/test_utils.rs

Copy link
Contributor

@arendjr arendjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.changeset/wet-grapes-cross.md (1)

5-5: Polish the release note wording.

This reads a bit mechanical, and ~2K files feels more like an internal benchmark than an end-user note. Something like Improved CLI performance in large projects. would be cleaner. As per coding guidelines, "Changeset descriptions must be written for end users (explain impact and what changed), not developers (avoid implementation details) in 1-3 sentences".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.changeset/wet-grapes-cross.md at line 5, Update the changeset description
in .changeset/wet-grapes-cross.md to be end-user facing and concise (1–3
sentences); replace the current line "Slightly increased the performance of the
CLI in projects that have more than ~2K files." with a polished note such as
"Improved CLI performance in large projects, reducing time spent scanning many
files." to remove internal benchmarking phrasing and focus on user impact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.changeset/wet-grapes-cross.md:
- Line 5: Update the changeset description in .changeset/wet-grapes-cross.md to
be end-user facing and concise (1–3 sentences); replace the current line
"Slightly increased the performance of the CLI in projects that have more than
~2K files." with a polished note such as "Improved CLI performance in large
projects, reducing time spent scanning many files." to remove internal
benchmarking phrasing and focus on user impact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d3aec368-02ef-46e3-9b85-b9440d422801

📥 Commits

Reviewing files that changed from the base of the PR and between e8701f1 and 5c4fd9a.

📒 Files selected for processing (1)
  • .changeset/wet-grapes-cross.md

@ematipico ematipico merged commit 4bffb66 into main Mar 11, 2026
34 checks passed
@ematipico ematipico deleted the perf/lock-free-collection branch March 11, 2026 16:18
@github-actions github-actions bot mentioned this pull request Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-CLI Area: CLI A-Core Area: core A-LSP Area: language server protocol A-Project Area: project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants