perf(cli): lock-free storage, and reduce system calls#9391
Conversation
🦋 Changeset detectedLatest commit: 5c4fd9a The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
f452450 to
533cd86
Compare
|
If I see some concrete result, I will add a changeset. Otherwise, no, but I would like to keep the change nonetheless. |
WalkthroughThis PR replaces many usages of Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
crates/biome_fs/src/interner.rs (1)
32-42: Redundant.to_path_buf()call.
pathis already aUtf8PathBuf, sopath.to_path_buf()is effectively a clone. You can usepathdirectly:♻️ Proposed simplification
pub fn intern_path(&self, path: Utf8PathBuf) -> bool { let guard = self.storage.pin(); if guard.contains(&path) { return false; } - let owned = path.to_path_buf(); - let result = guard.insert(owned.clone()); + let result = guard.insert(path.clone()); if result { - self.handler.send(owned).ok(); + self.handler.send(path).ok(); } result }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_fs/src/interner.rs` around lines 32 - 42, The call to path.to_path_buf() is redundant because path is already a Utf8PathBuf; replace the allocation with a direct clone (e.g., use path.clone()) or otherwise use path directly when inserting/sending to avoid an unnecessary conversion. Update the block around self.storage.pin(), guard.contains(&path), guard.insert(…), and self.handler.send(…) to create owned via path.clone() (or use path directly) and then insert/send that owned value so behavior remains identical but without the redundant to_path_buf() call.crates/biome_service/src/scanner.rs (1)
430-453: Keep the cachedPathKindhere.Line 430 rebuilds each
BiomePathfrom the raw path, so Line 453 has tosymlink_path_kind()everything again. Cloning the storedBiomePathkeeps the cached kind and lets this optimisation pay off in watch mode too.♻️ Small refactor
- let mut evaluated_paths: Vec<_> = ctx - .evaluated_paths() - .iter() - .map(|p| BiomePath::new(p.as_path())) - .collect(); + let mut evaluated_paths: Vec<_> = ctx.evaluated_paths().iter().cloned().collect(); - } else if ctx.watch && fs.symlink_path_kind(&path).is_ok_and(PathKind::is_dir) { + } else if ctx.watch && path.path_kind().is_dir() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_service/src/scanner.rs` around lines 430 - 453, The loop is rebuilding BiomePath instances later which loses the cached PathKind; instead construct and keep BiomePath values up front (use evaluated_paths: Vec<BiomePath> via BiomePath::new) and then clone those BiomePath instances where needed in the processing loop so the cached kind is preserved; update uses around evaluated_paths, for path in evaluated_paths, and any place that previously reconstructed a BiomePath so they consume or clone the stored BiomePath (references: evaluated_paths, BiomePath::new, ctx.watch, fs.symlink_path_kind, PathKind::is_dir, handleable_paths).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_cli/src/runner/crawler.rs`:
- Around line 147-150: increment_changed currently calls
evaluated_paths.pin().insert(path.to_written()) but because BiomePath's
Hash/PartialEq omit the was_written flag the insert won't replace the existing
entry; locate the increment_changed function and either (A) remove the existing
key first and then insert the to_written variant (e.g., lookup/remove by the
equivalent key then insert path.to_written()), or (B) change BiomePath's
equality/hash to include was_written so insert will replace the entry; update
the code around evaluated_paths.pin().insert and ensure
push_message(Message::Stats(MessageStat::Changed)) remains unchanged.
---
Nitpick comments:
In `@crates/biome_fs/src/interner.rs`:
- Around line 32-42: The call to path.to_path_buf() is redundant because path is
already a Utf8PathBuf; replace the allocation with a direct clone (e.g., use
path.clone()) or otherwise use path directly when inserting/sending to avoid an
unnecessary conversion. Update the block around self.storage.pin(),
guard.contains(&path), guard.insert(…), and self.handler.send(…) to create owned
via path.clone() (or use path directly) and then insert/send that owned value so
behavior remains identical but without the redundant to_path_buf() call.
In `@crates/biome_service/src/scanner.rs`:
- Around line 430-453: The loop is rebuilding BiomePath instances later which
loses the cached PathKind; instead construct and keep BiomePath values up front
(use evaluated_paths: Vec<BiomePath> via BiomePath::new) and then clone those
BiomePath instances where needed in the processing loop so the cached kind is
preserved; update uses around evaluated_paths, for path in evaluated_paths, and
any place that previously reconstructed a BiomePath so they consume or clone the
stored BiomePath (references: evaluated_paths, BiomePath::new, ctx.watch,
fs.symlink_path_kind, PathKind::is_dir, handleable_paths).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3b34e396-6e86-42b0-b700-d92e6e167276
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lockand included by**
📒 Files selected for processing (17)
crates/biome_cli/Cargo.tomlcrates/biome_cli/src/reporter/mod.rscrates/biome_cli/src/reporter/summary.rscrates/biome_cli/src/reporter/terminal.rscrates/biome_cli/src/runner/crawler.rscrates/biome_cli/src/runner/handler.rscrates/biome_cli/src/runner/impls/commands/traversal.rscrates/biome_cli/src/runner/impls/crawlers/default.rscrates/biome_cli/src/runner/impls/finalizers/default.rscrates/biome_fs/src/fs.rscrates/biome_fs/src/fs/os.rscrates/biome_fs/src/interner.rscrates/biome_fs/src/path.rscrates/biome_service/src/scanner.rscrates/biome_service/src/scanner/watcher.rscrates/biome_service/src/scanner/workspace_bridges.rscrates/biome_service/src/workspace/server.rs
|
On my relatively small work repo, I got from ~830 ms -> ~795ms. I also tried this on the signoz repo, which takes ~2s for me on 2.4.6, and biome did not exit. It got stuck on something before it could start analyzing. |
|
Oh good to know. Can you share the repo so I can test it? |
|
Specifically this branch: SigNoz/signoz#10333
https://github.com/elastic/kibana is another good repo to test on |
I tried too, and it works as expected. This branch is slightly faster, from ~5s to ~4s |
|
Oh, there's one thing to note. Preview releases enable recommended nursery rules because we don't provide |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
crates/biome_service/src/scanner/test_utils.rs (1)
88-94: Don't letpath_kindbecome decorative in the mock.Line 93 drops the new signal entirely, so these watcher tests still stay green if a caller passes
Noneor the wrong kind. Worth teaching the mock to assert or model the expected kind in the ignore-path cases, otherwise this API change is barely exercised here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_service/src/scanner/test_utils.rs` around lines 88 - 94, The mock implementation of is_ignored currently ignores the new path_kind parameter (named _path_kind) so tests don’t exercise the API change; update the mock to accept and validate path_kind (remove the leading underscore), and in the ignore-path branches assert/expect that path_kind matches the expected PathKind (or explicitly handle None when applicable), returning an error or failing the test if it doesn’t match; update any test callers of is_ignored to pass the correct PathKind where needed so the mock actually verifies the signal.crates/biome_cli/src/runner/crawler.rs (1)
91-95: Skip the second full snapshot when nothing changed.On read-only runs,
increment_changed()never mutates the set, so this extra collect + sort is just an O(n log n) scenic route. Reusinghandle_pathswhenchanged == 0keeps the fast path fast.⚡ Possible trim
- // Re-collect after handle phase to capture was_written mutations - let mut evaluated_paths: Vec<_> = ctx.evaluated_paths().into_iter().cloned().collect(); - evaluated_paths.sort_unstable(); + // Re-collect only when handle mutated the written bit. + let evaluated_paths = if ctx.changed.load(Ordering::Relaxed) == 0 { + handle_paths + } else { + let mut evaluated_paths: Vec<_> = ctx.evaluated_paths().into_iter().cloned().collect(); + evaluated_paths.sort_unstable(); + evaluated_paths + }; (start.elapsed(), evaluated_paths)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_cli/src/runner/crawler.rs` around lines 91 - 95, The code currently always re-collects and sorts ctx.evaluated_paths() into evaluated_paths even when nothing changed; modify the end of the function to check the mutation count (use the existing changed variable incremented by increment_changed()) and when changed == 0 return the previously computed handle_paths (already sorted/collected) instead of re-collecting; keep the existing tuple return shape (start.elapsed(), paths) and only perform the ctx.evaluated_paths().into_iter().cloned().collect()/sort_unstable() path when changed > 0 so read-only runs avoid the extra O(n log n) work.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_cli/src/runner/crawler.rs`:
- Around line 86-87: The handle pass is dropping cached BiomePath.kind by
converting to PathBuf in the loop that calls TraversalScope::handle; update the
API so the handler receives the full BiomePath with its evaluated kind: add a
new TraversalScope::handle_path(&self, ctx, biome_path: BiomePath) (or overload
TraversalScope::handle to accept BiomePath) and change the loop that iterates
handle_paths to call that new method with the original BiomePath (or clone it)
instead of path.to_path_buf(); alternatively, if you prefer not to change the
API, propagate the evaluated path_kind alongside the PathBuf and reconstruct
BiomePath::with_kind(path, path_kind) before calling TraversalScope::handle so
directories and symlinks retain their metadata when Handler::handle inspects
biome_path.path_kind().
---
Nitpick comments:
In `@crates/biome_cli/src/runner/crawler.rs`:
- Around line 91-95: The code currently always re-collects and sorts
ctx.evaluated_paths() into evaluated_paths even when nothing changed; modify the
end of the function to check the mutation count (use the existing changed
variable incremented by increment_changed()) and when changed == 0 return the
previously computed handle_paths (already sorted/collected) instead of
re-collecting; keep the existing tuple return shape (start.elapsed(), paths) and
only perform the
ctx.evaluated_paths().into_iter().cloned().collect()/sort_unstable() path when
changed > 0 so read-only runs avoid the extra O(n log n) work.
In `@crates/biome_service/src/scanner/test_utils.rs`:
- Around line 88-94: The mock implementation of is_ignored currently ignores the
new path_kind parameter (named _path_kind) so tests don’t exercise the API
change; update the mock to accept and validate path_kind (remove the leading
underscore), and in the ignore-path branches assert/expect that path_kind
matches the expected PathKind (or explicitly handle None when applicable),
returning an error or failing the test if it doesn’t match; update any test
callers of is_ignored to pass the correct PathKind where needed so the mock
actually verifies the signal.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0d48dff3-57f3-4b59-a266-4908c633b2f9
⛔ Files ignored due to path filters (1)
packages/@biomejs/backend-jsonrpc/src/workspace.tsis excluded by!**/backend-jsonrpc/src/workspace.tsand included by**
📒 Files selected for processing (3)
crates/biome_cli/src/runner/crawler.rscrates/biome_fs/src/fs/memory.rscrates/biome_service/src/scanner/test_utils.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.changeset/wet-grapes-cross.md (1)
5-5: Polish the release note wording.This reads a bit mechanical, and
~2K filesfeels more like an internal benchmark than an end-user note. Something likeImproved CLI performance in large projects.would be cleaner. As per coding guidelines, "Changeset descriptions must be written for end users (explain impact and what changed), not developers (avoid implementation details) in 1-3 sentences".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.changeset/wet-grapes-cross.md at line 5, Update the changeset description in .changeset/wet-grapes-cross.md to be end-user facing and concise (1–3 sentences); replace the current line "Slightly increased the performance of the CLI in projects that have more than ~2K files." with a polished note such as "Improved CLI performance in large projects, reducing time spent scanning many files." to remove internal benchmarking phrasing and focus on user impact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.changeset/wet-grapes-cross.md:
- Line 5: Update the changeset description in .changeset/wet-grapes-cross.md to
be end-user facing and concise (1–3 sentences); replace the current line
"Slightly increased the performance of the CLI in projects that have more than
~2K files." with a polished note such as "Improved CLI performance in large
projects, reducing time spent scanning many files." to remove internal
benchmarking phrasing and focus on user impact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d3aec368-02ef-46e3-9b85-b9440d422801
📒 Files selected for processing (1)
.changeset/wet-grapes-cross.md
Summary
This PR attempts to improve the speed of the CLI with two changes:
papaya::HashSetinstead ofBTreeSetfor theevaluated_paths. Sorting is used for predictable reporting; it's useless during crawling, as it only slows us down. So now we use a lock-free data structure for collecting the paths, and we sort them only at the end. There's only one small hiccup, where we need to do the collection twice, but I believe the cost here is less than keeping all paths ordered.BiomePathnow keeps the information of what kind of path is (file/dir/symbolic link), so we can re-use this information in hot paths such asis_ignored. Slow paths such as indexing of dependencies stay the same, so we don't lose anything with this refactor.Test Plan
Current tests should pass.
I'll create a preview release and test it in some projects.
Docs
N/A