fix(lsp): prefer most specific project root#9441
Conversation
🦋 Changeset detectedLatest commit: 8d66d5c The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
|
||
| let path = session.file_path(&url)?; | ||
|
|
||
| let project_key = match session.project_for_path(&path) { |
There was a problem hiding this comment.
This was moved / split up into into helper functions (load_project_for_open, load_status_for_open, load_from_workspace_base, etc.). Because the new logic for choosing the most specific project root would have made did_open even more nested and harder to reason about. The refactor keeps behaviour clear and testable.
There was a problem hiding this comment.
The refactor keeps behaviour clear and testable.
And yet no tests were added ;)
There was a problem hiding this comment.
Ha great point. I trusted AI with that summary of the change — apologies. I did get it to try and add tests but I think that code requires too much setup to be tested in more of a unit-test-style. Which means this refactor is mainly done for non-test reasons — just to reduce the nesting / indentation of that function as it was getting big.
| .iter() | ||
| .find(|(project_path, _project_key)| path.starts_with(project_path.as_path())) | ||
| .filter(|(project_path, _project_key)| path.starts_with(project_path.as_path())) | ||
| .max_by_key(|(project_path, _project_key)| project_path.as_str().len()) |
| if let Some(config_path) = session.get_settings_configuration_path() { | ||
| if !config_path.is_absolute() { | ||
| let file_path = path.to_path_buf(); | ||
| if let Some(resolved_path) = session.resolve_configuration_path(Some(&file_path)) { |
There was a problem hiding this comment.
When a project is already open, we resolve relative configurationPath against the file’s workspace folder and reload config so the correct workspace settings apply.
| .map(|parent| parent.to_path_buf()) | ||
| .unwrap_or_default(); | ||
| info!("Loading configuration from text_document {}", &project_path); | ||
| let project_path = resolve_workspace_base_path(session, &project_path); |
There was a problem hiding this comment.
We now call resolve_configuration_path(...) to anchor relative configurationPath to the file’s workspace folder.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a changeset recording a patch release. Prefers the most-specific project root when resolving a file's project in nested/multi-folder workspaces. Refactors LSP did_open into helper functions that centralise configuration-loading and workspace-base path resolution. Adds a per-session, per-path configuration-status cache and a cache-key type, uses it when loading configuration, and clears the cache on workspace-folder changes. Updates session.project_for_path to pick the longest-matching root and changes did_change to accept Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/biome_lsp/src/session.rs (2)
900-946:⚠️ Potential issue | 🟠 MajorMake the in-flight configuration registration atomic.
Lines 913-946 check
loading_operationsunder a read lock and only insert later under a write lock. Two concurrent opens can both miss, both load the same config, and then open/scan the same project twice — which is a neat way to resurrect duplicate diagnostics.Possible fix
- { - let operations = self.loading_operations.read().await; - - if let Some(sender) = operations.get(&path_to_index).cloned() { - drop(operations); // Release read lock - - // Wait for the ongoing operation to complete - let mut receiver = sender.subscribe(); - match receiver.recv().await { - Ok(status) => { - debug!( - "Reused configuration loading result for path: {:?}", - base_path - ); - return status; - } - Err(_) => { - // The sender was dropped or no more messages, continue to load - debug!( - "Configuration loading broadcast ended for path: {:?}", - base_path - ); - } - } - } - } - - // Create a broadcast channel for this operation - let (tx, _rx) = tokio::sync::broadcast::channel(1); - - // Store the receiver for other tasks to subscribe to - { - let mut operations = self.loading_operations.write().await; - operations.insert(path_to_index.clone(), tx.clone()); - } + let tx = loop { + let mut operations = self.loading_operations.write().await; + + if let Some(sender) = operations.get(&path_to_index).cloned() { + drop(operations); + + let mut receiver = sender.subscribe(); + match receiver.recv().await { + Ok(status) => { + debug!( + "Reused configuration loading result for path: {:?}", + base_path + ); + return status; + } + Err(_) => continue, + } + } + + let (tx, _rx) = tokio::sync::broadcast::channel(1); + operations.insert(path_to_index.clone(), tx.clone()); + break tx; + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_lsp/src/session.rs` around lines 900 - 946, Concurrent callers can race between the initial read of self.loading_operations and the later write insert, causing duplicate loads; make registration atomic by taking the write lock around the check-and-insert so only one task creates the broadcast sender for a given path. Specifically, within the flow that currently reads loading_operations and then later writes, replace the separate read-then-write with a single write-lock critical section that checks for an existing entry for path_to_index (or uses an entry/insert-if-absent pattern) and returns its receiver if present, otherwise creates the tokio::sync::broadcast channel (tx/_rx), inserts tx into loading_operations, then releases the lock before performing the actual load and broadcasting the result.
339-345:⚠️ Potential issue | 🔴 CriticalUse an exact root lookup when deciding whether to open a project.
This longest-prefix match is right for
file -> project, but the configuration loader also uses it forroot -> project. If/repois already open, looking up/repo/packages/appwill reuse the outerProjectKey, so the nested workspace never opens as its own project and its settings can overwrite the outer one.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_lsp/src/session.rs` around lines 339 - 345, project_for_path currently always does a longest-prefix match which causes a nested workspace root like `/repo/packages/app` to be resolved to the outer `/repo`; change project_for_path to first look for an exact match (find an entry where project_path.as_path() == path via self.projects.pin().iter().find(...)) and return that ProjectKey if found, and only if no exact match exists fall back to the existing longest-prefix logic (the current filter + max_by_key). Update the function body around project_for_path to implement this two-step lookup so exact root lookups prefer the exact project entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.changeset/fix-lsp-project-selection.md:
- Line 5: Update the changeset header so it uses the required bug-fix prefix
with the issue number: replace the current line that reads "Fixed an LSP project
selection bug for nested workspaces so files are associated with the most
specific project root." with a string starting "Fixed [`#NUMBER`]: ..." (e.g.,
"Fixed [`#1234`]: Fixed LSP project selection for nested workspaces so files are
associated with the most specific project root.") ensuring the linked issue
number is included up front and the descriptive text remains.
In `@crates/biome_lsp/src/handlers/text_document.rs`:
- Around line 73-92: The helper load_project_for_open currently captures
project_key from session.project_for_path before attempting to reload
configuration, which can return a stale outer-workspace key; change the flow so
you do not return the pre-load project key: first check for a configuration path
via session.get_settings_configuration_path and, if non-absolute, attempt
session.resolve_configuration_path and await
session.load_biome_configuration_file(ConfigurationPathHint::FromUser(...),
false); after the load (and after confirming status.is_loaded()), re-query
session.project_for_path(path) and return that resulting ProjectKey (or None) so
any newly-loaded nested project is returned instead of the original captured
key.
- Around line 145-163: In resolve_workspace_base_path, the workspace selection
uses .find which returns the first matching folder instead of the most specific
one; change the closure so it collects matching workspace folders and selects
the longest/most-specific match (e.g., use iterator.max_by_key(|ws|
ws.as_str().len() or compare path components) instead of .find) so project_path
will match the deepest workspace folder (function: resolve_workspace_base_path,
closure workspace_base).
---
Outside diff comments:
In `@crates/biome_lsp/src/session.rs`:
- Around line 900-946: Concurrent callers can race between the initial read of
self.loading_operations and the later write insert, causing duplicate loads;
make registration atomic by taking the write lock around the check-and-insert so
only one task creates the broadcast sender for a given path. Specifically,
within the flow that currently reads loading_operations and then later writes,
replace the separate read-then-write with a single write-lock critical section
that checks for an existing entry for path_to_index (or uses an
entry/insert-if-absent pattern) and returns its receiver if present, otherwise
creates the tokio::sync::broadcast channel (tx/_rx), inserts tx into
loading_operations, then releases the lock before performing the actual load and
broadcasting the result.
- Around line 339-345: project_for_path currently always does a longest-prefix
match which causes a nested workspace root like `/repo/packages/app` to be
resolved to the outer `/repo`; change project_for_path to first look for an
exact match (find an entry where project_path.as_path() == path via
self.projects.pin().iter().find(...)) and return that ProjectKey if found, and
only if no exact match exists fall back to the existing longest-prefix logic
(the current filter + max_by_key). Update the function body around
project_for_path to implement this two-step lookup so exact root lookups prefer
the exact project entry.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 212bb792-f67f-48ec-9ddc-f1c7603d5d5f
📒 Files selected for processing (3)
.changeset/fix-lsp-project-selection.mdcrates/biome_lsp/src/handlers/text_document.rscrates/biome_lsp/src/session.rs
|
This looks like it overlaps with #9392 |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
crates/biome_lsp/src/session.rs (1)
1327-1345: Please add an LSP-level regression for thedid_openpath.This unit test proves
project_for_path, but the behaviour change in this PR actually flows throughdid_openand configuration loading. A small end-to-end LSP test that opens a file inside nested workspace folders would lock down the bug this PR is fixing.Based on learnings, "Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_lsp/src/handlers/text_document.rs`:
- Around line 104-111: The code sets a session-wide ConfigurationStatus using
session.set_configuration_status(status) based on a single path from
load_status_for_open, which leaks that status to all documents; change this to
store and read configuration status keyed by workspace/root (e.g., add or use a
per-workspace map on Session such as per_workspace_config_status or similar)
instead of the global atomic, update the logic in
load_status_for_open/session.load_extension_settings to return/store the status
into that per-root map, and update update_diagnostics_for_document to query the
configuration status for the document's workspace/root rather than using
session.set_configuration_status/getting the global value; keep existing APIs
like has_initialized and load_extension_settings but route their results into
the per-workspace storage so opening/failing one workspace doesn't affect
others.
- Around line 79-95: After reloading the configuration with
session.load_biome_configuration_file (the let status = ... await), update the
session's cached configuration state by calling
session.set_configuration_status(status) immediately after that await; then
proceed to check status.is_loaded() and the project_for_path branch as before.
This ensures stale Missing/Error state is cleared when
load_biome_configuration_file succeeds or preserved when it fails.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 93093dab-5589-4f48-8802-52acfeddfe5b
📒 Files selected for processing (3)
.changeset/fix-lsp-project-selection.mdcrates/biome_lsp/src/handlers/text_document.rscrates/biome_lsp/src/session.rs
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
crates/biome_lsp/src/handlers/text_document.rs (1)
79-85:⚠️ Potential issue | 🟠 MajorKeep configuration status scoped to the root you just loaded.
These branches still push a path-specific load result through
set_configuration_status, which updates one session-wide status. In a multi-root session, opening one broken workspace can still flip unrelated files into the wrong configuration state. This needs the same per-root scoping as the configuration load cache.Also applies to: 105-112
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_lsp/src/handlers/text_document.rs` around lines 79 - 85, The current code calls session.load_biome_configuration_file(... ConfigurationPathHint::FromUser(resolved_path) ...) and then session.set_configuration_status(status), which updates a session-wide status and wrongly affects other workspace roots; change this to record the configuration status scoped to the specific root you just loaded (keyed by the resolved root/path) instead of using the global setter. Concretely: in the block that calls load_biome_configuration_file (and the similar block at the other location), replace the session.set_configuration_status(status) call with a per-root setter or cache update (e.g., set_configuration_status_for_root(root_id_or_resolved_path, status) or update the same per-root map used by the configuration load cache) so each workspace root retains its own status rather than overwriting a session-wide value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_lsp/src/handlers/text_document.rs`:
- Around line 74-100: Currently the function returns the initially found
project_key (session.project_for_path) before giving the loader a chance to
register a narrower project; change the control flow so that after attempting to
resolve and load a configuration (session.resolve_configuration_path,
session.load_biome_configuration_file, session.set_configuration_status) you
re-query session.project_for_path(path) and return the new, more specific
project if present, falling back to the original project_key only if no more
specific project was registered; ensure any early returns that bypass this
re-query (the final return Some(project_key)) are removed or moved after the
re-check so the cached loader can register nested projects before returning.
---
Duplicate comments:
In `@crates/biome_lsp/src/handlers/text_document.rs`:
- Around line 79-85: The current code calls
session.load_biome_configuration_file(...
ConfigurationPathHint::FromUser(resolved_path) ...) and then
session.set_configuration_status(status), which updates a session-wide status
and wrongly affects other workspace roots; change this to record the
configuration status scoped to the specific root you just loaded (keyed by the
resolved root/path) instead of using the global setter. Concretely: in the block
that calls load_biome_configuration_file (and the similar block at the other
location), replace the session.set_configuration_status(status) call with a
per-root setter or cache update (e.g.,
set_configuration_status_for_root(root_id_or_resolved_path, status) or update
the same per-root map used by the configuration load cache) so each workspace
root retains its own status rather than overwriting a session-wide value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e825b0a4-fb06-4219-b584-5ab2c0729f9f
📒 Files selected for processing (1)
crates/biome_lsp/src/handlers/text_document.rs
|
Hey @soconnor-seeq , I merged #9392 So now we can rebase, and i'll take a look at it |
4a0026a to
c922650
Compare
| let file_path = path.to_path_buf(); | ||
| if let Some(path) = session.resolve_configuration_path(Some(&file_path)) { | ||
| info!("Loading user configuration from text_document {:?}", &path); | ||
| return session.load_biome_configuration_file(path, false).await; |
There was a problem hiding this comment.
The fix/changes in #9392 is re-created here.
|
|
||
| if let Some(root) = workspace_root { | ||
| root | ||
| } else if fs.path_is_file(path) { |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/biome_lsp/src/handlers/text_document.rs (1)
125-127: Consider inlining this trivial wrapper.
load_status_for_openis a one-liner that just delegates toload_from_workspace_base. Unless there's a plan to add more logic here, it could be inlined to reduce indirection.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_lsp/src/handlers/text_document.rs` around lines 125 - 127, The function load_status_for_open is a trivial one-line wrapper around load_from_workspace_base; remove load_status_for_open and replace all call sites to call load_from_workspace_base(session, path).await directly, updating any imports/usages that referenced load_status_for_open (search for the symbol load_status_for_open) and keep the same return type ConfigurationStatus and parameter types (&Arc<Session>, &Utf8Path) when inlining so behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_lsp/src/session.rs`:
- Around line 1191-1196: The workspace-folder-change handler
did_change_workspace_folders must clear the configuration cache to avoid stale
entries in configuration_status_by_path; after you update the workspace list (in
did_change_workspace_folders) call self.clear_configuration_cache().await
(before or immediately after invoking load_workspace_settings) so removed-folder
entries are purged and subsequent calls to load_workspace_settings (including
load_workspace_settings(false)) cannot return stale cached statuses.
---
Nitpick comments:
In `@crates/biome_lsp/src/handlers/text_document.rs`:
- Around line 125-127: The function load_status_for_open is a trivial one-line
wrapper around load_from_workspace_base; remove load_status_for_open and replace
all call sites to call load_from_workspace_base(session, path).await directly,
updating any imports/usages that referenced load_status_for_open (search for the
symbol load_status_for_open) and keep the same return type ConfigurationStatus
and parameter types (&Arc<Session>, &Utf8Path) when inlining so behavior is
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 38279563-e66b-4cf7-9189-ebd1b4126e4b
📒 Files selected for processing (3)
.changeset/fix-lsp-project-selection.mdcrates/biome_lsp/src/handlers/text_document.rscrates/biome_lsp/src/session.rs
I rebased this PR to resolve the conflicts (see my comments above that show the fixes in #9392 are still there after my changes). I chose to rebase + force push as I figured another commit (the conflict resolving merge commit) could be more noise on top of the other two CodeRabbit-feedback-commits. I hope that was the right choice and didn't make your live harder! |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_lsp/src/server.rs`:
- Around line 446-449: After updating workspace folders and reloading config in
the block that calls self.session.update_workspace_folders(...),
self.session.clear_configuration_cache().await, and
self.session.load_workspace_settings(true).await, call the server routines that
rebuild registrations and refresh diagnostics: invoke setup_capabilities() (or
the method that re-registers workspace/didChangeWatchedFiles) and then call
update_all_diagnostics() so the new folders use correct file-watching
registrations and open files get refreshed diagnostics; place these calls
immediately after self.session.load_workspace_settings(true).await to ensure
registrations and diagnostics reflect the new workspace set.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5592b31e-e9b3-4451-91c6-f927ead18dee
📒 Files selected for processing (1)
crates/biome_lsp/src/server.rs
9a9848f to
2519dc2
Compare
ematipico
left a comment
There was a problem hiding this comment.
As we're fixing as existing bug, we should also have new tests. The proposed testing strategy isn't enough
ematipico
left a comment
There was a problem hiding this comment.
It seems that with this refactor, we call the same functions multiple times I can't understand why
| async fn load_status_for_open(session: &Arc<Session>, path: &Utf8Path) -> ConfigurationStatus { | ||
| load_from_workspace_base(session, path).await | ||
| } |
There was a problem hiding this comment.
A function that calls one function is redundant. Let's remove it
| Ok(()) | ||
| } | ||
|
|
||
| async fn load_project_for_open(session: &Arc<Session>, path: &Utf8Path) -> Option<ProjectKey> { |
There was a problem hiding this comment.
The name of the function isn't descriptive enough. Can you come with something different?
If not, please provide some docstring that explains the business logic of the function
There was a problem hiding this comment.
load_project_for_open → ensure_project_for_opened_document hopefully that's better. I also got codex to write a function docstring.
| load_from_workspace_base(session, path).await | ||
| } | ||
|
|
||
| async fn load_from_workspace_base(session: &Arc<Session>, path: &Utf8Path) -> ConfigurationStatus { |
There was a problem hiding this comment.
Please choose a better name and add Some docs
There was a problem hiding this comment.
load_from_workspace_base → load_from_workspace_root_for_path with some docs added.
|
|
||
| async fn load_from_workspace_base(session: &Arc<Session>, path: &Utf8Path) -> ConfigurationStatus { | ||
| let file_path = path.to_path_buf(); | ||
| if let Some(path) = session.resolve_configuration_path(Some(&file_path)) { |
There was a problem hiding this comment.
Why do we call this function again?
There was a problem hiding this comment.
I had to get codex to explain this one to me:
We call it “again” because:
- the first resolution may have happened for a different workspace folder/file,
- relative configurationPath is workspace‑folder dependent,
- so when you open a file in another folder, you must resolve it per file to avoid using the wrong config.
Which sounds right — but that's the problem with AI isn't it. Everything does in fact sound plausible. I asked codex if we are actually testing for this and it said:
the updated LSP test
relative_configuration_path_resolves_against_correct_workspace_foldernow opens a file in test_one first (so a project is already open), then opens a file in test_two. The formatting assertion for test_two only passes if the relativeconfigurationPathis re‑resolved per file. That covers the “load again” behavior.
There was a problem hiding this comment.
Ok, I think I understand the why, but I believe what the agent wrote is just a workaround.
If understand correctly, the agent is worried about concurrency. I don't know if it's correct, but let's pretend it is: that's not the right solution.
The resolution, at this point, should happen in a concurency-safe code, which isn't the case for this implementation.
There was a problem hiding this comment.
Ok I think I've made improvements + addressed your concern. I've moved the session.resolve_configuration_path(...) up near the top of did_open this is closer to the current code's behaviour of only resolving the config path once near the top of did_open. The resolved config_path is then passed down through the various functions instead of calling session.resolve_configuration_path later in the did_open-flow.
But I am worried that this only addresses your original comment of "Why do we call this function again?" and not the subsequent comment:
The resolution, at this point, should happen in a concurency-safe code, which isn't the case for this implementation.
And codex wasn't any help here as I got it to try different techniques to satisfy this concurrency concern as I felt like it was just making a mess (e.g. moving all the logic in did_open into session.rs, adding random locks, etc).
Hopefully calling session.resolve_configuration_path once + moving it higher up will alleviate the concurrency concern you pointed out. Feel free to disagree / ask for more changes!
|
|
||
| let path = session.file_path(&url)?; | ||
|
|
||
| let project_key = match session.project_for_path(&path) { |
There was a problem hiding this comment.
The refactor keeps behaviour clear and testable.
And yet no tests were added ;)
Note
AI Assistance Disclosure: This PR was implemented with guidance from OpenAI Codex. The solution was reviewed and validated by me (soconnor).
Summary
Fixes #1630
Prefer the most specific project root when multiple roots match a file in multi‑root/nested workspaces. This prevents files from being associated with the wrong project and aligns LSP behavior with user expectations (see #8163, #2289, and #7127).
Follow-up: a separate PR will add configuration search modes; this fix only addresses project selection.
Test Plan
cargo test -p biome_lspDocumentation