Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): broken LSPs on monorepos #4439

Closed
wants to merge 8 commits into from

Conversation

georgesboris
Copy link

@georgesboris georgesboris commented Oct 23, 2022

Problem

We're currently matching the top-most root for all projects and languages, but a few of them are not workspace oriented and the current behavior prevents the LSP from working in nested projects (a monorepo or a library with an example folder) when working in said languages.

e.g.

lib/
  elm.json
  example/
    elm.json

lib/elm.json would always be the root even if Helix was opened at example – that breaks LSP integrations, auto-formatting, etc.

Solution

For a non-breaking change solution that could be defined as the default for some languages and even configured case-by-case by the user (through a project /.helix folder), I propose adding a match_closest_root language option that is false by default (maintaining current behavior for all languages) but can be marked as true for languages like elm.

[[language]]
name = "elm"
match-closest-root = true

@georgesboris georgesboris changed the title fix: find_root always matching same root fix(core): find_root always matching same root Oct 23, 2022
@the-mikedavis
Copy link
Member

"Top-most" in

/// * Top-most folder containing a root marker in current git repository

means highest in the directory hierarchy but this changes it to lowest.

The top-most heuristic is geared towards umbrella / workspace setups (for example helix) where there's usually a root marker in the same directory as .git and then markers under subdirectories therein for sub-projects. Changing to lowest could break umbrella / workspace root detection.

The problem here looks essentially the same as #4342

@georgesboris
Copy link
Author

@the-mikedavis the current algorithm seems to be geared towards the same solution I described. The exact same situation with multiple apps in the same root directories would be broken with the current setup in most languages.

If using Elixir with Umbrella apps, would the LSP be expected to be started in the umbrella root and not in one of the sub-directories? Wouldn't this be achieved by starting helix on the root directory? In the current scenario the LSP is unable to be fixed by the user if I want to start a LSP in a subfolder of a say a monorepo with multiple package.json (for javascript projects).

@the-mikedavis
Copy link
Member

If using Elixir with Umbrella apps, would the LSP be expected to be started in the umbrella root and not in one of the sub-directories?

My impression was that this is the case - that the umbrella dir should be the workspace root - but I gave it a try and elixir_ls only works when the app under the umbrella is the workspace root, not the umbrella directory itself. rust-analyzer doesn't seem to care either way in cargo workspaces. It looks like this is how nvim-lspconfig determines the root too: https://github.com/neovim/nvim-lspconfig/blob/0b3e5ce95dd70cbeb9dcf0f690f5f7210e004198/lua/lspconfig/util.lua#L319-L353 👍

Wouldn't this be achieved by starting helix on the root directory?

No, the CWD is used as a fallback when we can't determine a workspace directory from markers or find a .git. Where you start within a workspace shouldn't make a difference

Copy link
Member

@the-mikedavis the-mikedavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs for this function also need a change with the behavior

helix-core/src/lib.rs Outdated Show resolved Hide resolved
@georgesboris
Copy link
Author

@the-mikedavis I've made the requested changes + updated the docs. Let me know if it works for you! I've tested it locally after the changes and everything seems good.

@kirawi kirawi added A-core Area: Helix core improvements S-waiting-on-review Status: Awaiting review from a maintainer. R-breaking-change This PR is a breaking change for some behavior labels Oct 25, 2022
@georgesboris georgesboris changed the title fix(core): find_root always matching same root fix(core): broken LSPs on monorepos Oct 26, 2022
@georgesboris
Copy link
Author

Folks - anything I can do to help move this PR? I know this doesn't seem urgent but it is forbidding a few people at my workplace from using Helix due to our usage of monorepos. It may cause a similar reaction to new users as they won't understand why LSP's are just not working for some projects.

Copy link
Member

@archseer archseer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break typescript/javascript language servers where each package has it's own package.json, but you're expected to start the LSP at the workspace root package.json

/// * Top-most folder containing a root marker in current git repository
/// * Git repository root if no marker detected
/// * Top-most folder containing a root marker if not git repository detected
/// * Top-most folder containing either a root marker or a git repository root
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the closest parent folder, not the top-most anymore.

@georgesboris
Copy link
Author

@archseer how can I make it so it behaves as expected on both typescript and elm.json (as in the example given) ?

Maybe a language flag that would decide if you should use the top most or closest root file?

As mentioned before, the "closest" approach is apparently also used by neovim. My experience there with both typescript and elm projects was good. So maybe the LSPs that support workspace like features have their own logic for discovering the closest root?

The current problem is that there is no escape hatch if I don't want to use the top most root file. And even on typescript that is sometimes the case (e.g. if I want to create an example folder with projects that are not part of a npm/pnpm workspace but meant to be run in isolation)

@TotalKrill
Copy link
Contributor

Basically its the same problem as is popping up everywhere regarding this. There is unfortunately no single solution that fixes all.

Neovim solution that worked best for me was simply not doing any of the smart finding of workspace root, but more just always start neovim in the correct folder. If stuff didnt work, I had an easy fix, start the program in the right folder.

Then comes other issues, because maybe you have to start helix in a folder which is inconvenient in regards to file pickers or similiar. So either you force all projects to conform to predefined folder/project structures, and have the LSP and helix detect all of them.

If we want the smart workspace detector, we also need the escape-hatch that @georgesboris mentions.

@georgesboris
Copy link
Author

On neovim, I was able to aliviate the file pickers problem by using both git based file pickers and pwd based ones. But if the logic for choosing between those is also based on the roots config them this escape hatch would also not exist.

@georgesboris
Copy link
Author

georgesboris commented Oct 27, 2022

My two cents (with breaking changes):

  • search for roots using order as priority so if we use something like [ ".pnpm-workspace.json", "package.json" ] then it would first look for closest pnpm workspace and then go back looking for the closest package.json. If the user still wants to use the top most package.json then they need to open helix at that folder (so the worst case is equal to neovim).

  • file picker would probably need to work on both closest root and git root as they are useful in different ways. coupling them would not be ideal.

  • space+f would always search based on closest root as this is usually more escoped and easier to find what you want.

  • space+F would always search based on git so you can broaden your search if needed.

From my thoughts so far the only easy way to make this failsafe would be for helix to use project-based configs as well but using priority-based roots we get something similar to do this as I can use a dummy file to ignore workspace root per-project.

@georgesboris
Copy link
Author

@archseer @the-mikedavis folks - do you see any short term strategy to get this merged?

totally available for working on other strategies that would stop helix from breaking lsp on this types of projects. it is just too much of a pain for it not to work even with local hacks (like creating a specific file for forcing correct root detection)...

only way I can make it work today is by running helix built with this PR's branch.

@jlloh
Copy link

jlloh commented Oct 30, 2022

The current behaviour also seems to break for Golang monorepos which have a mix of GOMODULEs enabled and disabled (admittedly not a common scenario, but it's a very old monorepo). The repo I'm working with has this structure:

--root_monorepo_folder/
   --> go.mod
   --> subfolder/
        --> not using gomodules

When I open the subfolder, Golang's LSP (Gopls) goes into overdrive and tries to load to LSP in the root monorepo which has hundreds of subfolders, and it ends up hanging.

Perhaps one naive way of solving this is as @georgesboris says, have an optional hidden file that lets users force correct root detection? E.g. .helix or .helixworkspace. That way the user has the ability to override the default behaviour and pick the behaviour he wants.

@kirawi
Copy link
Member

kirawi commented Oct 31, 2022

There is #4363

@kirawi kirawi added the S-needs-discussion Status: Needs discussion or design. label Oct 31, 2022
@archseer
Copy link
Member

@jlloh On the other hand, finding the toplevel on a Go repo with go.work makes gopls work correctly on a project with modules nested inside the root module.

@archseer
Copy link
Member

We already have support for a local config by looking for a .helix folder:

pub fn find_local_config_dirs() -> Vec<PathBuf> {
let current_dir = std::env::current_dir().expect("unable to determine current directory");
let mut directories = Vec::new();
for ancestor in current_dir.ancestors() {
if ancestor.join(".git").is_dir() {
directories.push(ancestor.to_path_buf());
// Don't go higher than repo if we're in one
break;
} else if ancestor.join(".helix").is_dir() {
directories.push(ancestor.to_path_buf());
}
}
directories
}

I think the behaviour should be:

Loop through ancestors:

  • Check for .helix/ presence, if so, that's the root, return immediately.
  • Check for markers: keep track of the highest marker we've seen.
  • Check for .git/. At that point we've hit the toplevel and stop the search.
  • Return either the highest marker, or the toplevel if no markers.

WDYT @the-mikedavis?

@the-mikedavis
Copy link
Member

Using .helix/ presence as the way to force the root makes sense to me 👍. I would prefer using .helix/ to an environment variable or CLI flag (#4363).

As a minor point, we may want to remove the fallback behavior of returning the CWD as the default: #4436. I think this root detection function can return an Option and if there is no .helix/, .git/ or any markers then it should return None. Then the callers can determine what to use for a default (for example, the file picker can just use the CWD)

@georgesboris
Copy link
Author

georgesboris commented Oct 31, 2022

Just as a review with use cases:

umbrella-app (go, elixir, rust?)

  • hx .
  • hx finds marker at . and starts LS.
  • opening subfolder/* files would work bc LS know how to handle that.

umbrella-app subfolder

  • hx subfolder
  • hx finds marker at . and starts LS there.
  • file-picker always opens at . unless space-F is used

monorepo not umbrella (elm lib+example, js lib+example)

  • hx example
  • hx finds marker at example and also at . - uses . as it is top-level - LS starts there and doesn't work on example/* files
  • creates /example/.helix to force LS to start at the right path
  • hx finds helix folder and uses that as top level, even if .git folder or other markers exist at a higher level

those are the scenarios, right?


just to illustrate the broken case a bit more - sometimes there are projects with a subfolder that is completely standalone. it is not part of a workspace or umbrella. it is just versioned together.

so:

/my-library
  package.json
  /examples
    projA/
      package.json
    projB/
      package.json

so in the current proposal the fix would for someone to create a .helix folder under projA or projB, right?

the problems I see are:

  • discoverability (helix will just seem broken for newcomers)
  • .helix folder will need to be either versioned or gitignored. both strategies touching the codebase (which might not be allowed sometimes)

@jlloh
Copy link

jlloh commented Nov 1, 2022

+1 to the convo, I do think a folder or file might be better than an environment variable (as an environment variable is global).

Re: @georgesboris, my two cents:

discoverability (helix will just seem broken for newcomers)

I guess Helix's existing behaviour will still work fine for most cases, but for folks having problems with nested monorepo structures, we can always just explicitly state it in the documentation that they would need this extra folder/file to force the LSP to start at the right folder?

.helix folder will need to be either versioned or gitignored. both strategies touching the codebase (which might not be allowed sometimes)

I feel it's probably fine, because other IDEs do something similar in terms of supporting some form of hidden folders/files. VSCode has .vscode/ (and I guess .workspace). I would leave it to the discretion of the user on how they want to manage that hidden artefact.

p.s

@jlloh On the other hand, finding the toplevel on a Go repo with go.work makes gopls work correctly on a project with modules nested inside the root module.

@archseer I tried go.work but it still doesn't seem to work, but perhaps I set something up wrongly. Thanks for the suggestion. Anyway no worries, will debug.

@archseer
Copy link
Member

archseer commented Nov 1, 2022

  • .helix folder will need to be either versioned or gitignored. both strategies touching the codebase (which might not be allowed sometimes)

We already use this folder for local configuration and I think it's not uncommon in other editors. You could add it to your global ~/.gitignore and avoid versioning it

@archseer
Copy link
Member

archseer commented Nov 1, 2022

I see your point about starting hx in a subdirectory though, you don't really expect it to go past that directory when setting up the workspace. On the other hand, I like being able to start editing in a subdirectory and still being able to access the rest of my files... using :cd could help there. We'd need to implement https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#workspace_didChangeWorkspaceFolders so that the LSP would be correctly reconfigured after :cd

@georgesboris
Copy link
Author

@archseer on neovim (telescope) I can deal with that by opening my file picker using CWD or through git. I have both keybindings so I can still "peek" through my whole project even when starting it in a subfolder.

the find_root function should match the top most root directory for a given file. However, due to not checking if the root was already present, it was overwriting the response so it always matched the bottom most root directory (most commonly the git directory). This broke the expected behavior for projects like monorepos with multiple root directories.
@georgesboris
Copy link
Author

@archseer I've changed the PR based on @jlloh 's suggestion. Do you think this approach would be acceptable?

@archseer archseer added this to the 22.11 milestone Nov 17, 2022
@@ -42,11 +42,13 @@ pub struct Client {

impl Client {
#[allow(clippy::type_complexity)]
#[allow(clippy::too_many_arguments)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've searched the codebase and noticed we already use this flag in a few places so I took the liberty to add it here as well.

@archseer
Copy link
Member

archseer commented Dec 6, 2022

Sorry for the delay, I've done some more thinking (and looked over #3993 again -- the issue even mentions that this used to be fixable by just creating a .helix/config.toml but was broken by a change). To me it seems that the preference here is not specific to a language but to your workspace layout: a similar issue could arise if you had a examples/ or demo/ sub-directory in a rust workspace.

I propose we make this a user config option, then it's possible for you to either change this globally, or keep the same default and change it in a single workspace with a .helix/config.toml override. We could make this more explicit by something like workspaces = ["foo/bar", "baz"] , the LSP would then use the first matching root from that list

@archseer archseer added the S-needs-discussion Status: Needs discussion or design. label Dec 6, 2022
@archseer archseer removed this from the 22.12 milestone Dec 7, 2022
@georgesboris
Copy link
Author

@archseer so the idea would be something like this?

/project
  /demo
    root_file
root_file

Currently, /project would be the root but the user could specify:

[???]
workspaces = [ "/project/demo" ]

that way we would match the project path and force that to be recognized? is that it? where do you think that setting should live? [editor], [editor.lsp]?

I'd be glad to make that change once we settle on a solution.

@pascalkuthe
Copy link
Member

pascalkuthe commented Dec 11, 2022

@archseer so the idea would be something like this?

/project
  /demo
    root_file
root_file

Currently, /project would be the root but the user could specify:

[???]
workspaces = [ "/project/demo" ]

that way we would match the project path and force that to be recognized? is that it? where do you think that setting should live? [editor], [editor.lsp]?

I'd be glad to make that change once we settle on a solution.

Here are some of my thaughts:

  • Ideally .helix/config.toml should be used to detect a workspace root for file explorer aswell to better support git free workflows
  • The LSP should always use the longest matching root from that list inside the config (that is a prefix to the file of-course). I don't ever see a case where preferring a parent directory makes sense (as that would mean the sub directory would always be ignored and you could just not add it to the list)
  • It might be possible that a folder should be treated as root for language A but not as root for language B so there should be a way to special case languages

For the specific file layout I had been thinking:

[workspace]
roots = ["examples"]
roots.rust = ["examples/rust", "fuzz"] # overwrites roots for rust

Using paths that fallback to a parent path is also used for themes so it seems fitting to use that here aswell.

@archseer
Copy link
Member

archseer commented Jan 6, 2023

Another PR popped up: #5418

Adding this to the next milestone since I'd like to resolve this by next release

@pascalkuthe
Copy link
Member

I implemented a solution based on the ideas from my comment above in #5748. I added that to the next milestone in favor of this PR

@pascalkuthe pascalkuthe removed this from the next milestone Jan 31, 2023
@georgesboris
Copy link
Author

I'm closing this in favor of #5748

@georgesboris georgesboris deleted the fix/find_root branch June 1, 2023 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core Area: Helix core improvements R-breaking-change This PR is a breaking change for some behavior S-needs-discussion Status: Needs discussion or design. S-waiting-on-review Status: Awaiting review from a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants