Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursively scanning $PWD #99

Closed
xxxserxxx opened this issue Oct 19, 2022 · 9 comments · Fixed by #110
Closed

Recursively scanning $PWD #99

xxxserxxx opened this issue Oct 19, 2022 · 9 comments · Fixed by #110

Comments

@xxxserxxx
Copy link

I was surprised to find that marksman recursively scans the directory it was run in, and opens every markdown file it finds as well as every .gitignore. This means that if a user with an editor that uses marksman opens a file from their home directory, marksman will attempt to recursively scan their entire home directory; if they open a file from root, marksman will recursively scan the entire filesystem. Among other things, it means marksman puts significant load on the machine and impacts the editor's performance. I'm going to assume that marksman is closing those files before it runs the OS out of file handles, but regardless, it's at least unexpected and at worst worrying behavior. There's a question about whether marksman should be accessing every file it can reach that the user hasn't asked it to.

At the very least, there should be a caveat lector in the readme about this behavior.

@artempyanykh
Copy link
Owner

artempyanykh commented Oct 19, 2022

Thanks for raising this @xxxserxxx . Marksman does this scan only in the workspace mode, that is if you open a file inside the folder which is either under version control or has a marker .marksman.toml file. Marksman needs to scan the workspace in order to provide cross file completion, navigation, and diagnostics.

Otherwise marksman opens a file in a single file mode and doesn't do the recursive scan.

I'd imagine you don't have a .git or .marksman.toml in your $HOME or your / folder? Did you actually observe the recursive scan outside of the "workspace" context?

Also, the initial scan doesn't keep the files open. It is required to hydrate the initial state. After that marksman relies on the editor to communicate text changes and file create/delete events.

@xxxserxxx
Copy link
Author

Yes, I observe it outside of a workspace, and outside of repositories.

You can easily replicate this:

Script started on 2022-10-20 10:45:05-05:00 [COMMAND="sh" TERM="tmux-256color" TTY="/dev/pts/11" COLUMNS="89" LINES="21"]
sh-5.1$ cd /tmp
sh-5.1$ mkdir temp
sh-5.1$ cd temp
sh-5.1$ git rev-parse --show-toplevelgit rev-parse --show-toplevel
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
sh-5.1$ function makeBig() {
   for x in $(seq 1 $1); do
      touch file-$x.md
      mkdir dir-$x
      pushd dir-$x >/dev/null
      touch file-$(($1-1))-$x.md
      makeBig $(($1 -1))
      popd >/dev/null
    done
}
sh-5.1$ makeBig 5
sh-5.1$ firejail --trace=marksman.log hx file-1.mdfirejail --trace=marksman.log hx file-1.md
######################################################################
########### helix opens, wait a couple seconds, close helix ##########
######################################################################
sh-5.1$ find . -name \*.md | wc -l
650
sh-5.1$ grep 'marksman:open .*\.md' marksman.log | wc -l
650
sh-5.1$ exit
exit

Script done on 2022-10-20 10:47:17-05:00 [COMMAND_EXIT_CODE="0"]

The function makeBig 5 will create 650 markdown files in subdirectories up to 5 levels deep. If you visually inspect the log, you'll see marksman entering every directory, recursively, and opening every file.

Most users probably don't change LSP servers for markdown depending on what directory they're in, but many do edit markdown everywhere on their filesystem. For helix, the path is relative to the file being edited, not the directory helix was opened in (contrary to what I originally said -- $PWD is irrelevant).

sudo touch /test.md
hx /test.md

would be interesting on a system that has security alerts for access triggers, as marksman will try to enter /etc/certs, /etc/opendkim, /root, and so on. Thankfully, markdown is not popular for READMEs in /etc, so this sort of scenario is unlikely.

A much more concerning situation is if the user opens a markdown in a directory that has remotely mounted shares. Marksman will cross filesystem boundaries in its search for markdowns:

mkdir temp && cd temp
for x in $(seq 1 3); do
   mkdir remote$x
   sshfs server$x.my.net remote$x
done
ln -s / link-to-root
hx README.md

I can think of several situations where this sort of thing could cause real-life issues; the main point is that its hidden behavior -- a user might fire up an editor, not expecting a full filesystem (and, in some cases, network) traversal.

I know of only two ways to prevent this behavior in Helix: change the lsp to something else (ltex-s), or uninstall marksman. I am not sure the "workspace mode" is toggle-able once the editor is started. Is this behavior that editors should be able to control through the LSP API?

@xxxserxxx
Copy link
Author

I apologize; I think, based on what you've said and I've seen, that I'm seeing a bug. Would you like me to re-file this as a ticket?

@artempyanykh
Copy link
Owner

artempyanykh commented Oct 20, 2022

Great! Thanks for the repro @xxxserxxx!

No need to refile. I'm going to investigate.

@artempyanykh
Copy link
Owner

artempyanykh commented Oct 23, 2022

@xxxserxxx I couldn't reproduce the behavior you described... at first. Until I recompiled helix from master.
The thing is something changed in helix between 66276ce6..ce469abf.
Previously, when you opened a file that was NOT under VCS and NOT marked with a custom root file, helix would initialize the marksman language server with an empty list of workspaceFolders. This enabled a single-file mode in marksman.

Now, with the new version, helix initializes workspaceFolders with cwd. Which doesn't make sense to me and looks like a bug/buggy UX in helix.

Ideally, I'd prefer editors to figure out whether a file is under a workspace/project and configure marksman properly because each editor/LSP client has its own logic around what is a project.

@artempyanykh
Copy link
Owner

artempyanykh commented Oct 23, 2022

Reported to helix helix-editor/helix#4436 Let's see if they agree that this is a bug. If not, I'll have to add a work-around to marksman.

@xxxserxxx
Copy link
Author

Thank you; if I'd have been thorough, I'd have tried this with vim, too. Thanks for tracking it down.

@artempyanykh artempyanykh added the needs-further-input Further information is required from the reporter label Nov 5, 2022
@artempyanykh
Copy link
Owner

@xxxserxxx FYI I've landed a temporary workaround in #110 . The next release will behave saner with Helix.

@artempyanykh artempyanykh added pending-release The feature/fix has landed and will be available with the next release and removed needs-further-input Further information is required from the reporter labels Nov 13, 2022
@artempyanykh artempyanykh reopened this Nov 13, 2022
@artempyanykh artempyanykh removed the pending-release The feature/fix has landed and will be available with the next release label Nov 25, 2022
@artempyanykh
Copy link
Owner

Fix/workaround released in https://github.com/artempyanykh/marksman/releases/tag/2022-11-25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants