FileExists: set one watcher instead of thousands #831

michaelpj · 2020-09-24T21:19:41Z

This is broken. It sends an illegal message before we've sent the initialize response. I got this far because (amusingly) it actually works in emacs, but the test suite complains.

I'm not sure quite how to fix this. @pepeiborra suggested sending the message in fileExistsRules as I'm doing, but this appears to happen too early.

I can think of a few gross ways to fix this (actually do the registration the first time GetFileExists runs, add some extra state to determine whether we've done it), but I thought I'd put it up to solicit ideas.

This prevents us from sending thousands of notifications to the client
on startup, which can lock up some clients like emacs. Instead we send
precisely one.

This has some consequences for the behaviour of the fast file existence
lookup, which I've noted in the code, alongside a description of how it
works (I spent a while figuring it out, I thought I might as well write
it down).

Fixes #776.

pepeiborra · 2020-09-24T22:07:43Z

Yup, the spec is quite clear on this:

In addition the server is not allowed to send any requests or notifications to the client until it has responded with an InitializeResult

To fix this, installing the watcher on the first call to GetFileExists is ok. A nicer way would be to change the Rules () computation taken by initialise to Rules (IO ()), so that rules are allowed to return some delayed initialisation to be run on the LSP.onStartup handler.

https://github.com/haskell/ghcide/blob/master/src/Development/IDE/Core/Service.hs#L41-L44

https://github.com/haskell/ghcide/blob/master/src/Development/IDE/LSP/LanguageServer.hs#L135-L136

michaelpj · 2020-09-25T18:52:59Z

To fix this, installing the watcher on the first call to GetFileExists is ok.

I did this. It seems to be okay.

A nicer way would be to change the Rules () computation taken by initialise to Rules (IO ()), so that rules are allowed to return some delayed initialisation to be run on the LSP.onStartup handler.

I agree this would be nicer, but it's not clear we can do this: the Rules () eventually gets passed to a function from Shake, and we don't get the return value. Or at least, it seemed non-trivial to make this work.

That said, I've got quite lost trying to follow how the LSP initialization works. So perhaps there is a place we could slip in a hook for something to run post-initialize.

michaelpj · 2020-09-25T19:08:14Z

Okay, now I have an issue where some of the tests are timing out. I'm somewhat mystified.

My initial guess was that this resulted from a TOCTOU issue with checking whether the watcher had been initialized and initializing it, which could potentially result in lots of notifications being sent. But I think I fixed that in the second commit, and things still time out. Any ideas for debugging?

pepeiborra · 2020-09-26T07:54:00Z

There's one test failing:

    add missing module:                                                                                   FAIL
      Exception: Timed out waiting to receive a message from the server.
      Last message received:
      {
          "tag": "NotLogMessage",
          "contents": {
              "jsonrpc": "2.0",
              "params": {
                  "type": 4,
                  "message": "finish: kick (took 0.03s)"
              },
              "method": "window/logMessage"
          }
      }

Tests are parsers that recognise a specific sequence of lsp messages. A timeout means that ghcide didn't send the expected message back after waiting for long enough time.

This test is checking that ghcide notices that a new module was created in the file system. You probably need to update the test expectations to work with your changes.

By the way, I'm surprised that the tests below passed; they count the number of file watchers created for a file:

  watched files
    workspace files:                                                                                      OK (0.18s)
    non workspace file:                                                                                   OK (0.19s)

michaelpj · 2020-09-26T10:02:13Z

I changed the expectations for the watched files indeed! I had a look at the other test, but it wasn't obvious what it was doing wrong (that it would fail when the others don't). I'll have another stab!

michaelpj · 2020-09-26T12:22:43Z

I think I tracked down the problem: in the case where we didn't have cached information for a file, we would delegate to the VFS lookup, but unlike fileExistsSlow, we wouldn't set alwaysRerun, so this result would not get invalidated.

In the old version, if we hit the slow case in fileExistsFast, we would register a listener, and so we would later get notifications from that, which would handle invalidation. Tricky.

I think the handling of invalidation here is maybe a bit more complicated than it needs to be. I might investigate just using alwaysRerun unconditionally, but I expect that could affect performance, so I'll do that in a follow-up PR.

pepeiborra · 2020-09-26T12:52:00Z

The fileExistsFast rule is quite performance critical and a bit special - it doesn't need an alwaysRerun because modifyFileExists calls deleteValue to mutate the state used by defineEarlyCutoff. In other words, it bypasses the Shake machinery by abusing the implementation details of the rules defined by ghcide.

Why is this better than a cheap alwaysRerun with cutoff? Back in the day we didn't have a benchmark suite so it's entirely possible that it isn't any better.

pepeiborra · 2020-09-26T12:54:35Z

src/Development/IDE/Core/FileExists.hs

-newtype FileExistsMapVar = FileExistsMapVar (Var FileExistsMap)
+-- | The state of our file existence cache. A pair of a boolean indicating whether we have initialized the
+-- file watcher in the client yet, and a map tracking file existence.
+data FileExistsState = FileExistsState { fileExistsWatcherInitialized :: Bool, fileExistsMap :: FileExistsMap }


Both fields should probably be strict

pepeiborra · 2020-09-26T12:56:52Z

src/Development/IDE/Core/FileExists.hs

  changesMap           <- evaluate $ HashMap.fromList changes

  -- Masked to ensure that the previous values are flushed together with the map update
  mask $ \_ -> do
    -- update the map
-    modifyVar_ var $ evaluate . HashMap.union changesMap
-    -- flush previous values
+    modifyVar_ var $ \st -> evaluate $ st{fileExistsMap=HashMap.union changesMap (fileExistsMap st)}


This is why the fileExistsMap field needs to be strict

pepeiborra · 2020-09-26T13:06:06Z

The CI benchmarks show a 23% performance regression - could you run them locally in a stable environment and verify the results? I suspect this is because now the fast path calls the slow path and that propagates the alwaysRerun

benchmarks

michaelpj · 2020-09-26T18:06:02Z

Yep, I can reproduce the benchmarks being bad. It's not yet clear to me why that is, I'll do some more digging.

On the functional issue: my understanding is that the problem is the invalidation in the fast path. On this branch, if we have a file that's a workspace file but not covered by our watcher, we will use fileExistsFast, do a slow lookup, and then cache that forever, since we're not using alwaysRerun, and we will never get a client notification about changes to that file (since we're not watching it).

A simple fix would be to change the isWorkspaceFile guard for using fileExistsFast to also check if the file matches our watch pattern. That would restore the invariant that "x is handled with fileExistsFast <=> x is watched". But it would add some more checks.

However, it would be nice if we could use alwaysRerun, because the current logic seems racy even for files that are watched. Consider the following series of events:

We check for the existence of X.hs
This triggers us asking the client to watch all ".hs" files
We do a slow lookup for X.hs, it doesn't exist.
X.hs is created.
The client installs the watcher we ask for.
We check for X.hs, and get a cached answer that it doesn't exist... forever.

What I really want to say is that if we got the answer the slow way then we should recheck next time. But I don't know how to do that. alwaysRerun errs in the opposite direction of checking too often.

michaelpj · 2020-09-26T18:13:40Z

The performance degrades even before the commit which uses alwaysRerun :/

wz1000 · 2020-09-26T18:20:08Z

Can't we register the watcher immediately after initialization?

michaelpj · 2020-09-26T18:42:22Z

Can't we register the watcher immediately after initialization?

I just came to this realization myself, in particular I found https://microsoft.github.io/language-server-protocol/specification#initialized, which seems to be exactly for this. I'm in the process of rewriting to use that, let's see if it works.

Not sure that will help with the performance, but it should be neater.

pepeiborra · 2020-09-26T21:27:40Z

Yep, I can reproduce the benchmarks being bad. It's not yet clear to me why that is, I'll do some more digging.

On the functional issue: my understanding is that the problem is the invalidation in the fast path. On this branch, if we have a file that's a workspace file but not covered by our watcher, we will use fileExistsFast, do a slow lookup, and then cache that forever, since we're not using alwaysRerun, and we will never get a client notification about changes to that file (since we're not watching it).
A simple fix would be to change the isWorkspaceFile guard for using fileExistsFast to also check if the file matches our watch pattern. That would restore the invariant that "x is handled with fileExistsFast <=> x is watched". But it would add some more checks.

Yes. We were assuming that we never care about files not covered by the watcher. Unfortunately this is not the case for arbitrary cradle dependencies, like cabal descriptors and stack.yaml files; we'll need individual watchers for any file not covered by the glob pattern in the workspace watcher.

However, it would be nice if we could use alwaysRerun, because the current logic seems racy even for files that are watched. Consider the following series of events:

We check for the existence of X.hs

This triggers us asking the client to watch all ".hs" files

We do a slow lookup for X.hs, it doesn't exist.

X.hs is created.

The client installs the watcher we ask for.

We check for X.hs, and get a cached answer that it doesn't exist... forever.

What I really want to say is that if we got the answer the slow way then we should recheck next time. But I don't know how to do that. alwaysRerun errs in the opposite direction of checking too often.

It is a bit racy. What's the cost of using alwaysRerun? Let's measure it and make an informed decision: if the cost is small do plug the gap, but if it's big the choice is less clear

pepeiborra · 2020-09-27T08:19:42Z

I measured for the Cabal project using the branch https://github.com/pepeiborra/ghcide/tree/alwaysRerun-fileExists

The results are around 10% slower, so I think it's probably worth dropping the hack and switching to alwaysRerun.

My only concern is whether bigger projects like GHC will see a bigger impact, since the cost of alwaysRerun is linear in the number of possible import paths (the benchmark suite uses the Cabal codebase). I'll see about repeating the measurements with 10x the number of import source folders.

Results

HEAD - Use alwaysRerun
upstream - Use the values hack instead
Y axis - total time

Row Labels	HEAD	upstream
code actions	0.993193472	0.777636776
code actions after edit	36.5799817	32.86563162
completions after edit	26.90540779	22.91404066
documentSymbols after edit	5.872847811	6.360484811
edit	34.66888288	31.02675419
getDefinition	1.381465592	1.322841643
hover	0.99478149	1.07828978
hover after edit	54.36070061	51.61647408
Grand Total	161.7572613	147.9621536

pepeiborra · 2020-09-27T12:51:52Z

With 10 additional hs-import-dirs entries in the Cabal file the impact is much bigger, see chart below.

This scenario is not unrealistic in codebases like GHC, where (unless things have changed recently) there are multiple src import dirs, or ours at Facebook where dozens of small projects are loaded into ghcide, each one with its own import dir.

I think this justifies the current hack to avoid the Shake overhead.

michaelpj · 2020-09-27T12:57:49Z

Okay. I've attempted a version that filters based on whether the file in question would match our watcher, and removes the alwaysRerun.

However, I get the same timeout issue with that branch, so I need to investigate a bit more to figure out a way to get correct behaviour.

This prevents us from sending thousands of notifications to the client on startup, which can lock up some clients like emacs. Instead we send precisely one. This has some consequences for the behaviour of the fast file existence lookup, which I've noted in the code, alongside a description of how it works (I spent a while figuring it out, I thought I might as well write it down). Fixes #776.

michaelpj · 2020-09-27T15:53:16Z

Well, turns out I was wrong about why the tests were failing, it was actually lukel97/lsp-test#77.

That was inadvertently exercising the failure mode of "the client doesn't send us change notifications even though we asked for them". This will result in us permanently caching the stale results, but this seems okay, since the whole point is to rely on the client for this and not do it ourselves, so there's not much we can do if the client lets us down. I've documented that, nonetheless.

That meant the tests passed, but I still needed to handle the case of workspace-but-not-watched files. I've implemented something like what I described above: we guard the fast path by a check whether the path matches the glob patterns that we are watching. This costs us a bit, but makes it more correct.

My benchmarks seem a bit inconsistent, particularly for "hover", but the general pattern seems to be that:

My version without the glob checking (first commit) is faster than master.
My version with glob checking (second commit) is slower than the previous version, but still faster than master.

I'd appreciate a check on this, since I'm not super-confident that I'm getting the right results here.

michaelpj · 2020-09-27T15:53:24Z

Otherwise, I think this is good to go!

michaelpj · 2020-09-27T16:30:49Z

Ugh, the Azure benchmarks show more of a regression on this PR. They look completely different to when I run them locally 🙄

pepeiborra · 2020-09-27T19:19:20Z

The Azure benchmarks are not reliable, because they run in a shared environment. The only reliable metric in that setting is the total allocations

pepeiborra · 2020-09-27T20:00:14Z

Performance is fine. Total time measurements for your branch (HEAD~2 is upstream):

pepeiborra

This is an awesome contribution, thanks.

michaelpj · 2020-09-27T21:54:13Z

Thanks for helping me through the thorns :) And I'm super glad we have a benchmark suite!

mrBliss · 2020-09-28T11:48:25Z

Thanks @michaelpj!

* FileExists: set one watcher instead of thousands This prevents us from sending thousands of notifications to the client on startup, which can lock up some clients like emacs. Instead we send precisely one. This has some consequences for the behaviour of the fast file existence lookup, which I've noted in the code, alongside a description of how it works (I spent a while figuring it out, I thought I might as well write it down). Fixes haskell/ghcide#776. * Use fast rules only if it matches our watcher spec

michaelpj force-pushed the imp/less-file-watching branch 2 times, most recently from 26ed2a9 to 65b840c Compare September 25, 2020 18:52

michaelpj force-pushed the imp/less-file-watching branch from 65b840c to 4cc504b Compare September 25, 2020 19:06

michaelpj force-pushed the imp/less-file-watching branch from a605333 to 2b62893 Compare September 26, 2020 11:34

pepeiborra previously approved these changes Sep 26, 2020

View reviewed changes

pepeiborra reviewed Sep 26, 2020

View reviewed changes

michaelpj force-pushed the imp/less-file-watching branch from 2b62893 to d65fd03 Compare September 26, 2020 20:45

michaelpj force-pushed the imp/less-file-watching branch 2 times, most recently from f2549e7 to 114ebf4 Compare September 27, 2020 15:23

michaelpj force-pushed the imp/less-file-watching branch from 114ebf4 to f46a0e1 Compare September 27, 2020 15:24

Use fast rules only if it matches our watcher spec

9ac17c7

michaelpj force-pushed the imp/less-file-watching branch from f46a0e1 to 9ac17c7 Compare September 27, 2020 15:34

pepeiborra approved these changes Sep 27, 2020

View reviewed changes

pepeiborra merged commit b279afb into haskell:master Sep 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileExists: set one watcher instead of thousands #831

FileExists: set one watcher instead of thousands #831

michaelpj commented Sep 24, 2020

pepeiborra commented Sep 24, 2020

michaelpj commented Sep 25, 2020

michaelpj commented Sep 25, 2020

pepeiborra commented Sep 26, 2020

michaelpj commented Sep 26, 2020

michaelpj commented Sep 26, 2020

pepeiborra commented Sep 26, 2020

pepeiborra Sep 26, 2020

pepeiborra Sep 26, 2020

pepeiborra commented Sep 26, 2020

michaelpj commented Sep 26, 2020

michaelpj commented Sep 26, 2020

wz1000 commented Sep 26, 2020

michaelpj commented Sep 26, 2020

pepeiborra commented Sep 26, 2020

pepeiborra commented Sep 27, 2020 •

edited

Loading

pepeiborra commented Sep 27, 2020

michaelpj commented Sep 27, 2020

michaelpj commented Sep 27, 2020

michaelpj commented Sep 27, 2020

michaelpj commented Sep 27, 2020

pepeiborra commented Sep 27, 2020

pepeiborra commented Sep 27, 2020

pepeiborra left a comment

michaelpj commented Sep 27, 2020

mrBliss commented Sep 28, 2020

FileExists: set one watcher instead of thousands #831

FileExists: set one watcher instead of thousands #831

Conversation

michaelpj commented Sep 24, 2020

pepeiborra commented Sep 24, 2020

michaelpj commented Sep 25, 2020

michaelpj commented Sep 25, 2020

pepeiborra commented Sep 26, 2020

michaelpj commented Sep 26, 2020

michaelpj commented Sep 26, 2020

pepeiborra commented Sep 26, 2020

pepeiborra Sep 26, 2020

Choose a reason for hiding this comment

pepeiborra Sep 26, 2020

Choose a reason for hiding this comment

pepeiborra commented Sep 26, 2020

michaelpj commented Sep 26, 2020

michaelpj commented Sep 26, 2020

wz1000 commented Sep 26, 2020

michaelpj commented Sep 26, 2020

pepeiborra commented Sep 26, 2020

pepeiborra commented Sep 27, 2020 • edited Loading

Results

pepeiborra commented Sep 27, 2020

michaelpj commented Sep 27, 2020

michaelpj commented Sep 27, 2020

michaelpj commented Sep 27, 2020

michaelpj commented Sep 27, 2020

pepeiborra commented Sep 27, 2020

pepeiborra commented Sep 27, 2020

pepeiborra left a comment

Choose a reason for hiding this comment

michaelpj commented Sep 27, 2020

mrBliss commented Sep 28, 2020

pepeiborra commented Sep 27, 2020 •

edited

Loading