-
Notifications
You must be signed in to change notification settings - Fork 97
FileExists: set one watcher instead of thousands #831
Conversation
Yup, the spec is quite clear on this:
To fix this, installing the watcher on the first call to |
26ed2a9
to
65b840c
Compare
I did this. It seems to be okay.
I agree this would be nicer, but it's not clear we can do this: the That said, I've got quite lost trying to follow how the LSP initialization works. So perhaps there is a place we could slip in a hook for something to run post-initialize. |
65b840c
to
4cc504b
Compare
Okay, now I have an issue where some of the tests are timing out. I'm somewhat mystified. My initial guess was that this resulted from a TOCTOU issue with checking whether the watcher had been initialized and initializing it, which could potentially result in lots of notifications being sent. But I think I fixed that in the second commit, and things still time out. Any ideas for debugging? |
There's one test failing:
Tests are parsers that recognise a specific sequence of lsp messages. A timeout means that ghcide didn't send the expected message back after waiting for long enough time. This test is checking that ghcide notices that a new module was created in the file system. You probably need to update the test expectations to work with your changes. By the way, I'm surprised that the tests below passed; they count the number of file watchers created for a file:
|
I changed the expectations for the watched files indeed! I had a look at the other test, but it wasn't obvious what it was doing wrong (that it would fail when the others don't). I'll have another stab! |
a605333
to
2b62893
Compare
I think I tracked down the problem: in the case where we didn't have cached information for a file, we would delegate to the VFS lookup, but unlike In the old version, if we hit the slow case in I think the handling of invalidation here is maybe a bit more complicated than it needs to be. I might investigate just using |
The Why is this better than a cheap |
newtype FileExistsMapVar = FileExistsMapVar (Var FileExistsMap) | ||
-- | The state of our file existence cache. A pair of a boolean indicating whether we have initialized the | ||
-- file watcher in the client yet, and a map tracking file existence. | ||
data FileExistsState = FileExistsState { fileExistsWatcherInitialized :: Bool, fileExistsMap :: FileExistsMap } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both fields should probably be strict
changesMap <- evaluate $ HashMap.fromList changes | ||
|
||
-- Masked to ensure that the previous values are flushed together with the map update | ||
mask $ \_ -> do | ||
-- update the map | ||
modifyVar_ var $ evaluate . HashMap.union changesMap | ||
-- flush previous values | ||
modifyVar_ var $ \st -> evaluate $ st{fileExistsMap=HashMap.union changesMap (fileExistsMap st)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why the fileExistsMap field needs to be strict
The CI benchmarks show a 23% performance regression - could you run them locally in a stable environment and verify the results? I suspect this is because now the fast path calls the slow path and that propagates the |
Yep, I can reproduce the benchmarks being bad. It's not yet clear to me why that is, I'll do some more digging. On the functional issue: my understanding is that the problem is the invalidation in the fast path. On this branch, if we have a file that's a workspace file but not covered by our watcher, we will use A simple fix would be to change the However, it would be nice if we could use
What I really want to say is that if we got the answer the slow way then we should recheck next time. But I don't know how to do that. |
The performance degrades even before the commit which uses |
Can't we register the watcher immediately after initialization? |
I just came to this realization myself, in particular I found https://microsoft.github.io/language-server-protocol/specification#initialized, which seems to be exactly for this. I'm in the process of rewriting to use that, let's see if it works. Not sure that will help with the performance, but it should be neater. |
2b62893
to
d65fd03
Compare
Yes. We were assuming that we never care about files not covered by the watcher. Unfortunately this is not the case for arbitrary cradle dependencies, like cabal descriptors and
It is a bit racy. What's the cost of using |
I measured for the Cabal project using the branch https://github.com/pepeiborra/ghcide/tree/alwaysRerun-fileExists The results are around 10% slower, so I think it's probably worth dropping the hack and switching to My only concern is whether bigger projects like GHC will see a bigger impact, since the cost of ResultsHEAD - Use
|
With 10 additional This scenario is not unrealistic in codebases like GHC, where (unless things have changed recently) there are multiple src import dirs, or ours at Facebook where dozens of small projects are loaded into ghcide, each one with its own import dir. I think this justifies the current hack to avoid the Shake overhead. |
Okay. I've attempted a version that filters based on whether the file in question would match our watcher, and removes the However, I get the same timeout issue with that branch, so I need to investigate a bit more to figure out a way to get correct behaviour. |
f2549e7
to
114ebf4
Compare
This prevents us from sending thousands of notifications to the client on startup, which can lock up some clients like emacs. Instead we send precisely one. This has some consequences for the behaviour of the fast file existence lookup, which I've noted in the code, alongside a description of how it works (I spent a while figuring it out, I thought I might as well write it down). Fixes #776.
114ebf4
to
f46a0e1
Compare
f46a0e1
to
9ac17c7
Compare
Well, turns out I was wrong about why the tests were failing, it was actually lukel97/lsp-test#77. That was inadvertently exercising the failure mode of "the client doesn't send us change notifications even though we asked for them". This will result in us permanently caching the stale results, but this seems okay, since the whole point is to rely on the client for this and not do it ourselves, so there's not much we can do if the client lets us down. I've documented that, nonetheless. That meant the tests passed, but I still needed to handle the case of workspace-but-not-watched files. I've implemented something like what I described above: we guard the fast path by a check whether the path matches the glob patterns that we are watching. This costs us a bit, but makes it more correct. My benchmarks seem a bit inconsistent, particularly for "hover", but the general pattern seems to be that:
I'd appreciate a check on this, since I'm not super-confident that I'm getting the right results here. |
Otherwise, I think this is good to go! |
Ugh, the Azure benchmarks show more of a regression on this PR. They look completely different to when I run them locally 🙄 |
The Azure benchmarks are not reliable, because they run in a shared environment. The only reliable metric in that setting is the total allocations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an awesome contribution, thanks.
Thanks for helping me through the thorns :) And I'm super glad we have a benchmark suite! |
Thanks @michaelpj! |
* FileExists: set one watcher instead of thousands This prevents us from sending thousands of notifications to the client on startup, which can lock up some clients like emacs. Instead we send precisely one. This has some consequences for the behaviour of the fast file existence lookup, which I've noted in the code, alongside a description of how it works (I spent a while figuring it out, I thought I might as well write it down). Fixes haskell/ghcide#776. * Use fast rules only if it matches our watcher spec
* FileExists: set one watcher instead of thousands This prevents us from sending thousands of notifications to the client on startup, which can lock up some clients like emacs. Instead we send precisely one. This has some consequences for the behaviour of the fast file existence lookup, which I've noted in the code, alongside a description of how it works (I spent a while figuring it out, I thought I might as well write it down). Fixes haskell/ghcide#776. * Use fast rules only if it matches our watcher spec
* FileExists: set one watcher instead of thousands This prevents us from sending thousands of notifications to the client on startup, which can lock up some clients like emacs. Instead we send precisely one. This has some consequences for the behaviour of the fast file existence lookup, which I've noted in the code, alongside a description of how it works (I spent a while figuring it out, I thought I might as well write it down). Fixes haskell/ghcide#776. * Use fast rules only if it matches our watcher spec
This is broken. It sends an illegal message before we've sent the initialize response. I got this far because (amusingly) it actually works in emacs, but the test suite complains.
I'm not sure quite how to fix this. @pepeiborra suggested sending the message in
fileExistsRules
as I'm doing, but this appears to happen too early.I can think of a few gross ways to fix this (actually do the registration the first time
GetFileExists
runs, add some extra state to determine whether we've done it), but I thought I'd put it up to solicit ideas.This prevents us from sending thousands of notifications to the client
on startup, which can lock up some clients like emacs. Instead we send
precisely one.
This has some consequences for the behaviour of the fast file existence
lookup, which I've noted in the code, alongside a description of how it
works (I spent a while figuring it out, I thought I might as well write
it down).
Fixes #776.