-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File Search - Improve performance of file search #15384
Comments
Subjectively, file search in VS Code is slower than in other editors, and this has been noticed by users. Here are a couple example issues: Slow search Slow, cancelling is not discoverable, progress is not helpful The first step is
To nail down the issue, I'm doing a round of performance tests in controlled conditions to better understand the perf characteristics of our current file search, and find the areas that will benefit the most from optimization. I'm doing the same search using other tools for comparison. GoalsVS Code file search should not be, or feel, significantly slower than other editors. Being faster than some is probably achievable. Results should appear quickly and the UI should remain responsive throughout the duration of the search. We also have the goal of investigating whether tools that might provide a very fast indexed search, and a more "fuzzy" search that goes beyond literal text, and evaluating the costs associated with it. SetupI'm running these tests in two workspaces, workspace A and workspace B. I'm trying to run a search in each that produces about the same number of results, where workspace A is a very large workspace with roughly evenly-distributed results, and workspace B is a very small workspace with densely packed results. Workspace A is a clone of the chromium repo, by the instructions here. This is workspace of 234,655 files, about 1.8 GB. In this workspace I'm doing a case-insensitive search for the phrase Workspace B is a folder containing 4 files, which together have just the phrase The regex search uses the regex Timing is done using a stopwatch from pressing 'enter' on the query, to the results being fully loaded into the UI and ready for interaction. Except for vscode, into which I injected timing instrumentation, and grep, using the 'time' command. This is pretty adhoc testing on my MBP and Windows desktop with SSDs. The Grep search is using this command: ResultsMacOS
Windows
AnalysisVS Code time breakdownWhat happened when VS Code spent 1m49s on a search: The time from starting the search to finding a match is very short, but then we sit on that match for 100+ seconds, because we don't send results to the frontend until we've filled a batch of 512. Since there are fewer than a full batch of results, nothing is visible until the search is over. So we could increase the perceived performance quite a lot by adding a timeout to send intermediate results over if the batch doesn't fill up. I changed the batch size to 20 and got results immediately, which is much nicer. But scrolling around through the list, it jumped around a lot as results were inserted into the middle. Results are returned as files are finished, which may or may not be in order. We could only show results in order (Sublime does this), or show results as they come in. This is also a problem with a batch size of 512, but there are only 4 batches in our max results limit of 2048, so less of a problem. On the end of making search faster, we should consider using the We can also try to run our search in more than one process. I saw a near 4x speedup when running 4 processes of grep instead of 3. Depending on disk speed, number of cores, and amount of time we spend just sending information between processes. Fuzzy searchWe also wanted to investigate options for enabling search-engine-style indexed fuzzy search in a workspace. 'Indexed' meaning an index is built for the files in the workspace, on disk or in memory, and a search in a large workspace can complete in seconds or milliseconds. 'Fuzzy' meaning a search based on not literal text or regex, but on word stemming, word proximity, etc. Here's an example use case, based on a true story: I'm exploring the vscode code base, trying to find the code responsible for file search. I There are a bunch of options for this, but any third-party tool here will increase our download size, so we plan to consider exposing search as an extension API, and implementing an optional extension that would provide this advanced search behavior for people who want it, while making sure that our built-in search is still usable and fast. |
These results come from running the
|
…tart, end). This is much faster although it's not totally obvious why. But it's run for every line of every file in the workspace so it's very sensitive. See #15384.
Part of #55.
We will focus on simple text search. Regex text searches might be more difficult to optimize and are used less often than simple text searches. We will initially explore our options (e.g., refining our own implementation, command line tools, text search libraries/engines).
We will include full-text search libraries/engines in our investigation and see if any of these can do simple text searches while at the same time open the door for also supporting a more natural 'fuzzy' text search at a later time.
The goal is raw search performance with library size (if any) and disk usage of an index (if any) being potential trade-offs.
The text was updated successfully, but these errors were encountered: