-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lint multiple files in parallel [$500] #3565
Comments
Nice. I'm very interested in trying this myself too. |
Another question that I had in my mind, if we need to rewrite everything to be async, should we use callback pattern? Promises? If so which library? Q or Bluebird? I personally would prefer promises to callback hell. |
I vote for promises. Bluebird is fast, but makes me nervous because it adds methods to the native |
Why not use built in promises? Just a question as I have no experience with promises yet. |
They're not supported in node 0.10, to my knowledge. Besides that, the libraries give some nice "sugar" methods when working with Promises. |
I've had plenty of success using native promises (or a polyfill when native promises aren't supported). That seems like a good starting point to me; if we need more than they provide we could probably swap out something that's API-compatible. |
I think we're putting the cart before the horse here. Let's hold off on promises vs. callbacks until we're at least ready to prototype. Get something working with callbacks and let's see how bad it is (or not). |
ESLint also does a lot of IO (directory traversal, reading source files). So I think we would also profit here if we rewrite eslint to do non-blocking IO. |
@lo1tuma I haven't profiled it yet, but in my mind, amount of IO we do is negligible comparing to amount of CPU cycles we eat. I will try to profile it and post results here if I will get anything meaningful. |
Using something like NodeClusters - or most other per-process implementations - would avoid the issue of needing to [vastly] rewrite ESLint. (Such implementations are strictly not threading, but allow managed parallel process execution.) It would mostly just need to IPC-in/out the current ESLint; ESLint parallelism would then be able to work freely over different files in a per-process manner, but it could not (without more rework) run concurrently over a single file. Thus if the goal is to run ESLint over different files in parallel I would urge such a 'simple' per-process concurrency approach. If the goal is to make ESLint parallel across the same source/AST then .. that is a more complicated can of worms as it changes the 'divergence' point. If there is a strict v0.10 node target for ESLint, maybe have this as an feature when running a compatible node version. |
My idea is:
There are source codes which has a variety of size, so I think the queue control in pull-style is important. Library.... I think |
Sorry for this question: Based on all the comments, do we think the reward of this functionality is worth the effort/changes/complications? (like I said just a question) |
@gyandeeps My projects are not big enough, or computer slow enough, for me to really care either way. In cases where there are sizable projects of many files, on computers with several cores and non-bound I/O, I imagine that it could lead to significantly reduced wall-clock, approaching Amdal's law. I would be less optimistic about this gain with fewer larger files, even with 'actual threading' or post-AST handling - but that is what performance profiles are for. Of course another option is to only lint 'changed' files and provide some sort of result cache, but comes with additional data management burdens. |
@gyandeeps To answer your question - we do not have definitive information on this. Right now my assumption is that we are CPU-bound, not IO. In that case utilizing more cores should have significant impact on larger projects (as @pnstickne mention, impact will be small or there might even be negative impact on a few large files). @pnstickne Thanks for the insights. I'm really not familiar with NodeClusters, just know that they exist and do not require everything to be async for them to act. We definitely not going to try to make ESLint run AST analysis in parallel, that's going to be a huge change, and a lot of our rules expect nodes to be reported in the right order, so we would not only need to rewrite pretty much the whole core, we would also need to rewrite a lot of rules. |
So I did some very unscientific performance analysis on both Windows and OSX. |
@ilyavolodin Try this: Start running 8 different eslint processes (over the same set of files should be fine, although it would be 'more fair' to break up the files into 8 different equally-sized sets). Compare the wall-clock times it takes for 1 process to complete and 8 processes to complete. The 8 processes will have done 8x the work (if using the same set of files for each process as for the single process) or the same amount of work (if having split the source files among them), but in how much x the time? This very crudely should show an approximate gain - if any - for using multi-process concurrency. |
Late to the conversation, but... Is anyone opposed to someone starting up a pull request to implement the callback hell approach (i.e., make ESLint async across the board, including for all I/O operations)? Seems like it would make the eventual parallelization easier anyway. |
This is essentially what ESLinter from @royriojas does. |
@IanVS That's pretty cool .. now if only it was built-in to my eslint grunt task :} (Okay, I could get it shimmed in pretty easy - but it'd still be nice to see a 'done package'.) |
@pnstickne #2998 |
@ilyavolodin That's fair enough. I'm wondering, though, would it be worth
|
@platinumazure While sync code is node anti-pattern, in our case, we can't do anything with the code (as in parse it into AST) until we read the whole file. So if improving performance is not on the plate, then changing code to async will increase complexity of the code, but not gain us anything. It's worth testing out and I still think there's a performance gain that we can get by doing parallel execution, but I would like to get some proof of that first. |
@ilyavolodin reading files asynchronously doesn’t mean you read them chunk-by-chunk. While you are waiting for one file to be read, you can lint a different file which has been read already. |
Here is the TL;DR on the status of feature as of February 23, 2023:
TSC SummaryESLint assumes that each rule and source file can be processed independently. typescript-eslint (ref #42 (comment)) and eslint-plugin-import (ref #42 (comment)) need to do upfront initialization work beyond the scope of a single rule and source file, specifically loading type information and tracing a module graph. Lacking a first-class API, they have inserted these initialization steps into the regular rule linting flow. If we were to ship parallel linting without supporting this use case, the duplicated initialization could make parallel linting slower than single-threaded linting with these plugins. The large number of ESLint users who also use one of these plugins would not benefit from parallel linting. This API would need to provide the plugin with the config and list of files to be linted, and parallel linting would need a way to share the result with workers. Pre-requirementsOne of the main reasons why we cannot proceed with this issue is that a lot of work needs to be done to the core APIs in order to even move forward. Some of the things discussed:
Alternatives Solutions in the Meantimejest-runner-eslintConsidering looking jest-runner-eslint. Some folks have seen considerable performance improvements.
EsprintEsprint could be another alternative in the meantime Please let me know if I've missed anything and I'll update this comment. |
Awesome work @sam3k! We are leaving this open because we do still intend to work towards this. |
Just wanna chime in that we've implemented a solution based on running ESLint in worker threads in the Backstage CLI. It provided a massive speed increase as well, especially compared to if you were running lint tasks through Lerna. |
@Rugvip nice! Thanks for sharing. |
Can you provide an example config? I can't make jest-runner-eslint work for some reason. |
A lot of discussion about parallel linting is taking place in #14139, because it will affect plugins who have a significant initialization step. Some salient points from that conversation:
|
@stevenpetryk thanks for the summary! Very helpful to have all of this info in one place. |
Wanted to chime in here given the hype about speeding up tools. tl;dr: ESLint can run in parallel with no issues, and the overhead of duplicated work is negligible: main...discord:eslint:parallel. This adds an opt-in We've been running ESLint on our monorepo in CI for a long time. As the codebase grew, ESLint started taking longer and longer to run. A few years ago, we had about 10k files, and ESLint would take ~3 minutes to run on CI. Then we wanted to add TypeScript rules to do some additional validation, but unfortunately adding those rules kicked up our ESLint run time to almost 8 minutes. Eventually, we split out the TS rules from all of the other ones and ran them as two separate jobs, which actually worked pretty well. Normal ESLint was back down to about 3 minutes, and the TypeScript job also took about 3 or 4 minutes to run just those rules. Fastforward to about 2 months ago, and our codebase has continued to expand, now almost 15k files, well over half a million LOC. Normal ESLint now takes 5 minutes to run, and TypeScript rules are getting close to 6. It's pretty infeasible to run them locally at this point, and they're becoming the bottleneck of our CI pipeline. I wanted to do something about it, and toyed with splitting up ESLint jobs even more, but none of that worked too well. Most importantly, reporting on errors became a big hassle. After some quick investigation, it became pretty clear that the time-consuming part of ESLint isn't the rules...at all. We run TypeScript rules, Taking inspiration from Then it was time to see just how much things improved. Running locally on my M2 Max with 12 cores, the normal ESLint run would take about 95 seconds to run (the 3-5 minute numbers from earlier were on CI, not our local machines). With parallelization across all of those cores (11 workers), the total run time was reduced to...20 seconds. A huge return, more than a 75% speedup. Then I wanted to see how TypeScript was affected. As mentioned in this thread, the TS rules do a lot of work scanning the entire project, and it uses a lot of caching that would be nice to have persistent across everything, but how much does that affect the run? Well before, the job with just those rules would take about 80 seconds to run. In parallel...24 seconds. Basically the same speedup, despite not having as good of a cache to rely on. Why? Because it really is just outweighed by the number of files being operated on, and the fact is that TS is itself a single-threaded workload. So all of that scanning work also ends up being run in parallel, and the only actual overhead is the slightly higher amount of time spent computing things that might have been cached already if they were all run on a single core. But even then, all of that work has to be done at some point on a single thread anyway. Our CI times, for reference, have a very similar level of improvement. We use 6 core machines, and the runtime of normal ESLint has dropped from 4 minutes to just over 1 minute, and TS rules from 4 minutes down to just under 2 minutes. Definitely some relative overhead, but still a massive improvement. All of this story is to say: I think the debate about how to design a good API for making parallel ESLint truly efficient is absolutely worthwhile. A way to cache work across multiple workers would be great, or to have some input on how files get batched to make it more efficient. But none of that is necessary to see huge, safe improvements in runtime just by distributed the work naively. I'd love to upstream these changes into ESLint itself, both so that everyone else can benefit from them, but also so we don't have to maintain a fork lol. I think it's probably close to being able to be merged, sans a few stylistic things, and of course general review. I'm happy to make a PR to get that process started if it would be welcome. |
@faultyserver Does your typescript configuration include type-check rules as well? |
Yep! The TS job is specifically for the rules that require type checking. We left one or two non-checked ones like It would be great to be able to run them all locally with the LSP, but it just becomes too slow in our codebase (like more than a second delay between hitting save and the file actually saving, etc). |
@faultyserver Thanks so much for sharing this work! It's very interesting to see an update with a functioning prototype in this long-standing discussion. Since this is such a popular issue, I thought I'd also put my 2 cents in. I have a tool for parallelizing ESLint which I've only been able to test on a few projects so far, so it's probably more experimental than yours. For anyone interested, the repo is here: https://github.com/origin-1/eslint-p (just updated to ESLint v8.55). The results look promising for large codebases, but not so good for smaller ones, especially when typescript-eslint is used. Clearly the overhead involved in setting up and synchronizing the workers can easily exceed the benefits of parallelization. Nonetheless I was able to lint the ESLint repo (JavaScript only) in just less than 8 seconds instead of around 16 seconds on an M1 machine with Maybe from a user perspective it would be best to autotune the process depending on the type of the project, the number of files to be linted, the environment, etc. without asking people to set an option (I like the way you set the concurrency to |
Great to know there are other attempts at this happening too! Some quick feedback from what I've learned in our parallel implementation:
The main reason for using processes rather than threads is to reduce contention on the process making syscalls to read the file system. Unix only gives so many accesses at a time to a process (citation needed, but i discovered that while doing this implementation), all the threads being in the same process means you're potentially hitting that bottleneck, and the single Node VM also has to wait on those syscalls. Distributing over processes means the VM has less contention. It can also avoid context switching within the VM when changing between threads, which also helped some. Our implementation also lets each worker resolve the config as it goes, so there's no memory sharing necessary at all, making processes much easier to switch to.
Dispatching each file individually between threads or between processes is going to slow down the parallelized version a lot. That's more time spent in serialization and doing IPC (or at least memory copies). Copying a batch of 50 file paths at once and getting the results in one go will be a lot faster than doing each one individually (2 messages vs 100). If each message takes 10ms to send, that's 10ms vs half a second spent communicating). Batching also ensures that you only spin up new workers when they're actually useful. Fewer than 50 files? It won't even try to spin up more than 1 worker. Fewer than 200? Never gonna spawn more than 4. 50 seems to be around the boundary where the time to spawn a process becomes negligible, but that's still an arbitrary number. 100 in a batch could also work well, or 20. Depends a lot, but in reality it doesn't matter much. But that leads to the biggest thing, which is having a pool of workers rather than spawning a new one every time. A new process will take a lot longer to spin up (granted it's less than you probably think, maybe 150ms at most on an M2), but even threads have an initialization cost. Keeping a pool of workers around and queueing them to receive messages will bring down the overhead a lot as well. Synchronizing is also pretty minimal if you keep track of "how busy" each worker is and dispatch to the least busy one. That's pretty cheap to do and helps keep the load balanced over time.
With pooling and batching, even sticking with threads, I think this could get up to maybe 70% speedup with 4 threads, probably a little higher with processes, but not too much more, just because the overall time is relatively low, so initialization is a little more costly.
For everything up to medium-large size projects (a few hundred files), and with batching, there would likely never be more than 4 or 5 workers going at a time. The only way to really know what's going to be the fastest for a given project, though, is to run it a few times and see which one gives the best result. It's so dependent on the number of files, but also the size of those files (and for typescript, the type-wise complexity). Keeping parallelism as opt-in entirely also ensures that small usecases that are super sensitive to initialization costs stay as fast as possible (i.e., my project with 20 files could lint in 60ms, less than the time needed to spawn a single worker. it's definitely not worth doingt then). |
It's totally possible that the OS restricts the number of simultaneous I/O accesses for a process, I'm not sure though if the bottleneck limit would be hit already at a concurrency of 2 or 4. It seems more likely that Node.js itself is applying some sort of synchronizing here. And it's totally possible that switching context across threads in Node.js is slower than switching the context to a different process.
Batching is an interesting idea I haven't explored enough. I'll try and see if that helps! My biggest problem at the time was that there weren't just enough big repos using flat config that I could test. Pooling is already in place. To be honest I haven't even tried to do without, as each thread needs to have its own copy of the config and (in the case of typescript-eslint) of the project metadata. Unless I'm overlooking something, loading a fresh copy of the data for each file in a new worker would be terribly slow. |
@faultyserver If you are still following this thread, could you please provide a link to the monorepo on which your fork of ESLint was tested - or to another repo where the speedup is similar? This would be helpful to compare results across different environments and setups. |
@fasttime unfortunately i cannot link to where i've tested, since those are our company's internal projects, but i can give some additional statistics and information that could help compare:
Running on my M2 Max, here are the results of running eslint on the full project, broken down by the type of run. All of these are run with eslint 8.57.0:
The main findings from this are:
|
Are you using the ProjectService setting in ts-eslint? ( Theoretically this would make things a bit lazier which especially helps when linting a subset of files, in addition to making monorepo use better as the same resource sharing done in tsserver can be done for linting. When linting TS itself, enabling the project service makes linting a single file with the CLI go from 3.78s on my machine down to 2.31s. Your mileage may vary; TS does need to do an initial project load no matter what. |
Yeah, we are using |
Well, I'm glad it's a huge help, at least 😅 |
@faultyserver thanks for sharing your parallel fork of ESlint, it sped our CI time up considerably! Do you have plans to update the fork with the upstream ESlint V9? |
@fasttime oh awesome, I hadn't seen this project before! |
@faultyserver I assume you're still maintaining your fork at discord? |
We still have that fork, but have stopped using it in favor of some separately-rolled parallelism infrastructure that also supports parallelizing Prettier, Stylelint, and more in similar ways. We don't plan to update it to v9 |
@faultyserver presumably that just means a tool that will take in a glob, split the glob into well balanced knapsacks, and then execute an arbitrary command on those knapsacks of paths in parallel? That's how you'd get a tool (eslint vs prettier) agnostic parallelization. I feel like I've tried that before but it can get hard because sometimes the argument lists get so long that Unix rejects them with |
No, it's the same parallel infrastructure as in the existing fork: a lazy iterator that dispatches jobs as batches of (by default) 50 files to a pool of workers, sending back results over IPC: https://github.com/discord/eslint/blob/e645c6a6a005e1a50ee1de0a3f70bdefa1c72530/lib/parallel/parallel-engine.js#L84-L109. Another approach we use elsewhere is dumping file lists into args files rather than directly on a command line, but the IPC approach is better for streaming on very-large file sets where you might want to end early if something fails. |
This is a discussion issue for adding ability to run eslint in parallel for multiple files.
The idea is that ESLint is mostly CPU bound, not IO bound, so creating multiple threads (for machine with multiple cores) might (and probably will) increase performance in a meaningful way. The downside is that currently ESLint's codebase is synchronous. So this would require rewriting everything up to and including eslint.js to be asynchronous, which would be a major effort.
I played with this a little while ago and found a few libraries for Node that handle thread pool, including detection of number of cores available on the machine.
And there are a ton of other libraries out there for this.
If anyone had any experience writing multithreaded applications for node.js and would like to suggest alternatives or comment on the above list, please feel free.
P.S. https://www.airpair.com/javascript/posts/which-async-javascript-libraries-should-i-use
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: