Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint multiple files in parallel [$500] #3565

Open
ilyavolodin opened this issue Aug 28, 2015 · 168 comments
Open

Lint multiple files in parallel [$500] #3565

ilyavolodin opened this issue Aug 28, 2015 · 168 comments
Assignees
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion bounty Someone has posted a bounty for this issue on BountySource cli Relates to ESLint's command-line interface feature This change adds a new feature to ESLint needs bikeshedding Minor details about this change need to be discussed needs design Important details about this change need to be discussed

Comments

@ilyavolodin
Copy link
Member

This is a discussion issue for adding ability to run eslint in parallel for multiple files.

The idea is that ESLint is mostly CPU bound, not IO bound, so creating multiple threads (for machine with multiple cores) might (and probably will) increase performance in a meaningful way. The downside is that currently ESLint's codebase is synchronous. So this would require rewriting everything up to and including eslint.js to be asynchronous, which would be a major effort.

I played with this a little while ago and found a few libraries for Node that handle thread pool, including detection of number of cores available on the machine.

  • Node-threads-a-gogo - seems pretty good, but looks dead.
  • nPool - seems actively in development, but has native components (C++)
  • Node WebWorkers - seems pretty dead too.
  • Parallel - seems dead, and no pool implementation.
  • Node Clusters - not stable yet, and probably isn't going to be available on Node v0.10
  • WebWorkers - seems that they are only implemented in io.js
    And there are a ton of other libraries out there for this.

If anyone had any experience writing multithreaded applications for node.js and would like to suggest alternatives or comment on the above list, please feel free.

P.S. https://www.airpair.com/javascript/posts/which-async-javascript-libraries-should-i-use

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@ilyavolodin ilyavolodin added the needs bikeshedding Minor details about this change need to be discussed label Aug 28, 2015
@ilyavolodin ilyavolodin self-assigned this Aug 28, 2015
@ilyavolodin ilyavolodin added this to the v2.0.0 milestone Aug 28, 2015
@nzakas nzakas mentioned this issue Aug 28, 2015
11 tasks
@BYK
Copy link
Member

BYK commented Aug 28, 2015

Nice. I'm very interested in trying this myself too.

@ilyavolodin
Copy link
Member Author

Another question that I had in my mind, if we need to rewrite everything to be async, should we use callback pattern? Promises? If so which library? Q or Bluebird? I personally would prefer promises to callback hell.

@IanVS
Copy link
Member

IanVS commented Aug 28, 2015

I vote for promises. Bluebird is fast, but makes me nervous because it adds methods to the native Promise, which is likely not a great idea. I think Q might be the best bet. There is also Asequence, but I have no personal experience with it.

@gyandeeps
Copy link
Member

Why not use built in promises? Just a question as I have no experience with promises yet.

@IanVS
Copy link
Member

IanVS commented Aug 28, 2015

They're not supported in node 0.10, to my knowledge.

Besides that, the libraries give some nice "sugar" methods when working with Promises.

@btmills
Copy link
Member

btmills commented Aug 28, 2015

I've had plenty of success using native promises (or a polyfill when native promises aren't supported). That seems like a good starting point to me; if we need more than they provide we could probably swap out something that's API-compatible.

@nzakas
Copy link
Member

nzakas commented Aug 28, 2015

I think we're putting the cart before the horse here. Let's hold off on promises vs. callbacks until we're at least ready to prototype. Get something working with callbacks and let's see how bad it is (or not).

@nzakas nzakas added cli Relates to ESLint's command-line interface accepted There is consensus among the team that this change meets the criteria for inclusion feature This change adds a new feature to ESLint labels Aug 28, 2015
@lo1tuma
Copy link
Member

lo1tuma commented Aug 28, 2015

The idea is that ESLint is mostly CPU bound, not IO bound

ESLint also does a lot of IO (directory traversal, reading source files). So I think we would also profit here if we rewrite eslint to do non-blocking IO.

@ilyavolodin
Copy link
Member Author

@lo1tuma I haven't profiled it yet, but in my mind, amount of IO we do is negligible comparing to amount of CPU cycles we eat. I will try to profile it and post results here if I will get anything meaningful.

@pnstickne
Copy link

Using something like NodeClusters - or most other per-process implementations - would avoid the issue of needing to [vastly] rewrite ESLint. (Such implementations are strictly not threading, but allow managed parallel process execution.)

It would mostly just need to IPC-in/out the current ESLint; ESLint parallelism would then be able to work freely over different files in a per-process manner, but it could not (without more rework) run concurrently over a single file.

Thus if the goal is to run ESLint over different files in parallel I would urge such a 'simple' per-process concurrency approach. If the goal is to make ESLint parallel across the same source/AST then .. that is a more complicated can of worms as it changes the 'divergence' point.

If there is a strict v0.10 node target for ESLint, maybe have this as an feature when running a compatible node version.

@mysticatea
Copy link
Member

My idea is:

  • The master process does queue control and collecting result.
  • Worker processes have a short queue (the length is about 2-4).
    A worker does:
    • Sends the result of the last file and requests a path of the next next next file to the master.
    • Reads asyncly the next next file.
    • Lints the next file.

There are source codes which has a variety of size, so I think the queue control in pull-style is important.

Library.... I think child_process.fork is enough.

@gyandeeps
Copy link
Member

Sorry for this question: Based on all the comments, do we think the reward of this functionality is worth the effort/changes/complications? (like I said just a question)
Or is this functionality too early (like google glass) to implement without any actual estimates or ask or whatever.

@pnstickne
Copy link

@gyandeeps My projects are not big enough, or computer slow enough, for me to really care either way.

In cases where there are sizable projects of many files, on computers with several cores and non-bound I/O, I imagine that it could lead to significantly reduced wall-clock, approaching Amdal's law. I would be less optimistic about this gain with fewer larger files, even with 'actual threading' or post-AST handling - but that is what performance profiles are for.

Of course another option is to only lint 'changed' files and provide some sort of result cache, but comes with additional data management burdens.

@ilyavolodin
Copy link
Member Author

@gyandeeps To answer your question - we do not have definitive information on this. Right now my assumption is that we are CPU-bound, not IO. In that case utilizing more cores should have significant impact on larger projects (as @pnstickne mention, impact will be small or there might even be negative impact on a few large files).
I think the first step of the process would be to prove or disprove my assumption on CPU vs IO. Granted, if it turns out I'm wrong and we are IO-bound, then changing ESLint code to async would improve performance anyways.

@pnstickne Thanks for the insights. I'm really not familiar with NodeClusters, just know that they exist and do not require everything to be async for them to act. We definitely not going to try to make ESLint run AST analysis in parallel, that's going to be a huge change, and a lot of our rules expect nodes to be reported in the right order, so we would not only need to rewrite pretty much the whole core, we would also need to rewrite a lot of rules.
I would be interested in learning more how we could incorporate code that would only execute on node 0.12 and not on earlier version. I'm not sure that's the approach I would like to take, but it's an interesting option never the less.

@ilyavolodin
Copy link
Member Author

So I did some very unscientific performance analysis on both Windows and OSX.
I specifically chose to lint a very large number of files (TodoMVC, 1597 files, 854 folders).
On Windows, results where very inconclusive. Basically my CPU usage was never over 15% (on 8 core machine, none of the cores were overtaxed at any point). But then my I/O never hit anything over 10MB/s on a SSD that's theoretically capable of 550 MB/s. So I have no idea why it didn't run any faster.
On OSX it never hit more then 15% CPU (I have no idea how many cores my macbook has, probably 8 too), but I/O was pretty taxed. It looked like there were a few spikes that were reaching 100% disk throughput. So maybe my assumption was wrong any we are not CPU bound?

@pnstickne
Copy link

@ilyavolodin Try this:

Start running 8 different eslint processes (over the same set of files should be fine, although it would be 'more fair' to break up the files into 8 different equally-sized sets).

Compare the wall-clock times it takes for 1 process to complete and 8 processes to complete.

The 8 processes will have done 8x the work (if using the same set of files for each process as for the single process) or the same amount of work (if having split the source files among them), but in how much x the time?

This very crudely should show an approximate gain - if any - for using multi-process concurrency.

@platinumazure
Copy link
Member

Late to the conversation, but... Is anyone opposed to someone starting up a pull request to implement the callback hell approach (i.e., make ESLint async across the board, including for all I/O operations)? Seems like it would make the eventual parallelization easier anyway.

@IanVS
Copy link
Member

IanVS commented Aug 30, 2015

Of course another option is to only lint 'changed' files and provide some sort of result cache, but comes with additional data management burdens.
-@pnstickne

This is essentially what ESLinter from @royriojas does.

@pnstickne
Copy link

@IanVS That's pretty cool .. now if only it was built-in to my eslint grunt task :}

(Okay, I could get it shimmed in pretty easy - but it'd still be nice to see a 'done package'.)

@ilyavolodin
Copy link
Member Author

@pnstickne #2998
@platinumazure I think I would like to first prove that there's something to gain by going async/multithreaded just so nobody would waist their time creating a very large PR that will then be closed, because there is no gain.

@platinumazure
Copy link
Member

@ilyavolodin That's fair enough. I'm wondering, though, would it be worth
creating a separate issue with the goal of making ESLint's I/O operations
asynchronous? Synchronous I/O is a bit of an anti-pattern in Node and it
might help us figure out just how much of a breaking change parallelization
would likely be.
On Aug 30, 2015 9:46 AM, "Ilya Volodin" [email protected] wrote:

@pnstickne https://github.com/pnstickne #2998
#2998
@platinumazure https://github.com/platinumazure I think I would like to
first prove that there's something to gain by going async/multithreaded
just so nobody would waist their time creating a very large PR that will
then be closed, because there is no gain.


Reply to this email directly or view it on GitHub
#3565 (comment).

@ilyavolodin
Copy link
Member Author

@platinumazure While sync code is node anti-pattern, in our case, we can't do anything with the code (as in parse it into AST) until we read the whole file. So if improving performance is not on the plate, then changing code to async will increase complexity of the code, but not gain us anything. It's worth testing out and I still think there's a performance gain that we can get by doing parallel execution, but I would like to get some proof of that first.

@lo1tuma
Copy link
Member

lo1tuma commented Aug 30, 2015

@ilyavolodin reading files asynchronously doesn’t mean you read them chunk-by-chunk. While you are waiting for one file to be read, you can lint a different file which has been read already.

@sam3k
Copy link
Contributor

sam3k commented Feb 23, 2023

Here is the TL;DR on the status of feature as of February 23, 2023:

  • "There is no timeline for parallel listing right now. We have a bunch of other core work that must be completed first. When that core work is done, we can evaluate where we are in terms of parallel linting" -- @nzakas
  • Some of the pre-requirements have been completed so we are moving in the right direction
  • There is still a $500 bounty for this
  • Official RFC
    • RFC has been closed. Per @nzakas: "With all of the internal changes, we are going to need to rethink how to move forward with this, so closing.". (Please see TSC summary below for more info)

TSC Summary

ESLint assumes that each rule and source file can be processed independently. typescript-eslint (ref #42 (comment)) and eslint-plugin-import (ref #42 (comment)) need to do upfront initialization work beyond the scope of a single rule and source file, specifically loading type information and tracing a module graph. Lacking a first-class API, they have inserted these initialization steps into the regular rule linting flow.

If we were to ship parallel linting without supporting this use case, the duplicated initialization could make parallel linting slower than single-threaded linting with these plugins. The large number of ESLint users who also use one of these plugins would not benefit from parallel linting.

This API would need to provide the plugin with the config and list of files to be linted, and parallel linting would need a way to share the result with workers.

Pre-requirements

One of the main reasons why we cannot proceed with this issue is that a lot of work needs to be done to the core APIs in order to even move forward. Some of the things discussed:

  • Moving cache helper utils into its own module. PR
  • CLIEngine was deprecated in Eslint v8 in favor of Eslint class
  • Eslint v7 public API (Provides async methods which will allows us to do parallelism)
  • FlatConfig needs to be completed

Alternatives Solutions in the Meantime

jest-runner-eslint

Considering looking jest-runner-eslint. Some folks have seen considerable performance improvements.

Our ESLint time has gone down from 13 minutes to 2 minutes

Esprint

Esprint could be another alternative in the meantime


Please let me know if I've missed anything and I'll update this comment.

@nzakas
Copy link
Member

nzakas commented Feb 27, 2023

Awesome work @sam3k!

We are leaving this open because we do still intend to work towards this.

@Rugvip
Copy link

Rugvip commented Mar 3, 2023

Just wanna chime in that we've implemented a solution based on running ESLint in worker threads in the Backstage CLI. It provided a massive speed increase as well, especially compared to if you were running lint tasks through Lerna.

@nzakas
Copy link
Member

nzakas commented Mar 7, 2023

@Rugvip nice! Thanks for sharing.

@Yankovsky
Copy link

FYI, you can achieve this using https://github.com/jest-community/jest-runner-eslint

Our ESLint time has gone down from 13 minutes to 2 minutes

Can you provide an example config? I can't make jest-runner-eslint work for some reason.

@stevenpetryk
Copy link

stevenpetryk commented Jun 9, 2023

A lot of discussion about parallel linting is taking place in #14139, because it will affect plugins who have a significant initialization step. Some salient points from that conversation:

  • Plugins would ideally have the ability to determine how files are split up across workers (for example, typescript-eslint would benefit from being able to shard by TypeScript project so that it's more likely that a single shard would entirely encapsulate one or more TypeScript projects).
  • Plugins would need some kind of shared state that they could establish in an initialization step.
  • Parallelization is likely not worth pursuing until these problems can be solved in a way that supports eslint-plugin-import and typescript-eslint, which are used by 60-70% of ESLint users already.

@nzakas
Copy link
Member

nzakas commented Jun 13, 2023

@stevenpetryk thanks for the summary! Very helpful to have all of this info in one place.

@faultyserver
Copy link

faultyserver commented Dec 2, 2023

Wanted to chime in here given the hype about speeding up tools.

tl;dr: ESLint can run in parallel with no issues, and the overhead of duplicated work is negligible: main...discord:eslint:parallel. This adds an opt-in --parallel flag to distribute work across multiple cores, but only if the user decides it's helpful.


We've been running ESLint on our monorepo in CI for a long time. As the codebase grew, ESLint started taking longer and longer to run. A few years ago, we had about 10k files, and ESLint would take ~3 minutes to run on CI. Then we wanted to add TypeScript rules to do some additional validation, but unfortunately adding those rules kicked up our ESLint run time to almost 8 minutes. Eventually, we split out the TS rules from all of the other ones and ran them as two separate jobs, which actually worked pretty well. Normal ESLint was back down to about 3 minutes, and the TypeScript job also took about 3 or 4 minutes to run just those rules.

Fastforward to about 2 months ago, and our codebase has continued to expand, now almost 15k files, well over half a million LOC. Normal ESLint now takes 5 minutes to run, and TypeScript rules are getting close to 6. It's pretty infeasible to run them locally at this point, and they're becoming the bottleneck of our CI pipeline. I wanted to do something about it, and toyed with splitting up ESLint jobs even more, but none of that worked too well. Most importantly, reporting on errors became a big hassle.

After some quick investigation, it became pretty clear that the time-consuming part of ESLint isn't the rules...at all. We run TypeScript rules, eslint-plugin-import, and a number of expensive in-house rules that are all fairly expensive to check, but when I looked at the runtime of things, it didn't really add up. The real kicker is just the number of files and size of the code being processed. Iterating 15k files and traversing their ASTs just takes a really long time to do. Period. The actual runtime of all of our rules was only about 12% of the total runtime of ESLint. The rest was spent just doing that iteration.

Taking inspiration from @mixer/parallel-prettier, I decided to just implement a worker model that sends batches of files to worker processes and then collects the results into a single report. Getting that to work with the CLI meant forking ESLint and building directly inside of the codebase, since the internals aren't directly exported for modern versions of Node, but after some initial setup and exploration, I ended up with something that seemed to work, gave all the right results, and printed them out in a single report at the end: main...discord:eslint:parallel. The best part of this is that there's no worry about incorrect configuration or missing files, because it's using all of the same ESLint internals, just dispatching the work to be done across multiple cores.

Then it was time to see just how much things improved.

Running locally on my M2 Max with 12 cores, the normal ESLint run would take about 95 seconds to run (the 3-5 minute numbers from earlier were on CI, not our local machines). With parallelization across all of those cores (11 workers), the total run time was reduced to...20 seconds. A huge return, more than a 75% speedup. Then I wanted to see how TypeScript was affected. As mentioned in this thread, the TS rules do a lot of work scanning the entire project, and it uses a lot of caching that would be nice to have persistent across everything, but how much does that affect the run? Well before, the job with just those rules would take about 80 seconds to run. In parallel...24 seconds. Basically the same speedup, despite not having as good of a cache to rely on.

Why? Because it really is just outweighed by the number of files being operated on, and the fact is that TS is itself a single-threaded workload. So all of that scanning work also ends up being run in parallel, and the only actual overhead is the slightly higher amount of time spent computing things that might have been cached already if they were all run on a single core. But even then, all of that work has to be done at some point on a single thread anyway.

Our CI times, for reference, have a very similar level of improvement. We use 6 core machines, and the runtime of normal ESLint has dropped from 4 minutes to just over 1 minute, and TS rules from 4 minutes down to just under 2 minutes. Definitely some relative overhead, but still a massive improvement.


All of this story is to say: I think the debate about how to design a good API for making parallel ESLint truly efficient is absolutely worthwhile. A way to cache work across multiple workers would be great, or to have some input on how files get batched to make it more efficient. But none of that is necessary to see huge, safe improvements in runtime just by distributed the work naively.

I'd love to upstream these changes into ESLint itself, both so that everyone else can benefit from them, but also so we don't have to maintain a fork lol. I think it's probably close to being able to be merged, sans a few stylistic things, and of course general review. I'm happy to make a PR to get that process started if it would be welcome.

@timocov
Copy link

timocov commented Dec 3, 2023

Then I wanted to see how TypeScript was affected. As mentioned in this thread, the TS rules do a lot of work scanning the entire project, and it uses a lot of caching that would be nice to have persistent across everything, but how much does that affect the run? Well before, the job with just those rules would take about 80 seconds to run. In parallel...24 seconds. Basically the same speedup, despite not having as good of a cache to rely on.

@faultyserver Does your typescript configuration include type-check rules as well?

@faultyserver
Copy link

@faultyserver Does your typescript configuration include type-check rules as well?

Yep! The TS job is specifically for the rules that require type checking. We left one or two non-checked ones like typescript-eslint/naming-convention in the default job since they run quick and are more likely to be encountered when people are writing code, then the TS job only runs in CI.

It would be great to be able to run them all locally with the LSP, but it just becomes too slow in our codebase (like more than a second delay between hitting save and the file actually saving, etc).

@fasttime
Copy link
Member

fasttime commented Dec 5, 2023

@faultyserver Thanks so much for sharing this work! It's very interesting to see an update with a functioning prototype in this long-standing discussion.

Since this is such a popular issue, I thought I'd also put my 2 cents in. I have a tool for parallelizing ESLint which I've only been able to test on a few projects so far, so it's probably more experimental than yours. For anyone interested, the repo is here: https://github.com/origin-1/eslint-p (just updated to ESLint v8.55).
It's not a tuned-up version of ESLint, but rather a separate package that wraps the CLI and other components (way too many additions to integrate into ESLint without design/RFC). An important difference is that it only runs on projects that use the flat config, not the legacy eslintrc config. This makes it somewhat complicated to compare the results against your solution. Also, I'm not using multiple child processes to split the workload but multiple worker threads.

The results look promising for large codebases, but not so good for smaller ones, especially when typescript-eslint is used. Clearly the overhead involved in setting up and synchronizing the workers can easily exceed the benefits of parallelization. Nonetheless I was able to lint the ESLint repo (JavaScript only) in just less than 8 seconds instead of around 16 seconds on an M1 machine with --concurrency=4 (50% speedup).

Maybe from a user perspective it would be best to autotune the process depending on the type of the project, the number of files to be linted, the environment, etc. without asking people to set an option (I like the way you set the concurrency to os.cpus().length - 1 by default). I would also love to hear some feedback from other users.

@faultyserver
Copy link

Great to know there are other attempts at this happening too!

Some quick feedback from what I've learned in our parallel implementation:

Also, I'm not using multiple child processes to split the workload but multiple worker threads.

The main reason for using processes rather than threads is to reduce contention on the process making syscalls to read the file system. Unix only gives so many accesses at a time to a process (citation needed, but i discovered that while doing this implementation), all the threads being in the same process means you're potentially hitting that bottleneck, and the single Node VM also has to wait on those syscalls. Distributing over processes means the VM has less contention. It can also avoid context switching within the VM when changing between threads, which also helped some.

Our implementation also lets each worker resolve the config as it goes, so there's no memory sharing necessary at all, making processes much easier to switch to.

The results look promising for large codebases, but not so good for smaller ones, especially when typescript-eslint is used. Clearly the overhead involved in setting up and synchronizing the workers can easily exceed the benefits of parallelization.

Dispatching each file individually between threads or between processes is going to slow down the parallelized version a lot. That's more time spent in serialization and doing IPC (or at least memory copies). Copying a batch of 50 file paths at once and getting the results in one go will be a lot faster than doing each one individually (2 messages vs 100). If each message takes 10ms to send, that's 10ms vs half a second spent communicating).

Batching also ensures that you only spin up new workers when they're actually useful. Fewer than 50 files? It won't even try to spin up more than 1 worker. Fewer than 200? Never gonna spawn more than 4. 50 seems to be around the boundary where the time to spawn a process becomes negligible, but that's still an arbitrary number. 100 in a batch could also work well, or 20. Depends a lot, but in reality it doesn't matter much.

But that leads to the biggest thing, which is having a pool of workers rather than spawning a new one every time. A new process will take a lot longer to spin up (granted it's less than you probably think, maybe 150ms at most on an M2), but even threads have an initialization cost. Keeping a pool of workers around and queueing them to receive messages will bring down the overhead a lot as well. Synchronizing is also pretty minimal if you keep track of "how busy" each worker is and dispatch to the least busy one. That's pretty cheap to do and helps keep the load balanced over time.

Nonetheless I was able to lint the ESLint repo (JavaScript only) in just less than 8 seconds instead of around 16 seconds on an M1 machine with --concurrency=4 (50% speedup).

With pooling and batching, even sticking with threads, I think this could get up to maybe 70% speedup with 4 threads, probably a little higher with processes, but not too much more, just because the overall time is relatively low, so initialization is a little more costly.

Maybe from a user perspective it would be best to autotune the process depending on the type of the project, the number of files to be linted, the environment, etc. without asking people to set an option.

For everything up to medium-large size projects (a few hundred files), and with batching, there would likely never be more than 4 or 5 workers going at a time. The only way to really know what's going to be the fastest for a given project, though, is to run it a few times and see which one gives the best result. It's so dependent on the number of files, but also the size of those files (and for typescript, the type-wise complexity). Keeping parallelism as opt-in entirely also ensures that small usecases that are super sensitive to initialization costs stay as fast as possible (i.e., my project with 20 files could lint in 60ms, less than the time needed to spawn a single worker. it's definitely not worth doingt then).

@fasttime
Copy link
Member

fasttime commented Dec 5, 2023

The main reason for using processes rather than threads is to reduce contention on the process making syscalls to read the file system. Unix only gives so many accesses at a time to a process (citation needed, but i discovered that while doing this implementation), all the threads being in the same process means you're potentially hitting that bottleneck, and the single Node VM also has to wait on those syscalls. Distributing over processes means the VM has less contention. It can also avoid context switching within the VM when changing between threads, which also helped some.

It's totally possible that the OS restricts the number of simultaneous I/O accesses for a process, I'm not sure though if the bottleneck limit would be hit already at a concurrency of 2 or 4. It seems more likely that Node.js itself is applying some sort of synchronizing here. And it's totally possible that switching context across threads in Node.js is slower than switching the context to a different process.

Dispatching each file individually between threads or between processes is going to slow down the parallelized version a lot. That's more time spent in serialization and doing IPC (or at least memory copies). Copying a batch of 50 file paths at once and getting the results in one go will be a lot faster than doing each one individually (2 messages vs 100). If each message takes 10ms to send, that's 10ms vs half a second spent communicating).

Batching also ensures that you only spin up new workers when they're actually useful. Fewer than 50 files? It won't even try to spin up more than 1 worker. Fewer than 200? Never gonna spawn more than 4. 50 seems to be around the boundary where the time to spawn a process becomes negligible, but that's still an arbitrary number. 100 in a batch could also work well, or 20. Depends a lot, but in reality it doesn't matter much.

But that leads to the biggest thing, which is having a pool of workers rather than spawning a new one every time. A new process will take a lot longer to spin up (granted it's less than you probably think, maybe 150ms at most on an M2), but even threads have an initialization cost. Keeping a pool of workers around and queueing them to receive messages will bring down the overhead a lot as well. Synchronizing is also pretty minimal if you keep track of "how busy" each worker is and dispatch to the least busy one. That's pretty cheap to do and helps keep the load balanced over time.

Batching is an interesting idea I haven't explored enough. I'll try and see if that helps! My biggest problem at the time was that there weren't just enough big repos using flat config that I could test. Pooling is already in place. To be honest I haven't even tried to do without, as each thread needs to have its own copy of the config and (in the case of typescript-eslint) of the project metadata. Unless I'm overlooking something, loading a fresh copy of the data for each file in a new worker would be terribly slow.

@fasttime
Copy link
Member

@faultyserver If you are still following this thread, could you please provide a link to the monorepo on which your fork of ESLint was tested - or to another repo where the speedup is similar? This would be helpful to compare results across different environments and setups.

@faultyserver
Copy link

@fasttime unfortunately i cannot link to where i've tested, since those are our company's internal projects, but i can give some additional statistics and information that could help compare:

  • Roughly 14k files.
    • 13k+ TSX
    • ~500 plain JS
  • 1.5M lines, 54M chars (both including whitespace, comments, etc.).
  • ~180 lint rules enabled, 16 of which are typescript-eslint rules using type information from tsc
  • Test runs are lints for the entire project, not individual files (i.e., starting worker processes to lint just 50 files is a slowdown not a speedup).
  • TypeScript on our entire project takes ~55s to check. (this isn't directly how typescript-eslint works, but can give some context on the complexity of types involved in our code).

Running on my M2 Max, here are the results of running eslint on the full project, broken down by the type of run. All of these are run with eslint 8.57.0:

run type default eslint 4 workers 6 workers (no thread contention) 8 workers (all p-cores) 11 workers (all cores + supervisor process)
no rules (just file parsing) 64s 18.5s 14s 13s 12.7s
no typescript rules 92s 26s 21.5s 19s 20.5s
only typescript rules 81s 45s 41s 43s 55s
all rules enabled 144s 72s 65s 60s 72s

The main findings from this are:

  • with this form of parallelization, typescript rules suffer from high memory usage, and during the initial start where almost everything is cpu bound, you'll hit thread contention well below the number of available cores on the machine (see 41s vs 55s for 6 vs 11 workers there, and 72s vs 60s for 11 vs 8 on all rules).
  • File iteration and parsing speed quickly hits a limit of worker efficacy, again because it's cpu bound, and using more e-cores won't provide as much of a benefit as even a single extra p-core.
  • non-ts rule execution time is less than 40% of the total run time. the majority of the time is just spent traversing the AST and applying configurations
  • i believe the reason default eslint + all rules enabled is so slow is because it's strictly bottlenecked by a big initial typecheck, and can only start processing more files after that initial one is done. the same is true for the others as well (they spend ~30s just doing typechecks on their first batch, then run through all of the remaining files in the rest of that time), but they are able to spread some of the memory load because of how the batching functions, and just process more quickly in general once that memory is built up.

@jakebailey
Copy link

i believe the reason default eslint + all rules enabled is so slow is because it's strictly bottlenecked by a big initial typecheck, and can only start processing more files after that initial one is done.

Are you using the ProjectService setting in ts-eslint? ("EXPERIMENTAL_useProjectService": true in ts-eslint 7, "projectService": true in the upcoming ts-eslint 8).

Theoretically this would make things a bit lazier which especially helps when linting a subset of files, in addition to making monorepo use better as the same resource sharing done in tsserver can be done for linting. When linting TS itself, enabling the project service makes linting a single file with the CLI go from 3.78s on my machine down to 2.31s. Your mileage may vary; TS does need to do an initial project load no matter what.

@faultyserver
Copy link

Are you using the ProjectService setting in ts-eslint?

Yeah, we are using projectService in our config, and it's a huge help, but unfortunately our project is a single monolith with.....many, many, interconnected dependencies. I spent quite a while trying to tease out and visualize some dependency graphs last year, and there are very few non-utility files in our app that don't inevitably require about 7000 files to be typechecked to have a full type resolution for a single parent. With services enabled and using tsserver in editors, that can still bring type checking for a single file down to less than 700ms p95, but as you mentioned, the bear is that initial project load and building the type memory to even do that first file's checks, which takes tsserver a solid 30 seconds to do both in eslint and in editors and so on.

@jakebailey
Copy link

Well, I'm glad it's a huge help, at least 😅

@okize
Copy link

okize commented Oct 8, 2024

@faultyserver thanks for sharing your parallel fork of ESlint, it sped our CI time up considerably! Do you have plans to update the fork with the upstream ESlint V9?

@fasttime
Copy link
Member

@okize If it helps, eslint-p uses ESLint v9.9.0. I'll update it to the latest version when I have a chance.

@okize
Copy link

okize commented Oct 11, 2024

@fasttime oh awesome, I hadn't seen this project before!

@codepunkt
Copy link

@faultyserver I assume you're still maintaining your fork at discord?

@faultyserver
Copy link

@faultyserver I assume you're still maintaining your fork at discord?

We still have that fork, but have stopped using it in favor of some separately-rolled parallelism infrastructure that also supports parallelizing Prettier, Stylelint, and more in similar ways. We don't plan to update it to v9

@maschwenk
Copy link

maschwenk commented Nov 1, 2024

@faultyserver presumably that just means a tool that will take in a glob, split the glob into well balanced knapsacks, and then execute an arbitrary command on those knapsacks of paths in parallel? That's how you'd get a tool (eslint vs prettier) agnostic parallelization.

I feel like I've tried that before but it can get hard because sometimes the argument lists get so long that Unix rejects them with ARG TOO LONG or whatever that message is.

@faultyserver
Copy link

@faultyserver presumably that just means a tool that will take in a glob, split the glob into well balanced knapsacks, and then execute an arbitrary command on those knapsacks of paths?

No, it's the same parallel infrastructure as in the existing fork: a lazy iterator that dispatches jobs as batches of (by default) 50 files to a pool of workers, sending back results over IPC: https://github.com/discord/eslint/blob/e645c6a6a005e1a50ee1de0a3f70bdefa1c72530/lib/parallel/parallel-engine.js#L84-L109.

Another approach we use elsewhere is dumping file lists into args files rather than directly on a command line, but the IPC approach is better for streaming on very-large file sets where you might want to end early if something fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion bounty Someone has posted a bounty for this issue on BountySource cli Relates to ESLint's command-line interface feature This change adds a new feature to ESLint needs bikeshedding Minor details about this change need to be discussed needs design Important details about this change need to be discussed
Projects
Status: Waiting for RFC
Development

Successfully merging a pull request may close this issue.