New: added draft for async/parallel proposal #4

Aghassi · 2018-12-03T01:39:21Z

Summary

CLIEngine currently doesn't support any async functions. As such, large codebases pay a toll since it requires users to wait for each file ( time O(n) ) to be processed by ESLint. There's a large gain to be made if ESLint can offer parallelized tasks so that users can more quickly statically analyze their code.

Related Issues

PR - eslint/eslint#10606
Google Group RFC Discussion - https://groups.google.com/forum/m/#!topic/eslint/di7l6__w2jk
Bounty Discussion - https://www.bountysource.com/issues/26284182-lint-multiple-files-in-parallel

jsf-clabot · 2018-12-03T01:39:22Z

All committers have signed the CLA.

not-an-aardvark

Thanks for the RFC! I've left a few initial comments -- I think it would be good if you could go into more detail on what design you're proposing.

not-an-aardvark · 2018-12-03T01:48:27Z

designs/2018-async-commands/README.md

+Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points.
+
+1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine).
+2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood.


I'm confused by this point -- to me it seems like the use of promises and threads is orthogonal given that they don't address the same problem. A Promise is a value representing the result of an asynchronous operation, whereas multithreading is a mechanism to make computation faster.

Some of the concerns were that promises would not be faster per say depending on the implementation.

My naive take (I haven't spent any time looking into this yet) is that if we can figure out a way to have each promise represent not just the linting but also the file I/O operations that this would still be a win.

@not-an-aardvark to my understanding, promises are just a fancy way of dealing with callbacks. So, under the hood, they could still be blocking the main thread if the callback that is being invoked is synchronous in the nature (the promise just ends up being a convenience to the consumer, but adds no benefit to them). To @kaicataldo's point, if the promise wraps a call that is non blocking on the main thread (because it too is promise based and async), then that's the best. The thread is a layer I see added on top of an already async system. To be 💯 open, I've not done much work with workers/threads, so I don't know how effective it would be.

I see, thanks for clarifying

I agree that this statement is confusing. The choice isn't really between threads (probably you mean processes in this context) or promises, as these accomplish two different things. At some point in the API there will need be promise support, which allows the JS engine to continue operating while waiting for some result to be returned.

I think it's safe to assume that there will need to be processes that somehow communicate back to the main process, likely via promises.

not-an-aardvark · 2018-12-03T01:49:28Z

designs/2018-async-commands/README.md

+
+Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points.
+
+1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine).


Personally, I don't think putting parallelism behind a flag should be a requirement provided that compatibility is preserved -- it seems like an implementation detail that users shouldn't have to worry about. (However, based on previous discussions there may be others on the team who disagree with me about this.)

I agree with this. I don't really see any value in keeping the ability to do things sync still. That being said, I do think this should go out in a major release.

I'm fine with it being in a major release. Plus, anyone who needs sync functionality can peg to an older version and move on their own time. I don't see a value in keeping things synchronous either unless there is a really good reason.

I would like to see this put behind some kind of flag for at least one release. My concern is that without a way to opt-in/opt-out, it will be difficult to determine if a certain class of bugs is in any way related to the parallelism or not. I wouldn't be comfortable shipping this as the default behavior until we are clear that we've worked out the bugs and gotten feedback from the community. At that point, I'd still want an opt-out flag to help with debugging.

not-an-aardvark · 2018-12-03T01:52:31Z

designs/2018-async-commands/README.md

+
+1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine).
+2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood.
+3. The experience should be opt in so as not to break existing users.


I agree on the need to not break existing users, but I'm not sure being opt-in is a requirement for this (see my comment on the first list item).

And I disagree for the reasons stated in my previous comment. :)

not-an-aardvark · 2018-12-03T02:07:17Z

designs/2018-async-commands/README.md

+
+## Detailed Design
+
+Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points.


This is a good list of points to consider when designing a solution -- I've left some feedback below on the points. However, the purpose of this section (and the RFC process more generally) is to describe a proposed design in significantly more detail, rather than just giving a summary of the problem and the points being considered. (See the design section of this work-in-progress RFC for an example.)

Some questions I have after reading this include:

What specific changes to the CLIEngine API are being proposed? If there are new methods being added, what do they do?

Does this design change the rule API? Before reading this RFC, my expectation for how parallelism would work was that the rule API would remain the same, and parallelism would happen on a per-file basis independently from rules. However, point 4 below mentions that some rules might need to identify themselves as a particular type, and I'm confused about why this would be necessary.

Thanks for the referral. I'll make adjustments. This is my first RFC so I appreciate the feedback 😅 !

kaicataldo · 2018-12-03T02:19:58Z

designs/2018-async-commands/README.md

+1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine).
+2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood.
+3. The experience should be opt in so as not to break existing users.
+4. There should be some sort of mechanism to handle eslint rules that expect `sync` reading. Rules may need to identify themselves as a "type". This way, if a rule is being loaded that requires `sync`, the code can handle it accordingly. Any rule that doesn't identify as a `sync` or `async` would be defaulted to `sync`.


Do you have any examples that demonstrate why this is necessary? I'm not sure I follow.

I was pulling this idea from this comment eslint/eslint#10606 (comment)

I honestly don't know the inner workings of a rule set, so I don't know that rules can necessarily be sync or async by nature.

Gotcha. Currently, those visitor callbacks are synchronous (see eslint/eslint#10606 (comment) and eslint/eslint#10606 (comment)).

I see. So for now I would say this part of the RFC is either out of scope or unnecessary for the first pass at this idea. Ideally, rules should be independent of execution anyway.

Agreed. There is no such thing as an async rule, and even if there was, that should be handled by something lower in the API stack. I think it's safe to remove this reference.

kaicataldo · 2018-12-03T02:23:22Z

designs/2018-async-commands/README.md

+
+## Documentation
+
+1. There should be JSDoc code for sure. In code documentation is always the best.


We currently enforce this in the codebase, so this will happen whether it's in this proposal or not!

I also think we'll want to document any changes that occur to any public APIs as well as possibly do a little write-up of how ESLint does async processing.

The spots that I believe would have to be updated would be:

Node.js API

Command Line Interface (assuming a flag)

kaicataldo · 2018-12-03T02:24:57Z

designs/2018-async-commands/README.md

+
+1. More to maintain for the maintainers
+2. Threading and async problems are hard, and can require considerable effort (even with other projects out there doing it).
+3. This may expose unintended edge cases for eslint rules that expect synchronous scanning. There will need to be a shim mechanism for sure.


Following up on my question above, I can't think of any rules that do this off the top of my head. If they do exist, perhaps we should alter them to not be affected by this (and make that a requirement for rules going forward).

Agreed. Each rule is very self-contained so I don't foresee any drawbacks along those lines.

Some other drawbacks I can see:

How will this interact with the --debug flag? Right now we recommend people use that to help figure out problems, and presumably we would lose some or all of that ability.

Testing the multi-process functionality seems like it would be difficult. Travis uses a 2-core CPU for each VM, so we do have some coverage. It still seems like it would be difficult to determine if this is working automatically.

nzakas

Thanks for getting this started! As @not-an-aardvark mentioned, I think what we're missing here is a design of how this could work. What I'm looking for, specifically, is:

Which parts of the API will change and in what way?
What is the mechanism by which the main process will communicate with subprocesses?
What functionality happens before spinning up subprocesses?
What is each subprocess responsible for?
How are results from the various subprocesses combined and presented?
What happens if a subprocess dies before completing its work?

nzakas · 2018-12-04T17:56:39Z

designs/2018-async-commands/README.md

+
+## Summary
+
+`CLIEngine` currently doesn't support any async functions. As such, large codebases pay a toll since it requires users to wait for each file ( time O(n) ) to be processed by ESLint. There's a large gain to be made if ESLint can offer parallelized tasks so that users can more quickly statically analyze their code.


It seems like the goal of the RFC is broader than this describes. The goal is really to allow ESLint as a whole to lint files in parallel. The changes to CLIEngine are an implementation detail, and I would expect changes to cli to also be included. I'd suggest reframing the summary of this RFC to make the high-level goal clear while explaining some of the implementation changes you're envisioning.

nzakas · 2018-12-04T17:58:26Z

designs/2018-async-commands/README.md

+
+Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points.
+
+1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine).


I would like to see this put behind some kind of flag for at least one release. My concern is that without a way to opt-in/opt-out, it will be difficult to determine if a certain class of bugs is in any way related to the parallelism or not. I wouldn't be comfortable shipping this as the default behavior until we are clear that we've worked out the bugs and gotten feedback from the community. At that point, I'd still want an opt-out flag to help with debugging.

nzakas · 2018-12-04T18:02:25Z

designs/2018-async-commands/README.md

+Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points.
+
+1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine).
+2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood.


I agree that this statement is confusing. The choice isn't really between threads (probably you mean processes in this context) or promises, as these accomplish two different things. At some point in the API there will need be promise support, which allows the JS engine to continue operating while waiting for some result to be returned.

I think it's safe to assume that there will need to be processes that somehow communicate back to the main process, likely via promises.

nzakas · 2018-12-04T18:03:34Z

designs/2018-async-commands/README.md

+
+1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine).
+2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood.
+3. The experience should be opt in so as not to break existing users.


And I disagree for the reasons stated in my previous comment. :)

nzakas · 2018-12-04T18:04:43Z

designs/2018-async-commands/README.md

+1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine).
+2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood.
+3. The experience should be opt in so as not to break existing users.
+4. There should be some sort of mechanism to handle eslint rules that expect `sync` reading. Rules may need to identify themselves as a "type". This way, if a rule is being loaded that requires `sync`, the code can handle it accordingly. Any rule that doesn't identify as a `sync` or `async` would be defaulted to `sync`.


Agreed. There is no such thing as an async rule, and even if there was, that should be handled by something lower in the API stack. I think it's safe to remove this reference.

nzakas · 2018-12-04T18:05:54Z

designs/2018-async-commands/README.md

+
+## Documentation
+
+1. There should be JSDoc code for sure. In code documentation is always the best.


The spots that I believe would have to be updated would be:

Node.js API

Command Line Interface (assuming a flag)

nzakas · 2018-12-04T18:09:21Z

designs/2018-async-commands/README.md

+
+1. More to maintain for the maintainers
+2. Threading and async problems are hard, and can require considerable effort (even with other projects out there doing it).
+3. This may expose unintended edge cases for eslint rules that expect synchronous scanning. There will need to be a shim mechanism for sure.


Agreed. Each rule is very self-contained so I don't foresee any drawbacks along those lines.

Some other drawbacks I can see:

How will this interact with the --debug flag? Right now we recommend people use that to help figure out problems, and presumably we would lose some or all of that ability.

Testing the multi-process functionality seems like it would be difficult. Travis uses a 2-core CPU for each VM, so we do have some coverage. It still seems like it would be difficult to determine if this is working automatically.

nzakas · 2018-12-04T18:09:52Z

designs/2018-async-commands/README.md

+    Are you able to implement this RFC on your own? If not, what kind
+    of help would you need from the team?
+-->
+I would like help from the core team with this effort as I do work full time and even this has been a few months of work in my free time outside of work.


What type of help are you talking about? Editing the RFC? Implementing the change?

Implementing. I tend to only have weekends as time when I can work on side projects. Otherwise, I don't mind taking a few months on weekends working on this 😄. It will be a lot of code reviews for sure is all.

Aghassi · 2018-12-08T16:56:32Z

I'm going to push some changes I've made, but I've not flushed out all the details yet. Please bear with me while I make those changes 😄 .

talked to outputFixes and the constructor

outlined use of executeOnFiles

Aghassi · 2018-12-08T17:16:42Z

I believe this can be reviewed again. I don't expect this review round to be the end, I'm just looking for feedback to understand if I am approaching the RFC correctly (as previously stated, this is my first). Please let me know if there is anything I can expand on or clarify.

nzakas · 2018-12-11T17:54:42Z

Thanks David, we will take another look. And no worries, this process is new for us, as well. You're helping us figure out if we're doing things well just by volunteering this RFC.

…

On Sat, Dec 8, 2018 at 9:16 AM David ***@***.***> wrote: I believe this can be reviewed again. I don't expect this review round to be the end, I'm just looking for feedback to understand if I am approaching the RFC correctly (as previously stated, this is my first). Please let me know if there is anything I can expand on or clarify. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACWkhOuxurVZIh9-cWLWdTebv1O7r9Vks5u2_P6gaJpZM4Y9u4y> .

--

______________________________ Nicholas C. Zakas @SlickNet Author, Principles of Object-Oriented JavaScript <http://amzn.to/29Pmfrm> Author, Understanding ECMAScript 6 <http://amzn.to/29K1mIy>

nzakas · 2018-12-13T17:55:12Z

Thanks for the updated content. A few thoughts/questions:

I'm not sure if it makes sense to start going through methods and creating asynchronous versions of all synchronous operations. It seems like figuring out which synchronous methods should be converted would be something that comes after a higher-level concept of this would work. By higher-level concept, what I mean is we should first be looking at what happens when someone runs ESLint on the command line before we worry about which APIs to change. For example, this is what currently happens when you run ESLint on the command line:

Determine list of files to lint
For each file:
- Check the cache to see if the file has changed
- Determine the correct configuration for the file
- Lint the file
- Optionally apply fixes to the file
- Optionally write fixes to the disk
Output lint results to console

What I'm interested in is where in this process do we fork off another process to continue work? On the surface, it seems like generating the list of files to lint should work as it does today, and then we should be forking the process such that the forks are responsible for what happens to each file (maybe completely, maybe not). jscodeshift, for example, does something similar. It figures out the list of files to operate on and then forks the process. It then sends batches of 50 files to each process until the entire group of files has been operated on.

I think figuring out a high-level description of how this would work will then inform what kind of API changes should be made.

Aghassi · 2018-12-15T05:37:24Z

@nzakas Thanks. I now understand what you mean. Let me do some digging on how others, like jscodeshift are doing it and see what is viable. I'll post back here in a comment first discussing the idea and then proceed to update the RFC as necessary once we agree on things 😄

Aghassi · 2018-12-23T00:13:27Z

@nzakas I've done some more reading on the jscodeshitft implementation. Comments are sparse, so I don't know how they picked their CHUNK_SIZE variable. However, the general flow is:

Get all the files to be processed (via a promise)
Determine the threshold of processes they can have running at once. This is done by looking at the CPU count on the given system, and taking Math.min of the files relative to the CPU (which means in most cases this will come out to be 4 assuming a quad core processor). If someone passes runInBand, it is single threaded by default.
Determine how many files can go to each process. Take some arbitrary CHUNK_SIZE (50 in this case) and apply it relative to how efficiently you can split the number of files across the processes provided. If we have 200 files for example and 4 processors, the math works out to ((200/4)/50) = 1.
Send the batched files off to a worker that has a predefined set of functions (in the case of ESLint, this would be the bit you specified). Based on the messages the workers send act accordingly. In the case of ESLint, this gets to the point we mentioned earlier about keeping track of failures. jscodeshift reports issues to the console. One thing we need to determine is whether or not a failure to lint, or even write/fix, is grounds for ending the process. I'd prefer if ESLint failed gracefully and just said "unable to lint/fix files:" and provided output.
Once the workers are done, provide output to the user.

In general, I like the flow of how they do it. I'm not familiar with how something like jest or test runners do this (because they too enumerate files to use). While I did some digging, I gotta say finding well documented code in some of these repos is a real shot in the dark 😅 .

So with regard to the example you gave, I would make these suggestions:

Determine list of files to lint
Determine processors available
Determine batch size <-- I'm debating here if it should be a batch, or if by default the number of CPUs available is the number of files we can handle simultaneously. Does it make sense to have 50 files go to a worker only to have those 50 be process synchronously in a thread (assuming 4 workers, this is 200 files divided over 4 processes)? Or does it make sense to have 4 files go to 4 workers, each processed simultaneously and keep feeding single files to workers as they become free?
Send each batch to a worker
For each ~~file~~ batch sent to a worker:
- Check the cache to see if the file has changed
- Determine the correct configuration for the file
- Lint the file
- Optionally apply fixes to the file
- Optionally write fixes to the disk
- Send a message back to the parent process
Gather messages
Output lint results to console

nzakas · 2018-12-27T20:22:15Z

I think that seems like a good high-level description. The batch size is, most likely, something we'd need to tinker with. I'm guessing that there is some overhead involved with passing messages between the parent and child process, maybe just in the form of a slight delay, such that sending one file at a time to a worker might result in longer execution time than sending batches. So, I'd put in the RFC that the final batch size recommendation would have to come as a result of doing some testing with the implementation.

…

On Sat, Dec 22, 2018 at 4:13 PM David ***@***.***> wrote: @nzakas <https://github.com/nzakas> I've done some more reading on the jscodeshitft implementation <https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L224-L227>. Comments are sparse, so I don't know how they picked their CHUNK_SIZE variable. However, the general flow is: 1. Get all the files to be processed (via a promise <https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L216> ) 2. Determine the threshold of processes <https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L224> they can have running at once. This is done by looking at the CPU count on the given system, and taking Math.min of the files relative to the CPU (which means in most cases this will come out to be 4 assuming a quad core processor). If someone passes runInBand, it is single threaded by default. 3. Determine how many files can go to each process <https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L225>. Take some arbitrary CHUNK_SIZE (50 in this case) and apply it relative to how efficiently you can split the number of files across the processes provided. If we have 200 files for example and 4 processors, the math works out to ((200/4)/50) = 1. 4. Send the batched files off to a worker that has a predefined set of functions (in the case of ESLint, this would be the bit you specified). Based on the messages the workers send <https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L270-L285> act accordingly. In the case of ESLint, this gets to the point we mentioned earlier about keeping track of failures. jscodeshift reports issues to the console <https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L66>. One thing we need to determine is whether or not a failure to lint, or even write/fix, is grounds for ending the process. I'd prefer if ESLint failed gracefully and just said "unable to lint/fix files:" and provided output. 5. Once the workers are done, provide output to the user <https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L292> . In general, I like the flow of how they do it. I'm not familiar with how something like jest or test runners do this (because they too enumerate files to use). While I did some digging, I gotta say finding well documented code in some of these repos is a real shot in the dark 😅 . So with regard to the example you gave, I would make these suggestions: - Determine list of files to lint - *Determine processors available* - *Determine batch size* <-- I'm debating here if it should be a batch, or if by default the number of CPUs available is the number of files we can handle simultaneously. Does it make sense to have 50 files go to a worker only to have those 50 be process synchronously in a thread (assuming 4 workers, this is 200 files divided over 4 processes)? Or does it make sense to have 4 files go to 4 workers, each processed simultaneously and keep feeding single files to workers as they become free? - *Send each batch to a worker* - For each file *batch sent to a worker*: - Check the cache to see if the file has changed - Determine the correct configuration for the file - Lint the file - Optionally apply fixes to the file - Optionally write fixes to the disk - *Send a message back to the parent process* - Gather messages - Output lint results to console — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACWkvQiD96TAElJWk9z360W_ZY4XhgWks5u7sqngaJpZM4Y9u4y> .

--

______________________________ Nicholas C. Zakas @SlickNet Author, Principles of Object-Oriented JavaScript <http://amzn.to/29Pmfrm> Author, Understanding ECMAScript 6 <http://amzn.to/29K1mIy>

Add detailed outline.

Aghassi · 2018-12-29T16:39:48Z

@nzakas Thanks! I've updated the RFC with what I said in that comment, and noted the bit about determining chunk size.

nzakas

Thanks, adding the batch information is helpful.

I think we still need more detail on how all of this will work. The goal of an RFC is to provide enough detail such that someone other than the RFC author would be able to reasonably implement the RFC even if the RFC author was unavailable to provide help any context. (It might be helpful to review #3 to see the process and detail we used to merge our first RFC.)

Some open questions in my mind about this proposal are:

How does the top-down architecture of the CLI app change? bin/eslint.js is the entrypoint and then that calls lib/cli.js. It seems like lib/cli.js's execute() method would need to change to return a promise so that it can call some async function on CLIEngine.
After the files to lint are gathered, how is that list managed? I'm assuming some sort of a queue, but it would be good to be specific.
Can this be reasonably implemented in Node.js 6 (without JavaScript async functions) or would we need to drop Node.js 6 support because async functions are the only sane way to do this?
The previously mentioned handling of configuration files and configuration caches needs to be thought out.
How would all of this work with the --cache option? Maybe it would just work without changes, but I think that's something we need to be explicit about.

Thanks again for all of your work and the thought you've put into this. I think this process is really helping to tease out some of the questions we hadn't thought too much about prior.

nzakas · 2019-01-10T17:35:31Z

designs/2018-async-commands/README.md

+- _Send each batch to a worker_
+- For each  _batch sent to a worker_:
+  - Check the cache to see if the file has changed
+  - Determine the correct configuration for the file


I think this might be one of those hidden tricky steps. Right now, we cache configuration files for faster lookup later. In the case of a worker, does that mean each worker would have its own configuration cache? (I think that would be fine, just something we should think about.)

Also, in determining the batches being sent to workers, should there be any consideration to batching together files in the same directory so each worker doesn't have to recalculate configuration info?

Aghassi · 2019-01-26T16:22:09Z

I haven't forgotten about this, I've been a little busy. I'm going to be taking a look at it this weekend hopefully. Just wanted to put that out there so it didn't seem like I ghosted this.

nzakas · 2019-01-29T16:58:10Z

Thanks for the heads up! And no worries, the RFC process tends to take a while anyway, so there is no time pressure.

…

On Sat, Jan 26, 2019 at 8:22 AM David ***@***.***> wrote: I haven't forgotten about this, I've been a little busy. I'm going to be taking a look at it this weekend hopefully. Just wanted to put that out there so it didn't seem like I ghosted this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACWkvmIHQGu7CbawP4DtvDTBFsmK8Yhks5vHICxgaJpZM4Y9u4y> .

--

______________________________ Nicholas C. Zakas @SlickNet Author, Principles of Object-Oriented JavaScript <http://amzn.to/29Pmfrm> Author, Understanding ECMAScript 6 <http://amzn.to/29K1mIy>

Aghassi · 2019-03-05T05:33:35Z

Just for some discussion before I put it into the RFC (just getting around to this now, I know it's been a while).

How does the top-down architecture of the CLI app change? bin/eslint.js is the entrypoint and then that calls lib/cli.js. It seems like lib/cli.js's execute() method would need to change to return a promise so that it can call some async function on CLIEngine.

execute could remain the way it is today, but have a flag to opt into the async functionality. This way it isn't default while the specifics are worked out. I think not renaming execute is critical so that we don't break existing assumptions of consumers. Ideally we would have a one major transition period ( so we could work on this through v6.x, and make it standard in v7). I would also expect that we start to construct a version of CLIEngine called CLIEngineAsync or something similar so that we don't touch the current logic. Would give us a way to take a fresh look at how to implement this functionality, and what to bring forward from the existing implementation.

After the files to lint are gathered, how is that list managed? I'm assuming some sort of a queue, but it would be good to be specific.

I believe a FIFO queue is fine for this. As far as I can tell, order should not matter.

Can this be reasonably implemented in Node.js 6 (without JavaScript async functions) or would we need to drop Node.js 6 support because async functions are the only sane way to do this?

I think if we are looking for backwards compatibility, something like bluebird or rsvp would be necessary. However, as stated above, if we give a major version to let this bake behind a flag, we can look to deprecate Node 6 support in two versions. Ideally, 8 is LTS now, and 10 is soon to be LTS. I know tooling doesn't move as fast Node version wise, but it wouldn't be impossible to justify it 😄.

Alternatively, we could publish to a beta channel with the intent to just deprecate Node 6 all together and avoid the interim period. I leave that decision to the maintainers. I would prefer to write things with async/await out of the gate if possible.

The previously mentioned handling of configuration files and configuration caches needs to be thought out.

I think loading of configuration files and caches, for a first iteration, can remain synchronous. Most of the speed benefit will come from the batch processing of all the files etc as mentioned in the spec. Thus, we can leverage the existing code that exists today. Warm up times would still be a little slow, but that can be improved once the processing changes are solid.

How would all of this work with the --cache option? Maybe it would just work without changes, but I think that's something we need to be explicit about.

As far as I know, it should remain the same since I believe .eslintcache just keeps a list of file paths to quickly load when ESLint is warming up. If we keep it as is today, shouldn't change. We are only aiming to change how the files are processed.

@nzakas If the above is acceptable, I can update the RFC with it. If you have more questions, we can hash them out in the comments and then I will update the RFC once we have consensus 😄 .

EDIT: Do we wish to carry on with this if #11 is in progress?

ljharb · 2019-03-05T08:07:27Z

async/await is just promises; there's zero reason for eslint to drop anything to use it. Consumers can always choose to use async functions as long as eslint wraps return values in Promise.resolve, which works as far back as node 0.12.

not-an-aardvark · 2019-03-05T08:16:10Z

@Aghassi Thanks for continuing to work on this. A few notes:

cli.execute is not exposed in the public API, so there's not a problem with changing it. (However, we do have to preserve the behavior of CLIEngine#executeOnFiles.)
Re. async functions: I largely agree with @ljharb's comment in New: added draft for async/parallel proposal #4 (comment) here. However, the point is moot because we're planning to drop support for Node 6 anyway in ESLint 6, which will probably be out in the next few months. For the purposes of this RFC I think it's fine to assume that we can use async functions.
.eslintcache files store a set of linted files, file modification times, and linting results for those files. When linting with a cache, ESLint checks whether a file has changed since it appears in the cache. If the file hasn't changed, it skips linting and just uses the results from the cache.

I think I agree with your conclusion that the caching logic doesn't need to change, provided that we're only parallelizing what happens after we figure out the list of files to lint.

Aghassi · 2019-03-25T05:25:45Z

@nzakas @not-an-aardvark anything else you think I should add or expand on based on the last push?

Aghassi · 2019-06-24T00:11:13Z

@mysticatea Do you have any feedback? It's been a while and I haven't heard back from the others on what else may be needed for this.

mysticatea

Hi. Thank you very much for your work on this.

Well, because I had not tracked the discussion about parallel processing, I'm sorry if my suggestion was mentioned in the past.

I feel that this RFC lacks the concrete design for user-facing changes. It's important how this change affects public API.

What is a new option?
What is a new API?
How do those work?
How do those affects the existing components (rules, plugins, custom parsers, and shareable configs)?

mysticatea · 2019-06-24T12:47:19Z

designs/2018-async-commands/README.md

+
+Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points.
+
+1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or option when using CLIEngine).


I think that we need a more concrete design about user-facing change. This is pretty vagueness to implement.

I think that this RFC intends to:

add --parallel option (or something like). And we need to describe how it works.

add CLIEngine#executeOnFilesInParallel(patterns). And we need to describe how it works.

About user-facing changes (public API changes) is very important. The purpose of RFC processes is to clear that.

Also, how does the --parallel option work with other CLI options? Especially, --stdin, --fix, --cache, --debug.

mysticatea · 2019-06-24T12:56:51Z

designs/2018-async-commands/README.md

+Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points.
+
+1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or option when using CLIEngine).
+2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. As a first iteration, the API should support `async` calls backed by `Promise`s. This will allow the main thread to continue operating while ESLint is doing its thing.


I feel that this sentence is ... confusing. Why does this compare Promises and Threads? Promises (and async) are the API to receive the result of asynchronous processing. Threads are the mechanism to run arbitrary processes concurrently. Those are things at different layers that we cannot compare. (we can use promises to receive the result of the processes that threads ran. It's the thing that other languages such as C# do.)

I guess that you intended to talk about what the way made parallel. One made IO parallel, another one made verification parallel. Then the former was not efficient.

I'm not sure if this sentence is needed.

mysticatea · 2019-06-24T12:57:58Z

designs/2018-async-commands/README.md

+
+1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or option when using CLIEngine).
+2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. As a first iteration, the API should support `async` calls backed by `Promise`s. This will allow the main thread to continue operating while ESLint is doing its thing.
+3. The experience should not break existing users.


I don't think this sentence is needed.

It should be in the "Backwards Compatibility Analysis" section as:

How it's possible to break the current behavior or not?

mysticatea · 2019-06-25T06:51:09Z

designs/2018-async-commands/README.md

+
+For this change, we will not need to modify the logic for `.eslintcache`, nor `cli.execute` as they are not exposed. We do need to keep the functionality of `CLIEngine.executeOnFiles` unchanged.
+
+Since Node 6 is intended to be deprecated in ESLint 6, it is ok for us to rely on and use `async/await` syntax to support this change.


I don't think this sentence is needed.

mysticatea · 2019-06-25T07:25:05Z

designs/2018-async-commands/README.md

+- Gather messages
+- Output lint results to console
+
+For this change, we will not need to modify the logic for `.eslintcache`, nor `cli.execute` as they are not exposed. We do need to keep the functionality of `CLIEngine.executeOnFiles` unchanged.


I don't think that this sentence is needed.

Instead, we need to describe how it realizes the cache functionality in parallel processing. Especially, the cache is one file. How does it read/write the cache without race condition in parallel?

mysticatea · 2019-06-25T07:42:54Z

designs/2018-async-commands/README.md

+
+## Documentation
+
+1. There should be JSDoc code for sure. In code documentation is always the best. In particular, the NodeJS API and the CLI Documentation would need to reflect this change.


The purpose of this section is to clarify how we describe this change to users. So it should describe how we update our documentation (https://eslint.org/) rather than JSDoc. For example, I think the following notes are needed.

The Command Line Interface page should describe how new --parallel option works.

The CLIEngine section of the "Node.js API" page should describe how new executeOnFilesInParallel(patterns) method works.

The Working with Plugins page should describe parallelism for plugins that parse the whole project such as @typescript-eslint/eslint-plugin.

The Working with Custom Parsers page should describe parallelism. Especially, if the parser author wants to provide functionality that shares arbitrary state via parserServices API, it may not support --parallel.

mysticatea · 2019-06-25T07:56:23Z

designs/2018-async-commands/README.md

+  - Optionally write fixes to the disk
+  - _Send a message back to the parent process_
+- Gather messages
+- Output lint results to console


Since rulesMeta, formatters require all rules that the verification process used. How does the formatter know/get the rules that were used in workers?

designs/2018-async-commands/README.md

Updating RFC PR Co-Authored-By: Toru Nagashima <[email protected]>

Aghassi · 2019-07-04T15:58:51Z

Thanks @mysticatea . Let me look and address your comments. Appreciate the feedback 😄

Gaafar · 2019-08-31T00:30:24Z

I made a quick and dirty PoC here eslint/eslint#12191

The main idea is to split the files into X chunks that can be linted sperately on X workers, I used workerpool for the workers, but most likely it will change. The main issue I have now is passing the ConfigArray object as it breaks when serializing/deserializing. I think it'd make sense to pass it around as a plain array and construct the class right before linting a file, but it might take me some time to refactor that as I'm new to the project.

Please have a look at the PR and let me know what you think of the general idea

UPDATE: I found workaround by passing the ConfigArray instance, then rehydrating the non serializable members in the workers https://github.com/eslint/eslint/pull/12191/files#diff-660ea0590a55a93f96e9f6979144e554R445

btmills · 2020-07-16T22:06:05Z

Thank you all for the discussion in this thread and particularly @Aghassi for putting the RFC together! A lot has changed in the core ESLint APIs as we worked toward making parallel linting possible. The TSC decided today to consolidate the parallel linting work in #42, which @mysticatea is updating to reflect a prototype implementation, so please follow along there.

eslint-deprecated bot added the triage label Dec 3, 2018

Chore: added draft for async/parallel proposal

d50284a

Aghassi force-pushed the async-parallel branch from 38b8690 to d50284a Compare December 3, 2018 01:41

Aghassi changed the title ~~chore: added draft for async/parallel proposal~~ Chore: added draft for async/parallel proposal Dec 3, 2018

not-an-aardvark suggested changes Dec 3, 2018

View reviewed changes

kaicataldo reviewed Dec 3, 2018

View reviewed changes

nzakas requested changes Dec 4, 2018

View reviewed changes

nzakas changed the title ~~Chore: added draft for async/parallel proposal~~ New: added draft for async/parallel proposal Dec 4, 2018

Aghassi added 2 commits December 8, 2018 08:47

refactor: reframed rfc, started hashing out implementation details

627a079

refactor: cleaned up and clarified points based on feedback

50375bc

Aghassi added 4 commits December 8, 2018 08:58

chore: typo optoin --> option

69d5b8d

chore: expanded explenation of necessary changes for methods

c46ef58

talked to outputFixes and the constructor

chore: clarifying the changes don't touch the rule API

fc9c2a5

chore: added information about async commands

30f3e10

outlined use of executeOnFiles

refactor: removed specific function examples

b4dd0bc

Add detailed outline.

nzakas requested changes Jan 10, 2019

View reviewed changes

not-an-aardvark mentioned this pull request Jan 15, 2019

Docs: Clarify when an RFC can be merged #8

Merged

nzakas added the Initial Commenting This RFC is in the initial feedback stage label Jan 29, 2019

nzakas removed triage labels Jan 29, 2019

not-an-aardvark mentioned this pull request Feb 6, 2019

New: Lint files in parallel #11

Closed

docs: added details about implementation questions

7b4667a

This was referenced May 17, 2019

New: Add events in CLIEngine (fileVerify) eslint/eslint#11733

Closed

New: Events Model in CLIEngine. Event: "fileVerify" #24

Closed

mysticatea suggested changes Jun 25, 2019

View reviewed changes

Committing suggestion by @mysticatea

718cac3

Updating RFC PR Co-Authored-By: Toru Nagashima <[email protected]>

kaicataldo mentioned this pull request Aug 5, 2019

New: Adds new info flag eslint/eslint#12060

Closed

This was referenced Sep 28, 2019

New: ESLint Class Replacing CLIEngine #40

Merged

feat: Lint files in parallel if many files exist #42

Closed

SimenB mentioned this pull request Jan 7, 2020

Cache does not work with multiple jest workers jest-community/jest-runner-eslint#82

Open

tunnckoCore mentioned this pull request Mar 15, 2020

@hela/eslint: Implement ESLint RFC#9 helapkg/hela#227

Open

JounQin mentioned this pull request Apr 29, 2020

Support async processing mdx-js/eslint-mdx#203

Closed

nzakas mentioned this pull request Jul 14, 2020

TSC meeting 16-July-2020 eslint/tsc-meetings#187

Closed

btmills closed this Jul 16, 2020

aewing mentioned this pull request Dec 13, 2020

Sync preprocess sveltejs/svelte#5770

Closed


		Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points.

		1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine).


		## Detailed Design

		Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points.


		## Documentation

		1. There should be JSDoc code for sure. In code documentation is always the best.


		## Summary

		`CLIEngine` currently doesn't support any async functions. As such, large codebases pay a toll since it requires users to wait for each file ( time O(n) ) to be processed by ESLint. There's a large gain to be made if ESLint can offer parallelized tasks so that users can more quickly statically analyze their code.


		For this change, we will not need to modify the logic for `.eslintcache`, nor `cli.execute` as they are not exposed. We do need to keep the functionality of `CLIEngine.executeOnFiles` unchanged.

		Since Node 6 is intended to be deprecated in ESLint 6, it is ok for us to rely on and use `async/await` syntax to support this change.

New: added draft for async/parallel proposal #4

New: added draft for async/parallel proposal #4

Conversation

Aghassi commented Dec 3, 2018

Summary

Related Issues

jsf-clabot commented Dec 3, 2018 • edited Loading

not-an-aardvark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aghassi Dec 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaicataldo Dec 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nzakas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aghassi commented Dec 8, 2018

Aghassi commented Dec 8, 2018

nzakas commented Dec 11, 2018 via email

nzakas commented Dec 13, 2018

Aghassi commented Dec 15, 2018

Aghassi commented Dec 23, 2018

nzakas commented Dec 27, 2018 via email

Aghassi commented Dec 29, 2018

nzakas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aghassi commented Jan 26, 2019

nzakas commented Jan 29, 2019 via email

Aghassi commented Mar 5, 2019 • edited Loading

How does the top-down architecture of the CLI app change? bin/eslint.js is the entrypoint and then that calls lib/cli.js. It seems like lib/cli.js's execute() method would need to change to return a promise so that it can call some async function on CLIEngine.

After the files to lint are gathered, how is that list managed? I'm assuming some sort of a queue, but it would be good to be specific.

Can this be reasonably implemented in Node.js 6 (without JavaScript async functions) or would we need to drop Node.js 6 support because async functions are the only sane way to do this?

The previously mentioned handling of configuration files and configuration caches needs to be thought out.

How would all of this work with the --cache option? Maybe it would just work without changes, but I think that's something we need to be explicit about.

ljharb commented Mar 5, 2019

not-an-aardvark commented Mar 5, 2019

Aghassi commented Mar 25, 2019

Aghassi commented Jun 24, 2019

mysticatea left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aghassi commented Jul 4, 2019

Gaafar commented Aug 31, 2019 • edited Loading

btmills commented Jul 16, 2020

jsf-clabot commented Dec 3, 2018 •

edited

Loading

Aghassi Dec 3, 2018 •

edited

Loading

kaicataldo Dec 3, 2018 •

edited

Loading

Aghassi commented Mar 5, 2019 •

edited

Loading

Gaafar commented Aug 31, 2019 •

edited

Loading