Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New: added draft for async/parallel proposal #4

Closed
wants to merge 10 commits into from
81 changes: 81 additions & 0 deletions designs/2018-async-commands/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
- Start Date: 2018-12-2
- RFC PR: (leave this empty, to be filled in later)
Aghassi marked this conversation as resolved.
Show resolved Hide resolved
- Authors: David Aghassi

# Async/Parallizing CLIEngine

## Summary

`CLIEngine` currently doesn't support any async functions. As such, large codebases pay a toll since it requires users to wait for each file ( time O(n) ) to be processed by ESLint. There's a large gain to be made if ESLint can offer parallelized tasks so that users can more quickly statically analyze their code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the goal of the RFC is broader than this describes. The goal is really to allow ESLint as a whole to lint files in parallel. The changes to CLIEngine are an implementation detail, and I would expect changes to cli to also be included. I'd suggest reframing the summary of this RFC to make the high-level goal clear while explaining some of the implementation changes you're envisioning.


## Motivation

On code bases in larger companies (in my case, Intuit) ESLint can take a very long time relative to other build tasks, even with caching enabled. The faster users can iterate and get feedback for their code, the faster they can develop/debug/ship. It also has impacts on CI since the same toll is paid.

## Detailed Design

Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good list of points to consider when designing a solution -- I've left some feedback below on the points. However, the purpose of this section (and the RFC process more generally) is to describe a proposed design in significantly more detail, rather than just giving a summary of the problem and the points being considered. (See the design section of this work-in-progress RFC for an example.)

Some questions I have after reading this include:

  • What specific changes to the CLIEngine API are being proposed? If there are new methods being added, what do they do?
  • Does this design change the rule API? Before reading this RFC, my expectation for how parallelism would work was that the rule API would remain the same, and parallelism would happen on a per-file basis independently from rules. However, point 4 below mentions that some rules might need to identify themselves as a particular type, and I'm confused about why this would be necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the referral. I'll make adjustments. This is my first RFC so I appreciate the feedback 😅 !


1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't think putting parallelism behind a flag should be a requirement provided that compatibility is preserved -- it seems like an implementation detail that users shouldn't have to worry about. (However, based on previous discussions there may be others on the team who disagree with me about this.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. I don't really see any value in keeping the ability to do things sync still. That being said, I do think this should go out in a major release.

Copy link
Author

@Aghassi Aghassi Dec 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it being in a major release. Plus, anyone who needs sync functionality can peg to an older version and move on their own time. I don't see a value in keeping things synchronous either unless there is a really good reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see this put behind some kind of flag for at least one release. My concern is that without a way to opt-in/opt-out, it will be difficult to determine if a certain class of bugs is in any way related to the parallelism or not. I wouldn't be comfortable shipping this as the default behavior until we are clear that we've worked out the bugs and gotten feedback from the community. At that point, I'd still want an opt-out flag to help with debugging.

2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this point -- to me it seems like the use of promises and threads is orthogonal given that they don't address the same problem. A Promise is a value representing the result of an asynchronous operation, whereas multithreading is a mechanism to make computation faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the concerns were that promises would not be faster per say depending on the implementation.

My naive take (I haven't spent any time looking into this yet) is that if we can figure out a way to have each promise represent not just the linting but also the file I/O operations that this would still be a win.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@not-an-aardvark to my understanding, promises are just a fancy way of dealing with callbacks. So, under the hood, they could still be blocking the main thread if the callback that is being invoked is synchronous in the nature (the promise just ends up being a convenience to the consumer, but adds no benefit to them). To @kaicataldo's point, if the promise wraps a call that is non blocking on the main thread (because it too is promise based and async), then that's the best. The thread is a layer I see added on top of an already async system. To be 💯 open, I've not done much work with workers/threads, so I don't know how effective it would be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for clarifying

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this statement is confusing. The choice isn't really between threads (probably you mean processes in this context) or promises, as these accomplish two different things. At some point in the API there will need be promise support, which allows the JS engine to continue operating while waiting for some result to be returned.

I think it's safe to assume that there will need to be processes that somehow communicate back to the main process, likely via promises.

3. The experience should be opt in so as not to break existing users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the need to not break existing users, but I'm not sure being opt-in is a requirement for this (see my comment on the first list item).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I disagree for the reasons stated in my previous comment. :)

4. There should be some sort of mechanism to handle eslint rules that expect `sync` reading. Rules may need to identify themselves as a "type". This way, if a rule is being loaded that requires `sync`, the code can handle it accordingly. Any rule that doesn't identify as a `sync` or `async` would be defaulted to `sync`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any examples that demonstrate why this is necessary? I'm not sure I follow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pulling this idea from this comment eslint/eslint#10606 (comment)

I honestly don't know the inner workings of a rule set, so I don't know that rules can necessarily be sync or async by nature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Currently, those visitor callbacks are synchronous (see eslint/eslint#10606 (comment) and eslint/eslint#10606 (comment)).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So for now I would say this part of the RFC is either out of scope or unnecessary for the first pass at this idea. Ideally, rules should be independent of execution anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. There is no such thing as an async rule, and even if there was, that should be handled by something lower in the API stack. I think it's safe to remove this reference.


## Documentation

1. There should be JSDoc code for sure. In code documentation is always the best.
Copy link
Member

@kaicataldo kaicataldo Dec 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently enforce this in the codebase, so this will happen whether it's in this proposal or not!

I also think we'll want to document any changes that occur to any public APIs as well as possibly do a little write-up of how ESLint does async processing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spots that I believe would have to be updated would be:

  1. Node.js API
  2. Command Line Interface (assuming a flag)

2. I do believe this should warrant a blog announcement to say "hey we are moving in this direction, and would like feedback and help with it". Many companies would contribute and help once the initial implementation is there.

## Drawbacks

1. More to maintain for the maintainers
2. Threading and async problems are hard, and can require considerable effort (even with other projects out there doing it).
3. This may expose unintended edge cases for eslint rules that expect synchronous scanning. There will need to be a shim mechanism for sure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my question above, I can't think of any rules that do this off the top of my head. If they do exist, perhaps we should alter them to not be affected by this (and make that a requirement for rules going forward).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Each rule is very self-contained so I don't foresee any drawbacks along those lines.

Some other drawbacks I can see:

  1. How will this interact with the --debug flag? Right now we recommend people use that to help figure out problems, and presumably we would lose some or all of that ability.
  2. Testing the multi-process functionality seems like it would be difficult. Travis uses a 2-core CPU for each VM, so we do have some coverage. It still seems like it would be difficult to determine if this is working automatically.


## Backwards Compatibility Analysis

As long as this functionality is marked as "beta" until ready and hid behind a flag or options, it should not break any existing users.

## Alternatives

https://github.com/pinterest/esprint <--- The main option, but Pinterest seems to have lost interest in it

I tried wrapping ESLint in promises, but that was very messy and didn't work out too well.
I tried making a PR that would change the functions I cared about, but the general discussion (in the PR above) showed there would need to be a more thoughtful attempt.

## Open Questions

1. What is the core architecture of ESLint that requires it to be synchronous?

## Help Needed

<!--
This section is optional.

Are you able to implement this RFC on your own? If not, what kind
of help would you need from the team?
-->
I would like help from the core team with this effort as I do work full time and even this has been a few months of work in my free time outside of work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of help are you talking about? Editing the RFC? Implementing the change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing. I tend to only have weekends as time when I can work on side projects. Otherwise, I don't mind taking a few months on weekends working on this 😄. It will be a lot of code reviews for sure is all.


## Frequently Asked Questions

<!--
This section is optional but suggested.

Try to anticipate points of clarification that might be needed by
the people reviewing this RFC. Include those questions and answers
in this section.
-->

## Related Discussions

<!--
This section is optional but suggested.

If there is an issue, pull request, or other URL that provides useful
context for this proposal, please include those links here.
-->

PR - https://github.com/eslint/eslint/issues/10606
Google Group RFC Discussion - https://groups.google.com/forum/m/#!topic/eslint/di7l6__w2jk
Bounty Discussion - https://www.bountysource.com/issues/26284182-lint-multiple-files-in-parallel