From d50284aa1efa9a915d532a043f9e2a7e60cd6b33 Mon Sep 17 00:00:00 2001 From: David Aghassi <3680126+Aghassi@users.noreply.github.com> Date: Sun, 2 Dec 2018 17:35:15 -0800 Subject: [PATCH 01/10] Chore: added draft for async/parallel proposal --- designs/2018-async-commands/README.md | 81 +++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 designs/2018-async-commands/README.md diff --git a/designs/2018-async-commands/README.md b/designs/2018-async-commands/README.md new file mode 100644 index 00000000..98a18c78 --- /dev/null +++ b/designs/2018-async-commands/README.md @@ -0,0 +1,81 @@ +- Start Date: 2018-12-2 +- RFC PR: (leave this empty, to be filled in later) +- Authors: David Aghassi + +# Async/Parallizing CLIEngine + +## Summary + +`CLIEngine` currently doesn't support any async functions. As such, large codebases pay a toll since it requires users to wait for each file ( time O(n) ) to be processed by ESLint. There's a large gain to be made if ESLint can offer parallelized tasks so that users can more quickly statically analyze their code. + +## Motivation + +On code bases in larger companies (in my case, Intuit) ESLint can take a very long time relative to other build tasks, even with caching enabled. The faster users can iterate and get feedback for their code, the faster they can develop/debug/ship. It also has impacts on CI since the same toll is paid. + +## Detailed Design + +Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points. + +1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). +2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood. +3. The experience should be opt in so as not to break existing users. +4. There should be some sort of mechanism to handle eslint rules that expect `sync` reading. Rules may need to identify themselves as a "type". This way, if a rule is being loaded that requires `sync`, the code can handle it accordingly. Any rule that doesn't identify as a `sync` or `async` would be defaulted to `sync`. + +## Documentation + +1. There should be JSDoc code for sure. In code documentation is always the best. +2. I do believe this should warrant a blog announcement to say "hey we are moving in this direction, and would like feedback and help with it". Many companies would contribute and help once the initial implementation is there. + +## Drawbacks + +1. More to maintain for the maintainers +2. Threading and async problems are hard, and can require considerable effort (even with other projects out there doing it). +3. This may expose unintended edge cases for eslint rules that expect synchronous scanning. There will need to be a shim mechanism for sure. + +## Backwards Compatibility Analysis + +As long as this functionality is marked as "beta" until ready and hid behind a flag or options, it should not break any existing users. + +## Alternatives + +https://github.com/pinterest/esprint <--- The main option, but Pinterest seems to have lost interest in it + +I tried wrapping ESLint in promises, but that was very messy and didn't work out too well. +I tried making a PR that would change the functions I cared about, but the general discussion (in the PR above) showed there would need to be a more thoughtful attempt. + +## Open Questions + +1. What is the core architecture of ESLint that requires it to be synchronous? + +## Help Needed + + +I would like help from the core team with this effort as I do work full time and even this has been a few months of work in my free time outside of work. + +## Frequently Asked Questions + + + +## Related Discussions + + + +PR - https://github.com/eslint/eslint/issues/10606 +Google Group RFC Discussion - https://groups.google.com/forum/m/#!topic/eslint/di7l6__w2jk +Bounty Discussion - https://www.bountysource.com/issues/26284182-lint-multiple-files-in-parallel \ No newline at end of file From 627a079cb2956f3d41e3ef8e1f25869377960c08 Mon Sep 17 00:00:00 2001 From: David Aghassi <3680126+Aghassi@users.noreply.github.com> Date: Sat, 8 Dec 2018 08:47:24 -0800 Subject: [PATCH 02/10] refactor: reframed rfc, started hashing out implementation details --- designs/2018-async-commands/README.md | 82 +++++++++++++++++++++++++-- 1 file changed, 77 insertions(+), 5 deletions(-) diff --git a/designs/2018-async-commands/README.md b/designs/2018-async-commands/README.md index 98a18c78..2f50fbc9 100644 --- a/designs/2018-async-commands/README.md +++ b/designs/2018-async-commands/README.md @@ -2,11 +2,11 @@ - RFC PR: (leave this empty, to be filled in later) - Authors: David Aghassi -# Async/Parallizing CLIEngine +# Async/Parallizing ESLint Processing Of Files ## Summary -`CLIEngine` currently doesn't support any async functions. As such, large codebases pay a toll since it requires users to wait for each file ( time O(n) ) to be processed by ESLint. There's a large gain to be made if ESLint can offer parallelized tasks so that users can more quickly statically analyze their code. +ESLint currently is entirely synchronous. The underlying code, `CLIEngine` currently doesn't support any async functions. As such, large codebases pay a toll since it requires users to wait for each file ( time O(n) ) to be processed by ESLint. There's a large gain to be made if ESLint can offer parallelized tasks so that users can more quickly statically analyze their code. ## Motivation @@ -18,8 +18,81 @@ Based on discussion in the original [PR](https://github.com/eslint/eslint/issues 1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). 2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood. -3. The experience should be opt in so as not to break existing users. -4. There should be some sort of mechanism to handle eslint rules that expect `sync` reading. Rules may need to identify themselves as a "type". This way, if a rule is being loaded that requires `sync`, the code can handle it accordingly. Any rule that doesn't identify as a `sync` or `async` would be defaulted to `sync`. +3. The experience should not break existing users. + +### CLIEngine + +I believe the initial implementation needs to stem from the core of ESLint. Many of the core methods rely on `sync` based Node APIs. We can swap these out for `async` based ones that are wrapped in `promisify` to get the promise based async calls we want. However, this will have a bubble up a effect to anything relying on these core APIs since they would return a `Promise`. + +#### constructor + +https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L428 + +#### processFile + +The processFile call is used to process a known file by ESLint. The core function this file wraps is [fs.readFileSync](https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L238). + +Node API allows us to swap + +```javascript +const text = fs.readFileSync(path.resolve(filename), "utf8"); +``` + +for + +```javascript +import fs from 'fs'; +import util from 'util'; + +const promiseReadFile = util.promisify(fs.readFile); + +... + +return promiseReadfile(path.resolve(filename), "utf8").then((text) => { + ... +}); +``` + +Since we _know_ the file exists, we don't have to have an exception check here (though it is good practice to do so). If we do have an exception, we should handle it accordingly. It may be worth noting the file that was not processed and logging it out to a file so as not to stop the entire linting process. Output can then be shown to say "x/total scanned, y unscanned. See log for details". + +`executeOnFiles` would have to handle that (assuming nothing else changes). `executeOnFiles` [calls processFile](https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L612). + +#### getCachedFile + +This method calls [fs.lstatSync](https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L362) under the hood to find the cache file. This can be replaced with the normal `lstat` call and `promisifed` like above. + +```javascript +let fileStats; + +try { + fileStats = fs.lstatSync(resolvedCacheFile); +} catch (ex) { + fileStats = null; +} +``` + +becomes + +```javascript +import fs from 'fs'; +import util from 'util'; + +const promisedLstat = util.promisify(fs.lstat); + +... + +let fileStats = await promisedLstat(resolvedCacheFile).then((stat) => stat). catch ((ex) => null); +``` + +#### executeOnFiles + +https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L577 + +Calls `processFile`: https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L612 + +#### outputFixes + +https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L541 ## Documentation @@ -30,7 +103,6 @@ Based on discussion in the original [PR](https://github.com/eslint/eslint/issues 1. More to maintain for the maintainers 2. Threading and async problems are hard, and can require considerable effort (even with other projects out there doing it). -3. This may expose unintended edge cases for eslint rules that expect synchronous scanning. There will need to be a shim mechanism for sure. ## Backwards Compatibility Analysis From 50375bc22f050b12ecceed3eb7dd553834433f9a Mon Sep 17 00:00:00 2001 From: David Aghassi <3680126+Aghassi@users.noreply.github.com> Date: Sat, 8 Dec 2018 08:54:36 -0800 Subject: [PATCH 03/10] refactor: cleaned up and clarified points based on feedback --- designs/2018-async-commands/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/designs/2018-async-commands/README.md b/designs/2018-async-commands/README.md index 2f50fbc9..14836a47 100644 --- a/designs/2018-async-commands/README.md +++ b/designs/2018-async-commands/README.md @@ -17,7 +17,7 @@ On code bases in larger companies (in my case, Intuit) ESLint can take a very lo Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points. 1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). -2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. On the flipside, it has been noted that `threads`/workers can be expensive on platforms like Windows. Ideally, the implementation would use a balance of both. For example, `jest` has async methods that use threads under the hood. +2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. As a first iteration, the API should suppor `async` calls backed by `Promise`s. This will allow the main thread to continue operating while ESLint is doing its thing. 3. The experience should not break existing users. ### CLIEngine @@ -53,7 +53,7 @@ return promiseReadfile(path.resolve(filename), "utf8").then((text) => { }); ``` -Since we _know_ the file exists, we don't have to have an exception check here (though it is good practice to do so). If we do have an exception, we should handle it accordingly. It may be worth noting the file that was not processed and logging it out to a file so as not to stop the entire linting process. Output can then be shown to say "x/total scanned, y unscanned. See log for details". +Since we _know_ the file exists, we don't have to have an exception check here (though it is good practice to do so). If we do have an exception, we should keep a tally of those files that caused issues and create a log file like `yarn` does. Since the processes are async, we can keep the exceptions in a map of `Errors` and time stamp each one. This way, we can get the output accordingly. It may be worth noting the file that was not processed and logging it out to a file so as not to stop the entire linting process. Output can then be shown to say "x/total scanned, y unscanned. See log for details". `executeOnFiles` would have to handle that (assuming nothing else changes). `executeOnFiles` [calls processFile](https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L612). @@ -96,7 +96,7 @@ https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L541 ## Documentation -1. There should be JSDoc code for sure. In code documentation is always the best. +1. There should be JSDoc code for sure. In code documentation is always the best. In particular, the NodeJS API and the CLI Documentation would need to reflect this change. 2. I do believe this should warrant a blog announcement to say "hey we are moving in this direction, and would like feedback and help with it". Many companies would contribute and help once the initial implementation is there. ## Drawbacks @@ -127,7 +127,7 @@ I tried making a PR that would change the functions I cared about, but the gener Are you able to implement this RFC on your own? If not, what kind of help would you need from the team? --> -I would like help from the core team with this effort as I do work full time and even this has been a few months of work in my free time outside of work. +I would like help from the core team with this effort when it comes to implementing (assuming this RFC is accepted) as I do work full time and even this has been a few months of work in my free time outside of work. ## Frequently Asked Questions From 69d5b8df518688c8d8801b924bc69d1de682fc4e Mon Sep 17 00:00:00 2001 From: David Aghassi <3680126+Aghassi@users.noreply.github.com> Date: Sat, 8 Dec 2018 08:58:25 -0800 Subject: [PATCH 04/10] chore: typo optoin --> option --- designs/2018-async-commands/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/designs/2018-async-commands/README.md b/designs/2018-async-commands/README.md index 14836a47..d25eda61 100644 --- a/designs/2018-async-commands/README.md +++ b/designs/2018-async-commands/README.md @@ -6,7 +6,7 @@ ## Summary -ESLint currently is entirely synchronous. The underlying code, `CLIEngine` currently doesn't support any async functions. As such, large codebases pay a toll since it requires users to wait for each file ( time O(n) ) to be processed by ESLint. There's a large gain to be made if ESLint can offer parallelized tasks so that users can more quickly statically analyze their code. +ESLint currently is entirely synchronous. The underlying code, `CLIEngine` currently doesn't support any async functions. As such, large codebases pay a toll since it requires users to wait for each file ( time O(n) ) to be processed by ESLint. There's a large gain to be made if ESLint can offer parallelized tasks so that users can more quickly statically analyze their code. This change would also bubble up to the `cli` level for users who don't access the API directly. ## Motivation @@ -16,7 +16,7 @@ On code bases in larger companies (in my case, Intuit) ESLint can take a very lo Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points. -1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or optoin when using CLIEngine). +1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or option when using CLIEngine). 2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. As a first iteration, the API should suppor `async` calls backed by `Promise`s. This will allow the main thread to continue operating while ESLint is doing its thing. 3. The experience should not break existing users. From c46ef586804cfc355a77a94f110081ef52fc7fc6 Mon Sep 17 00:00:00 2001 From: David Aghassi <3680126+Aghassi@users.noreply.github.com> Date: Sat, 8 Dec 2018 09:08:11 -0800 Subject: [PATCH 05/10] chore: expanded explenation of necessary changes for methods talked to outputFixes and the constructor --- designs/2018-async-commands/README.md | 42 +++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/designs/2018-async-commands/README.md b/designs/2018-async-commands/README.md index d25eda61..74cf6a2c 100644 --- a/designs/2018-async-commands/README.md +++ b/designs/2018-async-commands/README.md @@ -28,6 +28,29 @@ I believe the initial implementation needs to stem from the core of ESLint. Many https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L428 +Currently the constructor checks if an ignored file exists synchronously. We can swap this logic for: + +```javascript +import fs from 'fs'; +import util from 'util'; + +const promiseStat = util.promisify(fs.stat); + +... + + if (options.ignore && options.ignorePath) { + promiseStat(options.ignorePath).then((stats) => { + if (!stats.isFile()) { + throw new Error(`${options.ignorePath} is not a file`); + } + }) + .catch((e) => { + e.message = `Error: Could not load file ${options.ignorePath}\nError: ${e.message}`; + throw e; + }); + } +``` + #### processFile The processFile call is used to process a known file by ESLint. The core function this file wraps is [fs.readFileSync](https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L238). @@ -94,6 +117,25 @@ Calls `processFile`: https://github.com/eslint/eslint/blob/master/lib/cli-engine https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L541 +This method calls `fs.writeFileSync`. We can swap that with `promisify` + `fs.writeFile`. That will give us a promise while the file is written. If we fail to write the file, we can keep a running log and exit while out putting a failure log with those files that caused trouble (need more help to understand what mechanisms exist for this currently, if any). + +```javascript +import fs from 'fs'; +import util from 'util'; + +const promiseWriteFile = util.promisify(fs.writeFile); + +... + + static async outputFixes(report) { + report.results.filter(result => Object.prototype.hasOwnProperty.call(result, "output")).forEach(result => { + await promiseWriteFile(result.filePath, result.output).catch((e) => { + // Take care of logging failure + }); + }); + } +``` + ## Documentation 1. There should be JSDoc code for sure. In code documentation is always the best. In particular, the NodeJS API and the CLI Documentation would need to reflect this change. From fc9c2a55f9aec7c53f75e31e94f8d78712456509 Mon Sep 17 00:00:00 2001 From: David Aghassi <3680126+Aghassi@users.noreply.github.com> Date: Sat, 8 Dec 2018 09:09:40 -0800 Subject: [PATCH 06/10] chore: clarifying the changes don't touch the rule API --- designs/2018-async-commands/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/designs/2018-async-commands/README.md b/designs/2018-async-commands/README.md index 74cf6a2c..d7a45033 100644 --- a/designs/2018-async-commands/README.md +++ b/designs/2018-async-commands/README.md @@ -22,7 +22,7 @@ Based on discussion in the original [PR](https://github.com/eslint/eslint/issues ### CLIEngine -I believe the initial implementation needs to stem from the core of ESLint. Many of the core methods rely on `sync` based Node APIs. We can swap these out for `async` based ones that are wrapped in `promisify` to get the promise based async calls we want. However, this will have a bubble up a effect to anything relying on these core APIs since they would return a `Promise`. +I believe the initial implementation needs to stem from the core of ESLint. Many of the core methods rely on `sync` based Node APIs. We can swap these out for `async` based ones that are wrapped in `promisify` to get the promise based async calls we want. However, this will have a bubble up a effect to anything relying on these core APIs since they would return a `Promise`. None of this should touch the rule API, and should remain relegated to the processing of files under the hood. #### constructor From 30f3e103ff251872a15368586d83c666b205f162 Mon Sep 17 00:00:00 2001 From: David Aghassi <3680126+Aghassi@users.noreply.github.com> Date: Sat, 8 Dec 2018 09:15:12 -0800 Subject: [PATCH 07/10] chore: added information about async commands outlined use of executeOnFiles --- designs/2018-async-commands/README.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/designs/2018-async-commands/README.md b/designs/2018-async-commands/README.md index d7a45033..6659d635 100644 --- a/designs/2018-async-commands/README.md +++ b/designs/2018-async-commands/README.md @@ -22,7 +22,9 @@ Based on discussion in the original [PR](https://github.com/eslint/eslint/issues ### CLIEngine -I believe the initial implementation needs to stem from the core of ESLint. Many of the core methods rely on `sync` based Node APIs. We can swap these out for `async` based ones that are wrapped in `promisify` to get the promise based async calls we want. However, this will have a bubble up a effect to anything relying on these core APIs since they would return a `Promise`. None of this should touch the rule API, and should remain relegated to the processing of files under the hood. +I believe the initial implementation needs to stem from the core of ESLint. Many of the core methods rely on `sync` based Node APIs. We can swap these out for `async` based ones that are wrapped in `promisify` to get the promise based async calls we want. However, this will have a bubble up a effect to anything relying on these core APIs since they would return a `Promise`. None of this should touch the rule API, and should remain relegated to the processing of files under the hood. To be clear, the methods listed here would not be removed or modified. They would have `async` counter parts. This allows us to build out the functionality without impacting current async users. It also gives us the opportunity to refactor the `CLI` and other parts of the code that rely on the sync code to support both cases. + +In addition, all of these changes will require their own tests. #### constructor @@ -111,7 +113,13 @@ let fileStats = await promisedLstat(resolvedCacheFile).then((stat) => stat). c https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L577 -Calls `processFile`: https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L612 +Calls `processFile`: https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L612. Since this will be `async`, we can use `async/await` here. + +In addition, this method is called in two other places throughout the code base. +https://github.com/eslint/eslint/blob/7ad86dea02feceb7631943a7e1423cc8a113fcfe/lib/cli.js#L197 +https://github.com/eslint/eslint/blob/6009239042cb651bc7ca6b8c81bbe44c40327430/lib/util/source-code-utils.js#L33 + +These would need to be updated to have async options supported. #### outputFixes From b4dd0bc6d5e095491703f23411fcf6bf7e73dd2a Mon Sep 17 00:00:00 2001 From: David Aghassi <3680126+Aghassi@users.noreply.github.com> Date: Sat, 29 Dec 2018 08:37:48 -0800 Subject: [PATCH 08/10] refactor: removed specific function examples Add detailed outline. --- designs/2018-async-commands/README.md | 149 +++++--------------------- 1 file changed, 25 insertions(+), 124 deletions(-) diff --git a/designs/2018-async-commands/README.md b/designs/2018-async-commands/README.md index 6659d635..90ea899c 100644 --- a/designs/2018-async-commands/README.md +++ b/designs/2018-async-commands/README.md @@ -20,129 +20,29 @@ Based on discussion in the original [PR](https://github.com/eslint/eslint/issues 2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. As a first iteration, the API should suppor `async` calls backed by `Promise`s. This will allow the main thread to continue operating while ESLint is doing its thing. 3. The experience should not break existing users. -### CLIEngine - -I believe the initial implementation needs to stem from the core of ESLint. Many of the core methods rely on `sync` based Node APIs. We can swap these out for `async` based ones that are wrapped in `promisify` to get the promise based async calls we want. However, this will have a bubble up a effect to anything relying on these core APIs since they would return a `Promise`. None of this should touch the rule API, and should remain relegated to the processing of files under the hood. To be clear, the methods listed here would not be removed or modified. They would have `async` counter parts. This allows us to build out the functionality without impacting current async users. It also gives us the opportunity to refactor the `CLI` and other parts of the code that rely on the sync code to support both cases. - -In addition, all of these changes will require their own tests. - -#### constructor - -https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L428 - -Currently the constructor checks if an ignored file exists synchronously. We can swap this logic for: - -```javascript -import fs from 'fs'; -import util from 'util'; - -const promiseStat = util.promisify(fs.stat); - -... - - if (options.ignore && options.ignorePath) { - promiseStat(options.ignorePath).then((stats) => { - if (!stats.isFile()) { - throw new Error(`${options.ignorePath} is not a file`); - } - }) - .catch((e) => { - e.message = `Error: Could not load file ${options.ignorePath}\nError: ${e.message}`; - throw e; - }); - } -``` - -#### processFile - -The processFile call is used to process a known file by ESLint. The core function this file wraps is [fs.readFileSync](https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L238). - -Node API allows us to swap - -```javascript -const text = fs.readFileSync(path.resolve(filename), "utf8"); -``` - -for - -```javascript -import fs from 'fs'; -import util from 'util'; - -const promiseReadFile = util.promisify(fs.readFile); - -... - -return promiseReadfile(path.resolve(filename), "utf8").then((text) => { - ... -}); -``` - -Since we _know_ the file exists, we don't have to have an exception check here (though it is good practice to do so). If we do have an exception, we should keep a tally of those files that caused issues and create a log file like `yarn` does. Since the processes are async, we can keep the exceptions in a map of `Errors` and time stamp each one. This way, we can get the output accordingly. It may be worth noting the file that was not processed and logging it out to a file so as not to stop the entire linting process. Output can then be shown to say "x/total scanned, y unscanned. See log for details". - -`executeOnFiles` would have to handle that (assuming nothing else changes). `executeOnFiles` [calls processFile](https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L612). - -#### getCachedFile - -This method calls [fs.lstatSync](https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L362) under the hood to find the cache file. This can be replaced with the normal `lstat` call and `promisifed` like above. - -```javascript -let fileStats; - -try { - fileStats = fs.lstatSync(resolvedCacheFile); -} catch (ex) { - fileStats = null; -} -``` - -becomes - -```javascript -import fs from 'fs'; -import util from 'util'; - -const promisedLstat = util.promisify(fs.lstat); - -... - -let fileStats = await promisedLstat(resolvedCacheFile).then((stat) => stat). catch ((ex) => null); -``` - -#### executeOnFiles - -https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L577 - -Calls `processFile`: https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L612. Since this will be `async`, we can use `async/await` here. - -In addition, this method is called in two other places throughout the code base. -https://github.com/eslint/eslint/blob/7ad86dea02feceb7631943a7e1423cc8a113fcfe/lib/cli.js#L197 -https://github.com/eslint/eslint/blob/6009239042cb651bc7ca6b8c81bbe44c40327430/lib/util/source-code-utils.js#L33 - -These would need to be updated to have async options supported. - -#### outputFixes - -https://github.com/eslint/eslint/blob/master/lib/cli-engine.js#L541 - -This method calls `fs.writeFileSync`. We can swap that with `promisify` + `fs.writeFile`. That will give us a promise while the file is written. If we fail to write the file, we can keep a running log and exit while out putting a failure log with those files that caused trouble (need more help to understand what mechanisms exist for this currently, if any). - -```javascript -import fs from 'fs'; -import util from 'util'; - -const promiseWriteFile = util.promisify(fs.writeFile); - -... - - static async outputFixes(report) { - report.results.filter(result => Object.prototype.hasOwnProperty.call(result, "output")).forEach(result => { - await promiseWriteFile(result.filePath, result.output).catch((e) => { - // Take care of logging failure - }); - }); - } -``` +Based on looking at other libraries, like JSCodeShift, the general flow should look like. + +1) Get all the files to be processed (via a [promise](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L216)) +2) [Determine the threshold of processes](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L224) we can have running at once. This is done by looking at the CPU count on the given system, and taking `Math.min` of the files relative to the CPU (which means in most cases this will come out to be 4 assuming a quad core processor). If someone passes `runInBand`, it is single threaded by default. +3) [Determine how many files can go to each process](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L225). Take some arbitrary `CHUNK_SIZE` (50 in this case) and apply it relative to how efficiently you can split the number of files across the processes provided. If we have 200 files for example and 4 processors, the math works out to ((200/4)/50) = 1. NOTE: `CHUNK_SIZE` will have to be determined through testing for ESLint to see what is most performant. +4) Send the batched files off to a worker that has a predefined set of functions (in the case of ESLint, this would be the bit you specified). Based on the [messages the workers send](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L270-L285) act accordingly. In the case of ESLint, this gets to the point we mentioned earlier about keeping track of failures. `jscodeshift` [reports issues to the console](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L66). One thing we need to determine is whether or not a failure to lint, or even write/fix, is grounds for ending the process. I'd prefer if ESLint failed gracefully and just said "unable to lint/fix files:" and provided output. +5) Once the workers are done, [provide output to the user](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L292). + +The flow in turn would look like: + +- Determine list of files to lint +- _Determine processors available_ +- _Determine batch size_ +- _Send each batch to a worker_ +- For each _batch sent to a worker_: + - Check the cache to see if the file has changed + - Determine the correct configuration for the file + - Lint the file + - Optionally apply fixes to the file + - Optionally write fixes to the disk + - _Send a message back to the parent process_ +- Gather messages +- Output lint results to console ## Documentation @@ -177,6 +77,7 @@ I tried making a PR that would change the functions I cared about, but the gener Are you able to implement this RFC on your own? If not, what kind of help would you need from the team? --> + I would like help from the core team with this effort when it comes to implementing (assuming this RFC is accepted) as I do work full time and even this has been a few months of work in my free time outside of work. ## Frequently Asked Questions @@ -200,4 +101,4 @@ I would like help from the core team with this effort when it comes to implement PR - https://github.com/eslint/eslint/issues/10606 Google Group RFC Discussion - https://groups.google.com/forum/m/#!topic/eslint/di7l6__w2jk -Bounty Discussion - https://www.bountysource.com/issues/26284182-lint-multiple-files-in-parallel \ No newline at end of file +Bounty Discussion - https://www.bountysource.com/issues/26284182-lint-multiple-files-in-parallel From 7b4667abc9c0c7193d80a553b8670aa83f3c77bc Mon Sep 17 00:00:00 2001 From: David Aghassi <3680126+Aghassi@users.noreply.github.com> Date: Fri, 8 Mar 2019 08:14:21 -0800 Subject: [PATCH 09/10] docs: added details about implementation questions --- designs/2018-async-commands/README.md | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/designs/2018-async-commands/README.md b/designs/2018-async-commands/README.md index 90ea899c..f6406d09 100644 --- a/designs/2018-async-commands/README.md +++ b/designs/2018-async-commands/README.md @@ -17,16 +17,16 @@ On code bases in larger companies (in my case, Intuit) ESLint can take a very lo Based on discussion in the original [PR](https://github.com/eslint/eslint/issues/10606#issue-341171187)there are some main points. 1. The features that are intended to be async/parallel need to be behind a flag on the CLI, and opt in when using the API (maybe through an async command or option when using CLIEngine). -2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. As a first iteration, the API should suppor `async` calls backed by `Promise`s. This will allow the main thread to continue operating while ESLint is doing its thing. +2. There are two options here: `promise` based and `thread` based. Some users suggested looking at [esprint](https://github.com/pinterest/esprint) for inspiration. Some of the concerns were that `promises` would not be faster per say depending on the implementation. As a first iteration, the API should support `async` calls backed by `Promise`s. This will allow the main thread to continue operating while ESLint is doing its thing. 3. The experience should not break existing users. Based on looking at other libraries, like JSCodeShift, the general flow should look like. -1) Get all the files to be processed (via a [promise](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L216)) -2) [Determine the threshold of processes](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L224) we can have running at once. This is done by looking at the CPU count on the given system, and taking `Math.min` of the files relative to the CPU (which means in most cases this will come out to be 4 assuming a quad core processor). If someone passes `runInBand`, it is single threaded by default. -3) [Determine how many files can go to each process](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L225). Take some arbitrary `CHUNK_SIZE` (50 in this case) and apply it relative to how efficiently you can split the number of files across the processes provided. If we have 200 files for example and 4 processors, the math works out to ((200/4)/50) = 1. NOTE: `CHUNK_SIZE` will have to be determined through testing for ESLint to see what is most performant. -4) Send the batched files off to a worker that has a predefined set of functions (in the case of ESLint, this would be the bit you specified). Based on the [messages the workers send](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L270-L285) act accordingly. In the case of ESLint, this gets to the point we mentioned earlier about keeping track of failures. `jscodeshift` [reports issues to the console](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L66). One thing we need to determine is whether or not a failure to lint, or even write/fix, is grounds for ending the process. I'd prefer if ESLint failed gracefully and just said "unable to lint/fix files:" and provided output. -5) Once the workers are done, [provide output to the user](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L292). +1. Get all the files to be processed (via a [promise](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L216)). These can be stored in a FIFO (first in first out) queue for later parsing since order should not matter. NOTE: The loading of the files can remain synchronous for first iteration, but should be asynchronous in a second round change once the initial change is released. +2. [Determine the threshold of processes](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L224) we can have running at once. This is done by looking at the CPU count on the given system, and taking `Math.min` of the files relative to the CPU (which means in most cases this will come out to be 4 assuming a quad core processor). If someone passes `runInBand`, it is single threaded by default. +3. [Determine how many files can go to each process](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L225). Take some arbitrary `CHUNK_SIZE` (50 in this case) and apply it relative to how efficiently you can split the number of files across the processes provided. If we have 200 files for example and 4 processors, the math works out to ((200/4)/50) = 1. NOTE: `CHUNK_SIZE` will have to be determined through testing for ESLint to see what is most performant. +4. Send the batched files off to a worker that has a predefined set of functions (in the case of ESLint, this would be the bit you specified). Based on the [messages the workers send](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L270-L285) act accordingly. In the case of ESLint, this gets to the point we mentioned earlier about keeping track of failures. `jscodeshift` [reports issues to the console](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L66). One thing we need to determine is whether or not a failure to lint, or even write/fix, is grounds for ending the process. I'd prefer if ESLint failed gracefully and just said "unable to lint/fix files:" and provided output. +5. Once the workers are done, [provide output to the user](https://github.com/facebook/jscodeshift/blob/master/src/Runner.js#L292). The flow in turn would look like: @@ -34,7 +34,7 @@ The flow in turn would look like: - _Determine processors available_ - _Determine batch size_ - _Send each batch to a worker_ -- For each _batch sent to a worker_: +- For each _batch sent to a worker_: - Check the cache to see if the file has changed - Determine the correct configuration for the file - Lint the file @@ -44,6 +44,10 @@ The flow in turn would look like: - Gather messages - Output lint results to console +For this change, we will not need to modify the logic for `.eslintcache`, nor `cli.execute` as they are not exposed. We do need to keep the functionality of `CLIEngine.executeOnFiles` unchanged. + +Since Node 6 is intended to be deprecated in ESLint 6, it is ok for us to rely on and use `async/await` syntax to support this change. + ## Documentation 1. There should be JSDoc code for sure. In code documentation is always the best. In particular, the NodeJS API and the CLI Documentation would need to reflect this change. @@ -58,6 +62,8 @@ The flow in turn would look like: As long as this functionality is marked as "beta" until ready and hid behind a flag or options, it should not break any existing users. +As noted by [discussions during the RFC](https://github.com/eslint/rfcs/pull/4#issuecomment-469583046), Node 6 or earlier will not need supporting. + ## Alternatives https://github.com/pinterest/esprint <--- The main option, but Pinterest seems to have lost interest in it From 718cac357e61e2d0c1a7ee0fa2839fd4e704d1da Mon Sep 17 00:00:00 2001 From: David <3680126+Aghassi@users.noreply.github.com> Date: Thu, 4 Jul 2019 08:57:22 -0700 Subject: [PATCH 10/10] Committing suggestion by @mysticatea Updating RFC PR Co-Authored-By: Toru Nagashima --- designs/2018-async-commands/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/designs/2018-async-commands/README.md b/designs/2018-async-commands/README.md index f6406d09..62a56114 100644 --- a/designs/2018-async-commands/README.md +++ b/designs/2018-async-commands/README.md @@ -1,5 +1,5 @@ - Start Date: 2018-12-2 -- RFC PR: (leave this empty, to be filled in later) +- RFC PR: https://github.com/eslint/rfcs/pull/4 - Authors: David Aghassi # Async/Parallizing ESLint Processing Of Files