-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rush-lib] Ignore specified files when calculating project state hash #2643
Conversation
* A list of file globs under the project root that will be ignored when | ||
* determining whether this project has changed. | ||
*/ | ||
ignoreChangedFiles?: string[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "changed files" sounds like it is somehow related to the rush change
feature, which this isn't.
If they are globs, we should call them "globs". (Heft has some settings like "includeGlobs" and "excludeGlobs" for example.)
You suggested ignoreFilePatternsForBuildState
, but that sounds limited to rush build
, which this isn't. The incremental build analysis can be enabled for various Rush commands, which includes rush build
but also custom commands. A user might define an incremental command like rush generate-docs
and then enable this setting:
// /**
// * If true then this command will be incremental like the built-in "build" command
// */
// "incremental": false,
(We could provide separate ignore globs for each command, but that seems like an advanced feature that will get entangled with other design problems like phased commands. Note that phases will introduce "incremental phases" in addition to the "incremental commands" that we have today.)
My point is that "incremental" or "incremental build" is perhaps more clear than "build" by itself.
So based on all this, maybe the setting could be something like:
incrementalIgnoredGlobs
incrementalBuildIgnoredGlobs
incrementalCommandIgnoredGlobs
@iclanton what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think incrementalBuildIgnoredGlobs
is just about right. The phrase "incremental build" is the most common phrasing by far, in my opinion, so it makes sense to say "here's ignored globs for incremental build" (the fact that it applies to other commands potentially is OK, since those commands "act like incremental build" -- that's the way you'd describe them to someone).
|
||
"ignoreChangedFiles": { | ||
"type": "array", | ||
"description": "A list of file globs under the project root that will be ignored when determining whether this project has changed.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible improved wording:
The incremental analyzer can skip Rush commands for projects whose input files have not changed since the last build. Normally, every Git-tracked file under the project folder is assumed to be an input. Use the
incrementalBuildIgnoredGlobs
setting to ignore specific file paths. The paths are specified as globs relative to the project folder.
|
||
const projectConfiguration: | ||
| RushProjectConfiguration | ||
| undefined = await RushProjectConfiguration.tryLoadForProjectAsync(project, undefined, terminal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iclanton Can the rush-project.json
settings be inherited from a rig? That seems useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that they can today, I have it working in our local monorepo with 5.43.0. 🎉
|
||
const projectConfiguration: | ||
| RushProjectConfiguration | ||
| undefined = await RushProjectConfiguration.tryLoadForProjectAsync(project, undefined, terminal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd want to do an async parallel invocation if we are waiting for file reads in this loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is very much critical path for all rush commands that do any work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! I was hunting around the codebase and I just might not have found it yet -- is there a generic bottleneck / rate limiting / parallelism-limited async map somewhere in rush-lib that I can use?
(I'm pretty sure dropping a Promise.all(map)
bomb is actually slower for 200+ files, so I guess I'd feel better if we could limit it to a parallelism of 16 or something.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hunting around the codebase and I just might not have found it yet -- is there a generic bottleneck / rate limiting / parallelism-limited async map somewhere in rush-lib that I can use?
@elliot-nelson Weirdly no, I can't seem to find one. (Which is weird -- I'm sure someone encountered this before!) The idiomatic pattern would be an API something like:
import { Async } from '@rushstack/node-core-library';
// (or whatever the right API design is)
Async.throttle(3,
[
() => doAsyncThing(),
() => doAsyncThing(),
() => doAsyncThing(),
() => doAsyncThing(),
() => doAsyncThing(),
() => doAsyncThing(),
]
);
If you want to recommend a high quality throttling NPM package, we could use this PR to add a @beta
API in @rushstack/node-core-library
that wraps it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put up a PR with an alternate pretty quick option -- #2665.
@@ -92,7 +110,16 @@ export class PackageChangeAnalyzer { | |||
| RushConfigurationProject | |||
| undefined = this._rushConfiguration.findProjectForPosixRelativePath(filePath); | |||
if (owningProject) { | |||
projectHashDeps.get(owningProject.packageName)!.set(filePath, fileHash); | |||
const relativePath: string = path.relative(owningProject.projectRelativeFolder, filePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend just slicing off the prefix, since owningProject
will only be defined if filePath literally starts with projectRelativeFolder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it keeps all paths normalized to POSIX slashes (which git returns, and which projectRelativeFolder has been normalized to).
@@ -92,7 +110,16 @@ export class PackageChangeAnalyzer { | |||
| RushConfigurationProject | |||
| undefined = this._rushConfiguration.findProjectForPosixRelativePath(filePath); | |||
if (owningProject) { | |||
projectHashDeps.get(owningProject.packageName)!.set(filePath, fileHash); | |||
const relativePath: string = path.relative(owningProject.projectRelativeFolder, filePath); | |||
const ignoreMatchers: IMinimatch[] | undefined = ignorePatternsByProject.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a dependency on ignore
, which can run this like it is processing a .gitignore file and theoretically optimize a bit.
Updated PR title and body, and made following changes:
Took PR out of FEEDBACK state, as I think it is ready to go except for the suggested improvement of loading |
Latest cleanup commit -- load project configuration files in parallel, using the new At this point I believe all feedback has been addressed, let me know if any other concerns remain. |
@@ -29,9 +31,12 @@ export class PackageChangeAnalyzer { | |||
this._git = new Git(this._rushConfiguration); | |||
} | |||
|
|||
public getPackageDeps(projectName: string): Map<string, string> | undefined { | |||
public async getPackageDeps( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This repo uses a suffix to indicate async methods, so generally a function like this should be renamed to getPackageDepsAsync()
when converting. The same applies to other functions modified by this same PR.
): Promise<Ignore> { | ||
const projectConfiguration: | ||
| RushProjectConfiguration | ||
| undefined = await RushProjectConfiguration.tryLoadForProjectAsync(project, undefined, terminal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Ideally this should be cached somewhere, but we don't need to solve that in this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want to do that here; this is a wrapper for the heft-config-file
library which does is own internal caching of all config files you load with it, and it seems like that's the best way to do it.
(I did file a related ticket, #2691, that will improve the caching further.)
projectHashDeps.get(owningProject.packageName)!.set(filePath, fileHash); | ||
const relativePath: string = filePath | ||
.replace(owningProject.projectRelativeFolder, '') | ||
.replace(/^\//, ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a code comment explaining what transformation is happening here?
It might be helpful to include a realistic example of the input/output, something like:
// Example:
// filePath = category/project/src/thing.ts
// projectRelativeFolder = category/project
// relativePath = src/thing.ts
const relativePath: string = filePath
.replace(owningProject.projectRelativeFolder, '')
.replace(/^\//, '');
It seems obvious to me now, but I had to run this code in the debugger to figure out what it is doing. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correctness of replace()
hinges on the assumption that filePath
will always contain projectRelativeFolder
, but it's difficult to prove that, since filePath
and projectRelativeFolder
come from fairly different subsystems.
Here's a couple ideas to protect against that:
- Use
path.relative()
to operate on path objects rather than text strings; OR - First test whether
filePath
starts withprojectRelativeFolder
and throw anInternalError
if that not. The assumption is probably safe, but if not it would be nice to hear about it so we can fix it.
(BTW I tried some experiments to see if I could break this logic, like altering the case of a project folder on Windows. But the LookupByPath
API malfunctions first preventing this code from running. Maybe I'm being overly defensive -- it's just if replace()
did fail to match, the behavior might be pretty difficult to understand. 🙂)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base logic here is that, since LookupByPath
cannot return a matching project for a path that does not literally begin with projectRelativeFolder
, all paths must necessarily begin with said string prefix. Personally I'd just blindly slice the expected length off of the start of the string, because that is how I prefer to do path manipulation (as quickly as possible, with a minimum of fuss).
Actually the .slice()
call is safer here, because, contrary to what my memory had said, I only normalize the projectRelativeFolder
to /
when handing it off to LookupByPath
, so you can technically (incorrectly) use \\
inside of your paths in rush.json and weird things will happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split the difference by adding a comment and converting it to use @dmichon-msft 's suggested logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I am content that sufficient paranoia has been applied here heheh
In the repo I work on, this change has increased the run time of |
Summary
Implementation of #2618. The intent is to allow each project (in its
rush-project.json
file) to optionally specify glob patterns that should be ignored when calculating project state, which in turn, will allow those files to be ignored for the purposes of incremental build watching, cloud build cache hits, etc.The new option is called
incrementalBuildIgnoredGlobs
, and is interpreted as a dot-ignore file.Details
PR needs a little cleanup, but it does work (tested locally on our monorepo here and does exactly what I was hoping). Hoping to get feedback before I finish up the PR, to make sure this approach makes sense.
The new model of
ConfigurationFile
, which helps define riggable / extendable configuration files, is allasync
-based and takes aTerminal
for possible warning output; this means that anything above it that uses it also needs to be async and to accept a Terminal. This requires a little bit of hacking on several chains (likeProjectWatcher
->ProjectBuildCache
->PackageChangeAnalyzer
etc.).It's worth noting that
ProjectBuildCache()
always accepted aRushProjectConfiguration
that was previously loaded, but, that wasn't that useful in this situation. That config is for the project for which a hash is being calculated; if, for example,A
is dependent onB
, ifB
specifieddisableBuildCache
in its project configuration, that doesn't stopA
from calculating the files inB
as part of its hash. But for this new feature, we do needA
to ignore files thatB
ignores, so we need to load the project configuration for every project in the repo essentially. (Or, possibly, load them on the fly as we hit projects we haven't loaded the project configuration for yet; I opted for the former approach for simplicity but we can enhance that.)How it was tested
Currently tested in our current monorepo, for example, with project A depending on B:
ignoreChangedFiles: "*.md"
to B'srush-project.json
.B/src/index.ts
andrush build
; rebuilds B and A as expected.B/README.md
andrush build
; B and A get cache hits as expected.ignoreChangedFiles: "*.png"
to A'srush-project.json
.A/logo.png
andrush build
; A gets cache hit as expected.B/logo.png
andrush build
; rebuilds B and A as expected (A is ignoring PNGs, but B is not)