-
-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Go source excludes; Cleanup old filespec matching #4350
Conversation
SourceField.relative_to_buildroot
respect target spec; Fix Go source excludes
@wisechengyi : Sorry for the delayed review. This looks correct, but there is one more improvement while you're in here: This method is used (recursively) from https://github.com/pantsbuild/pants/blob/master/src/python/pants/build_graph/source_mapper.py#L61 : if you push the The v2 engine only creates |
# This skips dirents. | ||
globs('*/')]) | ||
# skip subdir contents | ||
globs('*/**')]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this is a legal glob... should it be */**/*
? Should check that it still compiles in latest master now that v2 is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test case added to prove the case.
src/python/pants/source/filespec.py
Outdated
@@ -16,7 +16,7 @@ def glob_to_regex(pattern): | |||
:returns: A regex string that matches same paths as the input glob does. | |||
""" | |||
out = ['^'] | |||
components = pattern.strip('/').replace('.', '[.]').split('/') | |||
components = pattern.strip('/').replace('.', '[.]').replace('$', '[$]').split('/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might make sense to consider a more general escape mechanism. This covers $
, but there are other non-glob regex characters that can appear in paths.
Initially I was thinking we could just drop re.escape
in here, but because we're actually trying to parse out the glob info, I don't think that would work.
Maybe on line: 37 https://github.com/pantsbuild/pants/pull/4350/files#diff-2d528bdfb8ee22a56a1b0660753b7a3cR37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been operating under the assumption that this code doesn't need to live very long with the v2 engine on the way. But if it does, this should probably switch to pathspec
, which we already have a dep on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time when this logic was introduced, pathspec was still using gitignore syntax which is different from glob matching syntax. I am not sure whether pathspec has evolved to support glob matching syntax since.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Tried pathspec
, and it doesn't fit our needs.
|
@@ -91,6 +91,11 @@ def files(self): | |||
def files_hash(self): | |||
return self._files_hash | |||
|
|||
def iter_relative_paths(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to call this paths_from_buildroot_iter()
(and below).
relative
is a vague term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex improvement looks good to me. I am fine with leave it as it is, but I will listen to other reviewers' opinion on escape mechanism.
"""An alternative `__iter__` that joins files with the relative root.""" | ||
for f in self: | ||
path_from_buildroot = os.path.join(self.rel_root, f) | ||
if matches_filespec(path_from_buildroot, self.filespec): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant. The files in the collection are already guaranteed to match the filespec: the filespec is just metadata about how the fileset was constructed.
The point of the SpecAddressMapper
is to not expand the fileset, and to just use the filespec
instead. This was based on the assumption that all FilesetWithSpec
implementations were lazy.
What I was suggesting was an abstract method on FilesetWithSpec
something like:
def matches(self, paths):
"""Returns the from the given set that match this FilesetWithSpec."""
...and the implementation for Eager
would be to just look in the fileset, while the implementation for Lazy
would be to compile the regex from the filespec and use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected. Pushed matches
to FilesetWithSpec
, and removed all the regex logic.
SourceField.relative_to_buildroot
respect target spec; Fix Go source excludes
As discussed offline, this now removes the optimization represented by https://github.com/pantsbuild/pants/blob/master/src/python/pants/build_graph/source_mapper.py#L21 ... that optimization was only relevant to But please either clean up or remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Yi.
### Problem go sources globs were defined with incorrect excludes. ### Other Change * Cleanup old filespec matching * Add actual files during tests due to slight behavior change (from regex match to actual file match)
### Problem go sources globs were defined with incorrect excludes. ### Other Change * Cleanup old filespec matching * Add actual files during tests due to slight behavior change (from regex match to actual file match)
### Problem #4529 There are 2 kinds of "changed" tasks in pants currently. Neither of them output correct result if a file is deleted. 1. "./pants changed" tasks. They used to have the correct output, but after #4350 , the logic of file matching in SpecSourceMapper was changed. We used to do a regular expression matching to check if the modified file is owned by a target. Now we check if a modified file is in a target's source file set. For a deleted file, it will not be in any target's source file set, thus no target will be considered as the owner of the deleted file, giving incorrect result. 2. "./pants --changed-XXX list". This uses EngineSourceMapper which implements similar logic as above. It never works on deleted files. ### Solution I restored the removed file matching logic (using regular expression). In both SpecSourceMapper and EngineSourceMapper, first check if a file exists. It it exists, then check if it's in target's file set. If not, then it is a deleted file, and use re matching logic on it. Fixes #4529
Problem
go sources globs were defined with incorrect excludes.
Other Change