Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Go source excludes; Cleanup old filespec matching #4350

Merged
merged 30 commits into from
Mar 27, 2017

Conversation

wisechengyi
Copy link
Contributor

@wisechengyi wisechengyi commented Mar 21, 2017

Problem

go sources globs were defined with incorrect excludes.

Other Change

  • Cleanup old filespec matching
  • Add actual files during tests due to slight behavior change (from regex match to actual file match)

@wisechengyi wisechengyi changed the title Getting sources in SourceField should respect target spec Have getting sources from SourceField respect target spec; Fix Go source excludes Mar 21, 2017
@wisechengyi wisechengyi changed the title Have getting sources from SourceField respect target spec; Fix Go source excludes Have SourceField.relative_to_buildroot respect target spec; Fix Go source excludes Mar 21, 2017
@wisechengyi wisechengyi mentioned this pull request Mar 21, 2017
@stuhood
Copy link
Member

stuhood commented Mar 23, 2017

@wisechengyi : Sorry for the delayed review. This looks correct, but there is one more improvement while you're in here:

This method is used (recursively) from https://github.com/pantsbuild/pants/blob/master/src/python/pants/build_graph/source_mapper.py#L61 : if you push the match implementation down onto Filespec (https://github.com/pantsbuild/pants/blob/master/src/python/pants/source/wrapped_globs.py#L21), you can have EagerFilesetWithSpec do a lookup in its file list, and LazyFilesetWithSpec use this implementation.

The v2 engine only creates EagerFilesetWithSpec, because it is able to cache them in the daemon. So if you encounter an EagerFilesetWithSpec, it is "free" to use the file list, and definitely cheaper than compiling regexes to match the filespecs.

# This skips dirents.
globs('*/')])
# skip subdir contents
globs('*/**')])
Copy link
Member

@stuhood stuhood Mar 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is a legal glob... should it be */**/*? Should check that it still compiles in latest master now that v2 is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test case added to prove the case.

@@ -16,7 +16,7 @@ def glob_to_regex(pattern):
:returns: A regex string that matches same paths as the input glob does.
"""
out = ['^']
components = pattern.strip('/').replace('.', '[.]').split('/')
components = pattern.strip('/').replace('.', '[.]').replace('$', '[$]').split('/')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to consider a more general escape mechanism. This covers $, but there are other non-glob regex characters that can appear in paths.

Initially I was thinking we could just drop re.escape in here, but because we're actually trying to parse out the glob info, I don't think that would work.

Maybe on line: 37 https://github.com/pantsbuild/pants/pull/4350/files#diff-2d528bdfb8ee22a56a1b0660753b7a3cR37

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been operating under the assumption that this code doesn't need to live very long with the v2 engine on the way. But if it does, this should probably switch to pathspec, which we already have a dep on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time when this logic was introduced, pathspec was still using gitignore syntax which is different from glob matching syntax. I am not sure whether pathspec has evolved to support glob matching syntax since.

Copy link
Contributor Author

@wisechengyi wisechengyi Mar 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Tried pathspec, and it doesn't fit our needs.

@wisechengyi
Copy link
Contributor Author

@wisechengyi : Sorry for the delayed review. This looks correct, but there is one more improvement while you're in here:

This method is used (recursively) from https://github.com/pantsbuild/pants/blob/master/src/python/pants/build_graph/source_mapper.py#L61 : if you push the match implementation down onto Filespec (https://github.com/pantsbuild/pants/blob/master/src/python/pants/source/wrapped_globs.py#L21), you can have EagerFilesetWithSpec do a lookup in its file list, and LazyFilesetWithSpec use this implementation.

The v2 engine only creates EagerFilesetWithSpec, because it is able to cache them in the daemon. So if you encounter an EagerFilesetWithSpec, it is "free" to use the file list, and definitely cheaper than compiling regexes to match the filespecs.

def iter_relative_paths is now an abstract class and LazyFilesetWithSpec has the new implementation.

@@ -91,6 +91,11 @@ def files(self):
def files_hash(self):
return self._files_hash

def iter_relative_paths(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to call this paths_from_buildroot_iter() (and below).

relative is a vague term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

Copy link
Contributor

@JieGhost JieGhost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex improvement looks good to me. I am fine with leave it as it is, but I will listen to other reviewers' opinion on escape mechanism.

"""An alternative `__iter__` that joins files with the relative root."""
for f in self:
path_from_buildroot = os.path.join(self.rel_root, f)
if matches_filespec(path_from_buildroot, self.filespec):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant. The files in the collection are already guaranteed to match the filespec: the filespec is just metadata about how the fileset was constructed.

The point of the SpecAddressMapper is to not expand the fileset, and to just use the filespec instead. This was based on the assumption that all FilesetWithSpec implementations were lazy.

What I was suggesting was an abstract method on FilesetWithSpec something like:

def matches(self, paths):
    """Returns the from the given set that match this FilesetWithSpec."""

...and the implementation for Eager would be to just look in the fileset, while the implementation for Lazy would be to compile the regex from the filespec and use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected. Pushed matches to FilesetWithSpec, and removed all the regex logic.

@wisechengyi wisechengyi changed the title Have SourceField.relative_to_buildroot respect target spec; Fix Go source excludes Fix Go source excludes Mar 25, 2017
@wisechengyi wisechengyi changed the title Fix Go source excludes Fix Go source excludes; Cleanup old filespec matching Mar 25, 2017
@stuhood
Copy link
Member

stuhood commented Mar 27, 2017

As discussed offline, this now removes the optimization represented by https://github.com/pantsbuild/pants/blob/master/src/python/pants/build_graph/source_mapper.py#L21 ... that optimization was only relevant to v1, so I think that I'm ok with this. People should see much more benefit by using the daemon + v2.

But please either clean up or remove SpecSourceMapper, because it no longer uses the spec (and that's ok).

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yi.

@wisechengyi wisechengyi merged commit a7c5542 into pantsbuild:master Mar 27, 2017
@wisechengyi wisechengyi deleted the go_fix_2 branch March 27, 2017 19:27
peiyuwang pushed a commit to peiyuwang/pants that referenced this pull request Mar 29, 2017
### Problem

go sources globs were defined with incorrect excludes.

### Other Change

* Cleanup old filespec matching
* Add actual files during tests due to slight behavior change (from regex match to actual file match)
lenucksi pushed a commit to lenucksi/pants that referenced this pull request Apr 25, 2017
### Problem

go sources globs were defined with incorrect excludes.

### Other Change

* Cleanup old filespec matching
* Add actual files during tests due to slight behavior change (from regex match to actual file match)
JieGhost added a commit that referenced this pull request May 4, 2017
### Problem

#4529 

There are 2 kinds of "changed" tasks in pants currently. Neither of them output correct result if a file is deleted.
1. "./pants changed" tasks. They used to have the correct output, but after #4350 , the logic of file matching in SpecSourceMapper was changed. We used to do a regular expression matching to check if the modified file is owned by a target. Now we check if a modified file is in a target's source file set. For a deleted file, it will not be in any target's source file set, thus no target will be considered as the owner of the deleted file, giving incorrect result.
2. "./pants --changed-XXX list". This uses EngineSourceMapper which implements similar logic as above. It never works on deleted files.

### Solution

I restored the removed file matching logic (using regular expression). In both SpecSourceMapper and EngineSourceMapper, first check if a file exists. It it exists, then check if it's in target's file set. If not, then it is a deleted file, and use re matching logic on it. Fixes #4529
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants