libfetchers/git: Support export-ignore#9480
Conversation
|
@edolstra How do you want source filtering input accessors to work? |
|
@roberth #9497 adds Maybe something similar can be done for export-ignore, i.e. wrap the |
|
🎉 All dependencies have been resolved ! |
5d8c99c to
a5f5744
Compare
|
There is also |
2f41fa4 to
1ea9930
Compare
|
|
||
| if (getExportIgnoreAttr(input) | ||
| && getSubmodulesAttr(input)) { | ||
| throw UnimplementedError("exportIgnore and submodules are not supported together yet"); |
There was a problem hiding this comment.
Note that this combination isn't used in any existing expressions, because the previous implementation did not apply export-ignore when submodules were enabled.
There was a problem hiding this comment.
Apart from this check, why don't they work together yet? Given that submodules are also implemented using Git accessors, I would expect the filtering to just work for submodules.
There was a problem hiding this comment.
I've added the comment:
/* In this situation, we don't have a git CLI behavior that we can copy.
`git archive` does not support submodules, so it is unclear whether
rules from the parent should affect the submodule or not.
When git may eventually implement this, we need Nix to match its
behavior. */
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2024-01-08-nix-team-meeting-minutes-114/38156/1 |
edolstra
left a comment
There was a problem hiding this comment.
Some comment, but LGTM overall.
| A Boolean parameter that specifies whether `export-ignore` from `.gitattributes` should be applied. | ||
| This approximates part of the `git archive` behavior. | ||
|
|
||
| Enabling this option is not recommended because it is unknown whether the Git developers commit to the reproducibility of `export-ignore` in newer Git versions. |
There was a problem hiding this comment.
It's a bit odd to say that enabling this option is not recommended, and then having the default as "enabled". Probably better to say "We recommend disabling this option because bla bla".
There was a problem hiding this comment.
It's not enabled for fetchTree though; just for fetchGit.
(And we can't unrecommend fetchGit until this is stable.)
src/libfetchers/git-utils.cc
Outdated
| std::string pathStr {path.rel()}; | ||
| const char * pathCStr = pathStr.c_str(); |
There was a problem hiding this comment.
Better to add a rel_c_str() to CanonPath. I.e.
const char * raw_c_str() const
{ return path.c_str() + 1; }
|
|
||
| if (getExportIgnoreAttr(input) | ||
| && getSubmodulesAttr(input)) { | ||
| throw UnimplementedError("exportIgnore and submodules are not supported together yet"); |
There was a problem hiding this comment.
Apart from this check, why don't they work together yet? Given that submodules are also implemented using Git accessors, I would expect the filtering to just work for submodules.
| } | ||
|
|
||
| bool isAllowed(const CanonPath & path) override { | ||
| return !isExportIgnored(path); |
There was a problem hiding this comment.
What's the performance penalty for export-ignore lookups? Should this be cached? The lazy-trees branch has a CachingFilteringInputAccessor that caches isAllowed(), which might be useful here.
There was a problem hiding this comment.
It takes about 2× as long on my local nixpkgs clone. Not great.
Caching could reduce the overhead by a factor three, so we may expect no better than 1.3× from that.
Using the batch variation of the libgit2 call could bring down the overhead some more though. That makes the lookups more eager, which in turns means that it's not a great match with the CachingFilteringInputAccessor interface ("protected" interface part as it were).
There was a problem hiding this comment.
-
I've picked
CachingFilteringInputAccessorfrom the diff, and that brought it down to a 15% penalty. -
batch variation
I misremembered while libgit2.org was down. It retrieves multiple attrs, not multiple files, so this is not a solution.
-
New idea: turn off filter when
export-ignoreis not present at allI've tried this, but it seems to make fetching Nixpkgs with
fetchGitslower (local repo, no fetcher cache, probably low single digit %) That's probably because I did an extra traversal of the whole repo before returning the accessor.It's possible that a per-directory, on-the-fly approach does go below that 15%, but I don't have a lot of confidence right now.
-
Another possible strategy:
Cache it in a table like(parent_dir, filename, allow_bool), which could be queried quite efficiently for either the whole repo or particular directories when lazy. Nonetheless, there's a risk that it's not much better, and this is a lot more work to pull of, so again I'd say not for this release.
Conclusion: will stop optimizing now.
...with the intention to prevent future regressions in fetchGit
Enabled for fetchGit, which historically had this behavior, among other behaviors we do not want in fetchGit. fetchTree disables this parameter by default. It can choose the simpler behavior, as it is still experimental. I am not confident that the filtering implementation is future proof. It should reuse a source filtering wrapper, which I believe Eelco has already written, but not merged yet.
Intentionally dumb change ahead of architectural improvements.
This will be needed because the accessor will be wrapped, and therefore not be an instance of GitInputAccessor anymore.
Also fingerprint and some preparatory improvements. Testing is still not up to scratch because lots of logic is duplicated between the workdir and commit cases.
efa08de to
469cf26
Compare
Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
Defensively because isRoot() is also defensive.
Motivation
Reintroduce export-ignore processing for fetchGit, which historically had this behavior, among other behaviors that I believe we do not want in fetchGit.
Change
exportIgnoreparameter to the git fetcherfetchTreedisables this parameter by default. It can choose thesimpler behavior, as it is still experimental.
fetchTreetoo!)Nix.fetchGitsurprisingly usesexport-ignoregit attribute #7195fetchGitenables this parameter by default, to approximate legacy behavior.TODO
amwas not confident that the filtering implementation is future proof. It should reuse a source filtering wrapper, such asFilteringInputAccessorfrom Lazy trees #6530. Not a straightforward backport because we don't havevirtual bool isAllowed(const CanonPath & path) = 0;(yet?)exportIgnore(Nix) parameter should be inherited)_extgit attributes functions,or reimplement(no, half of semantics already hardcoded; not worth it). Need to passrevfetchGitdoes not export-ignore when it submodules.Context
Priorities
Add 👍 to pull requests you find important.