Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure repository cloaking #10263

Closed
26 tasks
Tracked by #10258
premun opened this issue Aug 3, 2022 · 12 comments
Closed
26 tasks
Tracked by #10258

Configure repository cloaking #10263

premun opened this issue Aug 3, 2022 · 12 comments
Assignees

Comments

@premun
Copy link
Member

premun commented Aug 3, 2022

Context

As part of the tarball generation process, we are removing files such as binaries or non-required files under a license. We need to either cloak these files via source-mappings.json or bake this step into the VMR assembly process so that these files are in place for source build to work.

Goal

For this step of the tarball generation process, we need to go repo by repo and figure out what files we will need to cloak and should not be part of the VMR.

Progress

  • arcade
  • aspnetcore
  • command-line-api
  • deployment-tools
  • diagnostics
  • format
  • fsharp
  • installer
  • linker
  • llvm-project
  • msbuild
  • nuget-client
  • razor-compiler
  • roslyn
  • roslyn-analyzers
  • runtime
  • sdk
  • source-build-externals
  • source-build-reference-packages
  • sourcelink
  • symreader
  • templating
  • test-templates
  • vstest
  • xdt
  • xliff-tasks
@premun premun changed the title [VMR layout] Configure repository cloaking Configure repository cloaking Aug 4, 2022
@premun
Copy link
Member Author

premun commented Aug 8, 2022

Question

If we remove things such as .dll from the code like we already do for the tarball (https://github.com/dotnet/installer/blob/8f639696e6d57fb09e03e89c6397d913de1231ed/src/SourceBuild/Arcade/tools/SourceBuildArcadeTarball.targets#L246-L265), we might break repo's ability to run tests in the VMR. Should we then really do it or just keep this step?

@premun premun self-assigned this Aug 8, 2022
@premun
Copy link
Member Author

premun commented Aug 8, 2022

I tried to set up cloaking for the VMR, adhering to the current removal rules from current tarball generation which goes as follows:

  1. Prepare a list of files marked for removal by listing all .dll, .Dll, .exe, .pdb, .mdb, .zip, .nupkg files
  2. From the list, remove all under these paths:
    • runtime\src\coreclr\.nuget\
    • runtime*\src\installer\pkg

This effectively preserves two PDB files, e.g. runtime\src\coreclr\.nuget\_.pdb.

Problem

Currently, we cannot do this through the cloaking system which has include and exclude lists. Git takes the include ones and then substracts the exclude ones, whereas here we want another layer - "exclude files from the exclude filter".

Proposed solution

To solve this, we could use the .gitattributes file and give the final say in what gets included to the hands of the individual repos.
We could let the repo assign vmr-keep and vmr-ignore attributes which would override the globs from source-mappings.json.

From this, we could always only include files without the ignore attribute and then "not exclude" any with the keep attribute.
For instance, we could resolve current problem like this:

dotnet/runtime > .gitattributes

src/coreclr/**/*.pdb       vmr-keep
src/installer/**/*.pdb     vmr-keep

VMR > source-mappings.json

{
    "defaults": {
        "defaultRef": "main",
        "exclude": [
            "**/*.dll",
            "**/*.Dll",
            "**/*.exe",
            "**/*.pdb",
            "**/*.mdb",
            "**/*.zip",
            "**/*.nupkg"
        ]
    },
    "mappings": [
        {
            "name": "runtime",
            "defaultRemote": "https://github.com/dotnet/runtime"
        }
    ]
}

This would give us a list of filters such as this:

  • :(glob,attr:!vmr-ignore)**/*
  • :(exclude,glob,attr:!vmr-keep)**/*.dll
  • :(exclude,glob,attr:!vmr-keep)**/*.Dll
  • :(exclude,glob,attr:!vmr-keep)**/*.exe
  • :(exclude,glob,attr:!vmr-keep)**/*.pdb
  • :(exclude,glob,attr:!vmr-keep)**/*.mdb
  • :(exclude,glob,attr:!vmr-keep)**/*.zip
  • :(exclude,glob,attr:!vmr-keep)**/*.nupkg

@premun
Copy link
Member Author

premun commented Aug 8, 2022

@mmitche thoughts on the proposal above? Is this feasible?

@mmitche
Copy link
Member

mmitche commented Aug 8, 2022

How does the cloaking relate to git as it's currently implemented today? Is git used to do the cloaking or are we implementing it ourselves?

The gitattributes idea is interesting. It does look like .gitattributes allows for arbitrary attributes to be applied to a file. It is a little non-intuitive that the inclusion/exclusion metadata would be specified across multiple files, though I think well placed documentation and comments could avoid confusion there.

The other option might be to say that inclusions always filter off the full possible file set, while exclusions remove off the current keep set, then use a list of globs to generate the patterns. So the equivalent would be:

[
  {
    "type" : "include",
    "pattern" : "**/*"
  },
  {
    "type" : "exclude",
    "pattern" : "**/*.dll"
  },
  {
    "type" : "exclude",
    "pattern" : "**/*.pdb"
  },
  {
    "type" : "exclude",
    "pattern" : "**/*.Dll"
  },
  {
    "type" : "include",
    "pattern" : "src/coreclr/**/*.pdb"
  }
]

@mmitche
Copy link
Member

mmitche commented Aug 8, 2022

Question

If we remove things such as .dll from the code like we already do for the tarball (https://github.com/dotnet/installer/blob/8f639696e6d57fb09e03e89c6397d913de1231ed/src/SourceBuild/Arcade/tools/SourceBuildArcadeTarball.targets#L246-L265), we might break repo's ability to run tests in the VMR. Should we then really do it or just keep this step?

Yeah, that might be an issue. It also sounds like another issue we might run into; what if we find files that are required for a build of one platform but are prohibited under another? Basically, required in one scenario but disallowed in another.

In this case, it a dll is required for testing, but disallowed for Linux source build. Maybe this won't happen, maybe it will. Similar cases might be things like:

  • Binary required for testing but needs to be removed for Linux source build. Can it live in the VMR and be filtered later? Maybe instead we just say that X Y and Z tests won't be able to run in the VMR?
  • Source file required for Windows build has a license that does not allow its inclusion in Linux source build.

@MichaelSimons Do you think these scenarios may show up? Is it required that the repository that we hand to our partners already be filtered?

/cc @richlander

@richlander
Copy link
Member

I'm likely missing something basic. I was thinking that cloaking would be oriented on source not binaries. Can you describe the flow, both leading up to and including the cloaking?

@premun
Copy link
Member Author

premun commented Aug 9, 2022

@richlander looking at how we generate the tarball today, it seems like there are 2 "fake" PDB files in dotnet/runtime that have to make it to the VMR for runtime to build.
Example: https://github.com/dotnet/runtime/blob/main/src/coreclr/.nuget/_.pdb (just an empty file).
Using MSBuild, it's easy to list all PDBs and then handpick these two out of there, but not as much with the git setup we have.
It's only two files and their empty, so unlikely to change and we could just copy them to the VMR manually but I feel like we should still be able to have this kind of flexibility built-in in the VMR sync as more cases can come easily and we could avoid problems.

The other question about breaking repo tests without checked-in binaries is connected to things like unit tests that rely on some mock artifacts, e.g. Arcade SDK has targets for manipulating nupkgs and so the unit test projects have some nupkgs checked in which serve as test subjects.
Example: https://github.com/dotnet/arcade/blob/main/src/Microsoft.DotNet.Build.Tasks.Feed.Tests/TestInputs/Nupkgs/test-package-a.1.0.0.nupkg

@mmitche
How does the cloaking relate to git as it's currently implemented today? Is git used to do the cloaking or are we implementing it ourselves?

At the moment, we are actually super lucky, things are as simple as they can possibly be. We just build one git diff command and we call one git apply command and that's it. We are able to take the usual globs and create filters:
https://github.com/dotnet/arcade-services/blob/fbaae548a629b404baef3c8c9683a595dce9241f/src/Microsoft.DotNet.Darc/src/DarcLib/VirtualMonoRepo/VmrManager.cs#L356-L376

Example command would be:

git -C "./runtime/.git" diff --patch --binary --output "./runtime.patch" \
4b825dc642cb6eb9a060e54bf8d69288fbee4904..7320d777078a9e73874308f30800eacbeb459368 -- \
    ':(glob,attr:!vmr-ignore)**/*'             \ # add all files (there could be several of these)
    ':(exclude,glob,attr:!vmr-keep)**/*.dll'   \ # substract using the exclude magic keyword
    ':(exclude,glob,attr:!vmr-keep)**/*.Dll'   \ # this example already includes the .gitattributes idea from above
    ':(exclude,glob,attr:!vmr-keep)**/*.exe'   \
    ':(exclude,glob,attr:!vmr-keep)**/*.pdb'   \
    ':(exclude,glob,attr:!vmr-keep)**/*.mdb'   \
    ':(exclude,glob,attr:!vmr-keep)**/*.zip'   \
    ':(exclude,glob,attr:!vmr-keep)**/*.nupkg'

What git does is it adds all pathspecs into a set, then substracts all exclude pathspecs from that set. So no layering there. It creates a pure regular diff patch that we then apply to the right path.

I like the proposal with an explicit ordered list of include/exclude operations but I don't see how to apply it on the git operations.
Please note that this way git handles automatically files that were for example created in an excluded folder and after N commits were moved into an included one (and vice versa).

To be honest, I kind of like the idea of having the final say in the repo as the repo understands this concern well but it also leaks the knowledge about the existence of the VMR into the repo (though I believe we will not completely avoid that in the future anyway).

The above seems to work though I need a bit more time as I have some problems with git and applying this to a large set of files (it works fine on smaller sets, hopefully this is not a git bug).

@mmitche
Copy link
Member

mmitche commented Aug 9, 2022

To be honest, I kind of like the idea of having the final say in the repo as the repo understands this concern well but it also leaks the knowledge about the existence of the VMR into the repo (though I believe we will not completely avoid that in the future anyway).

I think we're likely to have some info about "last synced sha" or other such info in the individual repos.

The above seems to work though I need a bit more time as I have some problems with git and applying this to a large set of files (it works fine on smaller sets, hopefully this is not a git bug).

Alright, given that this is simple, native git functionality, I think using gitattributes is a reasonable way to go until we know otherwise. If you start getting into other corner cases, it may be worth looking in a different direction.

@premun
Copy link
Member Author

premun commented Aug 10, 2022

Alright, I made it work natively, without corner cases, seems to do just what we need.

What about the attribute names? We need two attributes - positive and negative, e.g. "ignore" and "preserve". They don't necessarily need to be "vmr" but can be something in the sense of "part of product".

It can also be one attribute and have different values (which might be better to make it clear only one is allowed). Example:

src/coreclr/**/*.pdb                              vmr=preserve
src/native/external/llvm-libunwind/LICENSE.TXT    vmr=ignore

EDIT: The diff filter when creating patches cannot do "attribute is not equal" so we cannot do !vmr=preserve or !vmr!=preserve but would have to do !vmr-preserve only unfortunately.

@premun
Copy link
Member Author

premun commented Aug 10, 2022

@mmitche I wonder if this can eventually be a solution for the platform-specific files?

E.g.

platform-specific=osx,linux

It could then be quite easy to synchronize these files between the actual VMRs / branches so that for example downstream RedHat would have a simple way to pull upstream minus win/osx files? (just a quick idea)

It's also possible to create filters (sets of attributes) and assign those:

[filter "win"]
	vmr-preserve
	platform-specific-win

*.cs	filter=win

@mmitche
Copy link
Member

mmitche commented Aug 10, 2022

@mmitche I wonder if this can eventually be a solution for the platform-specific files?

E.g.

platform-specific=osx,linux

It could then be quite easy to synchronize these files between the actual VMRs / branches so that for example downstream RedHat would have a simple way to pull upstream minus win/osx files? (just a quick idea)

It's also possible to create filters (sets of attributes) and assign those:

[filter "win"]
	vmr-preserve
	platform-specific-win

*.cs	filter=win

That's a possibility, but I I'd like to avoid different VMR content per platform if possible, until we absolutely need to.

premun added a commit to dotnet/arcade-services that referenced this issue Aug 11, 2022
This enables the individual repos to have a final say in which files are included/excluded from the VMR by setting git attributes on files named `vmr-preserve` and `vmr-ignore`.

As an example, we are generally removing all `.pdb` files, but there is a file `runtime/src/coreclr/.nuget/_.pdb` which is required. Instead of us having to tiptoe around this file in the `source-mapping.json` file which would get unmanageable pretty fast, we can add a following file into `dotnet/runtime`:

**dotnet/runtime** > **src/coreclr/.nuget/.gitattributes**
```
# we could also use globs such as **/*.pdb
_.pdb    vmr-preserve
```

This file would then not get ignored by the `**/*.pdb` generic filter we set for all repositories.

dotnet/arcade#10263
@premun
Copy link
Member Author

premun commented Oct 12, 2022

We have reach file parity between the tarball and the VMR after dotnet/installer#14706 so closing this

@premun premun closed this as completed Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants