Skip to content

Limit time to live for cache unavailability during ci command#201

Closed
dg0yt wants to merge 1 commit intomicrosoft:mainfrom
dg0yt:cache-ttl
Closed

Limit time to live for cache unavailability during ci command#201
dg0yt wants to merge 1 commit intomicrosoft:mainfrom
dg0yt:cache-ttl

Conversation

@dg0yt
Copy link
Contributor

@dg0yt dg0yt commented Sep 28, 2021

Picking up an issue and an idea I discussed in a previous conversation with @BillyONeal:

  • PR CI builds check cache availability only at the beginning of the ci command.
  • After significant changes to vcpkg repo master, many ports are not available in the binary cache.
  • PR CI runs scheduled within a short time frame after master update will thus identify the same (unrelated) ports as not available.
  • However, this information is outdated when a CI run actually reaches the point of installing a particular port (minutes or even hours later) because other PRs may have created the cache artifact in the meantime. If the cache artifact isn't used, CI wastes rare build resources in peak times.

Attaching time-to-live to the binary cache used during CI may allow rechecking availability without overloading the cache provider.
It is not a perfect solution. Without randomization in build order, parallel builds may run into the same set of unavailable ports with lonbg build times.

This PR compiles, but wasn't tested or verified so far.

@BillyONeal
Copy link
Member

BillyONeal commented Sep 28, 2021

I think this is unlikely to change much about how the pool operates:

  1. We had this problem before the binary caching scheme tried to remember that things were unavailable
  2. try_restore has no effect on blob storage:
    return RestoreResult::unavailable;
  3. It is unlikely that the availability of a port changes during a prbuild because all the builders are building the same things at the same time. I know once upon a time we tried to address this by randomizing parts of the graph.

We do think this is a problem we would like to do better at; introducing randomized order may help but truly solving it is going to need more cross-node communication. (Or some more creative scheme like looking at what ports are edited with git rather than completing the whole ABI list)

@dg0yt
Copy link
Contributor Author

dg0yt commented Sep 29, 2021

3. It is unlikely that the availability of a port changes during a prbuild because all the builders are building the same things at the same time.

Yes, agreed. That's my "same set of unavailable ports".

Focusing on vcpkg PRs, the first step should be to reduce the set of packages to build.
ATM ci rebuilds all packages which don't have a matching cache entry, including unrelated changes from master.
There should be a step which removes the packages from the build which have identical ABI hashes in the master parent of the merge commit (i.e. HEAD~1), unless needed as a dependency: These packages are unrelated to the PR.

@dg0yt
Copy link
Contributor Author

dg0yt commented Sep 29, 2021

Focusing on vcpkg PRs, the first step should be to reduce the set of packages to build.
ATM ci rebuilds all packages which don't have a matching cache entry, including unrelated changes from master.
There should be a step which removes the packages from the build which have identical ABI hashes in the master parent of the merge commit (i.e. HEAD~1), unless needed as a dependency: These packages are unrelated to the PR.

Local simulation (cache poorly filled):

TRIPLET=x64-osx
REF=042e1db92d115819bba6bffd681a174543111139
git checkout $REF~1
./vcpkg ci --dry-run $TRIPLET | grep ":$TRIPLET: \*" | sed -e 's,^.* : \([a-f0-9]*\)$,/\1/ d,' > parent-changes
wc -l parent-changes
git checkout $REF
./vcpkg ci --dry-run $TRIPLET | grep ":$TRIPLET: \*" > head-changes
wc -l head-changes
cat head-changes | sed -f parent-changes | sed -e 's,^ *\([^:]*\).*,\1,' > install-list
wc -l install-list
./vcpkg install --dry-run @install-list | grep ' ->' > actually-installed
wc -l actually-installed

with the following output:

Previous HEAD position was 042e1db92 [vcpkg_cmake_config_fixup] Fix variable name (#20380)
HEAD is now at 5d1839466 [redis_plus_plus] Update to 1.3.0 (#20384)
    1644 parent-changes
Previous HEAD position was 5d1839466 [redis_plus_plus] Update to 1.3.0 (#20384)
HEAD is now at 042e1db92 [vcpkg_cmake_config_fixup] Fix variable name (#20380)
    1629 head-changes
     545 install-list
     878 actually-installed

showing the number of packages going down from 1629 to 545 (affected by PR) / 878 (totally installed). As expected with dependencies, this is still much more than the number of packages actually using vcpkg_cmake_config_fixup:

$ grep -lR vcpkg_cmake_config_fixup ports | wc -l
     229

This would also increase the robustness of PR CI results with regard to errors in unrelated ports.

(Edit: Explicitly include the effect of vcpkg install @install-list)

@ras0219-msft
Copy link
Collaborator

You might find #167 useful, which will directly write out the ci check results to a json doc so you can use jq & friends

@dg0yt
Copy link
Contributor Author

dg0yt commented Sep 29, 2021

You might find #167 useful, which will directly write out the ci check results to a json doc so you can use jq & friends

Thanks. I saw that PR. Machine readable output is welcome. And I even looked at the jq intro. But is it installed on CI machines?

But will the idea work: skip "unrelated" hashes for PR CI?

@ras0219-msft
Copy link
Collaborator

I think the premise is good -- however for the actual implementation I think the best approach would be something like:

$ ./bootstrap-vcpkg
$ git checkout HEAD~1
$ ./vcpkg ci --output-hashes=parent_hashes.json --dry-run
$ git checkout HEAD@{1}
$ ./vcpkg ci --with-parent-hashes=parent_hashes.json

Then, the ci command can include the parent hashes in its delta calculation it's already performing w.r.t. the binary cache. We may want to modify dry-run to skip checking the binary cache in order to avoid hitting it twice.

@dg0yt
Copy link
Contributor Author

dg0yt commented Sep 29, 2021

We may want to modify dry-run to skip checking the binary cache in order to avoid hitting it twice.

I see. My previours script ended with an install command, so it would hit the cache even three times. But we can leave the cache checking entirely to the install command: We only need the hashes from the ci command.

TRIPLET=x64-osx             
REF=042e1db92d115819bba6bffd681a174543111139
git checkout $REF~1 -- .
./vcpkg ci --dry-run --no-binarycaching $TRIPLET | grep ":$TRIPLET: \*" | sed -e 's,^.* : \([a-f0-9]*\)$,/\1/ d,' > parent-hashes
wc -l parent-hashes
git checkout $REF -- .
./vcpkg ci --dry-run --no-binarycaching $TRIPLET | grep ":$TRIPLET: \*" > head-hashes
wc -l head-hashes
cat head-hashes | sed -f parent-hashes | sed -e 's,^ *\([^:]*\).*,\1,' > install-list
wc -l install-list
./vcpkg install --dry-run @install-list | grep ' ->' > actually-installed
wc -l actually-installed

Output:

    1667 parent-hashes
    1667 head-hashes
     560 install-list
     893 actually-installed

(This is a few ports more, just because I have them in cache which is now ignored for determining head-hashes.)

And now correct me if I'm wrong:
vcpkg install doesn't check the cache early but only before actually installing a package - meaning it can benefit from the cache being filled by parallel PR builds?

@dg0yt
Copy link
Contributor Author

dg0yt commented Oct 3, 2021

Closing. The parent-hashes idea is submitted in #210 now.

@dg0yt dg0yt closed this Oct 3, 2021
@dg0yt dg0yt deleted the cache-ttl branch May 18, 2025 11:05
@dg0yt dg0yt restored the cache-ttl branch May 18, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants