Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ToplevelArtifactsDownloader to download toplevel artifacts #16524

Closed
wants to merge 5 commits into from

Conversation

coeuvre
Copy link
Member

@coeuvre coeuvre commented Oct 21, 2022

Previously, with --remote_download_toplevel, Bazel only downloads toplevel outputs during spawn execution. It has a drawback that, if the toplevel targets are changed in a following invocation, but the generating actions are not able to be executed because of skyframe/action cache, the outputs of new toplevel targets are not downloaded.

ToplevelArtifactsDownloader fixes that issue by listening to the TargetCompleteEvent (which is fired every time after the toplevel target is built event if it hit the cache) and download outputs in the background. Additionally, it can listen to more events during the build hence is more flexible to define additional outputs to be downloaded as toplevel outputs.

Working towards #12665.

Fixes #13625.
Fixes #11834.
Fixes #10525.

@coeuvre coeuvre force-pushed the use-toplevel-downloader branch 3 times, most recently from d4953da to 9f0bdf2 Compare October 21, 2022 11:49
@coeuvre coeuvre marked this pull request as ready for review October 21, 2022 13:11
@coeuvre coeuvre requested a review from a team as a code owner October 21, 2022 13:11
@coeuvre coeuvre requested a review from tjgq October 21, 2022 13:11
@@ -221,12 +221,54 @@ EOF
--remote_download_toplevel \
//a:foobar >& $TEST_log || fail "Failed to build //a:foobar"

expect_log "2 processes: 1 remote cache hit, 1 internal"
expect_log "1 process: 1 internal"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so we won't report a fetch as a cache hit unless the action actually ran?

I guess this makes total sense under the guise of "separating action execution from downloading outputs", but now I'm wondering if we should add an additional "x files downloaded" stanza to the log message. Probably not in this PR, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so we won't report a fetch as a cache hit unless the action actually ran?

This is a little bit complex.

Before this change, we decide whether to download outputs of a spawn by checking whether any of its output is included in the set filesToDownload (the set is computed in the afterAnalysis callback in the RemoteModule based on analysis result). If we decided to download the outputs of the spawn, we download all the outputs but do not inject the metadata. It means we let skyframe to track the outputs on the local filesystem. So if we delete the output, skyframe will rerun the generating action, in which case, it will hit the remote cache and re-download the output.

With this change, we always inject metadata for outputs even if we decide to download them later. It means skyframe will use the metadata to track outputs. If we delete the local outputs, the metadata is still there, in which case, it will hit the skyframe cache.

@ShreeM01 ShreeM01 added team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Oct 21, 2022
copybara-service bot pushed a commit that referenced this pull request May 3, 2023
With #16524, we download toplevel artifacts by listening to `TargetCompleteEvent` so that it works for incremental builds (e.g. change toplevel targets, delete downloaded files, etc.).

However, BES sends `TargetComplete` event also by listening to `TargetCompleteEvent` which means at the time when the consumers of BEP received `TargetComplete` events, the outputs might not have been downloaded.

This CL changes the way we download toplevel artifacts from listening `TargetCompleteEvent` to checking whether an output is a toplevel artifact after action execution. It still works for incremental builds because of the RemoteOutputChecker we introduced in 60c9cf3. It doesn't work for BEP yet because the download still happens in the background. A following CL will fix that.

Note that this change makes RemoteOutputChecker depending on *FULL* analysis result in incremental builds because it needs that to determine whether an output is toplevel artifact during action dirtiness check. It's incompatible with skymeld. A way to fix it is described in the code.

PiperOrigin-RevId: 529026614
Change-Id: I6c0c9bf4b3f5bb076f404e46c00bc231789a2820
fweikert pushed a commit to fweikert/bazel that referenced this pull request May 25, 2023
With bazelbuild#16524, we download toplevel artifacts by listening to `TargetCompleteEvent` so that it works for incremental builds (e.g. change toplevel targets, delete downloaded files, etc.).

However, BES sends `TargetComplete` event also by listening to `TargetCompleteEvent` which means at the time when the consumers of BEP received `TargetComplete` events, the outputs might not have been downloaded.

This CL changes the way we download toplevel artifacts from listening `TargetCompleteEvent` to checking whether an output is a toplevel artifact after action execution. It still works for incremental builds because of the RemoteOutputChecker we introduced in bazelbuild@60c9cf3. It doesn't work for BEP yet because the download still happens in the background. A following CL will fix that.

Note that this change makes RemoteOutputChecker depending on *FULL* analysis result in incremental builds because it needs that to determine whether an output is toplevel artifact during action dirtiness check. It's incompatible with skymeld. A way to fix it is described in the code.

PiperOrigin-RevId: 529026614
Change-Id: I6c0c9bf4b3f5bb076f404e46c00bc231789a2820
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
3 participants