Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merkle tree cache doesn't cache tree artifacts #17923

Closed
coeuvre opened this issue Mar 30, 2023 · 0 comments
Closed

Merkle tree cache doesn't cache tree artifacts #17923

coeuvre opened this issue Mar 30, 2023 · 0 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@coeuvre
Copy link
Member

coeuvre commented Mar 30, 2023

Description of the bug:

When --experimental_remote_merkle_tree_cache is set, Bazel will cache the non-leaf nodes in the NestedSet when building merkle tree. However, tree artifacts are leaf nodes in the NestedSet so they are never cached. For some builds, one tree artifact could contain hundreds of thousands files, caching tree artifact could improve the performance.

Related #17804.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

No response

What is the output of bazel info release?

No response

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@coeuvre coeuvre added team-Remote-Exec Issues and PRs for the Execution (Remote) team P1 I'll work on this now. (Assignee required) and removed untriaged labels Mar 30, 2023
@coeuvre coeuvre assigned coeuvre and tjgq and unassigned sgowroji and ShreeM01 Mar 30, 2023
tjgq added a commit to tjgq/bazel that referenced this issue Mar 30, 2023
Currently, a large tree artifact cannot benefit from the Merkle tree cache if
it always appears on a nested set together with other (unique per-action)
files.

To improve this, modify SpawnInputExpander to treat the tree as a distinct
node in the input hierarchy that can be cached separately.

Also simplify the cache keys for filesets and runfiles, since the
SpawnInputExpander is a per-build singleton, and this cache is only shared by
actions within a single build.

Fixes bazelbuild#17923.
tjgq added a commit to tjgq/bazel that referenced this issue Apr 4, 2023
Currently, a large tree artifact cannot benefit from the Merkle tree cache if
it always appears on a nested set together with other (unique per-action)
files.

To improve this, modify SpawnInputExpander to treat the tree as a distinct
node in the input hierarchy that can be cached separately.

Also simplify the cache keys for filesets and runfiles, since the
SpawnInputExpander is a per-build singleton, and this cache is only shared by
actions within a single build.

Progress on bazelbuild#17923.
copybara-service bot pushed a commit that referenced this issue Apr 5, 2023
Currently, a large tree artifact cannot benefit from the Merkle tree cache if
it always appears on a nested set together with other (unique per-action)
files.

To improve this, modify SpawnInputExpander to treat the tree as a distinct
node in the input hierarchy that can be cached separately.

Also simplify the cache keys for filesets and runfiles, since the
SpawnInputExpander is a per-build singleton, and this cache is only shared by
actions within a single build.

Progress on #17923.

Closes #17929.

PiperOrigin-RevId: 522039585
Change-Id: Ia4f2603325acfd4400239894214f2884a71d69cf
tjgq added a commit to tjgq/bazel that referenced this issue Apr 5, 2023
Currently, it's possible for concurrent actions to end up computing the same
Merkle tree, even when the cache is enabled. This change makes it so that a
later action waits for the completion of the computation started by an earlier
action.

Progress on bazelbuild#17923.
ShreeM01 pushed a commit to ShreeM01/bazel that referenced this issue Apr 5, 2023
Currently, a large tree artifact cannot benefit from the Merkle tree cache if
it always appears on a nested set together with other (unique per-action)
files.

To improve this, modify SpawnInputExpander to treat the tree as a distinct
node in the input hierarchy that can be cached separately.

Also simplify the cache keys for filesets and runfiles, since the
SpawnInputExpander is a per-build singleton, and this cache is only shared by
actions within a single build.

Progress on bazelbuild#17923.

Closes bazelbuild#17929.

PiperOrigin-RevId: 522039585
Change-Id: Ia4f2603325acfd4400239894214f2884a71d69cf
tjgq added a commit to tjgq/bazel that referenced this issue Apr 6, 2023
Currently, it's possible for concurrent actions to end up computing the same
Merkle tree, even when the cache is enabled. This change makes it so that a
later action waits for the completion of the computation started by an earlier
action.

Progress on bazelbuild#17923.
copybara-service bot pushed a commit that referenced this issue Apr 6, 2023
Currently, it's possible for concurrent actions to end up computing the same Merkle tree, even when the cache is enabled. This change makes it so that a later action waits for the completion of the computation started by an earlier action.

Progress on #17923.

Closes #17995.

PiperOrigin-RevId: 522319291
Change-Id: I68ab952ed6357027ec71a67a104f91a684a7a040
ShreeM01 added a commit that referenced this issue Apr 6, 2023
Currently, a large tree artifact cannot benefit from the Merkle tree cache if
it always appears on a nested set together with other (unique per-action)
files.

To improve this, modify SpawnInputExpander to treat the tree as a distinct
node in the input hierarchy that can be cached separately.

Also simplify the cache keys for filesets and runfiles, since the
SpawnInputExpander is a per-build singleton, and this cache is only shared by
actions within a single build.

Progress on #17923.

Closes #17929.

PiperOrigin-RevId: 522039585
Change-Id: Ia4f2603325acfd4400239894214f2884a71d69cf

Co-authored-by: Tiago Quelhas <[email protected]>
ShreeM01 pushed a commit to ShreeM01/bazel that referenced this issue Apr 6, 2023
Currently, it's possible for concurrent actions to end up computing the same Merkle tree, even when the cache is enabled. This change makes it so that a later action waits for the completion of the computation started by an earlier action.

Progress on bazelbuild#17923.

Closes bazelbuild#17995.

PiperOrigin-RevId: 522319291
Change-Id: I68ab952ed6357027ec71a67a104f91a684a7a040
ShreeM01 added a commit that referenced this issue Apr 11, 2023
Currently, it's possible for concurrent actions to end up computing the same Merkle tree, even when the cache is enabled. This change makes it so that a later action waits for the completion of the computation started by an earlier action.

Progress on #17923.

Closes #17995.

PiperOrigin-RevId: 522319291
Change-Id: I68ab952ed6357027ec71a67a104f91a684a7a040

Co-authored-by: Tiago Quelhas <[email protected]>
@tjgq tjgq closed this as completed Apr 13, 2023
fweikert pushed a commit to fweikert/bazel that referenced this issue May 25, 2023
Currently, a large tree artifact cannot benefit from the Merkle tree cache if
it always appears on a nested set together with other (unique per-action)
files.

To improve this, modify SpawnInputExpander to treat the tree as a distinct
node in the input hierarchy that can be cached separately.

Also simplify the cache keys for filesets and runfiles, since the
SpawnInputExpander is a per-build singleton, and this cache is only shared by
actions within a single build.

Progress on bazelbuild#17923.

Closes bazelbuild#17929.

PiperOrigin-RevId: 522039585
Change-Id: Ia4f2603325acfd4400239894214f2884a71d69cf
fweikert pushed a commit to fweikert/bazel that referenced this issue May 25, 2023
Currently, it's possible for concurrent actions to end up computing the same Merkle tree, even when the cache is enabled. This change makes it so that a later action waits for the completion of the computation started by an earlier action.

Progress on bazelbuild#17923.

Closes bazelbuild#17995.

PiperOrigin-RevId: 522319291
Change-Id: I68ab952ed6357027ec71a67a104f91a684a7a040
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

4 participants