Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--distinct_host_configuration=false causes conflicting actions on bazel test --test_arg #14848

Open
aherrmann opened this issue Feb 17, 2022 · 21 comments
Assignees
Labels
next_month Issues that we will review next month. This is currently mainly used by Configurability team not stale Issues or PRs that are inactive but not considered stale P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug

Comments

@aherrmann
Copy link
Contributor

Description of the problem:

Enabling --distinct_host_configuration=false causes tests to fail to build when --test_arg is set.
Specifically, a bazel test --test_arg=... following a bazel test without --test_arg set fails in this manner.
The test succeeds if both commands do not set --test_arg or if both set the same --test_arg.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Make sure to have bazelisk installed and run the following script in an empty directory.

echo USE_BAZEL_VERSION=5.0.0 >.bazeliskrc
touch WORKSPACE
cat >BUILD.bazel <<EOF
sh_test(
    name = "test",
    srcs = ["test.sh"],
)
EOF
touch test.sh
chmod +x test.sh
cat >.bazelrc <<EOF
test --nocache_test_results
EOF

echo SUCCEEDS
bazel test //:test; bazel test //:test --test_arg=foo

cat >>.bazelrc <<EOF
build --distinct_host_configuration=false
EOF

echo FAILS
bazel test //:test; bazel test //:test --test_arg=foo

Note, this seems to be a regression. Using Bazel version 4.2.2 instead of 5.0.0 does not trigger this issue.

What operating system are you running Bazel on?

Ubuntu 21.04

What's the output of bazel info release?

release 5.0.0

Any other information, logs, or outputs that you want to share?

This issue was discovered here and then stripped down to a minimal repro.

@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug and removed P2 We'll consider working on this in future. (Assignee optional) labels Feb 21, 2022
@meteorcloudy
Copy link
Member

#14294 might be related.

/cc @sdtwigg Please confirm if it's the same issue and what should the priority be.

@gregestren gregestren self-assigned this Mar 25, 2022
@gregestren
Copy link
Contributor

Hi @aherrmann .

I hear your point. Addressing this is a bit nuanced since we're basically deprecating --distinct_host_configuration=false: see 78d0fc9.

For that reason I'm not going to prioritize this issue. But I'd love to hear what value --distinct_host_configuration=false serves for you. If the impact is severe enough this can always be a candidate for a patch fix.

For what it's worth, going forward we'd like to provide other mechanisms to offer the benefits --distinct_host_configuration=false offered. I acknowledge that's just a goal now and we don't have designs or implementations to fulfill that. But we're not necessarily intending to just abandon whatever value --distinct_host_configuration=false offered.

@gregestren gregestren added the P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) label Mar 25, 2022
@meteorcloudy
Copy link
Member

As far as I know, TensorFlow also uses --distinct_host_configuration=false in some of their builds to speed up build time.

See https://cs.opensource.google/tensorflow/tensorflow/+/master:.bazelrc;l=361-364

@aherrmann
Copy link
Contributor Author

aherrmann commented Apr 7, 2022

But I'd love to hear what value --distinct_host_configuration=false serves for you. If the impact is severe enough this can always be a candidate for a patch fix.

@gregestren It's to avoid duplicate builds of the same targets to reduce build times. The use-case is the daml repository. One of the main components in this repository is a compiler which is both built for target to bundle in a release artifact as well as for host (exec really) to compile Daml code (Daml stdlib and integration tests and the like). So, in that use-case, without --distinct_host_configuration=false we would always build a large part of the repository twice, which seems like a waste of resources.

@meteorcloudy
Copy link
Member

meteorcloudy commented Apr 7, 2022

@aherrmann That's a very similar use case as TensorFlow. Not sure if we can make Bazel share action cache between host and exec config when the configuration are essentially the same?

@aherrmann
Copy link
Contributor Author

@meteorcloudy Thanks for the reference, that's good to know.

Not sure if we can make Bazel share action cache between host and exec config when the configuration are essentially the same?

This sounds related to the wider issue in #13281. I'm not sure if this is possible in this case. If it were, then, yes, it may well avoid the duplicate builds.

@bhack
Copy link

bhack commented Jun 7, 2022

I got the same error in Tensorflow in these days:
tensorflow/tensorflow@0f9af91

It's really very sneaky

@meteorcloudy
Copy link
Member

I think 78d0fc9 is going to be a big performance regression for TensorFlow if they switch to Bazel 6.0 since it'll essentially cause most things to be built twice

@mihaimaruseac
Copy link

CC @learning-to-play, as this might be needed to be an FYI ahead of time

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Jul 14, 2022
This didn't matter because we have --distinct_host_configuration=true enabled for Windows, but this flag is an no-op with Bazel@HEAD, see bazelbuild/bazel#14848 (comment)

Fixes #56755

PiperOrigin-RevId: 460956334
@gregestren
Copy link
Contributor

@meteorcloudy acknowledged.

What's the best way for us to evaluate the concern? Can anyone else help frame the problem?

I don't know what the right tool is in our arsenal to address this (there could be a few). But I'm certainly not interested in regressions a major project.

@keithkml
Copy link

In case this is helpful as a data point, we found that 18% of a typical build at Stripe was targets being built a second time in host configuration (before/after being built in target configuration).

@meteorcloudy
Copy link
Member

@gregestren The problem is when the host and target platforms are the same, TensorFlow uses --distinct_host_configuration=false to avoid recompiling 80% of the entire build for host configuration, but after 78d0fc9 this no longer works. So the question is basically: is there a new way to tell Bazel host and target platform can use the same toolchain and cache can also be shared?

@fmeum
Copy link
Collaborator

fmeum commented Jul 25, 2022

Since the exec configuration uses both different flag defaults (e.g. compilation mode and cflags) and a different output directory, it differs substantially from the target configuration. I thus wouldn't expect future solutions to #13281 and related issues to have any impact on the problem TensorFlow is facing.

So the question is basically: is there a new way to tell Bazel host and target platform can use the same toolchain and cache can also be shared?

There is a code path that makes the exec configuration behave like this, but I don't know whether or how it can be triggered. On https://github.com/fmeum/bazel/tree/distinct-exec-configuration, I added a --distinct_exec_configuration flag that explicitly turns the exec transition into a no-op if set to false. If that helps TensorFlow, it would at least be a very simple change to make.

@aiuto
Copy link
Contributor

aiuto commented Jul 25, 2022

One of the problems here is that tensorflow makes the assumption that the host platform is the target platform.
The .bazelrc file magically does that here:
https://github.com/tensorflow/tensorflow/blob/de2e1fe1d809aa91ba5910f0e563b3cd47990ada/.bazelrc#L128
and then in configs, like this;
https://github.com/tensorflow/tensorflow/blob/de2e1fe1d809aa91ba5910f0e563b3cd47990ada/.bazelrc#L295

This is a mistaken use of the option --enable_platform_specific_config. It was intended to change the things that might vary for a user based on which of several platforms they build for. For example, if they need different URLs for RBE based on the OS they are invoking from. The idiomatic bazel usage is to declare each platform, e.g.

build:win <something>
build:macos <something else>

and explicitly build with --config=win or --config=macos.

@mihaimaruseac
Copy link

Oh, so that's what that option does (I still have a TODO to document/remove it). I guess the path forward is to remove and move to build:<os> options as needed

@learning-to-play
Copy link

@MichaelHudgins
Copy link

MichaelHudgins commented Jul 26, 2022

Interesting, I am already doing some tests to dial in TensorFlow's RBE usage. I will throw in some tests to see the full impact of distinct_host_configuration as well removing enable_platform_specific_config

@MichaelHudgins
Copy link

Doubling back on this before i forget, the removal of the distinct_host_configuration flag brought the total number of actions on a TF presubmit from 52112 to 72151 . On an RBE linux build it didn't result in a significant increase in execution time but I did not have the time to verify that with other configs (Notably ones without RBE).

@meteorcloudy
Copy link
Member

There is a code path that makes the exec configuration behave like this, but I don't know whether or how it can be triggered. On https://github.com/fmeum/bazel/tree/distinct-exec-configuration, I added a --distinct_exec_configuration flag that explicitly turns the exec transition into a no-op if set to false. If that helps TensorFlow, it would at least be a very simple change to make.

@fmeum Thanks for pointing out that! I think that sounds like the solution.
@gregestren Do you think adding such a flag is acceptable?

@aiuto
Copy link
Contributor

aiuto commented Aug 22, 2022

@sdtwigg is the expert on this.

@gregestren gregestren added the next_month Issues that we will review next month. This is currently mainly used by Configurability team label Nov 4, 2022
fhanau added a commit to cloudflare/workerd that referenced this issue Jun 13, 2023
Compile substantially all of workerd in the bazel 'target' configuration
instead of 'exec', which allows us to avoid compiling major components
twice.
This should offer a large performance improvement, especially when
rebuilding after a V8 update or building all targets and drive down the
worst-case CI build time significantly.
Bazel supports an exec configuration for building code that needs to run
on the host machine during the build, e.g. tools used to generate source
files or type definitions. When not cross-compiling exec and target will
be the same, but bazel does not offer a flag to use just one configuration
when the configurations are identical (bazelbuild/bazel#14848),
making manual changes necessary.
Bazel appears to offer no way to detect if exec and host are the same, so
this feature is implemented through an opt-in command line flag enabled in
.bazelrc, which makes it possible to easily disable the change when
cross-compilation support is needed.
fhanau added a commit to cloudflare/workerd that referenced this issue Jun 13, 2023
Compile substantially all of workerd in the bazel 'target' configuration
instead of 'exec', which allows us to avoid compiling major components
twice.
This should offer a large performance improvement, especially when
rebuilding after a V8 update or building all targets and drive down the
worst-case CI build time significantly.
Bazel supports an exec configuration for building code that needs to run
on the host machine during the build, e.g. tools used to generate source
files or type definitions. When not cross-compiling exec and target will
be the same, but bazel does not offer a flag to use just one configuration
when the configurations are identical (bazelbuild/bazel#14848),
making manual changes necessary.
Bazel appears to offer no way to detect if exec and host are the same, so
this feature is implemented through an opt-in command line flag enabled in
.bazelrc, which makes it possible to easily disable the change when
cross-compilation support is needed.
fhanau added a commit to cloudflare/workerd that referenced this issue Jun 13, 2023
Compile substantially all of workerd in the bazel 'target' configuration
instead of 'exec', which allows us to avoid compiling major components
twice.
This should offer a large performance improvement, especially when
rebuilding after a V8 update or building all targets and drive down the
worst-case CI build time significantly.
Bazel supports an exec configuration for building code that needs to run
on the host machine during the build, e.g. tools used to generate source
files or type definitions. When not cross-compiling exec and target will
be the same, but bazel does not offer a flag to use just one configuration
when the configurations are identical (bazelbuild/bazel#14848),
making manual changes necessary.
Bazel appears to offer no way to detect if exec and host are the same, so
this feature is implemented through an opt-in command line flag enabled in
.bazelrc, which makes it possible to easily disable the change when
cross-compilation support is needed.
fhanau added a commit to cloudflare/workerd that referenced this issue Jun 13, 2023
Compile substantially all of workerd in the bazel 'target' configuration
instead of 'exec', which allows us to avoid compiling major components
twice.
This should offer a large performance improvement, especially when
rebuilding after a V8 update or building all targets and drive down the
worst-case CI build time significantly.
Bazel supports an exec configuration for building code that needs to run
on the host machine during the build, e.g. tools used to generate source
files or type definitions. When not cross-compiling exec and target will
be the same, but bazel does not offer a flag to use just one configuration
when the configurations are identical (bazelbuild/bazel#14848),
making manual changes necessary.
Bazel appears to offer no way to detect if exec and host are the same, so
this feature is implemented through an opt-in command line flag enabled in
.bazelrc, which makes it possible to easily disable the change when
cross-compilation support is needed.
fhanau added a commit to cloudflare/workerd that referenced this issue Jun 13, 2023
Compile substantially all of workerd in the bazel 'target' configuration
instead of 'exec', which allows us to avoid compiling major components
twice.
This should offer a large performance improvement, especially when
rebuilding after a V8 update or building all targets and drive down the
worst-case CI build time significantly.
Bazel supports an exec configuration for building code that needs to run
on the host machine during the build, e.g. tools used to generate source
files or type definitions. When not cross-compiling exec and target will
be the same, but bazel does not offer a flag to use just one configuration
when the configurations are identical (bazelbuild/bazel#14848),
making manual changes necessary.
Bazel appears to offer no way to detect if exec and host are the same, so
this feature is implemented through an opt-in command line flag enabled in
.bazelrc, which makes it possible to easily disable the change when
cross-compilation support is needed.
fhanau added a commit to cloudflare/workerd that referenced this issue Jun 14, 2023
Compile substantially all of workerd in the bazel 'target' configuration
instead of 'exec', which allows us to avoid compiling major components
twice.
This should offer a large performance improvement, especially when
rebuilding after a V8 update or building all targets and drive down the
worst-case CI build time significantly.
Bazel supports an exec configuration for building code that needs to run
on the host machine during the build, e.g. tools used to generate source
files or type definitions. When not cross-compiling exec and target will
be the same, but bazel does not offer a flag to use just one configuration
when the configurations are identical (bazelbuild/bazel#14848),
making manual changes necessary.
Bazel makes it very difficult to set the config for a target at build
time, so default to using cfg "target" globally for now.
fhanau added a commit to cloudflare/workerd that referenced this issue Jun 15, 2023
Compile substantially all of workerd in the bazel 'target' configuration
instead of 'exec', which allows us to avoid compiling major components
twice.
This should offer a large performance improvement, especially when
rebuilding after a V8 update or building all targets and drive down the
worst-case CI build time significantly.
Bazel supports an exec configuration for building code that needs to run
on the host machine during the build, e.g. tools used to generate source
files or type definitions. When not cross-compiling exec and target will
be the same, but bazel does not offer a flag to use just one configuration
when the configurations are identical (bazelbuild/bazel#14848),
making manual changes necessary.
Bazel makes it very difficult to set the config for a target at build
time, so default to using cfg "target" globally for now.
Copy link

github-actions bot commented Jan 9, 2024

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Jan 9, 2024
@meteorcloudy meteorcloudy added not stale Issues or PRs that are inactive but not considered stale and removed stale Issues or PRs that are stale (no activity for 30 days) labels Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next_month Issues that we will review next month. This is currently mainly used by Configurability team not stale Issues or PRs that are inactive but not considered stale P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug
Projects
None yet
Development

No branches or pull requests

10 participants