Skip to content

For QUICHE integration, add --experimental_remap_main_repo to bazelrc.#5767

Closed
wu-bin wants to merge 1 commit intoenvoyproxy:masterfrom
wu-bin:bazelrc2
Closed

For QUICHE integration, add --experimental_remap_main_repo to bazelrc.#5767
wu-bin wants to merge 1 commit intoenvoyproxy:masterfrom
wu-bin:bazelrc2

Conversation

@wu-bin
Copy link
Contributor

@wu-bin wu-bin commented Jan 30, 2019

Description:

To allow source/extensions/quic_listeners/quiche/platform:quic_platform_impl_lib to depend on envoy build rules, add --experimental_remap_main_repo to bazelrc.

For context, this flag is temporarily necessary in order for the Envoy QUICHE platform implementation to be able to depend on Envoy libraries, without incurring link-time errors. More explanation in this comment. Bazel team's plan for graduating the flag-protected behavior is described here.

Note that the Bazel fix is only available in the most recent Bazel release, 0.22.0, which was released yesterday. @htuch , are there timing considerations around when a PR can assume the most recent version of Bazel?

Risk Level: minimal: build only
Testing: Tested build with PR #5758
Docs Changes: none
Release Notes: none
[Optional Fixes #Issue]
[Optional Deprecated:]

@wu-bin
Copy link
Contributor Author

wu-bin commented Jan 30, 2019

@wu-bin
Copy link
Contributor Author

wu-bin commented Jan 30, 2019

#2557

@mpwarres
Copy link
Contributor

LGTM

/retest

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: asan (failed build)
🔨 rebuilding ci/circleci: tsan (failed build)

🐱

Caused by: a #5767 (comment) was created by @mpwarres.

see: more, trace.

@mpwarres
Copy link
Contributor

Although the error messages appear unrelated, asan and tsan do seem to fail consistently. Bin, could you try repro'ing locally, to see if/how this might be related to the added flag?

@htuch
Copy link
Member

htuch commented Jan 30, 2019

@wu-bin do we build QUICHE by default? I'd like to ensure that for the bulk of Envoy devs, they don't need this flag, even if not using this .bazelrc for a default build, that's a pretty bad dev experience.

@mpwarres
Copy link
Contributor

@htuch what do we need to do to get QUICHE to not be built by default? The doc for disabling extensions makes it sound like extensions are only built if listed in source/extensions/extensions_build_config.bzl, and currently the code under //{source,test}/extensions/quic_listeners/quiche/ is not listed there. That said, it does look like tests from //test/extensions/quic_listeners/quiche/ are run in presubmit checks, so I'm guessing there's something more that we need to be doing, to exclude QUICHE from normal builds?

@htuch
Copy link
Member

htuch commented Jan 30, 2019

@mpwarres good point. I don't think we have a good story for this beyond the heavy hammer of Bazel settings. Take a look at what we do for Google gRPC in https://github.com/envoyproxy/envoy/blob/master/test/common/grpc/BUILD#L63 and other places that use this envoy_select_google_grpc.

I'd suggest a similar approach, defaulting to off for regular builds, and then in CI you could turn it on perhaps. The downside is that things might fail in CI for some folks and not locally until they realize this.

We definitely don't want to make this flag required for regular builds, we've delayed PRs for weeks before to avoid this (see #5218 for example).

@wu-bin
Copy link
Contributor Author

wu-bin commented Jan 31, 2019

I've managed to reproduce the asan ci failure. It looks like a bug with the --experimental_remap_main_repo flag: The asan build was ran from the envoyproxy/envoy-filter-example repo, using command:
bazel_with_collection test ${BAZEL_TEST_OPTIONS} -c dbg --config=clang-asan @envoy//test/... //:echo2_integration_test //:envoy_binary_test

The bazelrc used by this repo is the same as envoy's, so the --experimental_remap_main_repo flag is enabled in the command. The command failed for envoy-filter-example's :envoy_binary_test target, because one of its dependency, '//source/common/common:generate_version_number', depends on envoy's "//:VERSION", but bazel seems to thinking it is envoy-filter-example's "//:VERSION". envoy-filter-example does not have "//:VERSION", hence the failure.

@mpwarres , @htuch : please suggest what should we do to proceed. Thanks!

@mpwarres
Copy link
Contributor

I've managed to reproduce the asan ci failure. [...] The command failed for envoy-filter-example's :envoy_binary_test target, because one of its dependency, '//source/common/common:generate_version_number', depends on envoy's "//:VERSION", but bazel seems to thinking it is envoy-filter-example's "//:VERSION". envoy-filter-example does not have "//:VERSION", hence the failure.

@mpwarres , @htuch : please suggest what should we do to proceed. Thanks!

IIUC, seems like this immediate issue could be addressed by having //source/common/common:generate_version_number depend on @envoy//:VERSION, which perhaps is the right thing to do anyways (irrespective of this PR), if the intent is to rely on envoy's :VERSION?

Past that, we still need to address @htuch's previous comment, though. @wu-bin, LMK if you want to deal with that in this PR, or would like me to address it in a separate one.

@wu-bin
Copy link
Contributor Author

wu-bin commented Jan 31, 2019

I've managed to reproduce the asan ci failure. [...] The command failed for envoy-filter-example's :envoy_binary_test target, because one of its dependency, '//source/common/common:generate_version_number', depends on envoy's "//:VERSION", but bazel seems to thinking it is envoy-filter-example's "//:VERSION". envoy-filter-example does not have "//:VERSION", hence the failure.
@mpwarres , @htuch : please suggest what should we do to proceed. Thanks!

IIUC, seems like this immediate issue could be addressed by having //source/common/common:generate_version_number depend on @envoy//:VERSION, which perhaps is the right thing to do anyways (irrespective of this PR), if the intent is to rely on envoy's :VERSION?

Since it's a within-repo reference(from generate_version_number to VERSION), I think it makes sense to not use @envoy, but it fails even if @envoy is added.

Past that, we still need to address @htuch's previous comment, though. @wu-bin, LMK if you want to deal with that in this PR, or would like me to address it in a separate one.

I just sent #5791 to default disable quic_platform_test. @htuch @mpwarres , can you take a look at it? Thanks!

@mpwarres
Copy link
Contributor

mpwarres commented Feb 1, 2019

Since it's a within-repo reference(from generate_version_number to VERSION), I think it makes sense to not use @envoy, but it fails even if @envoy is added.

Oh, I see, I had missed that //source/common/common:generate_version_number is a rule in Envoy and not in the filter example. I agree that seems like a bug in the Bazel fix. @htuch, you mentioned you have contacts on Bazel team, is there someone you'd recommend? Otherwise I can ask on bazel-discuss@. I'll also try to simplify into a smaller example.

@htuch
Copy link
Member

htuch commented Feb 1, 2019

@mpwarres @dslomov has been a fantastic point-of-contact on Bazel team and is familiar with the Envoy context.

@dslomov
Copy link

dslomov commented Feb 4, 2019

I believe this specific issue has been fixed in Bazel 0.22, where this feature is non-experimental. /cc @dkelmer

@dkelmer
Copy link

dkelmer commented Feb 4, 2019

@dslomov according to the initial post it seems like they are using 0.22.

@wu-bin, are you seeing this error with bazel 0.22? If yes, would you be able to provide some repro instructions so I can look into the bug?

@wu-bin
Copy link
Contributor Author

wu-bin commented Feb 5, 2019

@dslomov according to the initial post it seems like they are using 0.22.

@wu-bin, are you seeing this error with bazel 0.22? If yes, would you be able to provide some repro instructions so I can look into the bug?
@dkelmer : Thanks for looking at this. Yes, I'm seeing this error with bazel 0.22.0.

To reproduce:

  1. Make sure docker is installed. If not, follow go/installdocker.
  2. Clone the envoy repository to workstation. I followed this instruction, but for reproduce the issue (I think) you can just clone the official repo envoyproxy/envoy.
  3. In the envoy directory, edit .bazelrc:
    $ git diff
    diff --git a/.bazelrc b/.bazelrc
    ...
    startup --host_jvm_args=-Xmx512m
    -build --workspace_status_command=bazel/get_workspace_status
    +build --workspace_status_command=bazel/get_workspace_status --experimental_remap_main_repo
    ...
  4. In the envoy directory, run "./ci/run_envoy_docker.sh './ci/do_ci.sh bazel.asan'"

@mpwarres
Copy link
Contributor

mpwarres commented Feb 5, 2019

Just to add some context/history:

  1. We recently added an external dependency to Envoy (QUICHE), in which (once issues are resolved) there will be Envoy target A -> QUICHE -> Envoy target B dependencies.

  2. In doing (1), we ran into link-time errors caused by duplicate definitions. @dkelmer helped us debug this, summary here. Bazel 0.22 plus --experimental_remap_main_repo appears necessary to avoid these. However...

  3. Bazel 0.22 plus --experimental_remap_main_repo causes a separate build error, described here, where the label "//:VERSION" used in an envoy build rule does not properly resolve to that target in envoy, when envoy is built as (itself) an external dependency of envoy-filter-example, which is what some of the CI build targets (e.g. the one referenced in @wu-bin's repro instructions do.

@dkelmer
Copy link

dkelmer commented Feb 6, 2019

@mpwarres and I took a look at this yesterday. @wu-bin, your repro instructions do indeed reproduce the failure, but it is difficult to discern which Bazel targets are being built and from where. We attempted to create a more minimal repro by doing the following:

  1. clone the envoy-filter-example repository and set it up (git submodule update --init)
  2. cd into the envoy directory and modify the .bazelrc file in there to add the --experimental_remap_main_repo flag
  3. bazel build //:envoy and bazel build @envoy//source/common/common:generate_version_number

Both of the above builds succeed even though I'd expect both to fail as that is what is failing when invoking the CI script.

Some additional information from the failure by invoking the CI script is that the top level target that is failing is //:envoy_binary_test in the envoy-filter-example repository. I attempted to bazel test //:envoy_binary_test using Bazel 0.22 without any modifications (i.e. without passing the --experimental_remap_main_repo flag) but the test is failing. I assume this is due to something in my environment being set up incorrectly since I assume the test normally passes. Do you know if there are any flags I need to pass or something else I need to set up?

@mpwarres
Copy link
Contributor

mpwarres commented Feb 6, 2019

Some findings from continuing to play around with this:

  1. I've only been able to repro the failure in the context of ci/do_ci.sh. It seems like the issue may be tied to the particular way in which ci/build_setup.sh (invoked by ci/do_ci.sh) is setting up the envoy and envoy-filter-example workspaces.

  2. The failure has nothing to do with ASAN. It repros even if this line in do_ci.sh is replaced with:

bazel_with_collection test ${BAZEL_TEST_OPTIONS} //source/common/common:generate_version_number
  1. The most (or really only) effective way I've found to get more info on the failure is to insert 'echo' and other shell commands into ci/do_ci.sh, e.g. immediately before the line referenced in (2). When doing this, be sure to 'rm -rf /tmp/envoy-docker-build' (or $ENVOY_DOCKER_BUILD_DIR if set) in between repro attempts, to ensure starting from a fresh slate.

  2. bazel is being executed with a multipart --package_path, which perhaps may play into this? It's set here, the value ends up being "%workspace%:/source". The build environment inside the docker container is:

    • envoy-filter-example repo at /build/envoy-filter-example
    • envoy-filter-example WORKSPACE file replaced with ci/WORKSPACE.filter.example (done here)
    • envoy repo populated under /source (i.e. $ENVOY_SRCDIR)
    • bazel command executed from /build/envoy-filter-example

IIUC, this means that bazel will be executing with a --package_path that includes both envoy-filter-example and envoy repos, even though the envoy repo is also brought in as a local_repository from the WORKSPACE file (here), which seems odd to me. However, if I tweak --package_path to only be "%workspace%", the build fails due to other missing targets.

@mpwarres
Copy link
Contributor

mpwarres commented Feb 6, 2019

Progress (I think): I noticed that the ci/WORKSPACE.filter.example file that ci/build_setup.sh installs for the failing CI runs seems a bit funky: it reuses (here and here) the name "envoy" both for the workspace name and a local_repository name, and also refers to files provided by the envoy local_repository via names that start with "//" rather than "@envoy//" (i.e. "//bazel:repositories.bzl" and "//bazel:cc_configure.bzl").

If I change ci/WORKSPACE.filter.example to use distinct names for the envoy-filter-example workspace and the envoy local_repository, and fully qualify the references to //bazel:repositories.bzl and //bazel:cc_configure.bzl, things appear to work, both with and without the --experimental_remap_main_repo flag. This change is PR #5868. A side benefit is that it brings ci/WORKSPACE.filter.example closer in line with the current actual envoy-filter-example WORKSPACE file, and removes the need to set --package_path in ci/build_setup.sh.

My best guess as to why this fouled things up is that it may have confused Bazel about which rules were defined in which workspaces, which perhaps was handled differently/less strictly prior to --experimental_remap_main_repo? I'm also not sure if there's a reason why ci/WORKSPACE.filter.example needs to be the way it is currently. The local_repository(name="envoy") goes a couple years back, to PR #742, perhaps the external dependency story was different back then? Not sure if @htuch may remember.

@dkelmer does the above theory seem plausible? @htuch does #5868 seem like a reasonable change (with or without the actual setting of --experimental_remap_main_repo)?

@htuch
Copy link
Member

htuch commented Feb 7, 2019

@wu-bin other than the package_path thing, #5868 looks reasonable. TBH, how we handle repository naming and qualified accesses across repos has changed a bit since we did #742, so this is probably a welcome update.

@mpwarres
Copy link
Contributor

mpwarres commented Feb 7, 2019

Thanks @htuch, I've reverted the package_path aspect of #5868, hopefully that should address the remaining CI issues. Let's see if @dkelmer has any feedback to add tomorrow.

@dkelmer
Copy link

dkelmer commented Feb 7, 2019

#5868 is absolutely a good change.

My very scientific explanation for why it was working before is "you were lucky" :). Bazel generally ignored the name specified in the workspace() function. But with the --experimental_remap_main_repo, Bazel now uses the name. For example, if the workspace function was workspace(name = "my_project"), then, with the flag, Bazel adds an entry in a map from my_project to @ (which always means the main workspace). And then any references to @my_project in the project will evaluate to @. And so in your case, all references to @envoy became @ which in your CI run was really @envoy-filter-example, hence causing targets that were in the real @envoy to not be found.

@htuch htuch added the waiting label Feb 8, 2019
@htuch
Copy link
Member

htuch commented Feb 8, 2019

@wu-bin can we close this one out now?

@mpwarres
Copy link
Contributor

mpwarres commented Feb 8, 2019

Thanks @dkelmer for the explanation!

Closing this PR SGTM, will work on addressing the remaining TEST_WORKSPACE issue in #5868 so that can land. @wu-bin does that sound ok to you?

@wu-bin
Copy link
Contributor Author

wu-bin commented Feb 8, 2019

Thanks @dkelmer for the explanation!

Closing this PR SGTM, will work on addressing the remaining TEST_WORKSPACE issue in #5868 so that can land. @wu-bin does that sound ok to you?
Sure, I'll close this one. Thanks for digging into this!

@wu-bin wu-bin closed this Feb 8, 2019
htuch pushed a commit that referenced this pull request Feb 11, 2019
Proposed fix for issues seen in #5767

Update envoy/ci/WORKSPACE.filter.example, which is used in CI tests (see setup [here](https://github.com/envoyproxy/envoy/blob/f2511a39cf2c4fe392d5499e854c39f262712100/ci/build_setup.sh#L92)), to more closely match current envoy-filter-example [WORKSPACE](https://github.com/envoyproxy/envoy-filter-example/blob/master/WORKSPACE) file. In particular, remove the dual use of "envoy" as both the name of envoy-filter-example workspace as well as the envoy local_repository, and load //bazel:repositories.bzl and //bazel:cc_configure.bzl using fully qualified target names, which removes the need to [set --package_path](https://github.com/envoyproxy/envoy/blob/f2511a39cf2c4fe392d5499e854c39f262712100/ci/build_setup.sh#L63) in ci/build_setup.sh.

Risk Level: low: errors should result in CI failure
Testing: existing CI tests

Signed-off-by: Michael Warres <mpw@google.com>
fredlas pushed a commit to fredlas/envoy that referenced this pull request Mar 5, 2019
Proposed fix for issues seen in envoyproxy#5767

Update envoy/ci/WORKSPACE.filter.example, which is used in CI tests (see setup [here](https://github.com/envoyproxy/envoy/blob/f2511a39cf2c4fe392d5499e854c39f262712100/ci/build_setup.sh#L92)), to more closely match current envoy-filter-example [WORKSPACE](https://github.com/envoyproxy/envoy-filter-example/blob/master/WORKSPACE) file. In particular, remove the dual use of "envoy" as both the name of envoy-filter-example workspace as well as the envoy local_repository, and load //bazel:repositories.bzl and //bazel:cc_configure.bzl using fully qualified target names, which removes the need to [set --package_path](https://github.com/envoyproxy/envoy/blob/f2511a39cf2c4fe392d5499e854c39f262712100/ci/build_setup.sh#L63) in ci/build_setup.sh.

Risk Level: low: errors should result in CI failure
Testing: existing CI tests

Signed-off-by: Michael Warres <mpw@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants