Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bzlmod lockfile does not track repository mappings of extension .bzl files #20721

Closed
fmeum opened this issue Jan 3, 2024 · 15 comments
Closed
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug

Comments

@fmeum
Copy link
Collaborator

fmeum commented Jan 3, 2024

Description of the bug:

When the .bzl files transitively loaded by a module extension don't change, but their repo mapping does, a module extension locked in the lockfile can return stale results.

Which category does this issue belong to?

External Dependency

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

# MODULE.bazel
bazel_dep(name = "rules_go", version = "0.40.0")

ext = use_extension("//:ext.bzl", "ext")
ext.tag(label_str = "@rules_go//go")
use_repo(ext, "repo")

other_ext = use_extension("//:other_ext.bzl", "other_ext")
use_repo(other_ext, "other_repo")

# BUILD
load("@repo//:defs.bzl", "LABEL")
load("@other_repo//:defs.bzl", "STR")
print(LABEL)
print(STR)

# ext.bzl
def _repo_impl(ctx):
    ctx.file("WORKSPACE")
    ctx.file("BUILD")
    ctx.file("defs.bzl", "LABEL = Label({})".format(repr(ctx.attr.label_str)))

_repo = repository_rule(
    implementation=_repo_impl,
    attrs={"label_str": attr.string(mandatory=True)},
)

_tag = tag_class(
    attrs={"label_str": attr.string(mandatory=True)},
)

def _ext_impl(ctx):
    for module in ctx.modules:
        for tag in module.tags.tag:
            _repo(name = "repo", label_str = tag.label_str)

ext = module_extension(
    implementation=_ext_impl,
    tag_classes={"tag": _tag},
)

# other_ext.bzl
load("@repo//:defs.bzl", "LABEL")

def _other_repo_impl(ctx):
    ctx.file("WORKSPACE")
    ctx.file("BUILD")
    ctx.file("defs.bzl", "STR = {}".format(repr(ctx.attr.str)))

_other_repo = repository_rule(
    implementation = _other_repo_impl,
    attrs = {
        "str": attr.string(),
    },
)

def _other_ext_impl(_):
    _other_repo(name = "other_repo", str = str(LABEL))

other_ext = module_extension(
    implementation = _other_ext_impl,
)
$ bazel build //...
DEBUG: /home/fhenneke/tmp/repo-mapping-lock/BUILD:3:6: @@rules_go~0.40.0//go:go
DEBUG: /home/fhenneke/tmp/repo-mapping-lock/BUILD:4:6: @@rules_go~0.40.0//go:go
$ buildozer 'set version 0.41.0' MODULE.bazel:rules_go
$ bazel build //...
DEBUG: /home/fhenneke/tmp/repo-mapping-lock/BUILD:3:6: @@rules_go~0.41.0//go:go
DEBUG: /home/fhenneke/tmp/repo-mapping-lock/BUILD:4:6: @@rules_go~0.40.0//go:go
$ bazel build //... --lockfile_mode=off
DEBUG: /home/fhenneke/tmp/repo-mapping-lock/BUILD:3:6: @@rules_go~0.41.0//go:go
DEBUG: /home/fhenneke/tmp/repo-mapping-lock/BUILD:4:6: @@rules_go~0.41.0//go:go
$ buildozer 'set version 0.40.0' MODULE.bazel:rules_go
$ bazel build //... --lockfile_mode=off
DEBUG: /home/fhenneke/tmp/repo-mapping-lock/BUILD:3:6: @@rules_go~0.40.0//go:go
DEBUG: /home/fhenneke/tmp/repo-mapping-lock/BUILD:4:6: @@rules_go~0.40.0//go:go

Which operating system are you running Bazel on?

No response

What is the output of bazel info release?

7.0.0 and HEAD

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

This is a regression in 7.0.0 since the lockfile is now enabled by default.

Have you found anything relevant by searching the web?

The reproducer is a synthetic, simplified version of the original problem (https://bazelbuild.slack.com/archives/CDBP88Z0D/p1703062323776219), which affects rules_go.

Any other information, logs, or outputs that you want to share?

No response

@fmeum
Copy link
Collaborator Author

fmeum commented Jan 3, 2024

@bazel-io flag

@fmeum fmeum changed the title Bzlmod lockfile does not track repository mappings of extension .bzl files [7.0.0] Bzlmod lockfile does not track repository mappings of extension .bzl files Jan 3, 2024
@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Jan 3, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Jan 3, 2024

Both #20722 and #20721 can be fixed with this patch:

diff --git a/src/main/java/com/google/devtools/build/lib/skyframe/BzlLoadFunction.java b/src/main/java/com/google/devtools/build/lib/skyframe/BzlLoadFunction.java
index 08a3ee4e2d..af90e5f306 100644
--- a/src/main/java/com/google/devtools/build/lib/skyframe/BzlLoadFunction.java
+++ b/src/main/java/com/google/devtools/build/lib/skyframe/BzlLoadFunction.java
@@ -829,6 +829,10 @@ public class BzlLoadFunction implements SkyFunction {
     if (predeclared == null) {
       return null;
     }
+
+    // Also include the repository mapping as the evaluation of a `.bzl` file depends on it through
+    // the Label function.
+    fp.addStringMap(Maps.transformValues(repoMapping.entries(), RepositoryName::getName));
     byte[] transitiveDigest = fp.digestAndReset();
 
     // The BazelModuleContext holds additional contextual info to be associated with the Module,

This does cause the invalidation of all repos generated by extension in a Bazel module when that module is updated though, which can be costly. A more fine-grained alternative would be to track Label calls similar to ctx.path calls.

CC @Wyverald

@fmeum
Copy link
Collaborator Author

fmeum commented Jan 3, 2024

@bazel-io fork 7.0.1

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Jan 3, 2024
@sgowroji sgowroji added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. area-Bzlmod Bzlmod-specific PRs, issues, and feature requests labels Jan 3, 2024
@iancha1992 iancha1992 changed the title [7.0.0] Bzlmod lockfile does not track repository mappings of extension .bzl files Bzlmod lockfile does not track repository mappings of extension .bzl files Jan 3, 2024
@Wyverald Wyverald added P1 I'll work on this now. (Assignee required) and removed untriaged labels Jan 3, 2024
@Wyverald
Copy link
Member

Wyverald commented Jan 3, 2024

A more fine-grained alternative would be to track Label calls similar to ctx.path calls.

I think we can do this. IIUC, we just need to store the mapping entries for anything used by a Label() call. Storing all repo mapping entries is as you said way too sensitive to change.

Wyverald added a commit that referenced this issue Jan 4, 2024
... that use repo mapping. This is a rather obscure case of the lockfile being stale; if the `Label()` constructor is called in an extension impl function, and that call uses repo mapping of any form (i.e. the argument looks like `@foo//bar`), then we need to be ready to rerun the extension if `@foo` were to suddenly map to something else.

I also did a minor refactoring in `SingleExtensionEvalFunction` around the logic to decide whether the locked extension is up-to-date. Right now we perform a "diff" between the locked extension and what we expect to be up-to-date, and if a "diff" is found *and* `--lockfile_mode=error`, we basically perform a diff again. We also don't short circuit; that is, if the transitive bzl digest has changed, there's no point in seeing whether any files have changed, but we do right now.

A follow-up will be sent to fix the analogous bug for repo rules.

Fixes #20721.
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Jan 4, 2024
... that use repo mapping. This is a rather obscure case of the lockfile being stale; if the `Label()` constructor is called in an extension impl function, and that call uses repo mapping of any form (i.e. the argument looks like `@foo//bar`), then we need to be ready to rerun the extension if `@foo` were to suddenly map to something else.

I also did a minor refactoring in `SingleExtensionEvalFunction` around the logic to decide whether the locked extension is up-to-date. Right now we perform a "diff" between the locked extension and what we expect to be up-to-date, and if a "diff" is found *and* `--lockfile_mode=error`, we basically perform a diff again. We also don't short circuit; that is, if the transitive bzl digest has changed, there's no point in seeing whether any files have changed, but we do right now.

A follow-up will be sent to fix the analogous bug for repo rules.

Fixes bazelbuild#20721.

Closes bazelbuild#20742.

PiperOrigin-RevId: 595818144
Change-Id: Id660b7a659a7f2e4dde19c16784c2ab18a9ceb69
iancha1992 added a commit that referenced this issue Jan 5, 2024
... that use repo mapping. This is a rather obscure case of the lockfile
being stale; if the `Label()` constructor is called in an extension impl
function, and that call uses repo mapping of any form (i.e. the argument
looks like `@foo//bar`), then we need to be ready to rerun the extension
if `@foo` were to suddenly map to something else.

I also did a minor refactoring in `SingleExtensionEvalFunction` around
the logic to decide whether the locked extension is up-to-date. Right
now we perform a "diff" between the locked extension and what we expect
to be up-to-date, and if a "diff" is found *and*
`--lockfile_mode=error`, we basically perform a diff again. We also
don't short circuit; that is, if the transitive bzl digest has changed,
there's no point in seeing whether any files have changed, but we do
right now.

A follow-up will be sent to fix the analogous bug for repo rules.

Fixes #20721.

Closes #20742.

PiperOrigin-RevId: 595818144
Change-Id: Id660b7a659a7f2e4dde19c16784c2ab18a9ceb69

Co-authored-by: Xdng Yng <[email protected]>
@iancha1992
Copy link
Member

@bazel-io fork 7.1.0

@fmeum
Copy link
Collaborator Author

fmeum commented Jan 8, 2024

@Wyverald I found a special case that isn't covered by the fix:

  def testExtensionRepoMappingChangeTopLevel(self):
    # Regression test for #20721
    self.main_registry.createCcModule('foo', '1.0')
    self.main_registry.createCcModule('foo', '2.0')
    self.main_registry.createCcModule('bar', '1.0')
    self.main_registry.createCcModule('bar', '2.0')
    self.ScratchFile(
      'MODULE.bazel',
      [
        'bazel_dep(name="foo",version="1.0")',
        'bazel_dep(name="bar",version="1.0")',
        'ext = use_extension(":ext.bzl", "ext")',
        'use_repo(ext, "repo")',
      ],
    )
    self.ScratchFile(
      'BUILD.bazel',
      [
        'load("@repo//:defs.bzl", "STR")',
        'print("STR="+STR)',
        'filegroup(name="lol")',
      ],
    )
    self.ScratchFile(
      'constant.bzl',
      [
        'LABEL = Label("@foo//:lib_foo")',
      ],
    )
    self.ScratchFile(
      'ext.bzl',
      [
        'load(":constant.bzl", "LABEL")',
        'def _repo_impl(rctx):',
        '  rctx.file("BUILD")',
        '  rctx.file("defs.bzl", "STR = " + repr(str(rctx.attr.value)))',
        'repo = repository_rule(_repo_impl,attrs={"value":attr.label()})',
        'def _ext_impl(mctx):',
        '  print("ran the extension!")',
        '  repo(name = "repo", value = LABEL)',
        'ext = module_extension(_ext_impl)',
      ],
    )

    _, _, stderr = self.RunBazel(['build', ':lol'])
    self.assertIn('STR=@@foo~1.0//:lib_foo', '\n'.join(stderr))

    # Shutdown bazel to make sure we rely on the lockfile and not skyframe
    self.RunBazel(['shutdown'])
    _, _, stderr = self.RunBazel(['build', ':lol'])
    self.assertNotIn('ran the extension!', '\n'.join(stderr))

    # Shutdown bazel to make sure we rely on the lockfile and not skyframe
    self.RunBazel(['shutdown'])
    # Now, for something spicy: upgrade foo to 2.0 and change nothing else.
    # The extension should rerun despite the lockfile being present, and no
    # usages or .bzl files having changed.
    self.ScratchFile(
      'MODULE.bazel',
      [
        'bazel_dep(name="foo",version="2.0")',
        'bazel_dep(name="bar",version="1.0")',
        'ext = use_extension(":ext.bzl", "ext")',
        'use_repo(ext, "repo")',
      ],
    )
    _, _, stderr = self.RunBazel(['build', ':lol'])
    self.assertIn('STR=@@foo~2.0//:lib_foo', '\n'.join(stderr))

    # Shutdown bazel to make sure we rely on the lockfile and not skyframe
    self.RunBazel(['shutdown'])
    # More spicy! upgrade bar to 2.0 and change nothing else.
    # The extension should NOT rerun, since it never used the @bar repo mapping.
    self.ScratchFile(
      'MODULE.bazel',
      [
        'bazel_dep(name="foo",version="2.0")',
        'bazel_dep(name="bar",version="2.0")',
        'ext = use_extension(":ext.bzl", "ext")',
        'use_repo(ext, "repo")',
      ],
    )
    _, _, stderr = self.RunBazel(['build', ':lol'])
    stderr = '\n'.join(stderr)
    self.assertNotIn('ran the extension!', stderr)
    self.assertIn('STR=@@foo~2.0//:lib_foo', stderr)

@fmeum fmeum reopened this Jan 8, 2024
@Wyverald
Copy link
Member

Wyverald commented Jan 8, 2024

Ah, thanks! We'll need to keep track of the used repo mappings from BzlLoadFunction too.

@Wyverald
Copy link
Member

Wyverald commented Jan 9, 2024

Hmm, this is actually slightly trickier than I thought. In particular, we also need to deal with repo-relative Label() calls.

Consider this scenario:

  • @@//:foo.bzl loads from @bar//:bar.bzl, which initially maps to @@bar~1.0//:bar.bzl
  • we upgrade bar from 1.0 to 2.0, and change nothing else
  • @@//:foo.bzl now actually loads from @@bar~2.0//:bar.bzl
  • if bar.bzl actually contains a MY_CONSTANT = Label("//:whatever"), its value will have changed by the upgrade, despite no bzl digest having changed, or any repo mapping from @@bar~1.0 or @@bar~2.0 having changed

To deal with this scenario, we could either consider load()s a "use" of a repo mapping and track it just like a Label() call, or we could store the "source repo" for every .bzl file as part of its digest. The latter approach is easier to implement but has a larger blast radius (affects BUILD-time loads as well), so I think I'll go with the first approach.


Beyond that, we also need to be careful while verifying the up-to-dateness of stored repo mapping entries. In the scenario above, if @@bar~1.0//:bar.bzl contained a Label("@quux//:whatever) call, then the entry <@@bar~1.0, quux, @@quux~1.0> would be stored. After upgrading bar to 2.0, though, trying to verify this entry could cause a crash as @@bar~1.0 doesn't even exist anymore, so requesting its repo mapping just causes an exception.

Wyverald added a commit that referenced this issue Jan 10, 2024
In the same vein as #20742, we record all repo mapping entries used during the load of a .bzl file too, including any of its `load()` statements and calls to `Label()` that contain an apparent repo name.

See #20721 (comment) for a more detailed explanation for this change, and the test cases in this commit for more potential triggers.

Fixes #20721
Wyverald added a commit that referenced this issue Jan 10, 2024
In the same vein as #20742, we record all repo mapping entries used during the load of a .bzl file too, including any of its `load()` statements and calls to `Label()` that contain an apparent repo name.

See #20721 (comment) for a more detailed explanation for this change, and the test cases in this commit for more potential triggers.

Fixes #20721
Wyverald added a commit that referenced this issue Jan 10, 2024
In the same vein as #20742, we record all repo mapping entries used during the load of a .bzl file too, including any of its `load()` statements and calls to `Label()` that contain an apparent repo name.

See #20721 (comment) for a more detailed explanation for this change, and the test cases in this commit for more potential triggers.

Fixes #20721

Closes #20830.

PiperOrigin-RevId: 597351525
Change-Id: I8f6ed297b81d55f7476a93bdc6668e1e1dcbe536
Wyverald added a commit that referenced this issue Jan 11, 2024
In the same vein as #20742, we
record all repo mapping entries used during the load of a .bzl file too,
including any of its `load()` statements and calls to `Label()` that
contain an apparent repo name.

See
#20721 (comment)
for a more detailed explanation for this change, and the test cases in
this commit for more potential triggers.

Fixes #20721

Closes #20830.

PiperOrigin-RevId: 597351525
Change-Id: I8f6ed297b81d55f7476a93bdc6668e1e1dcbe536
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.0.1 RC2. Please test out the release candidate and report any issues as soon as possible. Thanks!

Wyverald added a commit that referenced this issue Jan 25, 2024
... that use repo mapping. This is a rather obscure case of the lockfile being stale; if the `Label()` constructor is called in an extension impl function, and that call uses repo mapping of any form (i.e. the argument looks like `@foo//bar`), then we need to be ready to rerun the extension if `@foo` were to suddenly map to something else.

I also did a minor refactoring in `SingleExtensionEvalFunction` around the logic to decide whether the locked extension is up-to-date. Right now we perform a "diff" between the locked extension and what we expect to be up-to-date, and if a "diff" is found *and* `--lockfile_mode=error`, we basically perform a diff again. We also don't short circuit; that is, if the transitive bzl digest has changed, there's no point in seeing whether any files have changed, but we do right now.

A follow-up will be sent to fix the analogous bug for repo rules.

Fixes #20721.

Closes #20742.

PiperOrigin-RevId: 595818144
Change-Id: Id660b7a659a7f2e4dde19c16784c2ab18a9ceb69
Wyverald added a commit that referenced this issue Jan 25, 2024
In the same vein as #20742, we record all repo mapping entries used during the load of a .bzl file too, including any of its `load()` statements and calls to `Label()` that contain an apparent repo name.

See #20721 (comment) for a more detailed explanation for this change, and the test cases in this commit for more potential triggers.

Fixes #20721

Closes #20830.

PiperOrigin-RevId: 597351525
Change-Id: I8f6ed297b81d55f7476a93bdc6668e1e1dcbe536
Wyverald added a commit that referenced this issue Jan 29, 2024
Similar to the fix for #20721, we write recorded repo mapping entries into the marker file so that repo fetching is sensitive to any relevant repo mapping changes.

Fixes #20722.
copybara-service bot pushed a commit that referenced this issue Feb 1, 2024
Similar to the fix for #20721, we write recorded repo mapping entries into the marker file so that repo fetching is sensitive to any relevant repo mapping changes.

Fixes #20722.

Closes #21131.

PiperOrigin-RevId: 603310262
Change-Id: I806f383e8579fed3533fac9b181efd8b75e76fcd
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Feb 1, 2024
Similar to the fix for bazelbuild#20721, we write recorded repo mapping entries into the marker file so that repo fetching is sensitive to any relevant repo mapping changes.

Fixes bazelbuild#20722.

Closes bazelbuild#21131.

PiperOrigin-RevId: 603310262
Change-Id: I806f383e8579fed3533fac9b181efd8b75e76fcd
github-merge-queue bot pushed a commit that referenced this issue Feb 1, 2024
)

Similar to the fix for #20721,
we write recorded repo mapping entries into the marker file so that repo
fetching is sensitive to any relevant repo mapping changes.

Fixes #20722.

Closes #21131.

Commit
9edaddd

PiperOrigin-RevId: 603310262
Change-Id: I806f383e8579fed3533fac9b181efd8b75e76fcd

Co-authored-by: Xdng Yng <[email protected]>
@phst
Copy link
Contributor

phst commented Feb 3, 2024

Is this fixed in all cases? I'm still seeing it with Bazel 7.0.2 (e.g. https://github.com/phst/rules_elisp/actions/runs/7767854632/job/21185125063 - this works with --lockfile_mode=off but not otherwise).

@fmeum
Copy link
Collaborator Author

fmeum commented Feb 3, 2024

@phst Could you try rerunning the build after deleting the lock file once?

@phst
Copy link
Contributor

phst commented Feb 4, 2024

Yeah, that indeed worked, thanks!
I guess there's still some bug lurking somewhere then? Or is it expected that users have to regularly delete and recreate their lockfiles?

@fmeum
Copy link
Collaborator Author

fmeum commented Feb 4, 2024

That's not expected. We should have bumped the lockfile version, but didn't think of that in time for the release. We will bump it for 7.1.0.

Cc @Wyverald

@Wyverald
Copy link
Member

Wyverald commented Feb 5, 2024

Indeed -- this was an oversight on my part. Although frequent lockfile version bumps are also very disruptive, it's still better than hard-to-diagnose correctness issues.

@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug
Projects
None yet
7 participants