Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "Locating runfiles with Bzlmod" #269

Merged
merged 10 commits into from
Aug 17, 2022
Merged

Conversation

fmeum
Copy link
Contributor

@fmeum fmeum commented Jul 21, 2022

No description provided.

@fmeum fmeum requested review from jin and lberki as code owners July 21, 2022 16:27
@fmeum
Copy link
Contributor Author

fmeum commented Jul 21, 2022

@Wyverald @meteorcloudy @phst I have added you as reviewers based on the discussion we had about this topic. I would like to have a maintainer of either rules_java or rules_cc on the list as well, but wasn't sure who would be available.

I'm a bit uncertain about the wording in https://bazel.build/contribute/design-documents#new-proposal. Should I already send out the email or wait for this PR to be merged? "Submitted" could mean either of the two.

@fmeum
Copy link
Contributor Author

fmeum commented Jul 22, 2022

@comius Looks like you are back, would you be willing to review this proposal from the rules owner perspective?

Copy link
Member

@Wyverald Wyverald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the great writeup :) I'm still reviewing; leaving first round of comments

designs/2022-07-21-locating-runfiles-with-bzlmod.md Outdated Show resolved Hide resolved
designs/2022-07-21-locating-runfiles-with-bzlmod.md Outdated Show resolved Hide resolved
designs/2022-07-21-locating-runfiles-with-bzlmod.md Outdated Show resolved Hide resolved
designs/2022-07-21-locating-runfiles-with-bzlmod.md Outdated Show resolved Hide resolved
designs/2022-07-21-locating-runfiles-with-bzlmod.md Outdated Show resolved Hide resolved
If all entries (instead of just this limited set) were emitted into the manifest, all actions depending on T would have to rerun if any repository containing a transitive dependency of T would declare a new dependency or change an apparent repository name, even if that repository neither contributes a runfile nor looks up runfiles at runtime.

In order to help runfiles libraries find the repository mapping manifest, it is added to the runfiles manifest (and thus the runfiles directory) under the fixed path `_repo_mapping`.
Since canonical repository names do not start with `_`, there is no risk of this synthetic part being a prefix of an actual runfile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part is potentially untrue -- canonical repo names can start with _. We currently don't have such repos (not that I know of, anyway), but we haven't ruled out the possibility. (For example, there's a pseudo-repo called @_builtins.)

Is the runfiles manifest also inside the runfiles directory, or is it only outside of it? Is it crucial for the _repo_mapping file to be inside the runfiles directory? Maybe I'm trying too hard to be 100% sure that a conflict can't happen...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important bit for me was that we can't have collisions with canonical repository names of user-provided repos.

Collisions with internally defined repositories are less of a concern since we know them all. Also, nothing is stopping us from giving essentially arbitrary names to Bazel-defined repositories. But if you prefer a different escaping scheme, I can use @@repo_mapping or even _never_let_this_internal_repo_name_through_code_review_repo_mapping instead. ;-)

designs/2022-07-21-locating-runfiles-with-bzlmod.md Outdated Show resolved Hide resolved
designs/2022-07-21-locating-runfiles-with-bzlmod.md Outdated Show resolved Hide resolved
@aiuto
Copy link
Contributor

aiuto commented Jul 22, 2022

I'm a bit uncertain about the wording in https://bazel.build/contribute/design-documents#new-proposal. Should I already send out the email or wait for this PR to be merged? "Submitted" could mean either of the two.

Since we seem to be reviewing the proposal in code review rather than a doc, you should send mail to bazel-dev@ now, so others can see it without stalking this repo. I always avoid the problem by doing the proposal in a doc and making the PR just be a pointer to the doc. I find github's UI really insufficient for having discussions about text.


In order to support Bzlmod, rulesets providing runfiles libraries will have to be updated to advertise the new provider.
Since a new global symbol can't be feature-detected in Starlark, this can only be done once the ruleset providing the runfiles libraries targets a Bazel version that offers the new provider as its minimum supported version.
For runfiles libraries that do not ship with Bazel, it would thus be helpful to cherry-pick this change into Bazel 5.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it would be helpful if all runfiles libraries were external to Bazel. That facilitates writing code that gets runfiles under bazel test and bazel run, but can also be repackaged and installed in the canonical manner for any OS. Those rarely would have a .runfiles tre e the way we build it.

Even if we make them all external, we may not have to backport this feature to bazel 5.x. Since each Bazel LTS could have the potential to do very different things with runfiles, we will have to account for that at the library side. Either we have versions of library for each LTS track, or, have the runfiles manifest/layout itself contain a version number. Some day in the future, Bazel 9.x will use runfiles-v2, and a single runfiles library at that time could read all back versions. That style might solve the problem of backport to 5.x We could use the non-existance of the runfiles version to know it is the old style, and read accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the appeal of having runfiles be external to Bazel, but in this particular section, a backport of a small part of the functionality proposed here seems actually needed even if all libraries became external already tomorrow: We have to provide a way for runfiles libraries to mark themselves as such and this way has to be understood by Bazel itself.

That's because we are essentially introducing version 1 of a new manifest (the repository mapping manifest) here rather than a new version of an existing manifest. The new manifest type needs some additional support within Bazel itself.

At my company, we are actually using the runfiles libraries as they are provided right now at install time by relying on the manifest- rather than the directory-based implementation.

@fmeum
Copy link
Contributor Author

fmeum commented Jul 22, 2022

Since we seem to be reviewing the proposal in code review rather than a doc, you should send mail to bazel-dev@ now, so others can see it without stalking this repo. I always avoid the problem by doing the proposal in a doc and making the PR just be a pointer to the doc. I find github's UI really insufficient for having discussions about text.

Thanks for the clarification, will send out the mail right away.

I also find comments on Google Docs much more convenient than on GitHub PRs, but I prefer Markdown (especially its code blocks) to Docs formatting. It's also nice that Markdown files checked into the repo can't get lost - regaining edit rights to the original runfiles libraries design doc took quite some effort.


b. If the runfiles library is not shipped with Bazel, it should fall back to the original lookup procedure. This is necessary to support the use of runfiles libraries with versions of Bazel that do not yet create the manifest.

2. The runfiles library parses the repository mapping manifest and stores a map of `A` to `repo_mapping(C, A)` for the fixed canonical repository name `C` corresponding to the repository from which the `Create` method was called.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to determine C in Rlocation, not Create? Otherwise, a runfiles object passed across repository boundaries or a single global lazily-initialized runfiles object would give wrong results when used from different repositories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are probably correct that this is what we should do for consistency (also with languages like Bash that don't even have a Create).

The one drawback of that approach is that it makes it more difficult to migrate to the new API as the potentially many Rlocation calls all have to be updated. But if that turns out to be a concern, we could always add a CreateForSingleRepository function that locks in the repo mapping.

I will change the section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the section, PTAL.

fmeum added a commit to fmeum/bazel that referenced this pull request Aug 10, 2022
The new Starlark-defined RunfilesLibraryInfo provider may be used in the
future to decide whether additional information has to be emitted by a
rule for a target to find its runfiles at runtime. This may include
serializing repository mapping information to a runfile and/or
generating code that provides the canonical name of the current
repository.

The provider is added now rather than later so that third-party runfiles
libraries can rely on it being present earlier - they have to wait for
their minimum supported Bazel version to offer it. If it should turn out
not to be needed, it can safely be removed before any ruleset would have
a reason to depend on it.

Work towards bazelbuild/proposals#269
fmeum added a commit to fmeum/bazel that referenced this pull request Aug 10, 2022
The new Starlark-defined RunfilesLibraryInfo provider may be used in the
future to decide whether additional information has to be emitted by a
rule for a target to find its runfiles at runtime. This may include
serializing repository mapping information to a runfile and/or
generating code that provides the canonical name of the current
repository.

The provider is added now rather than later so that third-party runfiles
libraries can rely on it being present earlier - they have to wait for
their minimum supported Bazel version to offer it. If it should turn out
not to be needed, it can safely be removed before any ruleset would have
a reason to depend on it.

Work towards bazelbuild/proposals#269
fmeum added a commit to fmeum/bazel that referenced this pull request Aug 10, 2022
The new Starlark-defined RunfilesLibraryInfo provider may be used in the
future to decide whether additional information has to be emitted by a
rule for a target to find its runfiles at runtime. This may include
serializing repository mapping information to a runfile and/or
generating code that provides the canonical name of the current
repository.

The provider is added now rather than later so that third-party runfiles
libraries can rely on it being present earlier - they have to wait for
their minimum supported Bazel version to offer it. If it should turn out
not to be needed, it can safely be removed before any ruleset would have
a reason to depend on it.

Work towards bazelbuild/proposals#269
With `my_module` as the root module and Bzlmod enabled, the repository mapping manifest of `//:some_tool` would look as follows:

```
,my_module,my_workspace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify that the following:
,my_module,my_workspace
,my_workspace,my_workspace

entries are needed because of data.txt in the data attribute. It took me some time to figure that out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more explanations.

- `T` transitively depends on a target in `C` that directly depends on a target advertising `RunfilesLibraryInfo`;
- `T`'s runfiles contain an artifact owned by a target in `repo_mapping(C, A)`.
This property ensures that the repository mapping manifest only contains the entries that are actually needed for the target to resolve its runfiles.
If all entries (instead of just this limited set) were emitted into the manifest, all actions depending on T would have to rerun if any repository containing a transitive dependency of T would declare a new dependency or change an apparent repository name, even if that repository neither contributes a runfile nor looks up runfiles at runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this seems very well thought out but I myself am having trouble imagining how all of this will fit together and foreseeing problems like the one mentioned here without actually looking at the final code. Would it be possible to have a prototype of the whole thing working for at least 2 main rules like Java and C++? I think that would make it easier to spot any issues in the design that cause big problems like unnecessary rerunning actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a prototype for this for Java. I will look into developing it a bit more and including C++.

designs/2022-07-21-locating-runfiles-with-bzlmod.md Outdated Show resolved Hide resolved

End users that do use Bzlmod currently can't use runfiles libraries successfully in all but the simplest cases (lookups of runfiles contained in the main repository), thus backwards compatibility is not a concern.

# Suggested follow-up changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we leaving anything behind that gets deprecated and would require house-keeping? Assuming a world where everyone is using Bzlmod, is anything left in the code base that is no longer necessary that would need to be cleaned up? If that's the case, can you have a section describing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of anything. Even though Bzlmod is the motivation for this proposal, there is no direct dependency between the two as repository mappings can already be used in WORKSPACE and would break Stardoc today. It just so happens that they are very rarely used (although I believe rules_kotlin relies on them).

fmeum added a commit to fmeum/bazel that referenced this pull request Aug 10, 2022
The new Starlark-defined RunfilesLibraryInfo provider may be used in the
future to decide whether additional information has to be emitted by a
rule for a target to find its runfiles at runtime. This may include
serializing repository mapping information to a runfile and/or
generating code that provides the canonical name of the current
repository.

The provider is added now rather than later so that third-party runfiles
libraries can rely on it being present earlier - they have to wait for
their minimum supported Bazel version to offer it. If it should turn out
not to be needed, it can safely be removed before any ruleset would have
a reason to depend on it.

Work towards bazelbuild/proposals#269
With `my_module` as the root module and Bzlmod enabled, the repository mapping manifest of `//:some_tool` would look as follows:

```
,my_module,my_workspace
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more explanations.

designs/2022-07-21-locating-runfiles-with-bzlmod.md Outdated Show resolved Hide resolved

End users that do use Bzlmod currently can't use runfiles libraries successfully in all but the simplest cases (lookups of runfiles contained in the main repository), thus backwards compatibility is not a concern.

# Suggested follow-up changes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of anything. Even though Bzlmod is the motivation for this proposal, there is no direct dependency between the two as repository mappings can already be used in WORKSPACE and would break Stardoc today. It just so happens that they are very rarely used (although I believe rules_kotlin relies on them).

- `T` transitively depends on a target in `C` that directly depends on a target advertising `RunfilesLibraryInfo`;
- `T`'s runfiles contain an artifact owned by a target in `repo_mapping(C, A)`.
This property ensures that the repository mapping manifest only contains the entries that are actually needed for the target to resolve its runfiles.
If all entries (instead of just this limited set) were emitted into the manifest, all actions depending on T would have to rerun if any repository containing a transitive dependency of T would declare a new dependency or change an apparent repository name, even if that repository neither contributes a runfile nor looks up runfiles at runtime.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a prototype for this for Java. I will look into developing it a bit more and including C++.

@fmeum
Copy link
Contributor Author

fmeum commented Aug 10, 2022

@oquenchil and everybody else who is interested in a demo: I created an example repo that you can try out the main parts of this proposal on using the linked rough prototype of a Bazel implementation of parts 1, 2, and 4 for Java and C++. The example binaries emit their repository mapping manifests as well as the value of the injected current repository name.

@oquenchil
Copy link
Contributor

LGTM

@fmeum
Copy link
Contributor Author

fmeum commented Aug 17, 2022

@oquenchil Thanks for the review, I changed the relevant section to state that we are planning to reuse RunfilesProvider. I also listed you as a reviewer.

Do you happen to see a way we could introduce RunfilesLibraryInfo as a Starlark-defined provider? I tried this in bazelbuild/bazel#16077 (comment), but hit the problem that builtins can't access anything under @bazel_tools.

@Wyverald Wyverald merged commit af3d29d into bazelbuild:main Aug 17, 2022
Comment on lines +42 to +45
2. [Stardoc](https://github.com/bazelbuild/stardoc) generates documentation from `.bzl` files and for that has to resolve labels in `load` statements to `.bzl` files on disk.
Since it doesn't actually evaluate `WORKSPACE` or `MODULE.bazel` files, it has no way of discovering repository mappings in the same way as Bazel.
Thus, with Bzlmod, Stardoc fails to generate documentation for any `.bzl` file that loads `.bzl` files from other repositories.
This issue is tracked by [bazelbuild/bazel#14140](https://github.com/bazelbuild/bazel/issues/14140) and [bazelbuild/stardoc#117](https://github.com/bazelbuild/stardoc/issues/117).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: my current plan (currently in early prototype) is for stardoc to avoid doing any kind of custom disk paths munging, and instead, read documentation from the compiled starlark module object stored in skyframe through exactly the same mechanism as what is currently used for implementing the load statement.

In other words: if you can load the .bzl file, you'll be able to generate docs from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great! Could you keep me updated on this on bazelbuild/bazel#16124?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants