Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py_common API #1647

Open
UebelAndre opened this issue May 22, 2022 · 20 comments
Open

py_common API #1647

UebelAndre opened this issue May 22, 2022 · 20 comments

Comments

@UebelAndre
Copy link
Contributor

Description of the feature request:

Similar to cc_common is there a py_common available?

What underlying problem are you trying to solve with this feature?

I would like to have some rules which generate additional actions but would otherwise be interchangeable with a py_library. I've done similar things before using the cc_common API but cannot find one for python.

Which operating system are you running Bazel on?

Any

What is the output of bazel info release?

release 5.1.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@comius
Copy link
Contributor

comius commented May 24, 2022

cc @rickeylev

py_common is not available at the moment. We're doing rewrite on Python rules to Starlark, starting with the internal ones and then the ones native in Bazel. Implementing py_common now would interfere with the process, however I see no problem distilling it from Starlark implementation when it's finished.

As such I marked this as P4, but will upgrade it to P2 once Starlarkification is done.

@UebelAndre
Copy link
Contributor Author

@comius is there a tracking issue for porting the python rules to starlark?

@UebelAndre
Copy link
Contributor Author

@comius @rickeylev friendly ping. Is there a tracking issue for the port to starlark?

@rickeylev
Copy link
Collaborator

rickeylev commented Jul 4, 2022 via email

@UebelAndre
Copy link
Contributor Author

Could you tell more about the problem you're trying to solve, and features you want? Internally, we have 3-5+ cases that could be described as "create a drop in replacement for py library", but each has some small differences that need different "plugin" points. So I'm curious to know which part of rule behavior your logic needs to intercept/customize/keep/drop

I think the case that would be most impactful to me is being able to take my cc_binary-like rule and update it to provide an interface to python targets I have without needing to make a tree of macros as some of the attributes are hard to manage when select is used for some attributes. I'd come across the desire to do this a couple of times before writing this issue. In most cases I have a web of macros that could easily be condensed to a single rule.

@rickeylev would you be able to enumerate the cases you're aware of?

@rickeylev
Copy link
Collaborator

fyi, i created bazelbuild/bazel#15897 as a tracking issue for the starlark rewrite

@rickeylev rickeylev self-assigned this Oct 6, 2022
@rickeylev
Copy link
Collaborator

A quick brain dump of some known points of customization:

  • Being able to invoke pyc compilation as part of library building; similarly where to put such files in the output
  • Writing out the executable script (internally, we replace this with a C executable; aspect does something to create venv-startup wrapper thing; internally we use a different stub script)
  • Attaching aspects to various attributes and still acting as a regular py_library (useful for e.g. type checkers)
  • Replicating env and args expansion for a "packaging" rule around a binary (e.g., you should be able to drop-in replace a regular py_binary with a par-based, pyoxidze-based, etc binary rule and have its args/env behavior be the same)
  • Merging PyInfo objects (i.e. PyInfo equiv of cc_common.merge_cc_infos)
  • Populating/propagating providers that a regular py_library would (this allows more flexibility and interoperability between the canonical py_library and custom variants of it)
  • Being able to create compatible attribute definitions (e.g., the "stamp" attribute varies for tests vs binaries, or could vary by a particular type of binary; most variants of the canonical py_X rules will want to support largely the same attrs with the same behavior as the canonical rule)
  • Maybe also helpers to help identify where to put files in runfiles

@rickeylev
Copy link
Collaborator

From Phillip Shrader over slack, the sort of customization they do at Aurora:

Our current approach is: bazel-generated wrapper calls into a shell script we wrote that sets up some hermeticity things and eventually calls path/to/python3 -SsB path/to/hermetic_python3.py actual/file/of/interest.py . And then path/to/hermetic_python3.py does a few things like clearing sys.path[0] and a few other hermeticity things and then eventually delegates to actual/file/of/interest.py .
So another potential interest of mine is to maybe combine the bazel generated wrapper with our shell script to reduce the number of steps involved in setting up all the hermetic stuff.

Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@UebelAndre
Copy link
Contributor Author

Cc @rickeylev this is still desired but maybe the issue can be moved to rules_puthon? Now that Bazel 7 is out and we can use the starlark rules there?

@rickeylev rickeylev transferred this issue from bazelbuild/bazel Dec 21, 2023
@UebelAndre
Copy link
Contributor Author

@rickeylev do you know if this would be something that's easy to do? I still find myself wanting to make custom rules that build python binaries. Having an interface for that would be fantastic

@UebelAndre
Copy link
Contributor Author

@rickeylev would it be acceptable to have an experimental flag that gates the use of some py_common struct that would contain a single function, py_executable_base_impl, for now?

def py_executable_base_impl(ctx, *, semantics, is_test, inherited_environment = []):
"""Base rule implementation for a Python executable.
Google and Bazel call this common base and apply customizations using the
semantics object.
Args:
ctx: The rule ctx
semantics: BinarySemantics struct; see create_binary_semantics_struct()
is_test: bool, True if the rule is a test rule (has `test=True`),
False if not (has `executable=True`)
inherited_environment: List of str; additional environment variable
names that should be inherited from the runtime environment when the
executable is run.
Returns:
DefaultInfo provider for the executable
"""

This would cover my primary use for py_common which is simply writing Bazel rules that equate to py_binary or py_test.

The only changes to py_executable_base_impl (or whatever would be a part of the interface) is that ctx.attrs and ctx.files are pulled in as a separate parameters so other rules have the chance to address missing fields without actually needing to add attributes to the rule. This approach would have some splash damage where access directly from ctx would need to be updated but the changes should be otherwise pretty straight forward.

@UebelAndre
Copy link
Contributor Author

@aignas do you have any thoughts here?

@arrdem
Copy link
Contributor

arrdem commented Aug 5, 2024

This is to some degree a dupe of my #1860; interested in whatever the outcome here is.

@UebelAndre
Copy link
Contributor Author

I would be willing to help drive some MVP but I'm concerned about potentially wasting time without prior design sign-off from the maintainers.

@aignas
Copy link
Collaborator

aignas commented Aug 6, 2024

Playaing devil's advocate here - why would we have to support this API when bazel is working on subrule support? Is it not possible to use the py_binary and friends together with the subrule?

Aside from that, I am personally not super familiar with that part of the code, so cannot comment on implications or the maintenance burden in the long run. The request in general looks reasonable, but I would love to not expose anything that is not needed and the semantics arg should not be exposed, IMHO.

IMHO using subrule as described in https://docs.google.com/document/d/1RbNC88QieKvBEwir7iV5zZU08AaMlOzxhVkPnmKDedQ/edit#heading=h.9v5sndhcy9u and decomposing the py_binary into parts would be ideal.

@UebelAndre
Copy link
Contributor Author

Playaing devil's advocate here - why would we have to support this API when bazel is working on subrule support? Is it not possible to use the py_binary and friends together with the subrule?

If subrule (which I've not heard about until just now) exists then I can probably do what I want which is to write a rule which creates a py_binary or py_test and there wouldn't be a need for this interface. That being said though, what is time timeline for subrule? Adding py_common would be ideal if there was an unbounded timeline. This is behavior that I and others I know have wanted for a while and the lack of it had lead to very clunky/error-prone macros that ought to be rules.

Aside from that, I am personally not super familiar with that part of the code, so cannot comment on implications or the maintenance burden in the long run. The request in general looks reasonable, but I would love to not expose anything that is not needed and the semantics arg should not be exposed, IMHO.

Do you know who would have this context and could you ping them on this issue? Again, I ultimately don't care what form the implementation takes as long as the result is some publicly accessible API for creating python binaries in custom rules (and is available within the span of a month or two).

@rickeylev
Copy link
Collaborator

I'm +1 on creating some sort of py_common API. It might be easier to discuss the API of it given some more concrete use cases.

subrule timeline

I think it's supposed to be part of Bazel 8? Not sure on it wrt Bazel 7.

exposing semantics

Semantics shouldn't be exposed. It's purely a mechanism to make it somewhat easier for Google to internally patch the code. It's also not a very stable API.

The def py_executable_base_impl(ctx, *, semantics, is_test, inherited_environment = []): signature

This signature is pretty internal, so exposing it as-is would be a bad idea. Exposing something without the semantics, is_test, and inherited_environment args seems pretty reasonable, though. For reference:

  • is_test: Right now, it's only used for C++ related integrations that are only active on the Google side of things (though most are also possible with regular Bazel). It only exists as an arg because, surprisingly, Starlark ctx doesn't actually tell if things are test=true or executable=true (these settings are mutually exclusive). In retrospect, I should have just put a Java helper in to tell the value instead of wiring an arg through. Maybe it should just move to an attribute, which is the next best thing.
  • inherited_environment is for the env_inherit attribute. This attribute is historically test-only, but now with Document behavior of custom env and env_inherit attributes in Starlark rules bazel#12758, regular binaries can also implement it if they so choose. It should probably just graduate into an env_inherit attribute both tests and binaries have, then this arg can go away

re: API considerations:

The main consideration I have is: I don't want otherwise internal refactorings or non-breaking feature additions to be constrained because of a lower-level of API access. Some examples being:

  • If we need to add/remove/change some of the private attributes.
  • Adding a new public attribute.
  • If we need to change the toolchain types an executable needs
  • Really any of the args passed to rule()

What this loosely translates to is: exposing py_executable_base_impl isn't sufficient. What has to be exposed is something like create_executable_rule(), so that all the args passed to rule() can be properly setup.

If there aren't any modifications of the existing attributes/rule-args desired, then exposing something like create_executable_rule() should be mostly straight forward. Maybe the implementation function you pass is called with (ctx, struct_of_stuff) ?

If there are modifications to the existing rule-args, then, well, that might be a bit more messy, but I'm thinking a builder-pattern esq dict-of-args might work OK.

If you look around python/private/common, you can see a broke out various things in an attempt to be composable, and even the tests in tests/base_rules are somewhat setup in a "conformance test" type of way. But, I never got around to refactors to replace semantics and implement pytype type checking rules using a py_common type of approach.

@UebelAndre
Copy link
Contributor Author

I would be happy to try to factor out an interface so consumers of rules_python can also create executables. To provide a concrete example I've created https://github.com/UebelAndre/rules_pytest which has been a tricky set of rules to get right due to the lack of a py_common interface that lets me create a python executable. I'm hoping to do something like the following to convert the series of rules and macro into one single rule which does all the necessary configuration for pytest.

periareon/rules_pytest@main...py_common

The mock interface I was playing with looks like

py_exec_info = py_common.create_executable(
        ctx = ctx,
        toolchain = py_toolchain,
        main = ctx.file._main,
        legacy_create_init = False,
        deps = [pytest_toolchain.pytest] + ctx.attr.deps,
        is_test = True,
    )

where the function signature might look something like

def py_common_create_executable(
    ctx,
    toolchain,
    main,
    legacy_create_init,
    deps,
    is_test,
):
    """Create a python executable

    Args:
        ctx (ctx): The current rule's context object
        toolchain (py_toolchain): The python toolchain
        main (File): The source file to use as the entry point.
        legacy_create_init (bool): Whether to implicitly create empty `__init__.py` files/
        deps (list): A list of python dependencies.
        is_test (bool): Indicates if the executable is a test executable.

    Returns:
        struct: A struct containiging:
            - executable: The executable created.
            - runfiles: All associated runfiles for the executable.
    """

Would you say this is in the ballpark of possibilities? What changes would you make? I think if we can figure out the interface then the internals can change quickly and without impacting users.

github-merge-queue bot pushed a commit that referenced this issue Sep 27, 2024
…#2251)

This add some builders "classes" to make constructing PyInfo, depsets,
and runfiles
a bit cleaner. Previously, building such objects required creating
multiple local
variables for each arg/attribute of the object (e.g. list_of_files and
list_of_depsets
for building a depset). Keeping track of all the variables was verbose
and tedious.

To make it easier, encapsulate that bookkeeping in some objects.
Accumulating parts is
now much easier: builders have an `add()` method that looks at the input
and stores it
in the appropriate place. 

This builders will be exposed to users in an upcoming PR to make
building and merging
PyInfo objects easier.

Work towards #1647
@rickeylev
Copy link
Collaborator

Small update in this area:

We have a design for exposing analysis-time APIs. #2252 is the first installment of this to add a merge_py_infos() function.

The loading-phase side of things isn't addressed yet, though. i.e. how to define a rule with necessary attributes (toolchains, fragments, implicit attributes, exec groups -- all the stuff that rule() needs in order for later ctx.whatever to work). Some of this is mitigated by the analysis-time API design I mention above, but not all.

The basic idea I'm thinking for this is a function that returns a mutable dict of the args that would be passed to rule(). This allows the caller to modify them any way they please, while allowing them to automatically get any "defaults" that a py rule requires. (This idea is inspired by how Bazel internally defined rules in Java: a base Builder class would define the rule, then sub-classes would call various override/add/remove functions to modify it before building it into a rule definition.)

So, something like:

load(":pyrule.bzl", "pyrule")

def create_my_py_library():
  kwargs = pyrule.library_rule_kwargs()
  kwargs["attrs"]["srcs"]["aspects"].append(my_aspect)
  kwargs["toolchains"].append("//extra:toolchain")
  kwargs["attrs"]["deps"]["default"] = "//some:target"
  kwargs["provides"].append(MyInfo)
  kwargs["cfg"] = wrap_with_my_transition(kwargs.get("cfg"))
  kwargs["attrs"]["imports"]["mandatory"] = True
  kwargs["exec_groups"]["py_precompile"]["exec_compatible_with"] = ["@platforms//:linux"]

  # convert the mutable attr dicts to attr objects, etc
  kwargs = py_rule.finalize(kwargs)
  return rule(
    implementation = my_impl,
    **py_rule.finalize(kwargs)
  )

my_py_library = create_my_py_library()

github-merge-queue bot pushed a commit that referenced this issue Oct 7, 2024
This adds a public API for rules (i.e. analysis-phase code) to use code
from rules_python.
The main motivation for this is so that users can propagate PyInfo
without having to know
all the fields of PyInfo and implement the merging logic. With upcoming
PRs adding additional
fields to PyInfo, this becomes much more important.

The way the API is exposed is through a target. There are three reasons
for this:
1. It avoids loading phase costs when the implementation of the API
functions change.
Within Google, this makes changes to rules_python much cheaper and
easier to submit
and revert. This also allows us to worry less about the loading-phase
impact of
   our code.
2. Because a target can have dependencies, it allows us to hide some
details
from users. For example, if we want a flag to affect behavior, we can
add it to the
API target's attributes; users don't have to add it to their rule's
attributes
3. By having the API take the user's `ctx` as an argument, it allows us
to capture it
and use it as part of future API calls (this isn't used now, but gives
us
   flexibility in the future).

Work towards #1647
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants