Exposing package metadata like entrypoints when running tests #15481

achimnol · 2022-05-15T05:13:50Z

Is your feature request related to a problem? Please describe.
Currently pex and pants does not allow making editable installation of source packages.
I'd like to have a way to expose the package metadata (particularly, the entry points read via importlib.metadata.entry_points()) in the pex environment for pytest (./pants test ...) somehow.

Describe the solution you'd like

Having an option to make editable installations of designated python_distribution() targets inside the pex environment for pytest
Having an option to copy BUILD files into the pex environment for pytest
Better ideas?

Describe alternatives you've considered
I've written a custom BUILD file parser that scans and extracts entrypoint mappings from sibling directories and all subdirectories of the "{buildroot}/src" directory. This workaround works well when using the exported venv to run Python codes directly.
However, this method does not work because there is no BUILD files nor package metadata inside the pex environment for pytest.

Additional context
I'm working on a Python project that enumerates and loads its own (plugin) modules using importlib.metadata.entry_points().

The text was updated successfully, but these errors were encountered:

cognifloyd · 2023-02-02T00:10:39Z

I just noticed this. In #18132 I added a rule that generates an entry_points.txt file for use by openstack/stevedore. Effectively, that simulates installing (part of) the python_distribution for tests.

That rule only includes a subset of the entry points. Something like that rule could probably generate an entry_points.txt file for all of the entrypoints in a python_distribution if it is a dependency of the python_tests target.

@kaos

## About Editable Installs Editable installs were traditionally done via pip with `pip install --editable`. It is primarily useful during development when software needs access to the entry points metadata. When [PEP 517](https://peps.python.org/pep-0517/) was adopted, they punted on how to allow for editable installs. [PEP 660](https://peps.python.org/pep-0660/) extended the PEP 517 backend API to support building "editable" wheels. Therefore, there is now a standard way to collect and install the metadata for "editable" installs, using the "editable" wheel as an interchange between the backend (which collects the metadata + builds the editable wheel) and the frontend (which marshals the backend to perform a user-requested "editable" install). ## Why would we need editable installs in pants? I need editable installs in pants-exported virtualenvs so that dev tools outside of pants have access to: - The locked requirements - The editable sources on the python path - The entry points (and any other package metadata that can be loaded from `dist-info`. Entry points is the biggest most impactful example) I need to point all the external dev tooling at a virtualenv, and technically I could export a pex that includes all of the python-distributions pre-installed and use pex-tools to create a virtualenv, but then I would have to recreate that venv for every dev change wouldn't be a good dev experience. One of those dev tools is `nosetest`. I considred using `run` to support running that, but I am leary of adding all the complex BUILD machinery to support running a tool that I'm trying to get rid of. Editable installs is a more generic solution that serves my current needs, while allowing for using it in other scenarios. This PR comes in part from #16621 (comment) ## Overview of this PR ### Scope & Future work This PR focuses on adding editable installs to exported virtualenvs. Two other issues ask for editable installs while running tests: - #11386 - #15481 We can probably reuse a significant portion of this to generate editable wheels for use in testing as well. Parts of this code will need to be refactored to support that use case. But we also have to figure out the UX for how users will define dependencies on a `python_distribution` when they want an editable install instead of the built wheel to show up in the sandbox. Anyway, addressing that is out of scope for this PR. ### New option `[export].py_editables_in_resolves` (a `StrListOption`) This option allows user resolves to opt in to using the editable installs. After [consulting](https://pantsbuild.slack.com/archives/C0D7TNJHL/p1680810411706569?thread_ts=1680809713.310039&cid=C0D7TNJHL) with @kaos, I decided to add an option for this instead of always trying to generate/install the editable wheels. > `python_distribution` does not have a `resolve` field. So figuring out which resolve a `python_distribution` belongs to can be expensive: calculating the owned deps of all distributions, and for each distribution, look through those deps until one of them has a resolve field, and use that for that dist’s resolve. > > Plus there’s the cost of building the PEP-660 wheels - if the configured PEP-517 build backend doesn’t support the PEP-660 methods, then it falls back to a method that is, sadly, optional in PEP-517. If that method isn’t there, then it falls back to telling that backend to build the whole wheel, and then it extracts just the dist-info directory from it and discards the rest. > > So, installing these editable wheels isn’t free. It’ll slow down the export, though I’m not sure by how much. For StackStorm, I plan to set this in `pants.toml` for the default resolve that has python_distributions. Even without this option, I tried to bail out early if there were no `python_distribution`s to install. ### Installing editable wheels for exports I added this feature to the new export code path, which requires using `export --resolve=`. The legacy codepath, which uses cli specs `export <address specs>` did not change at all. I also ignored the `tool` resolves which cannot have any relevant dists (and `tool` resolves are deprecated anyway). Also, this is only for `mutable_virtualenv` exports, as we need modify the virtualenv to install the editable wheels in the venv after pex creates it from the lockfile. When exporting a user resolve, we do a `Get(EditableLocalDists, EditableLocalDistsRequest(resolve=resolve))`: _I'll skip over exactly how this builds the wheels for now so this section can focus on how installing works._ https://github.com/pantsbuild/pants/blob/f3a4620e81713f5022bf9a2dd1a4aa5ca100d1af/src/python/pants/backend/python/goals/export.py#L373-L379 As described in the commit message of b5aa26a, I tried a variety of methods for installing the editable wheels using pex. Ultimately, the best I came up with is telling pex that the digest containing our editable wheels are `sources` when building the `requirements.pex` used to populate the venv, so that they would land in the virtualenv (even though they land as plain wheel files. Then we run `pex-tools` in a `PostProcessingCommand` to create and populate the virtualenv, just as we did before this PR. Once the virtualenv is created, we add 3 more `PostProcessingCommands` to actually do the editable install. In this step, Pants is essentially acting as the PEP-660 front end, though we use pip for some of the heavy lifting. These commands: 1. move the editable wheels out of the virtualenv lib directory to the temp dir that gets deleted at the end of the export 2. use pip to install all of the editable wheels (which contain a `.pth` file that injects the source dir into `sys.path` and a `.dist-info` directory with dist metadata such as entry points). 3. replace some of the pip-generated install metadata (`*.dist-info/direct_url.json`) with our own so that we comply with PEP-660 and mark the install as editable with a file url pointing the the sources in the build_root (vs in a sandbox). Now, anything that uses the exported venv should have access to the standardized package metadata (in `.dist-info`) and the relevant source roots should be automatically included in `sys.path`. ### Building PEP-660 editable wheels The logic that actually builds the editable wheels is in `pants.backend.python.util_rules.local_dists_pep660`. Building these wheels requires the same chroot that pants uses to build regular wheels and sdists. So, I refactored the rule in `util_rules.setup_py` so that I could reuse the part that builds the `DistBuildRequest`. These `local_dists_pep660` rules do approx this, starting with the rule called in export: - `Get(EditableLocalDists, EditableLocalDistsRequest(resolve=resolve))` uses rule `build_editable_local_dists` - injected arg: `ResolveSortedPythonDistributionTargets` comes from rule: `sort_all_python_distributions_by_resolve` - injected arg: `AllPythonDistributionTargets` comes from rule: `find_all_python_distributions` - `Get(LocalDistPEP660Wheels, PythonDistributionFieldSet.create(dist))` for each dist in the resolve uses rule: `isolate_local_dist_pep660_wheels` - create `DistBuildRequest` using the `create_dist_build_request` method I exposed in `util_rules.setup_py` - `Get(PEP660BuildResult, DistBuildRequest)` uses rule: `run_pep660_build` - generates the `.pth` file that goes in the editable wheel - runs a PEP 517 build backend wrapper script I wrote - uses the PEP 517 build backend configured for the `python_distribution` to generate the `.dist-info` directory - generates the `WHEEL` and `RECORD` file to build a conformant wheel file - includes the `.pth` file previously generated (and placed in the sandbox with the wrapper script) - uses `zipfile` to build the wheel (using a vendored+modified function from the `wheel` package). - prints a path to the generated wheel - collects the generated editable wheel into a digest and collects metadata about the digest similar to how the `local_dists` rules do. - merges the editable wheel digests for all of the `python_distribution` targets. This gets wrapped in `EditableLocalDists` Much of the rule logic was based on (copied then modified): `pants.backend.python.util_rules.dists` and `pants.backend.python.util_rules.local_dists`.

cognifloyd · 2024-05-25T03:20:43Z

We need a way for python_test[s] to depend on one, or more, or all, entry points of a python_distribution.

Each entry point is a subset of the python_distribution, specifically it is:

metadata about the entry point
a dependency on the python file, class, or function that implements the entry point

So, I can see a couple of approaches to modeling a dependency like this. In either case, some codegen would kick in to add the entry points metadata file(s) to the sandbox when depending on them. Depending on the distribution itself should still yield the wheel.

Option 1: add field for entry_points dependencies on `python_test[s]` targets

This would be similar to the stevedore_namespaces field causing the test to depend on the individual entry point implementations. We would probably need to extend the address syntax so we can specify which (or all) of the entry points to depend on.

Option 2: turn the `python_distribution` into a target generator

Unlike option 1, we can reuse the existing address syntax.

The python_distribution target generates a python_entry_point target for each entry point. That generated target has an explicit dependency on the entry point implementation.
It also generates a python_entry_points target that depends on all of the generated python_entry_point targets.
The python_distribution itself depends on the generated python_entry_points target instead of getting an inferred dependency on the entry points implementations.

In this way, you could depend on:

path/to/python_distribution:tgt_name#some.entry.point.name
path/to/python_distribution:tgt_name#console_scripts.name
path/to/python_distribution:tgt_name#all_entry_points

Are generated targets supposed to depend on the generator? Or can the generator depend on its generated targets?
Are there any restrictions on the characters used in the generated targets' names?

cognifloyd · 2024-05-25T16:55:47Z

I'm leaning towards turning python_distribution into a target generator.

But my thinking was slightly muddled in my last comment. There are two levels of entry point: group and name. So, maybe you could depend on all entry points in a group, or on an individual group+name.

Using a target in the root, addresses might be:

//:python_distribution_name#console_scripts
//:python_distribution_name#console_scripts[foo-bar]
//:python_distribution_name#console_scripts|foo-bar
//:python_distribution_name#console_scripts/foo-bar

Or for custom groups like stevedore namespaces:

//:python_distribution_name#st2common.runners.runner
//:python_distribution_name#st2common.runners.runner[action-chain]
//:python_distribution_name#st2common.runners.runner|action-chain
//:python_distribution_name#st2common.runners.runner/action-chain

Which option do you like better?
A. [...]
B. |
C. /

I think I like C best.

kaos · 2024-05-25T21:01:44Z

Why do you need to pick apart the distribution?
Can't you either a) depend on just the code for the entry point side stepping the distribution, or b) depend on the whole distribution and get all its entry points but only use what ever is being required for the test?

cognifloyd · 2024-05-26T04:25:16Z

Why do you need to pick apart the distribution?

Because I just need the entry points metadata in the sandbox; I don't want to make all my tests depend on the whole distribution and all its dependencies as that would negate the benefits of fine gained caching.

Can't you either a) depend on just the code for the entry point side stepping the distribution,

That is only half of what I need. Somehow I need the entry points metadata file to also get written to the sandbox. Effectively that file "registers" plugins (like flake8 plugins, or stevedore extensions, etc) so it can be discovered at runtime via pkg_resources.

or b) depend on the whole distribution and get all its entry points but only use what ever is being required for the test?

I tried this, and got some weird errors about 3rd party dependencies missing from the sandbox. I'm not entirely sure what is supposed to happen if you add a python_distribution in the dependencies field of a python_tests target. I thought, incorrectly, that it put the wheel in the sandbox so you can test installing the wheel itself. According to the docs, that behavior only occurs if the dependency is registered via the runtime_package_dependencies field. So, does depending on a python_distribution result in pulling in all of its transitive dependencies, or is it supposed to actually install the python_distribution in the sandbox? If it only pulls in the transitive dependencies, then that is not enough, as I need the entry_points.txt file to be installed in the sandbox. If it does actually install the distribution, then that's over kill, as it makes the test depend on the entire wheel and all its source files instead of just the entry point (s) under test.

Outside of pants, this is achieved by doing an editable install of sources. That generates/installs the entry_points.txt file and other files leaving the sources in their original directory. Pants can export a venv with editable installs of the sources, so we have logic in pants to do something similar. But pants does not use editable installs internally, it manipulates PYTHONPATH and the PEX equivalent to make first party sources available in the sandbox. So we need some way to get the metadata that would be installed with the wheel, or with an editable wheel, into the sandbox. This issue is particularly about getting the entry_points.txt file into the sandbox.

cognifloyd · 2024-06-06T16:04:10Z

Aargh. Turning PythonDistribution into a TargetGenerator has a significant downside. The python_distribution itself cannot be parametrized if it is a generator:

pants/src/python/pants/engine/internals/graph.py

Lines 323 to 327 in e5aff16

    
           raise InvalidFieldException( 
        
               f"Only fields which will be moved to generated targets may be parametrized, " 
        
               f"so target generator {address} (with type {target_type.alias}) cannot " 
        
               f"parametrize the {generator_fields_parametrized_text} {noun}." 
        
           )

Maybe I can use ~~synthetic targets~~ instead of generated targets? edit: Nope. synthetic targets happens too early to be useful for this.

edit: maybe a context_aware_object_factory... But, then I would have to move python_distribution to a different alias which would be ugly.

cognifloyd · 2024-06-07T06:22:09Z

I can't find a way to break python_distribution into multiple targets so that the entry_points metadata is addressable. So, I guess that leaves creating a custom Dependencies-like field on python_test/python_tests that allows for specifying a python_distribution address AND an entry point group or group/name to depend on (or all of them).

Maybe:

python_tests(
    entry_point_dependencies={
        "//address/to:python_distro_tgt_1": ["*"],  # all entry points
        # only one group of entry points
        "//address/to:python_distro_tgt_2": ["console_scripts"],
        "//address/to:python_distro_tgt_4": ["st2common.runners.runner"],
        # or multiple groups of entry points
        "//address/to:python_distro_tgt_5": ["console_scripts", "st2common.runners.runner"],
        # or 1+ individual entry points
        "//address/to:python_distro_tgt_6": ["console_scripts/foo-bar"],
        "//address/to:python_distro_tgt_7": ["console_scripts/foo-bar", "console_scripts/baz"],
        "//address/to:python_distro_tgt_8": ["st2common.runners.runner/action-chain", "console_scripts/foo-bar"],
    }
)

kaos · 2024-06-07T06:30:00Z

Perhaps this is a good time to crack the annotated edges (dependencies) feature that pops up every so often. The long standing issue been that we can't classify dependencies, such as for the jvm and the runtime vs test dependencies and what not.

I think this could fit in there as well.

cognifloyd · 2024-06-07T06:46:19Z

Perhaps this is a good time to crack the annotated edges (dependencies) feature that pops up every so often. The long standing issue been that we can't classify dependencies, such as for the jvm and the runtime vs test dependencies and what not.

I think this could fit in there as well.

Agreed. Annotated dependencies sounds great.

For python_distribution, that would mean recording one or more ways to depend on a python_distribution:

you need the wheel at runtime (currently covered by the runtime_package_dependencies)
you just need all the transitive dependencies (the python_sources etc) to be available in the sandbox
you need the entry_points, or a subset of them, to be "installed" in the sandbox so that pkg_resources can detect them, or so that a test can do something like run one of the console_scripts as a subprocess.

cognifloyd · 2024-06-07T07:11:56Z

Some links to other places where annotated dependencies vs multiple fields has been discussed:

cognifloyd · 2024-06-15T04:02:30Z

I'm adding an entry_point_dependencies to python_test/s targets in: #21062
I believe that will close this request. Does that interface make sense + work for the use cases in this issue?

cognifloyd · 2024-06-15T04:03:29Z

Also, this issue seems to be a duplicate of #11386. Right?

achimnol added the enhancement label May 15, 2022

thejcannon added the backend: Python Python backend-related issues label Jun 7, 2022

cognifloyd mentioned this issue Feb 2, 2023

Add backend for projects that use openstack/stevedore #18132

Merged

cognifloyd mentioned this issue Apr 7, 2023

python_distribution editable installs in exports #18639

Merged

cognifloyd mentioned this issue Jun 14, 2024

Generate entry_points.txt for python_tests that require entry points from a python_distribution #21062

Merged

cognifloyd closed this as completed in #21062 Jun 20, 2024

cognifloyd closed this as completed in e6b377d Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exposing package metadata like entrypoints when running tests #15481

Exposing package metadata like entrypoints when running tests #15481

achimnol commented May 15, 2022 •

edited

Loading

cognifloyd commented Feb 2, 2023

cognifloyd commented May 25, 2024 •

edited

Loading

cognifloyd commented May 25, 2024 •

edited

Loading

kaos commented May 25, 2024

cognifloyd commented May 26, 2024

cognifloyd commented Jun 6, 2024 •

edited

Loading

cognifloyd commented Jun 7, 2024 •

edited

Loading

kaos commented Jun 7, 2024

cognifloyd commented Jun 7, 2024

cognifloyd commented Jun 7, 2024 •

edited

Loading

cognifloyd commented Jun 15, 2024

cognifloyd commented Jun 15, 2024

Exposing package metadata like entrypoints when running tests #15481

Exposing package metadata like entrypoints when running tests #15481

Comments

achimnol commented May 15, 2022 • edited Loading

cognifloyd commented Feb 2, 2023

cognifloyd commented May 25, 2024 • edited Loading

Option 1: add field for entry_points dependencies on python_test[s] targets

Option 2: turn the python_distribution into a target generator

cognifloyd commented May 25, 2024 • edited Loading

kaos commented May 25, 2024

cognifloyd commented May 26, 2024

cognifloyd commented Jun 6, 2024 • edited Loading

cognifloyd commented Jun 7, 2024 • edited Loading

kaos commented Jun 7, 2024

cognifloyd commented Jun 7, 2024

cognifloyd commented Jun 7, 2024 • edited Loading

cognifloyd commented Jun 15, 2024

cognifloyd commented Jun 15, 2024

achimnol commented May 15, 2022 •

edited

Loading

cognifloyd commented May 25, 2024 •

edited

Loading

Option 1: add field for entry_points dependencies on `python_test[s]` targets

Option 2: turn the `python_distribution` into a target generator

cognifloyd commented May 25, 2024 •

edited

Loading

cognifloyd commented Jun 6, 2024 •

edited

Loading

cognifloyd commented Jun 7, 2024 •

edited

Loading

cognifloyd commented Jun 7, 2024 •

edited

Loading