Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing package metadata like entrypoints when running tests #15481

Closed
achimnol opened this issue May 15, 2022 · 12 comments · Fixed by #21062
Closed

Exposing package metadata like entrypoints when running tests #15481

achimnol opened this issue May 15, 2022 · 12 comments · Fixed by #21062
Labels
backend: Python Python backend-related issues enhancement

Comments

@achimnol
Copy link
Sponsor

achimnol commented May 15, 2022

Is your feature request related to a problem? Please describe.
Currently pex and pants does not allow making editable installation of source packages.
I'd like to have a way to expose the package metadata (particularly, the entry points read via importlib.metadata.entry_points()) in the pex environment for pytest (./pants test ...) somehow.

Describe the solution you'd like

  • Having an option to make editable installations of designated python_distribution() targets inside the pex environment for pytest
  • Having an option to copy BUILD files into the pex environment for pytest
  • Better ideas?

Describe alternatives you've considered
I've written a custom BUILD file parser that scans and extracts entrypoint mappings from sibling directories and all subdirectories of the "{buildroot}/src" directory. This workaround works well when using the exported venv to run Python codes directly.
However, this method does not work because there is no BUILD files nor package metadata inside the pex environment for pytest.

Additional context
I'm working on a Python project that enumerates and loads its own (plugin) modules using importlib.metadata.entry_points().

@thejcannon thejcannon added the backend: Python Python backend-related issues label Jun 7, 2022
@cognifloyd
Copy link
Member

I just noticed this. In #18132 I added a rule that generates an entry_points.txt file for use by openstack/stevedore. Effectively, that simulates installing (part of) the python_distribution for tests.

That rule only includes a subset of the entry points. Something like that rule could probably generate an entry_points.txt file for all of the entrypoints in a python_distribution if it is a dependency of the python_tests target.

cognifloyd added a commit that referenced this issue Apr 10, 2023
## About Editable Installs

Editable installs were traditionally done via pip with `pip install
--editable`. It is primarily useful during development when software
needs access to the entry points metadata.

When [PEP 517](https://peps.python.org/pep-0517/) was adopted, they
punted on how to allow for editable installs. [PEP
660](https://peps.python.org/pep-0660/) extended the PEP 517 backend API
to support building "editable" wheels. Therefore, there is now a
standard way to collect and install the metadata for "editable"
installs, using the "editable" wheel as an interchange between the
backend (which collects the metadata + builds the editable wheel) and
the frontend (which marshals the backend to perform a user-requested
"editable" install).

## Why would we need editable installs in pants?

I need editable installs in pants-exported virtualenvs so that dev tools
outside of pants have access to:
- The locked requirements
- The editable sources on the python path
- The entry points (and any other package metadata that can be loaded
from `dist-info`. Entry points is the biggest most impactful example)

I need to point all the external dev tooling at a virtualenv, and
technically I could export a pex that includes all of the
python-distributions pre-installed and use pex-tools to create a
virtualenv, but then I would have to recreate that venv for every dev
change wouldn't be a good dev experience.

One of those dev tools is `nosetest`. I considred using `run` to support
running that, but I am leary of adding all the complex BUILD machinery
to support running a tool that I'm trying to get rid of. Editable
installs is a more generic solution that serves my current needs, while
allowing for using it in other scenarios.

This PR comes in part from
#16621 (comment)

## Overview of this PR

### Scope & Future work

This PR focuses on adding editable installs to exported virtualenvs. Two
other issues ask for editable installs while running tests:
- #11386
- #15481

We can probably reuse a significant portion of this to generate editable
wheels for use in testing as well. Parts of this code will need to be
refactored to support that use case. But we also have to figure out the
UX for how users will define dependencies on a `python_distribution`
when they want an editable install instead of the built wheel to show up
in the sandbox. Anyway, addressing that is out of scope for this PR.

### New option `[export].py_editables_in_resolves` (a `StrListOption`)

This option allows user resolves to opt in to using the editable
installs. After
[consulting](https://pantsbuild.slack.com/archives/C0D7TNJHL/p1680810411706569?thread_ts=1680809713.310039&cid=C0D7TNJHL)
with @kaos, I decided to add an option for this instead of always trying
to generate/install the editable wheels.

> `python_distribution` does not have a `resolve` field. So figuring out
which resolve a `python_distribution` belongs to can be expensive:
calculating the owned deps of all distributions, and for each
distribution, look through those deps until one of them has a resolve
field, and use that for that dist’s resolve.
>
> Plus there’s the cost of building the PEP-660 wheels - if the
configured PEP-517 build backend doesn’t support the PEP-660 methods,
then it falls back to a method that is, sadly, optional in PEP-517. If
that method isn’t there, then it falls back to telling that backend to
build the whole wheel, and then it extracts just the dist-info directory
from it and discards the rest.
>
> So, installing these editable wheels isn’t free. It’ll slow down the
export, though I’m not sure by how much.

For StackStorm, I plan to set this in `pants.toml` for the default
resolve that has python_distributions.

Even without this option, I tried to bail out early if there were no
`python_distribution`s to install.

### Installing editable wheels for exports

I added this feature to the new export code path, which requires using
`export --resolve=`. The legacy codepath, which uses cli specs `export
<address specs>` did not change at all. I also ignored the `tool`
resolves which cannot have any relevant dists (and `tool` resolves are
deprecated anyway). Also, this is only for `mutable_virtualenv` exports,
as we need modify the virtualenv to install the editable wheels in the
venv after pex creates it from the lockfile.

When exporting a user resolve, we do a `Get(EditableLocalDists,
EditableLocalDistsRequest(resolve=resolve))`: _I'll skip over exactly
how this builds the wheels for now so this section can focus on how
installing works._


https://github.com/pantsbuild/pants/blob/f3a4620e81713f5022bf9a2dd1a4aa5ca100d1af/src/python/pants/backend/python/goals/export.py#L373-L379

As described in the commit message of
b5aa26a, I tried a variety of methods
for installing the editable wheels using pex. Ultimately, the best I
came up with is telling pex that the digest containing our editable
wheels are `sources` when building the `requirements.pex` used to
populate the venv, so that they would land in the virtualenv (even
though they land as plain wheel files.

Then we run `pex-tools` in a `PostProcessingCommand` to create and
populate the virtualenv, just as we did before this PR.

Once the virtualenv is created, we add 3 more `PostProcessingCommands`
to actually do the editable install. In this step, Pants is essentially
acting as the PEP-660 front end, though we use pip for some of the heavy
lifting. These commands:
1. move the editable wheels out of the virtualenv lib directory to the
temp dir that gets deleted at the end of the export
2. use pip to install all of the editable wheels (which contain a `.pth`
file that injects the source dir into `sys.path` and a `.dist-info`
directory with dist metadata such as entry points).
3. replace some of the pip-generated install metadata
(`*.dist-info/direct_url.json`) with our own so that we comply with
PEP-660 and mark the install as editable with a file url pointing the
the sources in the build_root (vs in a sandbox).

Now, anything that uses the exported venv should have access to the
standardized package metadata (in `.dist-info`) and the relevant source
roots should be automatically included in `sys.path`.

### Building PEP-660 editable wheels

The logic that actually builds the editable wheels is in
`pants.backend.python.util_rules.local_dists_pep660`. Building these
wheels requires the same chroot that pants uses to build regular wheels
and sdists. So, I refactored the rule in `util_rules.setup_py` so that I
could reuse the part that builds the `DistBuildRequest`.

These `local_dists_pep660` rules do approx this, starting with the rule
called in export:
- `Get(EditableLocalDists, EditableLocalDistsRequest(resolve=resolve))`
uses rule `build_editable_local_dists`
- injected arg: `ResolveSortedPythonDistributionTargets` comes from
rule: `sort_all_python_distributions_by_resolve`
- injected arg: `AllPythonDistributionTargets` comes from rule:
`find_all_python_distributions`
- `Get(LocalDistPEP660Wheels, PythonDistributionFieldSet.create(dist))`
for each dist in the resolve uses rule:
`isolate_local_dist_pep660_wheels`
- create `DistBuildRequest` using the `create_dist_build_request` method
I exposed in `util_rules.setup_py`
- `Get(PEP660BuildResult, DistBuildRequest)` uses rule:
`run_pep660_build`
            - generates the `.pth` file that goes in the editable wheel
            - runs a PEP 517 build backend wrapper script I wrote
- uses the PEP 517 build backend configured for the
`python_distribution` to generate the `.dist-info` directory
- generates the `WHEEL` and `RECORD` file to build a conformant wheel
file
- includes the `.pth` file previously generated (and placed in the
sandbox with the wrapper script)
- uses `zipfile` to build the wheel (using a vendored+modified function
from the `wheel` package).
                - prints a path to the generated wheel
- collects the generated editable wheel into a digest and collects
metadata about the digest similar to how the `local_dists` rules do.
- merges the editable wheel digests for all of the `python_distribution`
targets. This gets wrapped in `EditableLocalDists`

Much of the rule logic was based on (copied then modified):
`pants.backend.python.util_rules.dists` and
`pants.backend.python.util_rules.local_dists`.
@cognifloyd
Copy link
Member

cognifloyd commented May 25, 2024

We need a way for python_test[s] to depend on one, or more, or all, entry points of a python_distribution.

Each entry point is a subset of the python_distribution, specifically it is:

  • metadata about the entry point
  • a dependency on the python file, class, or function that implements the entry point

So, I can see a couple of approaches to modeling a dependency like this. In either case, some codegen would kick in to add the entry points metadata file(s) to the sandbox when depending on them. Depending on the distribution itself should still yield the wheel.

Option 1: add field for entry_points dependencies on python_test[s] targets

This would be similar to the stevedore_namespaces field causing the test to depend on the individual entry point implementations. We would probably need to extend the address syntax so we can specify which (or all) of the entry points to depend on.

Option 2: turn the python_distribution into a target generator

Unlike option 1, we can reuse the existing address syntax.

The python_distribution target generates a python_entry_point target for each entry point. That generated target has an explicit dependency on the entry point implementation.
It also generates a python_entry_points target that depends on all of the generated python_entry_point targets.
The python_distribution itself depends on the generated python_entry_points target instead of getting an inferred dependency on the entry points implementations.

In this way, you could depend on:

  • path/to/python_distribution:tgt_name#some.entry.point.name
  • path/to/python_distribution:tgt_name#console_scripts.name
  • path/to/python_distribution:tgt_name#all_entry_points

Are generated targets supposed to depend on the generator? Or can the generator depend on its generated targets?
Are there any restrictions on the characters used in the generated targets' names?

@cognifloyd
Copy link
Member

cognifloyd commented May 25, 2024

I'm leaning towards turning python_distribution into a target generator.

But my thinking was slightly muddled in my last comment. There are two levels of entry point: group and name. So, maybe you could depend on all entry points in a group, or on an individual group+name.

Using a target in the root, addresses might be:

  • //:python_distribution_name#console_scripts
  • //:python_distribution_name#console_scripts[foo-bar]
  • //:python_distribution_name#console_scripts|foo-bar
  • //:python_distribution_name#console_scripts/foo-bar

Or for custom groups like stevedore namespaces:

  • //:python_distribution_name#st2common.runners.runner
  • //:python_distribution_name#st2common.runners.runner[action-chain]
  • //:python_distribution_name#st2common.runners.runner|action-chain
  • //:python_distribution_name#st2common.runners.runner/action-chain

Which option do you like better?
A. [...]
B. |
C. /

I think I like C best.

@kaos
Copy link
Member

kaos commented May 25, 2024

Why do you need to pick apart the distribution?
Can't you either a) depend on just the code for the entry point side stepping the distribution, or b) depend on the whole distribution and get all its entry points but only use what ever is being required for the test?

@cognifloyd
Copy link
Member

Why do you need to pick apart the distribution?

Because I just need the entry points metadata in the sandbox; I don't want to make all my tests depend on the whole distribution and all its dependencies as that would negate the benefits of fine gained caching.

Can't you either a) depend on just the code for the entry point side stepping the distribution,

That is only half of what I need. Somehow I need the entry points metadata file to also get written to the sandbox. Effectively that file "registers" plugins (like flake8 plugins, or stevedore extensions, etc) so it can be discovered at runtime via pkg_resources.

or b) depend on the whole distribution and get all its entry points but only use what ever is being required for the test?

I tried this, and got some weird errors about 3rd party dependencies missing from the sandbox. I'm not entirely sure what is supposed to happen if you add a python_distribution in the dependencies field of a python_tests target. I thought, incorrectly, that it put the wheel in the sandbox so you can test installing the wheel itself. According to the docs, that behavior only occurs if the dependency is registered via the runtime_package_dependencies field. So, does depending on a python_distribution result in pulling in all of its transitive dependencies, or is it supposed to actually install the python_distribution in the sandbox? If it only pulls in the transitive dependencies, then that is not enough, as I need the entry_points.txt file to be installed in the sandbox. If it does actually install the distribution, then that's over kill, as it makes the test depend on the entire wheel and all its source files instead of just the entry point (s) under test.

Outside of pants, this is achieved by doing an editable install of sources. That generates/installs the entry_points.txt file and other files leaving the sources in their original directory. Pants can export a venv with editable installs of the sources, so we have logic in pants to do something similar. But pants does not use editable installs internally, it manipulates PYTHONPATH and the PEX equivalent to make first party sources available in the sandbox. So we need some way to get the metadata that would be installed with the wheel, or with an editable wheel, into the sandbox. This issue is particularly about getting the entry_points.txt file into the sandbox.

@cognifloyd
Copy link
Member

cognifloyd commented Jun 6, 2024

Aargh. Turning PythonDistribution into a TargetGenerator has a significant downside. The python_distribution itself cannot be parametrized if it is a generator:

raise InvalidFieldException(
f"Only fields which will be moved to generated targets may be parametrized, "
f"so target generator {address} (with type {target_type.alias}) cannot "
f"parametrize the {generator_fields_parametrized_text} {noun}."
)

Maybe I can use synthetic targets instead of generated targets? edit: Nope. synthetic targets happens too early to be useful for this.

edit: maybe a context_aware_object_factory... But, then I would have to move python_distribution to a different alias which would be ugly.

@cognifloyd
Copy link
Member

cognifloyd commented Jun 7, 2024

I can't find a way to break python_distribution into multiple targets so that the entry_points metadata is addressable. So, I guess that leaves creating a custom Dependencies-like field on python_test/python_tests that allows for specifying a python_distribution address AND an entry point group or group/name to depend on (or all of them).

Maybe:

python_tests(
    entry_point_dependencies={
        "//address/to:python_distro_tgt_1": ["*"],  # all entry points
        # only one group of entry points
        "//address/to:python_distro_tgt_2": ["console_scripts"],
        "//address/to:python_distro_tgt_4": ["st2common.runners.runner"],
        # or multiple groups of entry points
        "//address/to:python_distro_tgt_5": ["console_scripts", "st2common.runners.runner"],
        # or 1+ individual entry points
        "//address/to:python_distro_tgt_6": ["console_scripts/foo-bar"],
        "//address/to:python_distro_tgt_7": ["console_scripts/foo-bar", "console_scripts/baz"],
        "//address/to:python_distro_tgt_8": ["st2common.runners.runner/action-chain", "console_scripts/foo-bar"],
    }
)

@kaos
Copy link
Member

kaos commented Jun 7, 2024

Perhaps this is a good time to crack the annotated edges (dependencies) feature that pops up every so often. The long standing issue been that we can't classify dependencies, such as for the jvm and the runtime vs test dependencies and what not.

I think this could fit in there as well.

@cognifloyd
Copy link
Member

Perhaps this is a good time to crack the annotated edges (dependencies) feature that pops up every so often. The long standing issue been that we can't classify dependencies, such as for the jvm and the runtime vs test dependencies and what not.

I think this could fit in there as well.

Agreed. Annotated dependencies sounds great.

For python_distribution, that would mean recording one or more ways to depend on a python_distribution:

  • you need the wheel at runtime (currently covered by the runtime_package_dependencies)
  • you just need all the transitive dependencies (the python_sources etc) to be available in the sandbox
  • you need the entry_points, or a subset of them, to be "installed" in the sandbox so that pkg_resources can detect them, or so that a test can do something like run one of the console_scripts as a subprocess.

@cognifloyd
Copy link
Member

I'm adding an entry_point_dependencies to python_test/s targets in: #21062
I believe that will close this request. Does that interface make sense + work for the use cases in this issue?

@cognifloyd
Copy link
Member

Also, this issue seems to be a duplicate of #11386. Right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Python Python backend-related issues enhancement
Projects
None yet
4 participants