Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backend for projects that use openstack/stevedore #18132

Merged
merged 15 commits into from
Feb 3, 2023

Conversation

cognifloyd
Copy link
Member

@cognifloyd cognifloyd commented Jan 31, 2023

This adds pants.backend.python.framework.stevedore.

This was originally developed for the StackStorm project in StackStorm/st2#5869 but it seems general enough to include in pants itself. Other people on slack agree, so here's the PR.

What is openstack/stevedore?

Python projects can use openstack/stevedore for dynamic (discovered at runtime) extensions/plugins. I find this page in the stevedore docs to be most helpful in understanding what stevedore does. Here are some of the key points:

Stevedore uses setuptools entry points to define and load plugins. An entry point is standard way to refer to a named object defined inside a Python module or package. The name can be a reference to any class, function, or instance, as long as it is created when the containing module is imported (i.e., it needs to be a module-level global).

Entry points are registered using a name in a namespace.

Entry point names are usually considered user-visible. ... Because they are public, names are typically as short as possible while remaining descriptive. ...

Namespaces, on the other hand, are an implementation detail, and while they are known to developers they are not usually exposed to users. The namespace naming syntax looks a lot like Python’s package syntax (a.b.c) but namespaces do not correspond to Python packages. ...

Each namespace is owned by the code that consumes the plugins and is used to search for entry points. ...

About the stevedore pants plugin

The primary focus of this plugin is to facilitate testing projects that use openstack/stevedore by adding the extensions/plugins to the pytest sandbox and ensuring they are "discoverable" by stevedore.

How does this plugin facilitate testing code that uses stevedore?

Since a stevedore extension does not use the python import system, pants does not know how to infer dependencies on any of this code. We need to extend the dependency inference system to teach it how to handle stevedore extensions. This PR adds a pants plugin that strives to accomplish that by allowing us to:

  • use python_distribution(entry_points={...}, ...) to define the stevedore namespaces and plugins,
    • differentiate namespaces in the entry_points field with a special stevedore_namespace object.
  • use stevedore_namespaces fields to record dependencies on extensions in the given namespaces.

So far, I've only added the stevedore_namespaces field to the python_test and python_tests targets as testing has been my primary focus. We could add it to other targets later if anyone finds that helpful.
When a target has the stevedore_namespaces field, this plugin will:

  1. look up all of the python_distribution targets with an entry_points field that have stevedore_namespace tagged keys.
  2. add/infer dependencies from the test targets to all of the python code that provides the entry points in the required namespaces; in other words, we infer dependencies on a subset of the python_distribution target, not on the python_distribution target itself.
  3. generate a {module_path}.egg-info/entry_points.txt file in the pytest sandbox for each relevant python_distribution target (the entry_points.txt file will only contain entry_points for the required namespaces).

For example, if an project used the namespace st2common.runners.runner, then we could set stevedore_namespaces=["st2common.runners.runner"] on a python_tests() target. Then pants will include all of he python code that provides the named entry points in those namespaces. And then the generated entry_points.txt files make that python code appear to be "installed" (not just added to PYTHONPATH), thus allowing stevedore to discover the entry points and load the plugins during the tests.

This backend provides dependency inference for apps that use
openstack/stevedore to load plugins at runtime.

This was originally developed for the StackStorm project:
StackStorm/st2#5869
@@ -108,7 +108,7 @@ remote = "//:buildgrid_remote"

[tailor]
build_file_header = """\
# Copyright 2022 Pants project contributors (see CONTRIBUTORS.md).
# Copyright 2023 Pants project contributors (see CONTRIBUTORS.md).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this was out-of-date when I tried to use tailor to add new BUILD files. That could probably be a separate 1-character PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer if the template supported some basic placeholders, like # Copyright {year} Pants project... to avoid this issue next year :P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cognifloyd cognifloyd marked this pull request as ready for review January 31, 2023 03:48
@kaos
Copy link
Member

kaos commented Jan 31, 2023

the docs suggest it has some assumptions about pointing to pex targets, which only really makes sense for the well known console_scripts and gui_scripts "namespaces" (in stevedore's terms) or "groups" (in setuptool's terms). But stevedore extensions are not necessarily "runnable", so I'm not sure how that assumption might interact with this.

We need to clarify the docs if it is not apparent that you can use the entry_points field with any entries as usual. The mention of pex targets is just to show that you may refer to pex_binary targets to pick up their entry point to avoid repeating the same entry point twice.

In this case, it may make sense to have the entry_points closer to the source where they come from, so could make sense as-is. 🤷🏽

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

I've made a first pass over. Overall lgtm!

Will take a second pass when I've had some time reading up on stevedore particulars.

@@ -108,7 +108,7 @@ remote = "//:buildgrid_remote"

[tailor]
build_file_header = """\
# Copyright 2022 Pants project contributors (see CONTRIBUTORS.md).
# Copyright 2023 Pants project contributors (see CONTRIBUTORS.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer if the template supported some basic placeholders, like # Copyright {year} Pants project... to avoid this issue next year :P

# select python_tests targets with stevedore_namespaces field
return (
target.has_field(StevedoreNamespacesField)
and target.get(StevedoreNamespacesField).value is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to Pants maintainers, not particularly for this PR.

As target.get() will return a default valued field if the target doesn't have one, checking for target.has_field() is then only an optimization to avoid creating a default instance, however at the cost of a double field lookup. Maybe worth it in this case to avoid thrashing the GC in case of many many targets but otherwise it would be interesting to run some benchmarks on the effectiveness of this.

My hunch is that we may want to make the Target._maybe_get() public for uses like this, where we don't want a default instance of a field in case it is not declared on the target. (or as a new default=None kwarg to Target.get())

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, possibly. There is certainly more boilerplate in field-getting then there should be.

@cognifloyd
Copy link
Member Author

I'm going to take a stab at implementing this with python_distribution(entry_points=...). I'll probably drop some of the entry_points magic (like mirroring pex_binary in allowing .py in the entry point).

@kaos
Copy link
Member

kaos commented Jan 31, 2023

I'm going to take a stab at implementing this with python_distribution(entry_points=...). I'll probably drop some of the entry_points magic (like mirroring pex_binary in allowing .py in the entry point).

After reading a bit of the stevedore docs.. I get the impression that it's just regular python along with the entry points configuration. The support we're building into this plugin is to make tests work more seamless wrgt bringing the proper sources into the test sandbox based on the plugins (namespaces) used.

If we use the entry points on the python_distribution, we only need something that tells us which of them are namespaces and then for the tests we need a field that says which namespaces are being used to pull in the code (which you have already).
With this, do we even need the stevedore_extension target at all..?

I just realized, that if this plugin uses the SetupKwargs hook, that will lock end users out of using it, as Pants only supports one implementation for SetupKwargs at a time. (unless we lift that restriction)

if len(applicable_setup_kwargs_requests) > 1:
possible_requests = sorted(plugin.__name__ for plugin in applicable_setup_kwargs_requests)
raise ValueError(
softwrap(
f"""
Multiple of the registered `SetupKwargsRequest`s can work on the target
{target.address}, and it's ambiguous which to use: {possible_requests}
Please activate fewer implementations, or make the classmethod `is_applicable()`
more precise so that only one implementation is applicable for this target.
"""
)
)

@cognifloyd
Copy link
Member Author

cognifloyd commented Feb 1, 2023

@kaos Thank you for your review. Could I get another look? It should be much simpler to review now. 😄

I just finished deleting half of this plugin and rewriting the other half. Now I'm taking advantage of python_distribution(entry_points={...}) instead of adding the stevedore_extension target.

One quirky thing here is that I required "tagging" the namespaces with a stevedore_namespace object (actually a subclass of str):

python_distribution(
    ...
    entry_points={
        stevedore_namespace("st2common.runners.runner"): {
            "plugin_name": "en.try:point",
        },
    },
)
  • I'm going to edit the PR description now to reflect the current state of this PR.

@cognifloyd
Copy link
Member Author

I just realized, that if this plugin uses the SetupKwargs hook, that will lock end users out of using it, as Pants only supports one implementation for SetupKwargs at a time. (unless we lift that restriction)

The current version of the plugin no longer provides the SetupKwargs stuff, so this is a moot point. But, I actually didn't use the SetupKwargs hook - I added a new hook that the user of SetupKwargs had to opt-in to using and then feed it into the SetupKwargs hook. I'm glad to drop that complexity. 😅

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good: thanks!



class StevedoreNamespace(str):
"""Syntactic sugar to tag a namespace in entry_points as a stevedore namespace.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than syntactic sugar, it seems like this would actually be necessary for finding the relevant entrypoints to search for? ...Assuming that you want to do "unowned entrypoint" warnings/errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. True. I wrote this docstring before I implemented things using it. I'll reword this a bit and drop the "sugar" bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. wdyt now?

):
if namespace not in requested_namespaces.value:
continue

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this block have an unowned-entrypoint error somewhere in it? Or am I missing it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically copied pants.backend.python.target_type_rules.infer_python_distribution_dependencies and adjusted things to work for multiple targets at once. If there's a missing error, then it should probably be added to infer_python_distribution_dependencies as well.

Does either of Get(ResolvedPythonDistributionEntryPoints, ...) or Get(PythonModuleOwners, ...) handle erroring on unowned entry points?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does either of Get(ResolvedPythonDistributionEntryPoints, ...) or Get(PythonModuleOwners, ...) handle erroring on unowned entry points?

Nope. I don't see anything that errors if an entry point module is unowned.
That applies for:

So, if an unowned entry point is an issue, it is probably an issue for python_distribution and pex_binary as well. It looks to me like unowned imports are handled for python_source, but not for the others: infer_python_dependencies_via_source -> _handle_unowned_imports

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are almost always an issue, but the question is more a level of certainty and user experience: if we cannot be 99% certain that the missing item is a problem, then warning is annoying. And if you can't be 100% certain, then you need a way to silence the error.

In this case, it seems like you could be 100% certain. But I also think that it is fine as a followup.

# select python_tests targets with stevedore_namespaces field
return (
target.has_field(StevedoreNamespacesField)
and target.get(StevedoreNamespacesField).value is not None
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, possibly. There is certainly more boilerplate in field-getting then there should be.

Comment on lines 38 to 41
PYTHONPATH during tests. Plus, an entry_points.txt file will be generated
in the sandbox so that the distribution appears to be "installed".
The stevedore namespace format (my.stevedore.extension) is similar
to a python namespace.
Copy link
Sponsor Member

@stuhood stuhood Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important, and I missed it initially.

Essentially, the primary reason for this field to exist (rather than for python_tests to use the dependencies field to get extensions) is to avoid needing to declare a python_distribution specific to each combination of plugins that tests use.

Not sure if the comment needs editing, but "avoids needing to declare extra python_distributions just for the purposes of testing" is good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the plugin is implemented now, you have to define python_distribution to define the entry points. But yes, you don't need to duplicate the entry point defs across multiple python_distributions. A previous version of this plugin used a new stevedore_extension target to define the entry points, but that made getting those entry_points into the python_distributions quite ugly.

I hadn't even considered declaring duplicate python_distributions with overlapping sets of entry points. But, I see how that could have been an alternate way of doing things. I still wouldn't want to manually curate a dependencies list of all of the distributions that provide the relevant entry points - it would always be out of date. So, inferring based on the namespace feels more maintainable to me.

I'll think about how I might expand this docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. wdyt now?

@stuhood
Copy link
Sponsor Member

stuhood commented Feb 1, 2023

One of the test failures is relevant: to isolate it, can run:

./pants test src/python/pants/init/load_backends_integration_test.py -- -k stevedore

Probably the register.py needs to include some more Python rules (which your tests likely already do) so that it can be enabled in isolation.

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Great improvement from the first version of this PR (that still was in great shape) 💯

@@ -108,7 +108,7 @@ remote = "//:buildgrid_remote"

[tailor]
build_file_header = """\
# Copyright 2022 Pants project contributors (see CONTRIBUTORS.md).
# Copyright 2023 Pants project contributors (see CONTRIBUTORS.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)
"""

alias = "stevedore_namespace"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oohh... clever

Comment on lines +50 to +53
stevedore_targets = await Get(
StevedoreExtensionTargets,
StevedoreNamespacesProviderTargetsRequest(requested_namespaces),
)
Copy link
Member Author

@cognifloyd cognifloyd Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future note: If we implement the rule described in these comments (add entry_points.txt for all entry points in a python_distribution if the distribution is a dep of the python_test):

Then, here, we should probably skip any python_distribution targets that are already direct or transitive deps of the requested target. That way the other, more complete, entry_points.txt will not get clobbered by the partial one generated here.

We could probably also factor out the entry_points.txt generation into a separate rule that both this rule and the new rule would use.

@cognifloyd
Copy link
Member Author

@stuhood and @kaos: Thank you for your reviews! I think I've addressed all your feedback. Do you see anything else I can/should improve?

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍🏽

):
if namespace not in requested_namespaces.value:
continue

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are almost always an issue, but the question is more a level of certainty and user experience: if we cannot be 99% certain that the missing item is a problem, then warning is annoying. And if you can't be 100% certain, then you need a way to silence the error.

In this case, it seems like you could be 100% certain. But I also think that it is fine as a followup.

@stuhood stuhood merged commit 3f6776b into pantsbuild:main Feb 3, 2023
@cognifloyd cognifloyd deleted the stevedore branch June 14, 2023 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants