Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple PEX runtime interpreter selection from buildtime interpreter selection. #1020

Closed
jsirois opened this issue Aug 28, 2020 · 4 comments · Fixed by #1770
Closed

Decouple PEX runtime interpreter selection from buildtime interpreter selection. #1020

jsirois opened this issue Aug 28, 2020 · 4 comments · Fixed by #1770
Assignees

Comments

@jsirois
Copy link
Member

jsirois commented Aug 28, 2020

The Pex CLI supports various options for constraining the interpreters a given PEX should be built to work with. Today these are:

  1. --python or --interpreter-constraint (these are mutually exclusive)
  2. --platform

And combinations of these are then possibly modified by --resolve-local-platforms and --use-first-matching-interpreter.

The result is a PEX file built by one or more local interpreters that can run in turn run on an unrelated set of interpreters. The last statement might be surprising, but is true and exposes the fundamental flaw in trying to use buildtime interpreter selection constraints to constrain runtime interpreter selection. To clarify, consider building a PEX file for Pex itself:

$ python -mpex pex -cpex -opex.pex
$ unzip -qc pex.pex PEX-INFO | jq .distributions
{
  "pex-2.1.15-py2.py3-none-any.whl": "bc4c28769a8f357bba9579aa5e24534b622d4ada"
}
$ head -1 pex.pex
#!/usr/bin/env python3.8

As we can see, the PEX file is universal since all contained distributions are universal and yet the PEX file will only run on a machine with a python3.8 binary on the PATH. Today this can be hacked around by either specifying an alternative --python-shebang (say #!/usr/bin/env python or executing the PEX file via a manually selected interpreter (/my/python pex.pex).

Ideally, once bootstrapped by a Pex compatible interpreter (Python 2.7 or Python3.5+), the PEX runtime should select a local interpreter compatible with the actual contained distributions. This would allow PEX files to run on all machines they should be able to run on assuming #!/usr/bin/env python points to a Pex compatible Python interpreter.

Switching the PEX runtime to select interpreters based on contained distributions would fix at least 2 bugs:

  1. PEX files could ship working on the maximum set of interpreters given only a reasonable #!/usr/bin/env python (solves the above example bug in pex.pex).
  2. PEX files would fail fast when interpreter constraints on the buildtime machine are met by a narrower set of interpreters than on a runtime machine.

The latter bug today manifests when using, say, --interpreter-constraints="CPython>=3.6" which finds just a CPython 3.7 distributuion on the build machine. The resulting PEX can then potentially only run on CPython 3.7 interpreter if platform specific wheels were built but at runtime a CPython 3.6 interpreter will be selected if found even if a local CPython 3.7 is present.

@cosmicexplorer
Copy link
Contributor

I created #1033 a few days ago, which I think may be quite similar in intent to this ticket. There are two main differences I can see:

  1. This ticket is scoped to a PEX file's bootstrap/re-exec behavior, while
  2. specify (interpreter-constraint) x (platform), and pass --python-version to pip regardless of a local binary #1033 is scoped to the capabilities of Pex library API and/or the CLI at build time.

I think that they seem quite complementary, in particular regarding this last point of yours:

The latter bug today manifests when using, say, --interpreter-constraints="CPython>=3.6" which finds just a CPython 3.7 distributuion on the build machine. The resulting PEX can then potentially only run on CPython 3.7 interpreter if platform specific wheels were built but at runtime a CPython 3.6 interpreter will be selected if found even if a local CPython 3.7 is present.

This ticket would address that conundrum, whereas #1033 would enable specifically resolving wheels at build time for an exact CPython==3.6.Y interpreter version, even if that interpreter does not exist on the build machine. This leaves us only unable to handle the case when:

  1. the user specifies an interpreter version (or platform) that doesn't exist on their machine, and
  2. some requirements are platform-specific, and there are no prebuilt wheels for that (interpreter-version) x (platform) pair.

Which I think could be considered the "speed of light" in terms of what it is possible for Pex to do anyway.

@jsirois
Copy link
Member Author

jsirois commented Sep 27, 2020

This ticket would address that conundrum, whereas #1033 would enable specifically resolving wheels at build time for an exact CPython==3.6.Y interpreter version, even if that interpreter does not exist on the build machine. This leaves us only unable to handle the case when:

You can already do that. Whenever you don't expect the targeted interpreter (PythonInterpreter) to be on the local machine, you specify a Platform. If the PEX could be built on an unknown machine, you should exclusively use Platforms, say:

--platform linux2014_x86_64-cp-36-m.
--platform macosx-10.13-x86_64-cp-36-m

That combined with --resolve-local-platforms gets the most flexibility:

    --resolve-local-platforms
                        When --platforms are specified, attempt to resolve a
                        local interpreter that matches each platform
                        specified. If found, use the interpreter to resolve
                        distributions; if not, resolve for the platform only
                        allowing matching binary distributions and failing if
                        only sdists or non-matching binary distributions can
                        be found.

You directly target two platforms and you don't need to be on either platform to build the PEX (as long as you're pointing at repos that already have all the requisite pre-built wheels available), but, if you do happen to be on one of the platforms and that platform does have the targeted interpreter, it will be resolved and used to build any missing wheels.

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Sep 27, 2020

Thank you so much! I will try to contribute a doc fix (if necessary) in response to your great comments on #1033.

@jsirois jsirois mentioned this issue Oct 9, 2020
3 tasks
@Eric-Arellano Eric-Arellano mentioned this issue Oct 15, 2020
7 tasks
jsirois added a commit to jsirois/pex that referenced this issue Dec 4, 2020
jsirois added a commit that referenced this issue Dec 5, 2020
This is needed to support #1020 and #1108.
This was referenced Dec 14, 2020
jsirois added a commit to jsirois/pex that referenced this issue Jan 4, 2021
jsirois added a commit that referenced this issue Jan 5, 2021
jsirois added a commit to jsirois/pex that referenced this issue Jan 8, 2021
This removes our dependency on pkg_resources Environment / WorkingSet in
favor of performing our own recursive resolve of runtime distributions
to activate using distribution metadata. This fixes an old test bug
noticed by Benjy but, more importanty, sets the stage to fix pex-tool#899, pex-tool#1020
and pex-tool#1108 by equipping PEXEnvironment with the ability to resolve the
appropriate transitive set of distributions from a root set of
requirements instead of the current full set of transitive requirements
stored post-resolve in PexInfo.
@Eric-Arellano Eric-Arellano mentioned this issue Mar 30, 2022
1 task
This was referenced Apr 4, 2022
This was referenced Apr 12, 2022
This was referenced Apr 19, 2022
This was referenced Apr 28, 2022
This was referenced May 13, 2022
jsirois added a commit to jsirois/pex that referenced this issue May 17, 2022
We now only use the namespace package APIs when absolutely needed at
runtime.

Work towards pex-tool#1020
@jsirois jsirois mentioned this issue May 18, 2022
2 tasks
jsirois added a commit that referenced this issue May 18, 2022
We now only use the namespace package APIs when absolutely needed at
runtime.

Work towards #1020
jsirois added a commit to jsirois/pex that referenced this issue May 18, 2022
Previously two proxies for interpreter applicability were used:
1. The shebang selected interpreter or the explicit interpreter used
   to invoke the PEX.
2. Any embedded interpreter constraints.

This could lead to selecting an interpreter that was not actually able
to resolve all required distributions from within the PEX, which is the
only real criteria, and a failure to boot.

Fix the runtime interpreter resolution process to test an interpreter
can resolve the PEX before using it or re-execing to it.

In the use case, the resolve test performed is cached work and leads to
no extra overhead. In the re-exec case the resolve test can cost
O(100ms), but at the benefit of ensuring either the selected interpreter
will definitely work or no interpreters on the search path can work.

Fixes pex-tool#1020
@jsirois
Copy link
Member Author

jsirois commented May 18, 2022

There is more that can be done here to optimize the PEX ZIPAPP re-exec case - namely caching that the successfully tested interpreter works with the given PEX to avoid the testing the next time the PEX ZIPAPP is run, but I'm going to close this out with #1770 since the robustness issue has been addressed.

jsirois added a commit that referenced this issue May 18, 2022
Previously two proxies for interpreter applicability were used:
1. The shebang selected interpreter or the explicit interpreter used
   to invoke the PEX.
2. Any embedded interpreter constraints.

This could lead to selecting an interpreter that was not actually able
to resolve all required distributions from within the PEX, which is the
only real criteria, and a failure to boot.

Fix the runtime interpreter resolution process to test an interpreter
can resolve the PEX before using it or re-execing to it.

In the use case, the resolve test performed is cached work and leads to
no extra overhead. In the re-exec case the resolve test can cost
O(100ms), but at the benefit of ensuring either the selected interpreter
will definitely work or no interpreters on the search path can work.

Fixes #1020
jsirois added a commit to jsirois/pex that referenced this issue May 18, 2022
The fix for pex-tool#1020 did not add language for the new resolve check
failures.
jsirois added a commit that referenced this issue May 18, 2022
The fix for #1020 did not add language for the new resolve check
failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants