Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An attempt to run Python exe in a more hermetic way #863

Closed
wants to merge 2 commits into from

Conversation

mvukov
Copy link

@mvukov mvukov commented Oct 20, 2022

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature (please, look at the "Scope of the project" section in the README.md file)
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Other... Please describe:

What is the current behavior?

Current custom toolchains allow the repo path and user's local Python packages to sneak in the Python path. For instance, sys.path can be something like:

[
  "/home/mvukov/dev/tmp/hermetic_python",  # The repo root.
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/execroot/hermetic_python/bazel-out/k8-fastbuild/bin/demo.runfiles",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/execroot/hermetic_python/bazel-out/k8-fastbuild/bin/demo.runfiles/hermetic_python",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/execroot/hermetic_python/bazel-out/k8-fastbuild/bin/demo.runfiles/python3_8_x86_64-unknown-linux-gnu",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/external/python3_8_x86_64-unknown-linux-gnu/lib/python38.zip",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/external/python3_8_x86_64-unknown-linux-gnu/lib/python3.8",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/external/python3_8_x86_64-unknown-linux-gnu/lib/python3.8/lib-dynload",
  "/home/mvukov/.local/lib/python3.8/site-packages",  # The user's local folder.
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/external/python3_8_x86_64-unknown-linux-gnu/lib/python3.8/site-packages"
]

with a toolchain registered as

python_register_toolchains(
    name = "python3_8",
    # Available versions are listed in @rules_python//python:versions.bzl.
    # We recommend using the same version your team is already standardized on.
    python_version = "3.8",
)

What is the new behavior?

With the proposed fix, based on rules_py I created a small wrapper script for the interpreter that removes the repo root and the local folder from the path.

Does this PR introduce a breaking change?

  • Yes
  • No

Other information

I didn't look into a wrapper for Windows yet. Please let me know if this could end up in the main branch and based on that I can proceed further. An alternative could be to have an extra argument like interpreter_flags in python_register_toolchains.

@mvukov
Copy link
Author

mvukov commented Oct 23, 2022

Not sure why CI fails. When I run bazel test //... locally, all green. Can anyone provide more info, please?


set -o errexit -o nounset -o pipefail

external/{repo_name}/{python_bin} -B -s -I "$@"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want -S (upper case) as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this would harm in the case a custom toolchain from this repo is used. In the case a system python is used, this may even be desirable.


set -o errexit -o nounset -o pipefail

external/{repo_name}/{python_bin} -B -s -I "$@"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "external/" part here might depend on if a flag is set?
https://bazel.build/reference/command-line-reference#flag--legacy_external_runfiles

I think you can drop the "external/" part, but not 100% sure.

Actually, what's the CWD when this is invoked? The relative path here means it would have to be the runfiles root. But what is doing that chdir()? (Sorry if I missed it)

Copy link
Author

@mvukov mvukov Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. If --nolegacy_external_runfiles, then this path doesn't work. The CWD in my example is

/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/execroot/hermetic_python/bazel-out/k8-fastbuild/bin/demo.runfiles/hermetic_python

where hermetic_python is the repo name. When --legacy_external_runfiles is used, then external/ has to be used...

But what is doing that chdir()? (Sorry if I missed it)

I'm not doing any chdir in this PR :) Can you be a bit more specific?

@groodt
Copy link
Collaborator

groodt commented Oct 24, 2022

I've got a few questions about this:

Is this the right place to be solving the problem? To me, this feels like a similar problem to:
#691, but now we've introduced a hardcoded dependency on the host of /bin/bash.

@mvukov
Copy link
Author

mvukov commented Oct 25, 2022

In the meantime, I found that using -I won't work as per doc "-I : isolate Python from the user's environment (implies -E and -s)". This effectively nulls PYTHONPATH set by the stub. I removed that in 10ffa53. After this, sys.path outputs for this example:

[
  "/home/mvukov/dev/tmp/hermetic_python",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/execroot/hermetic_python/bazel-out/k8-fastbuild/bin/demo.runfiles",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/execroot/hermetic_python/bazel-out/k8-fastbuild/bin/demo.runfiles/hermetic_python",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/execroot/hermetic_python/bazel-out/k8-fastbuild/bin/demo.runfiles/python3_8_x86_64-unknown-linux-gnu",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/external/python3_8_x86_64-unknown-linux-gnu/lib/python38.zip",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/external/python3_8_x86_64-unknown-linux-gnu/lib/python3.8",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/external/python3_8_x86_64-unknown-linux-gnu/lib/python3.8/lib-dynload",
  "/home/mvukov/.cache/bazel/_bazel_mvukov/bc9f8eb0563a76ddb0353ce7869ddf8d/external/python3_8_x86_64-unknown-linux-gnu/lib/python3.8/site-packages"
]

Still, it's a pity that the first entry still sneaks in the path.

Whats the plan for Windows?

If we decide to move forward with this PR, I'd work on support for Windows.

How does this inter-play with the python_stub_template.txt

The generated stub driven by a platform interpreter calls the generated wrapper, the wrapper then calls a custom interpreter.

Is this the right place to be solving the problem? To me, this feels like a similar problem to:
#691, but now we've introduced a hardcoded dependency on the host of /bin/bash.

Not sure, to be honest. I am looking into an intermediate solution until bazelbuild/bazel#15897 gets implemented or until https://github.com/aspect-build/rules_py gets out of experimental.

TBH, I would be just happy at the moment if py_runtime had an extra argument interpreter_flags that could be hard-coded in the stub. Would this alternative be better?

@mattem
Copy link
Collaborator

mattem commented Oct 25, 2022

👋 As you found, -I won't work with rules_python, as it makes Python ignore all env vars (including PYTHON_PATH that bazel crafts), so it effectively breaks us.

rules_py on the other hand can set this, (I'm the author of rules_py 😄) as it doesn't use the runfiles layout for running py_binary etc, so we don't need to directly meddle with PYTHON_PATH, so therefore can set the -I flag, and deals with the side affects.

It's probably worth setting the new -P flag in Python 3.11 that was designed for this, but we'd have to detect the interpreter version etc. https://docs.python.org/3.11/using/cmdline.html#cmdoption-P

On the topic of rules_py, I can help with issues using it to push it out of experimental if needed.

@jvolkman
Copy link
Contributor

This is related to #382 and bazelbuild/bazel#7091. Also worth noting that @groodt already added PYTHONSAFEPATH (the env var equivalent of -P) to the stub template in bazelbuild/bazel#15701, but of course that only applies to 3.11.

@mvukov
Copy link
Author

mvukov commented Oct 27, 2022

@mattem Many thanks. I'll try rules_py with a large monorepo to see how that works.

@mvukov mvukov closed this Oct 27, 2022
@mvukov mvukov reopened this Oct 27, 2022
@mvukov
Copy link
Author

mvukov commented Feb 14, 2023

I tested https://github.com/aspect-build/rules_py on a large monorepo and works as expected (after a couple of minor issues were resolved -- the current main branch works fine). Haven't tried out remote execution, but, the ruleset plays well now with remote caching. It would be nice to have rules_python and rules_py in the same repo eventually. :)

@mvukov mvukov closed this Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants