Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pants lockfile generation includes un-used dists and thus un-vetted dists. #12458

Open
jsirois opened this issue Jul 29, 2021 · 5 comments
Open
Labels
backend: Python Python backend-related issues

Comments

@jsirois
Copy link
Contributor

jsirois commented Jul 29, 2021

Pants uses pip-compile to generate lockfiles and pip-compile includes - apparently - all dists for a given version of a requirement whether they were actually used in the resolve or not. This is a security and stability problem.

For example, if the original requirement is foo>=1.0.0 and the IC is CPython==3.7.* the lockfile might contain foo==1.0.0 with hashes for the sdist and the cp37m wheels. Say the lockfile was generated with and later tested with a CPython 3.7 interpreter with the pymalloc extension; so the wheel is what is actually resolved and tested. The sdists will then go unused and untested. As such a CPython 3.7 without the pymalloc extension lockfile consumer will resolve and build the sdist - potentially years later. In normal cases the sdist will be faithful to the cp37m wheel and generate a cp37 wheel that is ~equivalent. Even so; there is no guaranty this wheel behaves the same - there may be code that is conditional upon the pymalloc extension presence and be buggy in the non-presence branch. Worse - the wheels could be a honeypot and the sdist the trap (see: https://docs.google.com/document/d/17Y6_YjLqv6hY5APWLaRw5CMvr_FVDBIWK6WS57mioNg/edit?usp=sharing for related concerns).

It appears to be the case that poetry.lock and Pipfile.lock have the same issue.

@Eric-Arellano Eric-Arellano changed the title Pans lockfile generation includes un-used dists and thus un-vetted dists. Pants lockfile generation includes un-used dists and thus un-vetted dists. Jul 29, 2021
@Eric-Arellano
Copy link
Contributor

(From Slack):

pip-compile/Poetry/Pipenv are making a tradeoff. You generate on a single platform, but may use that lockfile on others.

If I generated a lockfile with pantsbuild.pants wheel on my M1, it would only include the hash for the macos_arm64 wheel, even though people using the lockfile need the other wheels' hashes.

Trading off some security for flexibility to use the lockfile on multiple platforms.

@benjyw went on to point out:

Lockfiles only help with supply chain attacks in that you're guaranteed to resolve the same thing every time. There is an implicit assumption that you've vetted that one thing to be safe.

--

I suspect for the vast majority of users they will prefer the flexibility of the resolve working on multiple platforms over the risk of worse security/stability. We know many Pants users use macOS for desktop use and Linux for CI, for example. I'm assuming this "issue"/"feature" has continued to work well for Pipenv/Poetry/pip-compile, implied by their continued usage of it.

Fwit, organizations who take supply chain attacks really seriously can hand edit the requirements.txt-style lockfile Pants generates to remove hashes for release files they do not trust. If that ends up being important to an org, we could perhaps look into helping to automate that as a followup.

@jsirois
Copy link
Contributor Author

jsirois commented Jul 29, 2021

I think you're right, but I'm also not convinced the current state of the art is the best way to trade off. Another way would be to be able to generate partial lockfiles and then merge them. Not much less convenient, but more secure. This all assumes the current state of the art is not doing the wrong thing and just assuming that if a resolve computes req1==X, then all dists for X should be included. Environment markers can make that assumption invalid.

@benjyw
Copy link
Contributor

benjyw commented Jul 29, 2021

The lockfile does at least limit the window in which you can be exposed to an attack, namely the time you generate the lockfile. Assuming that happened when PyPI was uncompromised, your resolves will continue to be safe until the next lockfile change. That is not nothing.

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Aug 12, 2021

@chrisjrn proposed an idea that could allow us to do this if we go with Poetry 🙌 In between running poetry lock and poetry export, we can post-process poetry.lock and mutate it how we want, like deleting requirements out-right (e.g. Windows-specific deps) and removing specific hashes.

--

For hashes, poetry.lock stores like this:

[metadata.files]
attrs = [
    {file = "attrs-21.2.0-py2.py3-none-any.whl", hash = "sha256:149e90d6d8ac20db7a955ad60cf0e6881a3f20d37096140088356da6c716b0b1"},
    {file = "attrs-21.2.0.tar.gz", hash = "sha256:ef6aaac3ca6cd92904cdd0d83f629a15f18053ec84e6432106f7a4d04ae4f5fb"},
]
colorama = [
    {file = "colorama-0.4.4-py2.py3-none-any.whl", hash = "sha256:9f47eda37229f68eee03b24b9748937c7dc3868f906e8ba69fbcbdd3bc5dc3e2"},
    {file = "colorama-0.4.4.tar.gz", hash = "sha256:5941b2b48a20143d2267e95b1c2a7603ce057ee39fd88e7329b0c292aa16869b"},
]

I've confirmed that if you delete an entry from the list for a particular dep, then run poetry export, it will not include that hash.

Presumably, we could inspect the file names to determine if any are unused?

--

You can simply delete a dep from poetry.lock for it to not show up in requirements.txt. But it's more involved to determine if it's platform-specific.

For example, Pytest depends on atomicwrites, but only on Windows. this is the atomicwrites entry, w/o any reference to Windows:

[[package]]
name = "atomicwrites"
version = "1.4.0"
description = "Atomic file writes."
category = "dev"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"

Instead, the platform information comes from Pytest's entry:

[[package]]
name = "pytest"
version = "5.4.3"
description = "pytest: simple powerful testing with Python"
category = "dev"
optional = false
python-versions = ">=3.5"

[package.dependencies]
atomicwrites = {version = ">=1.0", markers = "sys_platform == \"win32\""}
attrs = ">=17.4.0"
colorama = {version = "*", markers = "sys_platform == \"win32\""}
more-itertools = ">=4.0.0"
packaging = "*"
pluggy = ">=0.12,<1.0"
py = ">=1.5.0"
wcwidth = "*"

We'd need to know how to evaluate the marker and then follow the dep chain to recursively remove all un-used dists, iiuc.

Another possibility could be modifying lockfile.txt itself via inspection of markers. I think this would allow us to not have to recursively follow deps, only remove any deps with unused env markers.

atomicwrites==1.4.0; python_version >= "3.5" and python_full_version < "3.0.0" and sys_platform == "win32" or sys_platform == "win32" and python_version >= "3.5" and python_full_version >= "3.4.0" \
    --hash=sha256:6d1784dea7c0c8d4a5172b6c620f40b6e4cbfdf96d783691f2e1302a7b88e197 \
    --hash=sha256:ae70396ad1a434f9c7046fd2dd196fc04b12f9e91ffb859164193be8b6168a7a

--

Of course, we would also need to have figured out which platforms/environments are going to be used. This is easy when platforms are specified, but not as clear for a normal resolve. For non-platform resolves, we could ban all Windows deps given that Pants doesn't work on Windows, for example? Allow users to specify what OSes they plan to use perhaps?

Eric-Arellano added a commit that referenced this issue Aug 13, 2021
…-compile (#12549)

**Disclaimer**: This is not a formal commitment to Poetry, as we still need a more rigorous assessment it can handle everything we need. Instead, this is an incremental improvement in that Poetry handles things much better than pip-compile. 

It gets us closer to the final result we want, and makes it much more ergonomic to use the experimental feature—like `generate_all_lockfiles.sh` now not needing any manual editing. But we may decide to switch from Poetry to something like pdb or Pex.

--

See #12470 for why we are not using pip-compile. 

One of the major motivations is that Poetry generates lockfiles compatible with all requested Python interpreter versions, along with Linux, macOS, and Windows. Meaning, you no longer need to generate the lockfile in each requested environment and manually merge like we used to. This solves #12200 and obviates the need for #12463.

--

This PR adds only basic initial support. If we do decide to stick with Poetry, some of the remaining TODOs:

- Handle PEP 440-style requirements.
- Hook up to `[python-setup]` and `[python-repos]` options.
- Hook up to caching.
- Support `--platform` via post-processing `poetry.lock`: #12557
- Possibly remove un-used deps/hashes to reduce supply chain attack risk: #12458

--

Poetry is more rigorous than pip-compile in ensuring that all interpreter constraints are valid, which prompted needing to tweak a few of our tools' constraints.
@benjyw benjyw removed python labels Sep 9, 2021
@cognifloyd cognifloyd added the backend: Python Python backend-related issues label Mar 13, 2023
@jsirois
Copy link
Contributor Author

jsirois commented Dec 5, 2023

Long overdue follow up, but sloppy thinking here:

The lockfile does at least limit the window in which you can be exposed to an attack, namely the time you generate the lockfile.

Definitely not true. If an sdist is locked and the sdist build system has any floating direct deps or transitive deps (~extremely common), you are not safe from a future injection if the unlocked build system or its dependencies releases new versions.

Towards that end though and also re:

@chrisjrn proposed an idea that could allow us to do this if we go with Poetry 🙌 In between running poetry lock and poetry export, we can post-process poetry.lock and mutate it how we want, like deleting requirements out-right (e.g. Windows-specific deps) and removing specific hashes.

You can use --style universal --no-build to get a universal lock with no sdists, and, belatedly, you can use --target-system to block out Windows dists (Pants does this already).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Python Python backend-related issues
Projects
None yet
Development

No branches or pull requests

4 participants