⚠️ Cachito's way of supporting Git and HTTP(S) dependencies is currently only compatible withpip >= 10.0
This document describes some of the more intricate details of Cachito support for pip. For a high level overview, look here in the README.
Cachito has a number of specific requirements when it comes to pip packages. Some of those stem from the general ideas behind Cachito (e.g. reproducibility), some from the technical challenges of supporting a packaging system which defines most metadata through a Python executable. Read on for more details.
One of the main component of a pip package is the requirements.txt
file. Typically, the file might
look something like this:
requests
git+https://github.com/containerbuildsystem/dockerfile-parse
https://github.com/containerbuildsystem/operator-manifest/archive/v0.0.3.zip
The dependencies in this file are:
requests
, a PyPI dependencydockerfile-parse
, a Git dependencyoperator-manifest
, an HTTPS dependency
Git and HTTP(S) dependencies will henceforth collectively be referred to as "external."
To make sure builds are reproducible, Cachito will require that all dependencies be pinned to a specific version.
For PyPI dependencies, use the ==
operator:
requests==2.24.0
For Git dependencies, specify the commit hash:
git+https://github.com/containerbuildsystem/dockerfile-parse@<full-commit-hash>
For HTTP(S) dependencies, include the hash of the source archive using #cachito_hash
:
https://github.com/containerbuildsystem/operator-manifest/archive/v0.0.3.zip#cachito_hash=sha256:<full-sha-digest>
In addition to specifying direct dependencies, recursive dependencies also need to be explicitly defined for two reasons:
- Further enable reproducibility by explicitly specifying every needed package
- Prevent the need for remote execution of
setup.py
While this might be onerous to manually maintain, pip-compile
from
pip-tools can be used to automate this process for you
using the following procedure.
- rename
requirements.txt
torequirements.in
(by convention) - run
pip-compile requirements.in -o requirements.txt
This is the output of the above command:
#
# This file is autogenerated by pip-compile
# To update, run:
#
# pip-compile --output-file=requirements.txt requirements.in
#
certifi==2020.6.20 # via requests
chardet==3.0.4 # via requests
git+https://github.com/containerbuildsystem/dockerfile-parse # via -r requirements.in
idna==2.10 # via requests
https://github.com/containerbuildsystem/operator-manifest/archive/v0.0.3.zip # via -r requirements.in
requests==2.24.0 # via -r requirements.in
ruamel.yaml.clib==0.2.2 # via ruamel.yaml
ruamel.yaml==0.16.12 # via operator-manifest
six==1.15.0 # via dockerfile-parse
urllib3==1.25.10 # via requests
As you can see, pip-compile
gathered all the recursive dependencies and pinned all PyPI packages.
It did not pin external dependencies, the mechanism for doing so is specific to Cachito. You can pin
these beforehand in the requirements.in
file, but if any of the recursive dependencies are
external, you may need to edit the generated file anyway.
Note that pip-compile
considers some packages "unsafe" in a requirements file (e.g. setuptools
).
If you do use these packages as runtime dependencies, you will need to pass the --allow-unsafe
flag to pip-compile
. If you only use them as build time dependencies, you will need to put them
in a separate requirements file as described in Build dependencies.
Cachito needs to know the package name for all of your dependencies. For PyPI dependencies, this is
trivial, as the name is already present in the requirements file. For external dependencies,
resolving the name may require executing the setup.py
file. Cachito does have a mechanism for
extracting package metadata from setup.py
, but it is very limited. That is why, for external
dependencies, you will need to explicitly specify package names using one of the mechanisms that
pip supports.
a) use @
:
<package-name> @ git+https://github.com/namespace/repo
b) use #egg
:
git+https://github.com/namespace/repo#egg=<package-name>
Similarly to the procedure used for pinning external dependency versions, you can specify explicit
package names in requirements.in
to avoid having to edit the file generated by pip-compile
.
However, pip-compile
seems to ignore the @
mechanism, so using #egg
may be preferrable.
After pinning versions and specifying package names for external dependencies, the requirements.in
file at the top of this section would look like this:
requests==2.24.0
git+https://github.com/containerbuildsystem/dockerfile-parse@<full-commit-hash>#egg=dockerfile-parse
https://github.com/containerbuildsystem/operator-manifest/archive/v0.0.3.zip#egg=operator-manifest&cachito_hash=sha256:<full-sha-digest>
In general, Cachito handles hash checking the same way that pip does. If --require-hashes
is
present in the requirements file, or if any dependency uses the --hash
option, Cachito will
require that all dependencies specify a hash and will check that the hashes are valid.
For HTTP(S) dependencies, Cachito will always require a hash and will always validate it. You can
provide it using --hash
, but as mentioned above, that will turn on hash checking for all your
dependencies. If that is not desirable, use the Cachito-specific #cachito_hash
URL fragment as
shown in the HTTP(S) dependencies example in the Pinning versions section.
Setuptools provides a way to specify build dependencies via the setup_requires
keyword argument. It is deprecated in favor of the PEP-518 approach but, for similar
reasons as mentioned in the sections above, Cachito supports neither. If you have any build-only
dependencies, you will need to put them in a requirements-build.txt
file which follows the same
rules as requirements.txt
.
There are two implications which may not be immediately obvious for build requirements files:
- you need to specify all the runtime and build dependencies for each direct build dependency (recursively)
- you need to repeat the above for all your recursive runtime dependencies
You can use the pip_find_builddeps.py script to find all the build dependencies you will need. Here is how you would use it:
- set up
requirements.txt
as described above - if you have any direct build dependencies, put them in
requirements-build.in
- run
pip_find_builddeps.py requirements.txt -o requirements-build.in --append
- run
pip-compile requirements-build.in -o requirements-build.txt --allow-unsafe
You could also use this script as pre-commit hooks.
To do so, copy pip_find_builddeps.py
create a .pre-commit-hooks.yaml
with the follwing:
id: update-build-requirements
name: update-build-requirements
description: find build dependencies with cachito's pip_find_builddeps.py script
entry: path/to/pip_find_builddeps.py
language: python
language_version: python3
pass_filenames: false
files: ^requirements.txt$
args: ["requirements.txt", "-o", "requirements-build.in", "-a", "--only-write-on-update"]
...then add the following lines to .pre-commit-config.yaml
:
repos:
- repo: https://github.com/containerbuildsystem/cachito.git
rev: ... # a sha or tag from cachito that contains the .pre-commit-hooks.yaml file.
hooks:
- id: update-build-requirements
- repo: https://github.com/jazzband/pip-tools
rev: 6.8.0 # or whichever version you prefer
hooks:
- id: pip-compile
name: pip-compile requirements-build.in
args: [requirements-build.in, -o, requirements-build.txt, --allow-unsafe]
When building your app using the Cachito-provided content, you will need to make sure build
dependencies are installed before runtime dependencies. If you use a packaging system, specify
all the build dependencies in the proper location (e.g. options.setup_requires
in setup.cfg
or
build_system.requires
in pyproject.toml
). If you do not, make sure to pip install
the build
requirements file(s) before the runtime requirements file(s).
Pip packages can define their metadata in two files -- setup.py or
setup.cfg (or a combination of the two). Cachito will scan both of these files
(if present) for the name and version of your package. If Cachito fails to resolve either of those
values, the request will fail. More details about how (and to what extent) Cachito supports setup
files can be found in the docstrings of the corresponding classes in pip.py:
SetupPY
, SetupCFG
.
Support for setup.cfg
is more complete and allows greater flexibility when defining the package
version compared to setup.py
. Nevertheless, both approaches are subject to some compromises on the
Cachito side. If Cachito cannot resolve the metadata it needs, you may unfortunately need to make
changes in your packaging code.
Cachito allows you to configure some aspects of a request that uses the pip
package manager. You
can specify multiple subpackages within the source repository. For each subpackage, you can specify
custom locations for your requirements and build requirements file(s). Below is an example request
that uses all the available configuration options.
{
"repo": "https://github.com/example/repo.git",
"ref": "8adec82cf2fc557d23a6dac2563ed25bb0f46b72",
"pkg_managers": ["pip"],
"packages": {
"pip": [
{
"path": ".",
"requirements_files": ["requirements.txt", "requirements-extras.txt"]
},
{
"path": "some/subpackage",
"requirements_build_files": ["requirements-build-only.txt"]
}
]
}
}