Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Review 1:) Add: package structure & build tools overview to packaging guide #23

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@
html_theme = 'pydata_sphinx_theme'
html_static_path = ["_static"]
html_css_files = ["pyos.css"]
html_title = "pyOpenSci Package Guide"
html_title = "pyOpenSci Python Packaging Guide"
html_logo = "images/logo/logo.png"

# Add any paths that contain custom static files (such as style sheets) here,
Expand Down
Binary file added images/python-package-tools-2022-survey-pypa.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/python-package-tools-decision-tree.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 5 additions & 3 deletions index.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
# pyOpenSci Python Open Source Package Development Guide



```{toctree}
:hidden:
:caption: Documentation

Documentation <documentation/index>
```
Documentation Overview <documentation/index>

```
```{toctree}
:hidden:
:caption: Packaging

Packaging <python-packaging/intro>
Packaging <package-structure-code/intro>

```

```{toctree}
Expand Down
152 changes: 152 additions & 0 deletions package-structure-code/complex-python-package-builds.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Complex Python package builds

This guide is focused on packages that are either pure-python or that
have a few simple extensions in another language such as C or C++.

If your package is more complex, [you may want to refer to this guide
created by Ralf Gommers on Python packaging.](https://pypackaging-native.github.io/)

## Pure Python Packages vs. packages with extensions in other languages

You can classify Python package complexity into three general categories. These
categories can in turn help you select the correct package front-end and
back end tools.

1. **Pure-python packages:** these are packages that only rely on Python to function. Building a pure Python package is simpler. As such, you can chose a tool below that
has the features that you want and be done with your decision!
2. **Python packages with non-Python extensions:** These packages have additional components called extensions written in other languages (such as `C` or `C++`). If you have a package with non-python extensions, then you need to select a build back-end tool that allows you to add additional build steps needed to compile your extension code. Further, if you wish to use a front-end tool to support your workflow, you will need to select a tool that
supports additional build setps. In this case, you could use setuptools. However, we suggest that you chose build tool that supports custom build steps such as Hatch with Hatchling or PDM. PDM is an excellent choice as it allows you to also select your build back end of choice. We will discuss this at a high level on the complex builds page.
3.**Python packages that have extensions written in different languages (e.g. fortran and C++) or that have non Python dependencies that are difficult to install (e.g. GDAL)** These packages often have complex build steps (more complex than a package with just a few C extensions for instance). As such, these packages require tools such as [scikit-build](https://scikit-build.readthedocs.io/en/latest/)
or [meson-python](https://mesonbuild.com/Python-module.html) to build. NOTE: you can use meson-python with PDM.


<!--
On this page, we will focus on using front-end tools to package pure python
packages. We will note if a package does have the flexibility to support other
back-ends and in turn more complex builds (*mentioned in #2 and #3 above*). -->
<!--
## COmbine the two sets of statement below...
ELI:
PDM supports C/Cython extensions too: https://pdm.fming.dev/latest/pyproject/build/#build-platform-specific-wheels

It does this by allowing you to write a python script that gets injected into a setuptools build process :) so that's not necessarily the greatest choice. It's a bit like using setuptools directly. ;)

Ralf:
Hatch only supports pure Python packages as of now. setuptools is still a very reasonable choice, and okay if all you have is a few C/Cython extensions. But I'd say you should probably recommend meson-python and scikit-build-core as the two best tools for building packages containing compiled extensions.


* link to ralf's blog and book on complex builds
* keep this page high level so we don't get weight downsides
* can use the examplePy repo stefan and I are working on that will test various build combinations

*****

ELI: It would be more accurate to say that PDM supports using PDM and setuptools at the same time, so you run setuptools to produce the C extensions and then PDM receives the compiled extension files (.so, .pyd) and packages it up alongside the pure python files.

Hatch - https://hatch.pypa.io/latest/config/build/#build-hooks uild hooks

Ralf -
Hatch has the worst take on building compiled code by some distance. Unless its author starts developing an understanding of build systems / needs, and implements support for PEP 517 build backend hooks in pyproject.toml, it's pretty much a dead end.
****


HEnry: Poetry will move to PEP 621 configuration in version 2.

* pdm, hatch and poetry all have "ways" of supporting c extensions via pdm-build, hatchling and poetry's build back end.
* poetry's support for C extensions is not fully developed and documented (yet). * Poetry doesn't offer a way to facilitate "communication" between poetry front end and another back end like meson to build via a build hook. so while some have used it with other back end builds it's not ideal for this application
* pdm and poetry both rely on setuptools for C extensions. pdm's support claims to be fully developed and documented. poetry claims nothing, and doesn't document it.
* hatch both offers a plugin type approach to support custom build steps
PDM (right now) is the only tool that supports other back ends (hatch is working on this - 2 minor releases away)
At some point a build becomes so complex that you need to use a tool like scikit or meson to support that complexity.



**Setuptools** is the oldest tool in the above list. While it doesn't have a
friendly user front end, because "OG" tool that has been used for Python packaging for over a decade, we discuss it here.

**Hatch** and PDM are newer, more modern tool that support customization of any
part of your packaging steps. These tools also support some C and C++
extensions.


OFEK - Why use hatchlin vs pdm back end -
File inclusion is more configurable and easier by default
There is already a rich ecosystem of plugins and a well-thought-out interface
Consistency since the official Python packaging tutorial uses Hatchling by default


Henry -
The scikit-hep cookie provides 11 backends including flit-core and hatchling, and I've moved packaging to flit-core, and lots of other things to hatchling, and I can say that hatching's defaults are much nicer than flit-core's. Hatching uses .gitignore to decide what to put in the SDist. Flit-core basically tries to keep its hands off of adding defaults, so you have to configure everything manually. To make it even more confusing, if you use flit instead of a standard tool like build, it will switch to using VCS and those ignored files won't be added - meaning it is really easy to have a project that doesn't support build, including various GitHub Actions. Hatchling wins this by a ton.

<!-- TODO: add - compatible with other build back ends eg pdm can work with hatchling

Eli:
poetry: supports it, but is undocumented and uses setuptools under the hood, they plan to change how this works and then document it
pdm-backend: supports it, and documents it -- and also uses setuptools under the hood
hatchling: permits you to define hooks for you to write your own custom build steps, including to build C++ extensions

-->



<!-- from eli about pdm
It would be more accurate to say that PDM supports using PDM and setuptools at the same time, so you run setuptools to produce the C extensions and then PDM receives the compiled extension files (.so, .pyd) and packages it up alongside the pure Python files.

Comment about hatch.
https://github.com/pyOpenSci/python-package-guide/pull/23#discussion_r1081108118

From ralf: There are no silver bullets here yet, no workflow tool is complete. Both Hatch and PDM are single-author tools, which is another concern. @eli-schwartz's assessment is unfortunately correct here I believe (at a high level at least, not sure about details). Hatch has the worst take on building compiled code by some distance. Unless its author starts developing an understanding of build systems / needs, and implements support for PEP 517 build backend hooks in pyproject.toml, it's pretty much a dead end.

-->

<!--TODO Add examples of builds using each of the tools below?

pdm, hatch and poetry all have "ways" of supporting c extensions via pdm-build, hatchling and poetry's build back end.
poetry's support for C extensions is not fully developed and documented (yet). Poetry doesn't offer a way to facilitate "communication" between poetry front end and another back end like meson to build via a build hook.
PDM and hatch both offer a plugin type approach to support custom build steps
PDM (right now) is the only tool that supports other back ends (hatch is working on this - 2 minor releases away)
At some point a build becomes so complex that you need to use a tool like scikit or meson to support that complexity.

CORRECTIONS:
pdm doesn't use plugins. Hatch does.
pdm and poetry both rely on setuptools for C extensions. pdm's support claims to be fully developed and documented. poetry claims nothing, and doesn't document it.


??
Poetry supports extensions written in other languages but this functionality is
currently undocumented.

Tools such as Setuptools, PDM, Hatch and Poetry all have some level of support
for C and C++ extensions.
Some Python packaging tools,
such as **Flit** and the **flit-core** build back-end only support pure-Python
package builds.
Some front-end packaging tools, such as PDM, allow you to use other
build back-ends such as **meson** and **scikit-build**.


me:
pdm, hatch and poetry all have "ways" of supporting c extensions via pdm-build, hatchling and poetry's build back end.
poetry's support for C extensions is not fully developed and documented (yet). Poetry doesn't offer a way to facilitate "communication" between poetry front end and another back end like meson to build via a build hook.
PDM and hatch both offer a plugin type approach to support custom build steps
PDM (right now) is the only tool that supports other back ends (hatch is working on this - 2 minor releases away)
At some point a build becomes so complex that you need to use a tool like scikit or meson to support that complexity.
@eli-schwartz eli-schwartz 3 weeks ago
PDM and hatch both offer a plugin type approach to support custom build steps

ELI:
pdm doesn't use plugins. Hatch does.
pdm and poetry both rely on setuptools for C extensions. pdm's support claims to be fully developed and documented. poetry claims nothing, and doesn't document it.


https://pdm.fming.dev/latest/pyproject/build/#build-platform-specific-wheels
-->


<!-- https://github.com/pyOpenSci/python-package-guide/pull/23#discussion_r1071541329
ELI: A complex build could mean running a python script that processes some data file and produces a pure python module file.

Probably not common in the scientific community specifically, but I've seen quite a few setup.py files that contain custom build stages which e.g. build gettext locale catalogs.

The main point is that it is more "complex" than simply copying files or directories as-is into the built wheel.
-->
88 changes: 88 additions & 0 deletions package-structure-code/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Python package structure information

This section provides guidance on your Python package's structure, code formats and style. It also reviews the various packaging tools that you can use to
support building and publishing your package.

```{note}
If you are considering submitting a package for peer review, have a look at the
bare-minimum [editor checks](https://www.pyopensci.org/peer-review-guide/software-peer-review-guide/editor-in-chief-guide.html#editor-checklist-template) that pyOpenSci
performs before a review begins. These checks are useful to explore
for both authors planning to submit a package to us for review and for
anyone who is just getting started with creating a Python package.

In general these are basic items that should be in any open software repository.
```

## What you will learn here

In this section of our Python packaging guide, we:

* Provide an overview of the options available to you when packaging your tool
* Suggest tools and approaches that both meet your needs and also support existing standards.
* Suggest tools and approaches that will allow you to expand upon a workflow that may begin as a pure Python tool and evolve into a tool that requires addition layers of complexity in the packaging build.
* Align our suggestions with the most current, accepted
[PEPs (Python Enhancement Protocols)](https://peps.python.org/pep-0000/) and the [scientific-python community SPECs](https://scientific-python.org/specs/).
* In an effort to maintain consistency within our community , we also align with existing best practices being implemented by developers of core Scientific Python packages such as Numpy, SciPy and others.

## Guidelines for pyOpenSci's packaging recommendations

<!-- Might belong on the LANDING page for this entire guide?-->

The flexibility of the Python programming language lends itself to a diverse
range of tool options for creating a Python package. Python is so flexible that
it is one of the few languages that can be used to wrap around other languages.

If you are building a pure Python package, then your packaging setup can be
simple. However, some scientific packages have complex requirements as they may
need to support extensions or tools written in other languages such as C or C++.

To support the many different uses of Python, there are many ways to create a
Python package. In this guide, we suggest approaches for packaging approaches and tools based
upon:

1. What we think will be best and easiest to adopt for those who are newer to packaging
2. Tools that we think are well maintained and documented.
3. A shared goal of standardizing packaging approaches across this (scientific) Python ecosystem.

Here, we also try to align our suggestions with the most current, accepted
[Python community](https://packaging.python.org/en/latest/) and [scientific community](https://scientific-python.org/specs/).


```{admonition} Suggestions in this guide are not pyOpenSci review requirements
:class: important

The suggestions for package layout in this section are made with the
intent of being helpful; they are not specific requirements for your
package to be reviewed and accepted into our pyOpenSci open source ecosystem.

Please check out our [package scope page](https://www.pyopensci.org/software-peer-review/about/package-scope.html) and [review requirements in our author guide](https://www.pyopensci.org/software-peer-review/how-to/author-guide.html#) if you are looking for Python package review requirements!
```

<!--
```{tip}
### Python packaging resources that we love

We think the resources below are excellent but each have particular opinions
that you may or may not find in our packaging guide. For instance, the PyPA
guide encourages users to store their package in a `src/package-name` directory.
While we accept that approach many of our community members prefer to not use
the `src` directory.

* [Python packaging for research software engineers](https://merely-useful.tech/py-rse/)
* [PyPA packaging guide](https://packaging.python.org/en/latest/)
```
-->


```{toctree}
:hidden:
:caption: Package structure & code style

Intro <self>

Python package structure <python-package-structure>
pyproject.toml Package Metadata <pyproject-toml-python-package-metadata>
What are SDist & Wheel Files? <python-package-distribution-files-sdist-wheel>
Package Build Tools <python-package-build-tools>
Complex Builds <complex-python-package-builds>
```
104 changes: 104 additions & 0 deletions package-structure-code/pyproject-toml-python-package-metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Use a pyproject.toml file for your package configuration & metadata

The standard file that Python packages use to [specify build requirements and
metadata is called a **pyproject.toml**](https://packaging.python.org/en/latest/specifications/declaring-project-metadata/). Adding metadata, build requirements
and package dependencies to a **pyproject.toml** file replaces storing that
information in a setup.py or setup.cfg file.

The **pyproject.toml** file is written in [TOML (Tom's Obvious, Minimal Language) format](https://toml.io/en/). TOML is an easy-to-read structure that is founded on key: value pairs. Each section in the **pyproject.toml** file contains a `[table identifier]`.
Below that table identifier are key value pairs that
support configuration for that particular table.

### Benefits of using a pyproject.toml file

Including your package's metadata in a separate human-readable **pyproject.toml**
format also allows someone to view the project's metadata in a GitHub repository.

<!-- setup.cfg for project metadata is being deprecated - set setuptools guide and
https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html
pypa -
https://packaging.python.org/en/latest/specifications/declaring-project-metadata/ -->

```{admonition} Setup.py is still useful for complex package builds
:class: tip

Using **setup.py** to manage package builds and metadata [can cause problems with package development](https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html).
In some cases where a Python package build is complex, a **setup.py** file may
be required. While this guide will not cover complex builds, we will provide
resources working with complex builds in the future.

```

### Example pyproject.toml for building using PDM
Below is an example build configuration for a Python project. This example
package setup uses:

* **pdm.pep517.api** to build the [package's SDist and wheels](python-package-distribution-files-sdist-wheel)

```
[build-system]
requires = ["pdm-pep517>=1.0.0"]
build-backend = "pdm.pep517.api"

[project]
name = "examplePy"
authors = [
{name = "Some Maintainer", email = "[email protected]"}
]
maintainers = [{name = "All the contributors"}]
license = {text = "BSD 3-Clause"}
description = "An example Python package used to support Python packaging tutorials"
keywords = ["pyOpenSci", "python packaging"]
readme = "README.md"

dependencies = [
"dependency-package-name-1",
"dependency-package-name-2",
]
```
Notice that dependencies are specified in this file.

### Example pyproject.toml for building using setuptools

The package metadata including authors, keywords, etc is also easy to read.
Below you can see the same toml file that uses a different build system (setuptools).
Notice how simple it is to swap out the tools needed to build this package!

In this example package setup you use:

* **setuptools** to build the [package's SDist and wheels](python-package-distribution-files-sdist-wheel)
* **setuptools_scm** to manage package version updates using version control tags

In the example below `[build-system]` is the first table
of values. It has two keys that specify the build front end and backend for a package:

1. `requires =`
1. `build-backend =`

```
[build-system]
requires = ["setuptools>=45", "setuptools_scm[toml]>=6.2"]
build-backend = "setuptools.build_meta"

[project]
name = "examplePy"
authors = [
{name = "Some Maintainer", email = "[email protected]"}
]
maintainers = [{name = "All the contributors"}]
license = {text = "BSD 3-Clause"}
description = "An example Python package used to support Python packaging tutorials"
keywords = ["pyOpenSci", "python packaging"]
readme = "README.md"

dependencies = [
"dependency-package-name-1",
"dependency-package-name-2",
]
```



```{note}
[Click here to read about our packaging build tools including PDM, setuptools, Poetry and Hatch.](/package-structure-code/python-package-build-tools)
```
Loading