-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Migrate development workflow to Pixi #10888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Using the bare-minimum.yml requirements file to act as a starting point to build the composable environments - Add pixi.lock to gitignore (no need to commit lock files in library repos) - Update .gitattributes (automatically done by pixi) - Configure xarray as source dependency with dynamic versioning
… files Already ported to pixi
Already migrated to pixi
Update requirements files to remove deps handled by Pixi
Handled by pixi
Handled by pixi
Handled by pixi
Handled by pixi
Handled by pixi
Handled by pixi
Handled by pixi
Handled by pixi
Handled by pixi
Handled by pixi
Handled by pixi
Handled by pixi
|
cc @kmuehlbauer |
|
I'm looking for the first time in a pixi-based setup, please bear with me. To my understanding in the current nightly we are using h5py, hdf5, netcdf4-python and libnetcdf from conda forge (where libnetcdf/netcdf4-python as well as h5py are build from the same hdf5. In this pixi based setup it looks like we get h5py-wheel ( https://pypi.anaconda.org/scientific-python-nightly-wheels/simple/h5py/3.15.1/h5py-3.15.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl with bundled hdf5) and netcdf4-wheel (netcdf4-1.7.3-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl with bundled libnetcdf/hdf5). Not sure if compatible versions of hdf5 are in the above two wheels. Do we have any means of control, which versions of the underlying c libraries are bundled into those wheels? |
|
maybe we need to make sure to only install the packages we want from the index ( So for example |
|
@keewis That looks reasonable. Nevertheless, should we aim for nightly testing netcdf4-python, too? I remember vaguely, that we removed that from nightly testing because of some numpy 2 issues. |
|
did we ever do that? As far as I remember I only removed the As for using the |
|
if you're sure the issue is the mismatch in hdf5 versions I'm sure we can try to come up with something (but for now I think it would be easiest to exclude the nightly |
I assume this would mean moving The following diff, however, doesn't work locally for me either: diff --git a/pixi.toml b/pixi.toml
index cdb9553f..dbc28e51 100644
--- a/pixi.toml
+++ b/pixi.toml
@@ -259,6 +259,7 @@ extra-index-urls = [
[feature.nightly.dependencies]
python = "*"
+h5netcdf = "*"
[feature.nightly.pypi-options.dependency-overrides]
dask = { git = "https://github.com/dask/dask" }
@@ -283,7 +284,7 @@ bottleneck = { git = "https://github.com/pydata/bottleneck" }
fsspec = { git = "https://github.com/intake/filesystem_spec" }
nc-time-axis = { git = "https://github.com/SciTools/nc-time-axis" }
flox = { git = "https://github.com/xarray-contrib/flox" }
-h5netcdf = { git = "https://github.com/h5netcdf/h5netcdf" }
+# h5netcdf = { git = "https://github.com/h5netcdf/h5netcdf" } # https://github.com/pydata/xarray/pull/10888#issuecomment-3544194835
opt_einsum = { git = "https://github.com/dgasmith/opt_einsum" }
# sparse = { git = "https://github.com/pydata/sparse"}
Feeling a bit out of my depth here - is having |
I am +1 on getting this in and making |
|
it should be sufficient to add |
keewis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got another two questions on the CI setup
| cache-pixi-lock: | ||
| uses: ./.github/workflows/cache-pixi-lock.yml | ||
| with: | ||
| pixi-version: "v0.58.0" # keep in sync with env var above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why we pin this? I'm asking because this may be a pain to keep in sync or upgrade. If not, would it be possible to use the most recent version, somehow?
If that's not possible, is there a reason why we don't use the environment variable?
| pixi-version: "v0.58.0" # keep in sync with env var above | |
| pixi-version: "${{ env.PIXI_VERSION }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its because of this thread in the Prefix Discord https://discord.com/channels/1082332781146800168/1433751284317687868
TLDR(ish)
I have a CI setup that does the following:
- Generate a
pixi.lockfile based on thepixi.toml(saved under a cache ID of the date and a hash of the contents of thepixi.toml)- Use the cached
pixi.lockto set up the environmentThis workflow has been working quite well, and has been reducing CI times significantly without us having to commit lock files.
Though after some weeks i've run into flakiness (https://github.com/Parcels-code/Parcels/actions/runs/18939417196) for the first time, with the message
pixi install --locked Error: × lock-file not up-to-date with the workspace
There was lockfile incompatability between pixi versions which surfaced since we were using latest pixi all the time, and the lockfile (for that day) was generated with an older version of Pixi hence breaking runs. Nasty to debug since it requires manually clearing caches from GitHub and re-requesting runs.
@lucascolley mentioned in the thread they try to maintain lockfile compatability across versions, but I thought it would be good to pin it in case just to be safe
If that's not possible, is there a reason why we don't use the environment variable?
Will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's not possible, is there a reason why we don't use the environment variable?
Oh yes, the env context isn't available at this level
"Available expression contexts: github, inputs, vars, needs, strategy, matrix"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll have to find a way to automatically upgrade this version through some bot like dependabot, then; there's no way we can keep this up-to-date manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my experience I've never had a particularly pressing reason to bump the Pixi version.
EDIT: of course this would be valuable in case of big security fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can set the Pixi version on a repo level using the vars context https://docs.github.com/en/actions/how-tos/write-workflows/choose-what-workflows-do/use-variables#defining-variables-for-multiple-workflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, the
envcontext isn't available at this level
Is that because you're reusing a workflow? Could we maybe convert the caching of the lock file to an action?
Didn't work for me :/ (testing locally) |
|
I did some experiments myself and realized two things:
This has the ugly side-effect that e.g. I've taken the liberty of pushing three commits to pin python to 3.13, remove the typing feature from nightly, and switch to using the |
|
but that doesn't resolve our issue with mismatching |
is it possible to install all packages which depend on |
|
I don't think so (but I'm not an expert on this), as there is nothing the solver could use to detect which version of hdf5 is contained within the wheel. As soon as you have a fundamental mismatch between the two, you may get segfaults and other weird errors. |
|
perhaps the long-term solution is nightly conda packages, although that will require quite broad ecosystem coordination I imagine |
|
I guess for pure-conda solves something similar to the scientific-python index would be nice |
|
I think this would be somewhat feasible once we can build all the projects we are interested in with |
Add GitHub Codespaces config following https://pixi.sh/latest/integration/editor/vscode/#devcontainer-extension allowing people to develop from anywhere. Waiting for pydata#10888
(posting now for visibility and to request feedback/pixi debugging help)
Overview
Fixes #10732
This PR migrates the dev workflow and CI for Xarray across to Pixi, providing the following benefits:
See the original issue for more info.
Changes so far in this PR:
pixi.tomlsplit apart into features that I thought were sensible . I left outenvironment-benchmarks.yml,binder/environmentas that has interactions with asv, and Binder - this PR is already big enough, and I think those should be explored another time.cache-pixi-lock.ymlworkflow (see below section "Considerations")ci.yaml98% there - for some reason the CI of Pixi is findingwhich pytestto be.pixi/envs/default/Scripts/pytestwhile localpixi run -e test-all-deps-py313 which pytestis finding.pixi/envs/test-all-deps-py313/bin/pytest(seetest-pixi-dustbranch, example action run) . Any ideas why @lucascolley ?I've tried to make the commits tidy to help with reviewing commit by commit, which might be easier. I also was quite diligent when migrating from the old env files to make sure versions were the same.
Testing instructions
Resources: Pixi Scipy 2025 talk | Docs: Manifest Reference
pixi info-> show info about the pixi environmentspixi run docpixi run testthen choose the environment you want to run the tests in (orpixi run -e environment_name test)test-all-deps-py313environment (corresponding to the old environmentci/requirements/environment.yml)pixi run pre-commitpixi run typingEnter an environment (equivalent to
conda activate):pixi shell -e env_nameExit an environment (equivalent to
conda activate):exitor Ctrl+DSee all tasks:
pixi runConsiderations
Lock files o' lock files
There was some interesting conversation in #10732 (comment) about lock files. To summarise:
We have two choices to handle the lock files, either (a) generate them in CI, or (b) commit them to the repo and periodically update them.
(a) generating in CI (done in this PR):
pixi.lockto.gitignoredate + hash(pixi.toml)pixi.lockfile for environment creationPros:
cache-pixi-lock.ymlis re-usable across different projects).Cons:
pixi.lockand what's in CI. Local developers need to periodically deletepixi.lockand regenerate it.(b) commit the lock files
(I think this is the gist of it)
bleeding-edgewhich runs every few days by taking the current lockfile, running an update, and then running tests. Any failures can be automatically reported in an issuepixi.tomlmanifest and talk with upstream to see whats upPros:
Cons:
@lucascolley knows the full extent as he's been exploring this setup at Scipy
Conclusion
Approach (a) has minimal setup/maintenance with little downside. I think that it's a good solution for smaller projects in particular (we've adopted it at Parcels - cc @maxrjones might be interesting based on your comment )
Approach (b) is more robust if having the same environment between all devs is highly valued (@shoyer mentioned during a dev meeting that this would be good for xarray), but requires more setup.
I recommend we go for (a) as is done in this PR, and consider (b) separately .
@lucascolley would it be beneficial to do a write-up of all this on
prefix.devsometime to help guide others dealing with this? I'm happy to write or collab on a blog post.Feedback wanted: To what extent do we promote Conda dev workflows
Yeah - I don't know. In the projects I'm working on I've gone full Pixi, but those are smaller projects.
I've deleted the old environment files to avoid duplication, but can re-add them to the extent which you want to support conda dev workflows.
I've held off on updating the contributing instructions for this reason.
EDIT: Joined the dev meeting - @keewis doesn't think its a bad idea to fully migrate dev instructions from conda to Pixi. Later (if people really want conda instructions) we can show how to use pixi to export a conda compatible env file - no need for us to maintain two separate env files.
I think that's about it! I don't think I've forgotten anything, but it is late on a Friday so maybe - will update if that's the case :)
Let me know if you want me to drop by the dev meeting on 5 Nov - but I'm happy to keep this async otherwise.
(🎉 for my first significant contribution to Xarray!!!)