Skip to content

use .tar.xz archive for jsonpath_lib sources in polars v0.20.2#22450

Merged
lexming merged 3 commits intoeasybuilders:5.0.xfrom
boegel:20250304110025_new_pr_polars0202
Mar 13, 2025
Merged

use .tar.xz archive for jsonpath_lib sources in polars v0.20.2#22450
lexming merged 3 commits intoeasybuilders:5.0.xfrom
boegel:20250304110025_new_pr_polars0202

Conversation

@boegel
Copy link
Member

@boegel boegel commented Mar 4, 2025

(created using eb --new-pr)

This fixes the first part of the issue with installing this easyconfig:

Missing checksum for jsonpath_lib-0.3.0-24eaf0b4416edff38a4d1b6b17bc4b9f3f047b4b.tar.gz in {'jsonpath_lib-0.3.0.tar.gz':
'dfb5eb7f47b5b5a7b35262f7b41787a785958001a23ff039cdae926396c6af96'}

More work will be needed, because with the checksum key updated, the installation still fails with:

== 2025-03-04 10:53:05,667 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/base/exceptions.py:126 in __init__): cmd " /software/Python/3.11.3-GCCcore-12.3.0/bin/python -m pip install --prefix=/tmp/software/polars/0.20.2-gfbf-2023a  --verbose  --no-deps  --ignore-installed  --no-index  --no-build-isolation  .[deltalake,matplotlib,numpy,openpyxl,pandas,pyarrow]" exited with exit code 1 and output:
Using pip 23.1.2 from /software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/pip (python 3.11)
Processing /tmp/easybuild/polars/0.20.2/gfbf-2023a/polars-0.20.2
  Preparing metadata (pyproject.toml): started
  Running command Preparing metadata (pyproject.toml)
  error: failed to get `jsonpath_lib` as a dependency of package `polars-ops v0.35.4 (/tmp/easybuild/polars/0.20.2/gfbf-2023a/polars-0.20.2/crates/polars-ops)`

  Caused by:
    failed to load source for dependency `jsonpath_lib`

  Caused by:
    Unable to update https://github.com/ritchie46/jsonpath?branch=improve_compiled#24eaf0b4

  Caused by:
    can't checkout from 'https://github.com/ritchie46/jsonpath': you are in the offline mode (--offline)
   maturin failed
    Caused by: Cargo metadata failed. Does your crate compile with `cargo build`?
    Caused by: `cargo metadata` exited with an error:
  Error running maturin: Command '['maturin', 'pep517', 'write-dist-info', '--metadata-directory', '/tmp/eb-mcr6xkpm/pip-modern-metadata-9ven5akl', '--interpreter', '/software/Python/3.11.3-GCCcore-12.3.0/bin/python']' returned non-zero exit status 1.
  Checking for Rust toolchain....
  Running `maturin pep517 write-dist-info --metadata-directory /tmp/eb-mcr6xkpm/pip-modern-metadata-9ven5akl --interpreter /software/Python/3.11.3-GCCcore-12.3.0/bin/python`
  error: subprocess-exited-with-error

Note that with EasyBuild 4.x, you will need to use eb --ignore-checksums, because the source tarball for jsonpath_lib is created via git_config (in EasyBuild 5.x the checksum for the source tarball will be reproducible, thanks to easybuilders/easybuild-framework#4660)

edit: requires:

@github-actions github-actions bot added the change label Mar 4, 2025
@boegel boegel added bug fix and removed change labels Mar 4, 2025
@boegel boegel added this to the release after 4.9.4 milestone Mar 4, 2025
@boegel
Copy link
Member Author

boegel commented Mar 4, 2025

The log shows that the unpacked jsonpath_lib tarball is done originally in the `` subdirectory, and it then gets moved afterwards:

== 2025-03-04 10:52:34,288 cargo.py:319 DEBUG Unpacked sources of jsonpath_lib-0.3.0-24eaf0b4416edff38a4d1b6b17bc4b9f3f047b4b.tar.gz into: /tmp/easybuild/polars/0.20.2/gfbf-2023a/easybuild_vendor_git/jsonpath
== 2025-03-04 10:52:34,288 cargo.py:337 INFO creating .cargo-checksums.json file for jsonpath
== 2025-03-04 10:52:34,289 filetools.py:1943 DEBUG Not creating existing path /tmp/easybuild/polars/0.20.2/gfbf-2023a/easybuild_vendor_git/jsonpath
== 2025-03-04 10:52:34,289 filetools.py:1943 DEBUG Not creating existing path /tmp/easybuild/polars/0.20.2/gfbf-2023a/easybuild_vendor
== 2025-03-04 10:52:34,289 filetools.py:2804 INFO /tmp/easybuild/polars/0.20.2/gfbf-2023a/easybuild_vendor_git/jsonpath moved to /tmp/easybuild/polars/0.20.2/gfbf-2023a/easybuild_vendor/jsonpath

See also easybuilders/easybuild-easyblocks#3567

@Micket @Flamefire @lexming Any ideas on this?

@github-actions github-actions bot added the change label Mar 4, 2025
@Flamefire
Copy link
Contributor

I investigated that a bit (BTW: The EB 5 reprod shell is awesome!):

https://github.com/ritchie46/jsonpath?branch=improve_compiled#24eaf0b4

This is requesting a specific branch. In particular the dependency is specified in Cargo.toml as

jsonpath_lib = { version = "0.3", optional = true, git = "https://github.com/ritchie46/jsonpath", branch = "improve_compiled" }

However we do not store the branch information in the easyconfig so we only have:

[source."https://github.com/ritchie46/jsonpath?rev=24eaf0b4416edff38a4d1b6b17bc4b9f3f047b4b"]
git = "https://github.com/ritchie46/jsonpath"
rev = "24eaf0b4416edff38a4d1b6b17bc4b9f3f047b4b"
replace-with = "vendored-sources"

adding branch = "improve_compiled" under rev makes the build start.

I'll investigate how that had worked before.

@Flamefire
Copy link
Contributor

Previously we had this in config.toml which misses any information about branch or revision:

[patch."https://github.com/ritchie46/jsonpath"]
jsonpath_lib = { path = "/dev/shm/polars/0.20.2/gfbf-2023a/easybuild_vendor/jsonpath" }

As an easy way out we can go back to that in most cases, but I'm not fully sure if there could be conflicts if we had cases where the same repo and different revisions/branches were used for different sources.
I guess we can use this old format when we detect that there is no conflict and the new format if there are conflicts detected or if its required due to that issue where multiple crates are in the same folder.

In any case we have to implement support for storing the branch in the crates list.
Question is: Shall we add (even more) logic to avoid e.g. this failure or just update the (hopefully few) affected easyconfigs?

{'js-sys-0.3.66.tar.gz': 'cee9c64da59eae3b50095c18d3e74f8b73c0b86d2792824ff01bbce68ba229ca'},
{'jsonpath_lib-0.3.0.tar.gz': 'dfb5eb7f47b5b5a7b35262f7b41787a785958001a23ff039cdae926396c6af96'},
{'jsonpath_lib-0.3.0-24eaf0b4416edff38a4d1b6b17bc4b9f3f047b4b.tar.gz':
'dfb5eb7f47b5b5a7b35262f7b41787a785958001a23ff039cdae926396c6af96'},
Copy link
Contributor

@lexming lexming Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of the issue is that this repo is currently cloned with its .git, which makes the checksum worthless

Copy link
Contributor

@lexming lexming Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a source from github, isn't it? So for the crates can't we "rewrite" them to archive downloads instead of git checkouts as we do now? I mean it is us anyway creating a "source" entry from a crate tuple

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial improvement to the checksum verification failures: easybuilders/easybuild-framework#4798

Copy link
Contributor

@lexming lexming Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final fix related to the checksums:

  1. this checksum definition needs to be changed to:
{'jsonpath_lib-0.3.0-24eaf0b4.tar.xz': 'cc7ed586a3047975d1d9c2ab8e2c555403fd73ee94358746476053610dc7024f'},
  1. cargo easyblock needs a little update to not use .tar.gz on these sources. I guess I'll PR this change to @Flamefire PR Fix cargo build when git branch is referenced easybuild-easyblocks#3654

@Flamefire
Copy link
Contributor

Flamefire commented Mar 6, 2025

I have updated the cargo easyblock as described using the approach that allows specifying the branch (and output it in the collected crates when the file is run on a cargo project) and also handles the existing EC in a backwards compatible way.

Testing with the easyconfigs that were the reason for the change in the easyblock: #21637, #21638

@Flamefire
Copy link
Contributor

Test report by @Flamefire
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3654
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
i7008 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.8.17
See https://gist.github.com/Flamefire/7c018c5c54fbcd4c46c5578fcf6b0649 for a full test report.

@lexming
Copy link
Contributor

lexming commented Mar 6, 2025

Minimum changes to make this work:

  • use a .tar.xz for the sources of jsonpath_lib
  • add the branch tag to source specification of jsonpath_lib in cargo.toml

So pretty minor stuff. No idea how this worked before though. The dependency on jsonpath_lib must have changed/gained the requirement on that branch but we use the same version of Rust and the same sources of polars as in the original PR.

@Flamefire
Copy link
Contributor

Flamefire commented Mar 6, 2025

So pretty minor stuff. No idea how this worked before though. The dependency on jsonpath_lib must have changed/gained the requirement on that branch but we use the same version of Rust and the same sources of polars as in the original PR.

I tried to explain that above but it is a bit hard to understand. I'll try a bit different:

  • Prior to my (previously merged) easyblock PR we basically replaced the whole github repository with our vendored source without any respect for branch or revision. Basically sudo use this!
  • The change now was to include the revision, I.e. like telling cargo "The GitHub repo with this revision is here: ..."
  • This worked for all tested ECs but now this one here doesn't ask for a specific revision but for a branch. And as with (our) change we neither say where a GitHub repo with that branch is nor that it should use a specific folder for all revisions/branches of a repo it failed to find one.

Hence we either need to go back to the previous approach or include the branch. easybuilders/easybuild-easyblocks#3654 does both: If a branch is specified, even if it is only "no branch required" (by specifying None) we use the new approach, else we use the old one.
So it works with the current EC as-is, see my test report.

But better would be to add the branch to the source specification which also serves as a test and example for the change to the easyblock

Anyway you are right with the 2 suggest changes.

@Flamefire
Copy link
Contributor

Test report by @Flamefire
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3654
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
i7008 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.8.17
See https://gist.github.com/Flamefire/88cd6efce12f63ef9796f096b4496755 for a full test report.

@boegel
Copy link
Member Author

boegel commented Mar 13, 2025

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22450 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22450 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5935

Test results coming soon (I hope)...

Details

- notification for comment with ID 2721033220 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/152ae279d2a80291526ddc43b4c3f14c for a full test report.

@Flamefire
Copy link
Contributor

== 2025-03-13 12:10:30,592 easyblock.py:4305 WARNING build failed (first 300 chars): git_config currently only supports filename ending in .tar.gz

This needs the EB 5.0x branch. So the easyblocks develop needs to be merged to 5.0x first.

@lexming
Copy link
Contributor

lexming commented Mar 13, 2025

@Flamefire I'm on it

@lexming lexming changed the title fix polars 0.20.2 easyconfig (WIP) use .tar.xz archive for jsonpath_lib sources in polars v0.20.2 Mar 13, 2025
@lexming lexming changed the base branch from develop to 5.0.x March 13, 2025 13:25
@lexming lexming added the EasyBuild-5.0 EasyBuild 5.0 label Mar 13, 2025
@lexming lexming modified the milestones: release after 4.9.4, 5.0.0 Mar 13, 2025
@lexming
Copy link
Contributor

lexming commented Mar 13, 2025

@boegelbot please test @ jsc-zen3
EB_BRANCH=5.0.x

@boegelbot
Copy link
Collaborator

@lexming: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ "5.0.x" != 'develop' ]]; then EB_BRANCH="5.0.x" ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/"5.0.x" source init_env_easybuild_develop.sh; fi; EB_PR=22450 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH="5.0.x" /opt/software/slurm/bin/sbatch --job-name test_PR_22450 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5936

Test results coming soon (I hope)...

Details

- notification for comment with ID 2721617626 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@lexming
Copy link
Contributor

lexming commented Mar 13, 2025

Test report by @lexming
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
obelia - Linux Fedora Linux 41 (KDE Plasma), x86_64, 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, Python 3.9.21
See https://gist.github.com/lexming/bacb80874ded108623986b7028f17eba for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/d962f37e159fdbb554066245a4a29705 for a full test report.

Copy link
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lexming
Copy link
Contributor

lexming commented Mar 13, 2025

Merging, thanks @boegel @Flamefire !

@lexming lexming merged commit cd4901d into easybuilders:5.0.x Mar 13, 2025
10 checks passed
@boegel boegel changed the title use .tar.xz archive for jsonpath_lib sources in polars v0.20.2 use .tar.xz archive for jsonpath_lib sources in polars v0.20.2 Mar 13, 2025
@boegel boegel removed the change label Mar 13, 2025
@boegel boegel deleted the 20250304110025_new_pr_polars0202 branch March 16, 2025 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants