Skip to content

Conversation

@Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Mar 5, 2025

(created using eb --new-pr)

This fixes a regression introduced by #3483 when handling cargo crates from git repositories after an issue was identified in easybuilders/easybuild-easyconfigs#22450.

Previously we replaced the git repository with

[patch."https://github.com/ritchie46/jsonpath"]
jsonpath_lib = { path = "/dev/shm/polars/0.20.2/gfbf-2023a/easybuild_vendor/jsonpath" }

After that PR we included the revision to ensure the correct source is taken:

[source."https://github.com/ritchie46/jsonpath?rev=24eaf0b4416edff38a4d1b6b17bc4b9f3f047b4b"]
git = "https://github.com/ritchie46/jsonpath"
rev = "24eaf0b4416edff38a4d1b6b17bc4b9f3f047b4b"
replace-with = "vendored-sources"

However a cargo crate might specify a branch instead of a revision:
jsonpath_lib = { version = "0.3", optional = true, git = "https://github.com/ritchie46/jsonpath", branch = "improve_compiled" }

As the resolver is very strict it needs branch = "improve_compiled" under rev = ... in this case. The Cargo.lock file actually has that information but we ignored it

Solution: Change the entry in the easyconfig crates list:

-('jsonpath_lib', '0.3.0', 'https://github.com/ritchie46/jsonpath', '24eaf0b4416edff38a4d1b6b17bc4b9f3f047b4b'),
+('jsonpath_lib', '0.3.0', 'https://github.com/ritchie46/jsonpath', '24eaf0b4416edff38a4d1b6b17bc4b9f3f047b4b', 'improve_compiled'),

This PR implements support for that:

  • The extraction (easyblock run directly) now also prints a 5th tuple element which is either None or the branch
  • If the new crate-tuple is used (branch is None or a valid string) then the new format is used including branch = if the specified branch is True-ish
  • If the old crate-tuple is used (backwards compatibility) the old format is used unless the new format is required:
    • for crates with git workspaces
    • when the same repository but with different revisions is used (by different crates)
      This is required because our old format has no way for the revision and might use the wrong for one of those.
      Using the new format here works unless the crates referenced this with a branch which should be rare enough

I also moved the offline-setup code to an own function to make it a bit easier to handle, and some refactoring of the collection code as pylint was noting e.g. that variables from the outer space are being redefined. Also the output should now be a bit more clear.

@Flamefire
Copy link
Contributor Author

For visibility I'd like to discuss the alternatives we have so far here:

  1. Have tuples of (name, version, rev) and (name, version, rev, branch) (optional branch) and use the (new) [source."{url}?rev={rev}"]entries with optional branch.
  2. Have only (name, version, rev) and detect required branches by searching all Cargo.toml files if any specifies a branch. (PR by @lexming )

Advantage of 1) is that it is the flexibility: The code to print out the list can depend on e.g. the toml package, newer Python versions etc. as it doesn't need to run in the same constraints as the build. Easyconfig authors can change/fix issues that code introduced, e.g. manually changing/adding branch information. However it breaks some easyconfigs that exist, as seen in the polars case.
Advantage of 2) is the easier/cleaner format in EasyConfigs and the backwards compatibility. However there might be corner-cases the parsing code did not anticipate and then fails, but that should be caught in test builds I guess. But the logic is/has to be limited to not re-implement a (full) TOML-parser and might fail or return wrong results. E.g. there are packages that use a dependency of the same name with different versions, although in that case not using git but crates.io. That might be hard to handle without a full parser.

For reference a few formats from cargo docs 1 and 2 (not valid, just using the same crate as example):

Cargo.toml:

[dependencies]
regex = { git = "https://github.com/rust-lang/regex.git" }
regex = { git = "https://github.com/rust-lang/regex.git", rev = "9f9f693" }
regex = { git = "https://github.com/rust-lang/regex.git", branch = "next" }
regex = { git = "https://github.com/rust-lang/regex.git", tag = "1.10.3" }
regex = { version = "1.10.3", git = "https://github.com/rust-lang/regex.git", branch = "next" }
some-crate = { version = "1.0", registry = "my-registry" }
time = "0.1.12"

Cargo.lock:

[[package]]
name = "regex"
version = "1.5.0"
source = "git+https://github.com/rust-lang/regex.git#9f9f693768c584971a4d53bc3c586c33ed3a6831"
[[package]]
name = "regex"
version = "1.5.0"
source = "git+https://github.com/rust-lang/regex.git&rev=9f9f693768c584971a4d53bc3c586c33ed3a6831#9f9f693768c584971a4d53bc3c586c33ed3a6831"
[[package]]
name = "regex"
version = "1.5.0"
source = "git+https://github.com/rust-lang/regex.git&branch=next#9f9f693768c584971a4d53bc3c586c33ed3a6831"

Given the different additional keys in Cargo.toml files the approach 2) might be extendable to that too.
I'm also not sure that different versions of the same crate when using git repositories are possible but I assume so as the (transitive) dependencies might use whatever they like so I'm quite sure it can happen in practice

@lexming
Copy link
Contributor

lexming commented Mar 8, 2025

That's a very good summary.

  • I'm against approach 1) mainly because the branch key has no role in fetching the crate sources in EasyBuild. We want a URL and a commit, always. EasyBuild will not fetch the HEAD of any branch, as that cannot be checksumed. So having the branch info in crates is just noise from that perspective.

  • Approach 2) has some risks in parsing the toml as you point out. I agree there. That exmaple with 3 versions of itertools is scary. But the format in Cargo.toml files is well structured enough to have a reliable parsing of the branch key with regexes.
    We should keep in mind that this is only done for crates cloned from git repos, which are quite rare on their own. Crates from git repos with branches are even more rare. So having a case with multiple of those of the same package is a very exotic case. And the code in the easyblock can easily be improved whenever any of those complicated cases show up.

@lexming lexming added the EasyBuild-5.0 EasyBuild 5.0 label Mar 10, 2025
@boegel boegel added the bug fix label Mar 12, 2025
@boegel boegel added this to the 5.0 milestone Mar 12, 2025
@Flamefire
Copy link
Contributor Author

Flamefire commented Mar 12, 2025

Test report by @Flamefire

Overview of tested easyconfigs (in order)

  • SUCCESS uv-0.4.20-GCCcore-13.3.0.eb
  • SUCCESS polars

Build succeeded for 1 out of 1 (1 easyconfigs in total)
i7008 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.8.17
See https://gist.github.com/Flamefire/30022cad1fbd9506c53e0baed8142e5e for a full test report.

@Flamefire
Copy link
Contributor Author

This should now be ready. I'd like to unify workspace and non-workspace git crates (see Flamefire#9 (comment)) but my easy attempt failed as because in jsonpath/wasm it expects also a .cargo-checksums.toml in addition to the main folder created by cloning https://github.com/ritchie46/jsonpath

I assume it needs one checksum file for every Cargo.toml so the whole workspace-parsing code can be replaced by a glob.

@lexming
Copy link
Contributor

lexming commented Mar 13, 2025

Test report by @lexming

Overview of tested easyconfigs (in order)

  • SUCCESS modkit-0.3.3-GCCcore-12.3.0.eb
  • SUCCESS mfqe-0.5.0-GCC-12.3.0.eb
  • SUCCESS maturin-1.1.0-GCCcore-12.3.0.eb
  • SUCCESS tiktoken-0.7.0-GCCcore-12.3.0.eb
  • SUCCESS kyber-0.4.0-GCC-12.3.0.eb
  • SUCCESS tokenizers-0.15.2-GCCcore-12.3.0.eb
  • SUCCESS juliaup-1.17.9-GCCcore-12.3.0.eb
  • SUCCESS jiter-0.4.1-GCCcore-12.3.0.eb
  • SUCCESS Longshot-1.0.0-GCCcore-12.3.0.eb
  • SUCCESS skani-0.2.2-GCCcore-12.3.0.eb
  • SUCCESS smafa-0.8.0-GCC-12.3.0.eb
  • SUCCESS rustworkx-0.15.1-gfbf-2023a.eb
  • SUCCESS Safetensors-0.4.3-gfbf-2023a.eb
  • SUCCESS cramino-0.14.5-GCC-12.3.0.eb
  • SUCCESS chopper-0.9.0-GCCcore-12.3.0.eb
  • SUCCESS Clarabel.rs-0.7.1-gfbf-2023a.eb
  • SUCCESS fastparquet-2024.11.0-gfbf-2023a.eb
  • SUCCESS cryptography-41.0.1-GCCcore-12.3.0.eb
  • SUCCESS phasius-0.2.0-GCC-12.3.0.eb
  • SUCCESS DeltaLake-0.15.1-gfbf-2023a.eb
  • SUCCESS bcrypt-4.0.1-GCCcore-12.3.0.eb
  • SUCCESS bamtofastq-1.4.1-GCCcore-12.3.0.eb
  • SUCCESS pydantic-2.5.3-GCCcore-12.3.0.eb
  • SUCCESS poetry-1.7.1-GCCcore-12.3.0.eb

Build succeeded for 24 out of 24 (24 easyconfigs in total)
obelia - Linux Fedora Linux 41 (KDE Plasma), x86_64, 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, Python 3.9.21
See https://gist.github.com/lexming/a7381efd7192a4702959d6dc23df714c for a full test report.

@lexming
Copy link
Contributor

lexming commented Mar 13, 2025

Merging, thanks @Flamefire !

@lexming lexming merged commit d3caef1 into easybuilders:develop Mar 13, 2025
41 checks passed
@Flamefire Flamefire deleted the 20250305174305_new_pr_cargo branch March 13, 2025 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants