Skip to content

fixVersioneerSourcesHook: init#417859

Open
YorikSar wants to merge 6 commits intoNixOS:masterfrom
tweag:fix-versioneer-sources-hook
Open

fixVersioneerSourcesHook: init#417859
YorikSar wants to merge 6 commits intoNixOS:masterfrom
tweag:fix-versioneer-sources-hook

Conversation

@YorikSar
Copy link
Contributor

@YorikSar YorikSar commented Jun 18, 2025

Add a hook that would fix a common issue with projects using Versioneer on GitHub: when GitHub runs analog of git archive to generate an archive from a tag, it subsitutes $Format:%d$ with (HEAD -> main, tag: the_tag) when the tag points to the latest commit, but when more commits are added, it becomes (tag: the_tag), which changes source hash.

Packages that are either automatically or manually updated early, get source with this HEAD -> main part and so their sources are not reproducible more often.

Adding this hook to projects that use Versioneer will remove this HEAD -> main part of the description string and make the source reproducible. Here's how it would look, for example:

src = fetchFromGitHub {
  owner = "ANCPLabOldenburg";
  repo = "ancp-bids";
  tag = version;
  hash = "sha256-vmw8SAikvbaHnPOthBQxTbyvDwnnZwCOV97aUogIgxw=";
  nativeBuildInputs = [ fixVersioneerSourcesHook ];
};

I ran into this issue when preparing #416464, the general case of this issue is described in #84312.

To see which Python packages are using Versioneer, I ran following script:

find-versioneer.sh
#!/usr/bin/env bash
e='
with import ./. {
  system = "x86_64-linux";
  overlays = [ ];
  config = { };
};
python3Packages
|> lib.mapAttrsToList (
  name: p:
  if
    lib.isAttrs p
    && lib.isAttrs p.src or null
    && lib.hasPrefix "https://github.com" p.src.url or ""
    && p.src ? owner
    && p.src ? repo
  then
    {
      inherit name;
      inherit (p.src) url owner repo;
    }
  else
    null
)
|> lib.filter (p: p != null |> builtins.tryEval |> (r: r.success && r.value))
|> lib.concatMapStringsSep "\n" (p: "${p.name} ${p.owner} ${p.repo}")
'

eval() {
  nix eval --extra-experimental-features pipe-operators --raw --impure --expr "$e"
}

while read name owner repo; do
  echo -n "$name"
  if gh api --cache 48h "/repos/${owner}/${repo}/contents/versioneer.py" > /dev/null 2>&1; then
    echo
  fi
  echo -ne "\r${name//?/ }\r"
done < <(eval)

And it produced the following list of packages that most likely have unstable source hashes after each release:

Packages
ancp-bids
bayespy
birch
cons
coredis
crochet
dask-jobqueue
dask-yarn
datalad
datalad-gooey
datashape
debugpy
ed25519
eliot
etelemetry
etuples
fipy
flask-limiter
gcsfs
geometric
geopandas
great-expectations
hieroglyph
intake-parquet
junos-eznc
jupyter-repo2docker
kotsu
libusb1
limits
llvmlite
logical-unification
magic-wormhole
matchpy
minikanren
monai
monai-deploy
msgspec
nbconflux
ncclient
ndindex
numba
numbaWithCuda
parallel-ssh
pims
propka
pydata-google-auth
pylatex
pyprecice
pyrevolve
pytest-mpi
python-jsonrpc-server
rembg
s3fs
salmon-mail
shapely
spake2
ssh-python
ssh2-python
strenum
trackpy

Trying to rebuild all of them, I got following packages with broken hashes:

Packages with broken sources
cons
datalad
debugpy
etuples
fipy
intake-parquet
logical-unification
monai
monai-deploy
msgspec
pims
propka
pydata-google-auth
pylatex
rembg
salmon-mail
trackpy

I've added the hook to some packages and fixed hashes if they were broken to show some examples. I picked a small number of packages with no dependants to avoid mass rebuild.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other contributing documentation in corresponding paths.

Add a 👍 reaction to pull requests you find important.

@nix-owners nix-owners bot requested review from bcdarwin and jluttine June 18, 2025 14:01
@github-actions github-actions bot added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 6.topic: python Python is a high-level, general-purpose programming language. labels Jun 18, 2025
@nixpkgs-ci nixpkgs-ci bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Jun 30, 2025
@YorikSar YorikSar force-pushed the fix-versioneer-sources-hook branch 2 times, most recently from 774d318 to 63e9348 Compare July 2, 2025 13:15
@ofborg ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Jul 2, 2025
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/prs-ready-for-review/3032/5635

@YorikSar YorikSar force-pushed the fix-versioneer-sources-hook branch from 63e9348 to 830b0b7 Compare July 3, 2025 11:52
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/fetchfromgithub-and-the-versioneer-fixing-source-reproducibility/66539/1

@nixpkgs-ci nixpkgs-ci bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Jul 16, 2025
YorikSar added 6 commits July 21, 2025 17:36
Add a hook that would fix a common issue with projects using Versioneer
on GitHub: when GitHub runs analog of `git archive` to generate an
archive from a tag, it subsitutes `$Format:%d$` with `(HEAD -> main,
tag: the_tag)` when the tag points to the latest commit, but when more
commits are added, it becomes `(tag: the_tag)`, which changes source
hash.

Packages that are either automatically or manually updated early, get
source with this `HEAD -> main` part and so their sources are not
reproducible more often.

Adding this hook to projects that use Versioneer will remove this
`HEAD -> main` part of the description string and make the source
reproducible. Here's how it would look, for example:

  src = fetchFromGitHub {
    owner = "ANCPLabOldenburg";
    repo = "ancp-bids";
    tag = version;
    hash = "sha256-vmw8SAikvbaHnPOthBQxTbyvDwnnZwCOV97aUogIgxw=";
    nativeBuildInputs = [ fixVersioneerSourcesHook ];
  };

I ran into this issue when preparing
NixOS#416464, the general case of this
issue is described in NixOS#84312.
Also fix resulting hash, which now matches the current tarball.
Also fix resulting hash, which now matches the current tarball.
@YorikSar YorikSar force-pushed the fix-versioneer-sources-hook branch from 830b0b7 to 4d6204a Compare July 21, 2025 15:38
@nixpkgs-ci nixpkgs-ci bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Jul 21, 2025
@nixpkgs-ci nixpkgs-ci bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Aug 6, 2025
YorikSar added a commit to tweag/nixpkgs that referenced this pull request Aug 8, 2025
This option fetches tarball from GitHub based on tree hash instead of
the tag to get unprocessed data from the repo, and then applies a
partial implementation of `export-subst` to replace certain format
strings, but unlike Git uses reproducible values for them.

This includes an example of using this in a package affected by this
non-reproducibility.

Related PR: NixOS#417859
Related issue: NixOS#84312
Inspired by discussion: https://discourse.nixos.org/t/fetchfromgithub-and-the-versioneer-fixing-source-reproducibility/66539
@balsoft balsoft requested a review from pilz0 September 16, 2025 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: python Python is a high-level, general-purpose programming language. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants