Skip to content

python3Packages.pyproj: change data dir priorities#282139

Open
kidanger wants to merge 1 commit intoNixOS:masterfrom
kidanger:pyproj-datadir-order
Open

python3Packages.pyproj: change data dir priorities#282139
kidanger wants to merge 1 commit intoNixOS:masterfrom
kidanger:pyproj-datadir-order

Conversation

@kidanger
Copy link
Member

Description of changes

Because the ${pkgs.proj}/share/proj folder always exist, the user cannot override the data dir with the environment variable PROJ_DATA as usually possible with proj and pyproj. This PR changes the priority, to first check the environment variable before falling back to proj's folder.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added the 6.topic: python Python is a high-level, general-purpose programming language. label Jan 19, 2024
@ofborg ofborg bot added 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels Jan 19, 2024
@imincik
Copy link
Contributor

imincik commented Jan 23, 2024

What about setting multiple paths like this PROJ_DATA=${proj}/share/proj:<PATH2>:<PATH3> ? Can this solve your problem ?

@kidanger
Copy link
Member Author

What about setting multiple paths like this PROJ_DATA=${proj}/share/proj:<PATH2>:<PATH3> ? Can this solve your problem ?

The issue is that the env variable is not even considered by pyproj, since it priorities internal_datadir over proj_lib_dirs/PROJ_DATA.

For a non-nix build of pyproj, the folder referenced by internal_datadir may not exist depending on how it was installed, in which case the user can use PROJ_DATA. For a nix build, this folder always exist, hence this PR.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/week-in-geospatial-team/37035/6

@autra
Copy link
Contributor

autra commented Oct 5, 2024

Do you think this could be proposed upstream? It seems reasonable to me that the env variable always takes precedence.

@imincik
Copy link
Contributor

imincik commented Oct 5, 2024

Do you think this could be proposed upstream? It seems reasonable to me that the env variable always takes precedence.

Yes, I agree.

@kidanger
Copy link
Member Author

kidanger commented Oct 5, 2024

Good idea, I have opened an issue upstream to see if they agree.

@kidanger
Copy link
Member Author

kidanger commented Oct 6, 2024

According to the maintainer, PROJ_DATA is not meant to be used to change pyproj behavior like that.

I guess one can always override the pyproj derivation to achieve ones goal (like bundling different datum grids), and don't play with PROJ_DATA at runtime.

Otherwise, maybe instead of swapping the priorities as I initially proposed in this PR, we can question whether the current patch in nixpkgs is ok or not. Maybe instead of hard-setting the internal_datadir variable, we could change the fallback that looks for the proj binary to the nix store path.

In any case, I don't know if all this is useful for anyone else, it might just be a misunderstanding of pyproj on my part. If that's the case we can close the PR.

@autra
Copy link
Contributor

autra commented Oct 6, 2024

I've skimmed very quickly the conversation, and I think there's a misunderstanding. We are talking from a packager POV and snowman2 is thinking about an end-user using the library.

@autra
Copy link
Contributor

autra commented Oct 6, 2024

I've skimmed very quickly the conversation, and I think there's a misunderstanding. We are talking from a packager POV and snowman2 is thinking about an end-user using the library.

unless I'm the one misunderstanding it, and you really wanted a different proj dir when using this package?

@kidanger
Copy link
Member Author

kidanger commented Oct 6, 2024

I've skimmed very quickly the conversation, and I think there's a misunderstanding. We are talking from a packager POV and snowman2 is thinking about an end-user using the library.

unless I'm the one misunderstanding it, and you really wanted a different proj dir when using this package?

Yes, here is an example of what I used when packaging an application to docker: (might be outdated)

proj_data = pkgs.stdenv.mkDerivation {
  name = "proj_data";
  dontUnpack = true;
  installPhase = ''
    mkdir $out
    cp ${pkgs.fetchurl {
      url = "https://github.com/OSGeo/PROJ-data/raw/a810a6d00ef4e16186e5db9abebb05ca14f2155d/us_nga/us_nga_egm96_15.tif";
      hash = "sha256-20kwJ1YsmwBNciD6iB9WA62tpOHFApuTP6feRUew540=";
    }} $out/us_nga_egm96_15.tif
    cp -ra ${pkgs.proj}/share/proj/* $out
  '';
  outputHashAlgo = "sha256";
  outputHashMode = "recursive";
  outputHash = "sha256-wIwdjOz80BHv0I0E83szpIcgzkw6stVMPOSkTBnaACc=";
};

And adding PROJ_DATA=${proj_data} to the environment variables of the docker image (along with the patch of this PR).

But it's a bit too complex and there might be a simpler way.

@autra
Copy link
Contributor

autra commented Oct 6, 2024

Ah, you actually want to embed only a subset of projdata in a docker image, am I right? I guess it's for saving space?

In this case, I think we should probably package proj-data, because we don't currently (or I didn't find it), and that's a shame. Then, this derivation would be an arg to proj derivation, probably behind a withProjData flag.

Then, you could use your derivation with proj.override { proj_data = proj_data; withProjData = true; }; to build your docker image.

In short, I think that what's missing from the packaging standpoint is actually an argument in the proj derivation. Would that fix your issue?

@kidanger
Copy link
Member Author

kidanger commented Oct 6, 2024

Ah, you actually want to embed only a subset of projdata in a docker image, am I right? I guess it's for saving space?

Yes, I want to add some data to the default ${proj}/share/proj folder, but not bring all of proj-data because of space reasons.

In short, I think that what's missing from the packaging standpoint is actually an argument in the proj derivation. Would that fix your issue?

Indeed, it would solve the issue quite cleanly.

@imincik
Copy link
Contributor

imincik commented Oct 7, 2024

In this case, I think we should probably package proj-data, because we don't currently (or I didn't find it), and that's a shame. Then, this derivation would be an arg to proj derivation, probably behind a withProjData flag.

proj-data PR is here - #290643 .

@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: python Python is a high-level, general-purpose programming language. 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants