Skip to content

Semi-automatic replacement fetchPyPi with fetchFromGitHub#150251

Closed
KAction wants to merge 302 commits intoNixOS:stagingfrom
KAction:replace-pypi-with-git
Closed

Semi-automatic replacement fetchPyPi with fetchFromGitHub#150251
KAction wants to merge 302 commits intoNixOS:stagingfrom
KAction:replace-pypi-with-git

Conversation

@KAction
Copy link
Contributor

@KAction KAction commented Dec 11, 2021

It is common for python packages to contain reduced set of files in pypi
upload, with tests and documentation omited. It makes some sense, since it
reduce amount of data downloaded by "pip install", which only cares about
executable code.

Hence, for nixpkgs it makes sense to use version control system instead. This
pull request converts subset of python packages from fetchPyPi to fetchGitHub.
It does not change any functionality.

Only packages for which I could have figured out git repository automatically,
were changed. Every package changed in this pull request was succesfully built
before and after on my machine.

@github-actions github-actions bot added 6.topic: cinnamon Desktop environment 6.topic: emacs Text editor 6.topic: GNOME GNOME desktop environment and its underlying platform 6.topic: haskell General-purpose, statically typed, purely functional programming language 6.topic: kernel The Linux kernel 6.topic: lua Lua is a powerful, efficient, lightweight, embeddable scripting language. labels Dec 11, 2021
@github-actions github-actions bot removed 6.topic: kernel The Linux kernel 6.topic: steam Steam game store/launcher (store.steampowered.com) 6.topic: cinnamon Desktop environment 6.topic: ruby A dynamic, open source programming language with a focus on simplicity and productivity. 6.topic: vim Advanced text editor labels Dec 11, 2021
src = fetchFromGitHub {
owner = "autocracy";
repo = "python-ipy";
rev = "IPy-1.01";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't maintain Python stuff in Nixpkgs so cannot properly review this PR.

But... maybe the following way is more preferred? (and same for other packages)

Suggested change
rev = "IPy-1.01";
rev = "IPy-${version}";

Copy link
Contributor Author

@KAction KAction Dec 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right. It will be harder to do it automatically, though.

I'll see what I can do

@sternenseemann
Copy link
Member

I'm opposed to this change. PyPi is the primary way to distribute python packages and holds the release tarballs intended for use in packaging. Release Tarballs and version control repositories can differ and we prefer the proper release tarballs in nixpkgs where they exist.

We already do use the GitHub download when tests are missing, but as package-specific workarounds.

@KAction
Copy link
Contributor Author

KAction commented Dec 11, 2021

I convinced many upstream projects to provide release archives, sometimes even signed ones, during my days in Debian, but now I will disagree that release archives should be primary distribution means.

Let us consider common case of project that uses public git as version control, tags for release and also provides release archive, like GNU Hello. So what are reasons to prefer maintainer-provided release archive to auto-generated git archive, or via versa?

  1. Maintainer releases normally include auto-generated architecture/OS-independent resources, like outputs of yacc(1), bison(1) and autoconf(1). On other hand, git-generated archive would require more build tools and deeper understanding of build process. This is advantage for casual user, but irrelevant for distributions at best. At worst, they tempt distribution maintainer into not doing their job properly and not using, e. g autoreconfHook.

  2. Maintainer releases are rarely (raise hand if you ever did it) verified, either on their own nor to match content of the git repository, while (hopefully) thousands of eyes are gazing at git repository. So from social/trust point of view, automatic git archive is preferred.

  3. For languages with built-in "download and run code from the Internet" functionality, there is incentive to omit optional resources to reduce archive size. While distribution also benefits from smaller release archive, win is pretty small, and only for distribution maintainers or CI, while pain of figuring out whether something useful was omitted is much bigger. Keep in mind, distribution end user does not care about side of inputs, only about size of outputs. On other hand, if git snapshot is incomplete, we have much, much bigger problem.

  4. If upstream git hosting does not provide release download functionality, whole git repository will need to be cloned to make snapshot of single commit. This is quite inefficient, but extremely rare -- cgit, gitweb, github, gitlab, sourcehut, repo.or.cz -- they all provide this functionality.

In summary, in world dominated by public git, maintainer releases are useless at best.

@Ma27
Copy link
Member

Ma27 commented Dec 12, 2021

FYI there was a discussion around this before in #142236.

At worst, they tempt distribution maintainer into not doing their job properly and not using, e. g autoreconfHook.

I'm not sure I follow, why is someone "not doing their job properly" when using an existing ./configure?

Maintainer releases are rarely (raise hand if you ever did it) verified, either on their own nor to match content of the git repository, while (hopefully) thousands of eyes are gazing at git repository

I'd argue that if a repository is important enough to be watched by "thousands of eyes", there are also people checking release tarballs.

Regarding the approach here, I'd suggest to discuss this in an issue first before opening such a large PR for two reasons:

  • With tons of packages having a different source now, these will require a rebuild. Changed derivations mean that ofborg pings a lot of people for a change that isn't even safe to be merged in.
  • As you can see, especially python packages tend to change quite fast, so it will have a lot of merge conflicts to be resolved. If it's agreed to do that, one could file a PR and get it in rather quickly without these problems.

@KAction
Copy link
Contributor Author

KAction commented Dec 12, 2021

FYI there was a discussion around this before in #142236.

Yes, I see. It was discussed previously. Mostly the same arguments, but unfortunately ended with nothing.

At worst, they tempt distribution maintainer into not doing their job properly and not using, e. g autoreconfHook.
I'm not sure I follow, why is someone "not doing their job properly" when using an existing ./configure?

For same reasons as using upstream-provided binary is bad, building from the source is good. Attack surface, readiness for patching, etc. Was discussed in issue you mentioned.

Maintainer releases are rarely (raise hand if you ever did it) verified, either on their own nor to match content of the git repository, while (hopefully) thousands of eyes are gazing at git repository
I'd argue that if a repository is important enough to be watched by "thousands of eyes", there are also people checking release tarballs.

You clearly have difference experience from mine. I do review git-diff for some of my dependencies, but I do it using "git log v1.0..v2.0". If distribution uses release archives, and they have differences no present in git, I would be fooled.

Regarding the approach here, I'd suggest to discuss this in an issue first before opening such a large PR for two reasons:

  • With tons of packages having a different source now, these will require a rebuild. Changed derivations mean that ofborg pings a lot of people for a change that isn't even safe to be merged in.
  • As you can see, especially python packages tend to change quite fast, so it will have a lot of merge conflicts to be resolved. If it's agreed to do that, one could file a PR and get it in rather quickly without these problems.

Making it in one PR is going to be hard anyway, since not all packages have git URL in machine-readable format, but yes, you are right, let's try to get RFC first.

Actually, I was quite surprised that this idea met any opposition.

@jonringer
Copy link
Contributor

Actually, I was quite surprised that this idea met any opposition.

This is a no-go for me until we have rev mentioning the version.

And this also doesn't take into account that some packages have more involved processes as part of the sdist, such as doing a node build.

@KAction
Copy link
Contributor Author

KAction commented Dec 12, 2021

Actually, I was quite surprised that this idea met any opposition.

This is a no-go for me until we have rev mentioning the version.

You mean rev = "v${version}?

@adisbladis
Copy link
Member

Apart from the comments from @sternenseemann, which I fully agree with and is reason enough to close this PR there was also the comment from @jonringer:

And this also doesn't take into account that some packages have more involved processes as part of the sdist, such as doing a node build.

Which made me think.. There is no way you actually tested this change.
While you might have tested the builds you would also need to test the run time behaviour which may not be the same any more if there are missing assets and such.

This change is sweeping and without much regard for the fallout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: python Python is a high-level, general-purpose programming language.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants