Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRC_URI instead of git fetch #53

Open
Ud71p opened this issue Sep 16, 2017 · 10 comments
Open

SRC_URI instead of git fetch #53

Ud71p opened this issue Sep 16, 2017 · 10 comments
Labels

Comments

@Ud71p
Copy link

Ud71p commented Sep 16, 2017

Hi, and thanks a lot for great PaleMoon ebuilds. I use them all the
time, and are very happy, except one small detail - use of git
fetching to get the sources. I always have to modify the ebuild to
change to the standard portage fetch.

Here are some reasons why standard fetch (using SRC_URI) is better
than git-r3_fetch:

Most reasons stem from this super-reason:

Verifying ebuild manifests

Package manager (e.g. Portage) will check the hashes of all
downloads. This does not happen for git fetches.

Hashing prevents corruption of data.

The biggest win of hashing is security. It's important all users and
devs build the package from the same source. Otherwise some user under
attack can fetch sources modified to contain malware, and this will
never be detected.

Another win is quality assurrance. It's imperative that everybody,
e.g. the testers of unstable version, and later users after
stabilization, all use exactly same sources. With version control,
people can do changes and re-release under same tag/version. This of
course shound be forbidden, but I have seen this happen. Hashing
simply prevents this.

I also believe the size of a tarball release is smaller than git clone.

Also git fetches screw up the whole mirroring system. Mirrors are
great for many things, but only work with standard SRC_URI fetches.

Also SRC_URI is better because then emerge -a correctly calculates and
informs the user of the download size.

SRC_URI also is great when multiple emerge is done for same source
(such as when experimenting with what flags to enable, what
optimizations, which gcc version, etc). I think git fetches refetch
all data on each merge?

Using SRC_URI allows the user to resume an abrupted fetch, because
partial file resides in DISTDIR. I don't think the same happens for
git fetches, it could in principle, but if the fetch is into
PORTAGE_TMPDIR, then this is usually cleaned after unsuccesful merge.

SRC_URI = no need to emerge git. git is a dev tool, end users should
not need it just to install packages.

SRC_URI works well together with tools, such as distclean, to clean
disk space after package is uninstalled.

A tarball is also better due to legal woes - licensing is usually
clearer, as the whole file can be easily regarded as one coherent
release, while fetching a git repo is a bit more risky - some files
can have a different license, but of course this happens rarely.

SRC_URI also works together with FETCHCOMMAND. People with special
proxy/firewall/vpn needs can still get the source, not so for git
fetches.

In general, Gentoo doesn't want cvs/svn/git-fetch sources in the tree:

https://devmanual.gentoo.org/ebuild-writing/functions/src_unpack/svn-sources/index.html

So I hope changing palemoon ebuilds to SRC_URI can facilitate their
inclusion in the official Portage tree.

Here are changes needed for SRC_URI instead of git fetch:

< inherit palemoon-2 mozlinguas-palemoon git-r3 eutils flag-o-matic pax-utils

inherit palemoon-2 mozlinguas-palemoon eutils flag-o-matic pax-utils

SRC_URI="https://codeload.github.com/MoonchildProductions/Pale-Moon/tar.gz/${PV}_Release -> ${P}.tar.gz"

< EGIT_REPO_URI="https://github.com/MoonchildProductions/Pale-Moon.git"
< GIT_TAG="${PV}_Release"

< git-r3_fetch ${EGIT_REPO_URI} refs/tags/${GIT_TAG}
< git-r3_checkout

  unpack ${A}
  mv Pale-Moon-${PV}_Release ${P}

This website is not so good, it didn't allow me to upload a file... :-(

@Bfgeshka
Copy link

But git also controls verification...

@Ud71p
Copy link
Author

Ud71p commented Sep 17, 2017

No. It does not. Proof:
https://github.com/deuiore/palemoon-overlay/blob/master/www-client/palemoon/Manifest
There is no hash of the sources there.

In general, imagine you are to retrieve sources from somewhere and want to make sure you will get what you expect. If you don't have any form of hash or signature before retrieval, then it is mathematically impossible to verify that what you get is what you expect. In other words if you have no hash beforehand, then you don't know what data you expect.

You are probably right that git does some verification, but not the kind we need. It probably just verifies that what you have on your disk after the fetch is the same as what resides in the remote repo at that time. Such a check is nice to have to prevent some data traffic corruption, but not what we need from security perspective.

We need a hash verification with some hash we know before we commence the fetch.

An example of an attack to illustrate some risks:
Say user A emerges palemoon with git fetch, and it's all OK. The sources retrieved are same to the ones in repo. Git verification went OK.
Now another user C wants to do the same, but this user is under attack. An attacker B (a malware on palemoon's dev's machine, your-favourite-3-letter-agency, github, etc) modifies the sources in the repo before user C fetches them. Then C's fetch goes perfect. She gets exactly the same sources as are in the repo (with malware). Git verification goes OK. Now the attacker B removes the malware, so all other users get a clean non-malware sources.

The kind of verification we need is to make sure that absolutely all the users get always the same sources for a given ebuild version, and this cannot happen unless there is a hash of the source in the Manifest file.

@deu
Copy link
Owner

deu commented Sep 17, 2017

The ebuilds used to use SRC_URI with the source package being directly taken from the Pale Moon archives. This was changed because of a couple of reasons.

At first, there was no GitHub release and the official source package had broken permissions (I think it still does). After some time they started using GitHub release packages and I switched to those (See this commit).

Unfortunately they weren't consistent. I don't know if it was the Pale Moon developers' fault or GitHub's, but the checksum failed quite often and the Manifest had to be updated every time.
If you glance at the commit history around that time you can see a number of "Updated/Fixed Manifest" commits, and issues were opened because of that.

No complaints were made in that regard once I switched to directly pulling the version tags from git. It still sometimes occurs when changes are made to the language packs, but that doesn't happen nearly as often.

You raise legitimate concerns, but if going back to using GitHub release packages would mean going back to inconsistent packages to redownload and check every time, then it's probably not worth it in the face of the possibility that the Pale Moon GitHub repository could be compromised. Also, I think that that eventuality could be mitigated by starting to check commit hashes.

Just a couple of things though:

SRC_URI also is great when multiple emerge is done for same source
(such as when experimenting with what flags to enable, what
optimizations, which gcc version, etc). I think git fetches refetch
all data on each merge?

Using SRC_URI allows the user to resume an abrupted fetch, because
partial file resides in DISTDIR. I don't think the same happens for
git fetches, it could in principle, but if the fetch is into
PORTAGE_TMPDIR, then this is usually cleaned after unsuccesful merge.

SRC_URI = no need to emerge git. git is a dev tool, end users should
not need it just to install packages.

SRC_URI works well together with tools, such as distclean, to clean
disk space after package is uninstalled.

I think portage should fetch into DISTDIR/git3-src by default, so no, you shouldn't have to refetch all data on each merge and resuming an abrupted fetch should work fine.
And you mean emerge --depclean? Or make distclean? In any case I fail to see how using git-r3 or SRC_URI would make a difference.

All this said, I guess we could test the GitHub release packages once again and see how it goes, but if it starts causing broken Manifests all over again we'll come back to git-r3.

Oh by the way, when you want to paste code you should put it between `s (inline) or ```s (multi-line) to not have GitHub screw up the formatting.

@nick87720z
Copy link

Direct usage of VCS clone/fetch system, with proper eclass assistance, is good for live ebuilds (btw don't see palemoon-9999 there :) ). As for release ebuilds... they usually set SRC_URI to archived tarballs or, if hosting allowes (as github), special url, which fetches zip archive for specific commit.
When submodules are used, either code owner could form custom ebuilds, or...
it is still possible to fetch submodules in SRC_URI, then build complete source tree in custom src_unpack().

Files under git control are signed by definitition (by git).
As for gentoo repo under any vcs control (git, hg, svn, no matter) - such repos are good to have thin-manifest flag, which causes Manifest's to sign only SRC_URI files:
https://wiki.gentoo.org/wiki/Repository_format/metadata/layout.conf#thin-manifests

@deu
Copy link
Owner

deu commented Sep 28, 2017

Actually I didn't realise I could fetch a zip archive for a specific commit with GitHub.
Will look into that since it seems more ideal than the current solution.

@deu
Copy link
Owner

deu commented Sep 28, 2017

Unfortunately that doesn't seem to be a solution either.
From a brief research those are also inconsistent archives.
This seems to be a common problem (see libgit2/libgit2#4343 for example).
Hopefully there's a solution on the horizon: Homebrew/homebrew-core#18044 (comment)

Having a thin Manifest really wouldn't help. From my understanding, that would only cause the files in this repository, so the ebuilds, not to be signed in the Manifest. The problems are really the files outside of it.
Unless you were just making an aside suggestion unrelated to this issue.

@Ud71p
Copy link
Author

Ud71p commented Oct 28, 2017

Even one more thing which works only with SRC_URI and is broken by git fetch is using Tor:

https://wiki.gentoo.org/wiki/Tor#Portage

People who value privacy or want to hide from their ISP/regime/hackers what OS/packages/versions they use can only install SRC_URI and not git-fetched packages.

@nick87720z
Copy link

nick87720z commented Dec 18, 2017

I looked for what archive links may be got by various ways. Example with version 27.6.2.

  1. In releases tabs - two entries, automatically created by github by essence:
    https://github.com/MoonchildProductions/Pale-Moon/archive/27.6.2_Release.tar.gz
    https://github.com/MoonchildProductions/Pale-Moon/archive/27.6.2_Release.zip

  2. Again from releases page - on sidebar you can see two link: tag-name, linking to tree view for commit, associated with tag. Following this link, then selecting "Clone or download"->"Download ZIP", you will get same link, as by first way.
    https://github.com/MoonchildProductions/Pale-Moon/archive/27.6.2_Release.zip

  3. Following to commit view, from which in turn - to tree view, and again - "Clone or download"->"Download ZIP". This time archive link is based on commit hash.
    https://github.com/MoonchildProductions/Pale-Moon/archive/ce6529faeb2f0c11c832a34570c79d04707c3255.zip
    However, replacing extention to tar.gz also works (though not obvious):
    https://github.com/MoonchildProductions/Pale-Moon/archive/ce6529faeb2f0c11c832a34570c79d04707c3255.tar.gz

As for as i can understand, in all these cases github doesn't prepare archives, but generates them on demand. Though for release tags... i don't know, it could prepare them as well.

As for as i can understand, these commit/tag-based urls should give same content, as git-checkout.

I don't know, how really releases page is maintained - may be they mark certain tags as releases, or there is some format for tag names.

@sedimentation-fault
Copy link

sedimentation-fault commented May 2, 2020

In my understanding, there is another problem with using Git fetches:

Suppose I realized that, for some version I installed, something did not go as expected (say, something like this here: #81). Also, suppose that this "something" has its root deeply buried into some incompatibility (or whatever else) introduced in the latest Git commits. I do see two directories with current dates in portage's DISTDIR/git3-src/:

MoonchildProductions_Pale-Moon.git
MoonchildProductions_UXP.git

I decide to revert to an older version, for which I know the problem did not occur. But since the ebuilds always fetches the current version of Pale-Moon.git and UXP.git (i.e. since the directories do not contain commit or version information in their names), I lose don't I?

@deu
Copy link
Owner

deu commented May 4, 2020

@sedimentation-fault No, it doesn't work like that. It doesn't matter what those directories contain at any given time. When you emerge an ebuild for a specific version, it always checkouts and builds the specific version the ebuild specifies, so you don't have to worry on that front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants