Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add apt datasource for private repositories #7041

Closed
micheelengronne opened this issue Aug 20, 2020 · 11 comments
Closed

Add apt datasource for private repositories #7041

micheelengronne opened this issue Aug 20, 2020 · 11 comments
Labels
new datasource New datasource support priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others status:requirements Full requirements are not yet known, so implementation should not be started type:feature Feature (new functionality)

Comments

@micheelengronne
Copy link

micheelengronne commented Aug 20, 2020

What would you like Renovate to be able to do?

#3722 was closed because we can manage packages pining with public apt repositories through the repology datasource. It does not work for private repositories and any repository that is not indexed by repology.

Did you already have any implementation ideas?

We can use the output of apt-cache policy but that needs an identical /etc/apt directory between the renovate environment running the command and the environment of the project being updated.

See: https://stackoverflow.com/questions/18885820/how-to-check-the-version-before-installing-a-package-using-apt-get

Are there any workarounds or alternative ideas you've tried to avoid needing this feature?

I create Docker layers with the apt packages installed in it to ensure immutability through the Docker datasource. But that is not an optimized solution at all.

@HonkingGoose HonkingGoose added new datasource New datasource support type:feature Feature (new functionality) labels Oct 27, 2020
@HonkingGoose

This comment has been minimized.

@rarkins
Copy link
Collaborator

rarkins commented Oct 27, 2020

There's no formality to worry about with the Project board and Needs requirements is the appropriate column until someone has marked it as otherwise

@rarkins rarkins added the status:requirements Full requirements are not yet known, so implementation should not be started label Jan 12, 2021
@nejch
Copy link
Contributor

nejch commented Mar 7, 2021

Does anyone else here have a use case to manage the dependency versions of actual debian packages themselves, in debian/control files via Depends/Pre-Depends and so on? https://debian-handbook.info/browse/stable/sect.package-meta-information.html

@HonkingGoose HonkingGoose added priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others and removed priority-5-triage labels Mar 8, 2021
@nejch
Copy link
Contributor

nejch commented Jun 25, 2021

So I looked into this a bit more and I think some of the discussion in the previous issue (#3722) is still relevant:

  1. renovate should not rely on any kind of API, as it might not be available in other implementations like aptly, Artifactory, Nexus, and soon GitLab.
  2. renovate should not try to run apt list, because it cannot know the architecture of the target host (e.g. a package may already be available for amd64 but not yet for armhf or so).

IMO the best way forward here is:

  1. define private repo(s) for the datasource with enough information to replicate an entry* in sources.list:
    1. registryUrl / uri
    2. distribution
    3. component(s)
    4. architecture (since renovate cannot know what arch the target host is running on)
  2. renovate fetches the Packages.gz index for the given registry/dist/component/arch, and extracts the package versions, e.g.:
    • ${registryUrl}/dists/${dist}/${component}/binary-${arch}/Packages.gz (arch-specific packages), and also
    • ${registryUrl}/dists/${dist}/${component}/binary-all/Packages.gz (packages for all architectures, like header-only libs that aren't compiled - if the index exists)
  3. *Perhaps two datasources make sense: deb and deb-src, each downloading the index from:
    • ${registryUrl}/dists/${dist}/${component}/binary-${arch}/Packages.gz + all (for deb datasource)
    • ${registryUrl}/dists/${dist}/${component}/sources/Sources.gz (for deb-src datasource)
  4. The actual parsing should be quite easy, an entry in a Package index looks like this:
    Package: debdry
    Version: 0.2.2-1
    Installed-Size: 60
    Maintainer: Debian QA Group <[email protected]>
    Architecture: all
    Replaces: python3-debdry
    Depends: python3:any (>= 3.3.2-2~), python3-apt
    Conflicts: python3-debdry
    Description: Semi-assisted automatic Debian packaging
    Homepage: https://anonscm.debian.org/cgit/collab-maint/debdry.git
    Description-md5: 5df92a437462dcd6581e059ecb2db772
    Section: devel
    Priority: optional
    Filename: pool/main/d/debdry/debdry_0.2.2-1_all.deb
    Size: 11948
    MD5sum: 69dc0ce1206e1e5af2a3e1179067da23
    SHA256: 655e0a454a9d28dad47faed4ffb6f34083803be6234caab8e4a670b9d5ffa298
    
  5. Recommend against, or perhaps even forbid and force use of repology, for any attempts to use this with known public debian repos as they often maintain only 1 version of a package and pinning makes no sense, and also have quite large index files (although these can be cached).

Most of this is documented here: https://wiki.debian.org/DebianRepository
Example of a relatively smaller index: https://dl.bintray.com/netflixoss/debian/dists/wheezy/main/binary-amd64/

Note: the above only deals with the datasource, e.g. for use with regex managers, not how renovate should incorporate that natively into a manager, although I think one could exist for managing debian/control files (see my comment above).

Would love to hear some feedback if I missed anything here, from debian experts from the previous thread @psyb0t @joerocklin @ppmathis @ndbroadbent and also @rarkins if you already had to deal with other datasources that require extra parameters (not just registryUrl or that have multiple platforms.

One thing I've never tried with renovate - can multiple registryUrls/sources be combined for a dependency so that renovate picks the latest from a list of sources, like apt would? Not needed for an initial MVC, just wondering.

@rarkins
Copy link
Collaborator

rarkins commented Jun 26, 2021

Really great info, thanks!

You can indeed combine the results from multiple registries, with registryStrategy=merge in the datasource.

Example from maven:

export const registryStrategy = 'merge';

What are the implications of debian repos having only one version? Is it that if you pin a particular version then your build can "break" at any time because they could update the repo to only have a newer version? In such case, what can Renovate do, repology or otherwise?

@nejch
Copy link
Contributor

nejch commented Jun 26, 2021

What are the implications of debian repos having only one version? Is it that if you pin a particular version then your build can "break" at any time because they could update the repo to only have a newer version? In such case, what can Renovate do, repology or otherwise?

Exactly, it doesn't make that much sense for official debian repos, so it's an issue with any datasource (https://unix.stackexchange.com/q/544432). Maybe the main reason for this specific datasource is the large index files though (50-100MB I think in some cases) to prevent people using it so it doesn't slow down renovate runs. Small private indices can be a few dozen KB and there it makes more sense.

But I think this was already discussed in the previous issue and that's why it makes more sense for internal/private repositories where you can ensure that versions persist - even custom servers based on https://wiki.debian.org/DebianRepository/Setup. You basically just need an HTTP server serving a structure as defined by Debian - this is why fetching the index seems to me the most reliable way.

I had a quick look at other ideas from competitors (no one has implemented this properly yet), and the only idea I could find simply relies on running local apt-cache commands while trying to reproduce the target environment (e.g. Dockerfiles, CI, see dependabot/dependabot-core#2129). So that's another approach but not very helpful when you're trying to target other architectures and scenarios like in Ansible playbooks etc.

@Ka0o0
Copy link

Ka0o0 commented Jan 10, 2022

I've created a branch were I've implemented the Debian package source similarly to what @nejch wrote here and created the PR #13463 .

I've made the architecture for binary packages configurable on a repository basis. My reasoning is that the repository that has a certain package as a dependency knows the destination it will be run at.

I haven't implemented support for source packages (yet) but I think this should be easy.

Currently, my approach is to download the Packages file, extract it and parse it within Node.JS. On my machine, parsing Debian's main amd64 Packages file takes ~800ms. There is plenty room for improvements there though but all in all I think it's not necessary to use the apt-cache tool.

@Ka0o0 Ka0o0 mentioned this issue Jan 10, 2022
9 tasks
@rarkins
Copy link
Collaborator

rarkins commented Jan 11, 2022

@Ka0o0 we would prefer to avoid using child_process in datasources, as that would mean they couldn't be used in browsers at some future point. Is there anything about the process which means you must use child process, or could it be refactored to do everything within JS?

@viceice
Copy link
Member

viceice commented Jan 11, 2022

@rarkins it is, see my review in PR.

@MAGICCC
Copy link

MAGICCC commented Feb 19, 2023

I stumbled across this issue, and would like to see this in renovate, too bad the PR got closed :( Is there currently an alternative available?

@tewfik-ghariani
Copy link

Hii
Is there a chance that the work on this issue will be resumed in the future? ^^

@renovatebot renovatebot locked and limited conversation to collaborators Oct 1, 2023
@rarkins rarkins converted this issue into discussion #24906 Oct 1, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
new datasource New datasource support priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others status:requirements Full requirements are not yet known, so implementation should not be started type:feature Feature (new functionality)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants