Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aptly 1.4.0+ds1-6ubuntu0.1 breaks mirror handling: Error decoding remote repo: EOF #1399

Open
robin-checkmk opened this issue Nov 29, 2024 · 16 comments
Assignees
Labels
please confirm resolved We believe the issue is resolved ! if so, please close the issue, thanks ;-)

Comments

@robin-checkmk
Copy link

robin-checkmk commented Nov 29, 2024

Detailed Description

After aptly was updated by unattended upgrades from 1.4.0+ds1-6 to 1.4.0+ds1-6ubuntu0.1 mirror management was broken. The symptom were constant messages like this: Error decoding remote repo: EOF

Context

We use aptly in our CI, and it creates several mirrors and snapshots in one go.
After the upgrade, we saw this behavior:

Mirror [base-bookworm]: http://ftp.de.debian.org/debian/ bookworm successfully added.
You can run 'aptly mirror update base-bookworm' to download repository contents.
2024/11/27 09:06:28 Error decoding remote repo: EOF
2024/11/27 09:06:28 Error decoding remote repo: EOF
2024/11/27 09:06:28 Error decoding remote repo: EOF
ERROR: unable to update: mirror with name base-bookworm not found
Execute: 'aptly -config mirror/aptly.conf mirror list'
List of mirrors:
 * 
 * [base-buster]: http://ftp.de.debian.org/debian/ buster
 * [base]: http://ftp.de.debian.org/debian/ stretch
 * [updates-bookworm]: http://ftp.de.debian.org/debian/ bookworm-updates
 * [updates-buster]: http://ftp.de.debian.org/debian/ buster-updates
 * [updates]: http://ftp.de.debian.org/debian/ stretch-updates

The remarkable things from my point of view:

  • Error decoding remote repo: EOF
  • The empty mirror entry (above the others, only consisting of *

We downgraded back to 1.4.0+ds1-6 and the commands now finish again successfully. However, we still see the aforementioned messages.

Quite frankly, I do not really know where to look, so I am happy for any hints as to how to fix this.

Possible Implementation

Your Environment

Ubuntu 22.04.05 LTS

The update notes for 1.4.0+ds1-6ubuntu0.1 merely mention this:

aptly (1.4.0+ds1-6ubuntu0.1) jammy-security; urgency=medium

  * No change rebuild due to golang-1.18 update. Note that this package
    was built with golang-1.17.

 -- Allen Huang <email address hidden>  Mon, 18 Nov 2024 15:24:14 +0000
@neolynx
Copy link
Member

neolynx commented Nov 29, 2024

since this is not an aptly upstream build, I think it is best to report this to ubuntu. maybe some build dependency does not match ?

@robin-checkmk
Copy link
Author

Thanks for the blazing fast response!

I was afraid you said that, and it makes sense, but I figured bringing it up here as well, as users might pop by. I will look into reporting it to Ubuntu, and will report back, if/what I can learn. If anyone else has ideas and thoughts, do let me know.

P.S.: @neolynx While I have your attention: Any hints towards how I might be able to remove those "empty" mirror entries, which seem to cause the Error decoding remote repo: EOF error? I am afraid, I have to mess with the database directly, but maybe I overlooked something obvious.

@neolynx
Copy link
Member

neolynx commented Dec 2, 2024

aptly 1.4.0 is already quite old. one way might be to use the latest 1.6.0 CI build of aptly, which has a lot of issues fixed. Otherwise go back to the working version from ubuntu.

With a known good version, you can try running aptly db cleanup which compacts the levedb and mayb also fixes your EOF problem. Maybe removing and recreating the mirrors does the trick. There are also official go LevelDB tools for manipulating databases, maybe this can help.

It would be a good idea to backup the .aptly/db directory, in case things go wrong. if you have such a backup already, you might be able to compare the 2 dbs.

hope this helps !

@robin-checkmk
Copy link
Author

Thanks for the hints, we will look into them (already thought about some, actually).

I appreciate you taking the time, thank you!

@ju2wheels
Copy link

ju2wheels commented Dec 2, 2024

Im hitting the same issue on 22.04 with 1.6.0 CI build. Im going to double check I used the right packages but it still seems to be present in the latest release and its an issue that came out of nowhere since it was working fine before a few days ago even with outdated aptly packages.

Maybe removing and recreating the mirrors does the trick.

Can confirm that does work. I still have the old copy of the aptly dir, so ill put it back and try the db cleanup.

[edit] db cleanup/recover commands didnt help, it still produces the same EOF error.

@robin-checkmk
Copy link
Author

@ju2wheels I am glad it is reproducible across releases, which means, it is most certainly an Ubuntu issue. Do you have means to file a bug report with them? I currently lack both the time and an account to do so.

@ju2wheels
Copy link

ju2wheels commented Dec 3, 2024

Its reproducible once it happens and you have a corrupted leveldb, but I really dont think it originated with the aptly side (aside from the fact that it maybe could have been caused by an unhandled exception). I have a few thousand devices running aptly in Docker on container start running on 18.04, 20.04, and 22.04 base with the default aptly version for each upstream distro. I have aptly on RPi 4 18.04 that hasnt yet corrupted its leveldb and its still perfectly able to mirror packages from the upstream Ubuntu 18.04 mirror but I did have this device off for most of last week.

Only 10 of my devices started hitting this EOF issue all starting on Nov 25.

My current theory is something happened to upstream repos last week and caused an unhandled exception in aptly which led to the corrupted leveldb but I dont know anything about this code base to be able to confirm that. But it looks like from your comment that you were mirroring Debian. Im still looking through my logs to see if i can identify the first occurrence on one of my 10 devices and see if it raised a more useful exception before the leveldb EOF errors started showing up.

Just to add that in my case im getting two EOF errors:

Error decoding mirror: EOF

and

Error decoding remote repo: EOF

I do have a Launchpad account so I can open a ticket later if you guys want.

@neolynx
Copy link
Member

neolynx commented Dec 3, 2024

is this happening on arm4 devices only ? or also on amd64 ?

@ju2wheels
Copy link

ju2wheels commented Dec 3, 2024

The 10 devices im currently hitting the issue on are all amd64 and mirroring only amd64.

@neolynx
Copy link
Member

neolynx commented Dec 11, 2024

ok, it is not architecture specific then.

I think there is little aptly can do, I hope ubuntu can fix the build. shall we close this issue ?

@ju2wheels
Copy link

ju2wheels commented Dec 11, 2024

Ive already started deploying 1.6.0 to my devices to try to figure out if this is an issue there or not. I was also able to to find up to 174 devices showing similar logs of leveldb errors of the form

ERROR: leveldb: manifest corrupted (field 'comparer'): missing [file=MANIFEST-001591

except for these it seems aptly is able to recover unlike with the EOF errors.

So IMO, given that my devices are in people's homes with varying level of network connectivity stability, it does kind of point back to the fact that upstream (here whether it be network level or repo level) issues are contributing to some of these corruption issues in aptly.

I havent yet had a chance to start interrupting the network on my Docker containers and forcibly restarting/interrupting aptly mirroring to see if I can reproduce any of these EOF or leveldb errors to appear on 1.6.0 but thats my next step.

@neolynx
Copy link
Member

neolynx commented Dec 11, 2024

that sounds like a big problem indeed.
@ju2wheels drop me a mail to neolynx 8 gmail ~ com if you'd like to discuss your approach...

@neolynx neolynx self-assigned this Dec 11, 2024
@neolynx neolynx added the bug label Dec 11, 2024
@L0sted
Copy link

L0sted commented Dec 12, 2024

In my case, installing aptly from official repository over ubuntu package fixed problem. Everything (mirroring/snapshotting/publishing) seems to be okay. This might not work if database is highly corrupted (i guess), e.g. when there is mirror with no name, like in OP post, in my case i just was not able to update mirror.

@neolynx
Copy link
Member

neolynx commented Jan 11, 2025

aptly 1.6.0 has been released #1414 which includes many bug fixes compared to 1.4.0. It also is built with more up to date dependencies, which might be more stable than the debian builds.

Could you give it a try ?

@neolynx neolynx added please confirm resolved We believe the issue is resolved ! if so, please close the issue, thanks ;-) and removed bug labels Jan 11, 2025
@ju2wheels
Copy link

My team is still in the process of deploying 1.6.0 to all our devices in the field (will likely happen over the next few days). So far, I only had the time to test that manually killing or disconnecting the network when my container during initial starts near the points where its doing aptly mirroring doesnt seem to be able to easily corrupt the leveldb even after leaving that process running in a loop for 3 hrs of starting and killing or disconnecting the container (this was true for 1.6.0 and the distro default aptly).

Its not necessarily the best test as its depending on random timing, but its the best I could do for now.

I have not seen any new occurrences of the corruption on my device fleet after purging the level db on the problematic devices since the Dec. As a result I dont really have any new info here to pinpoint the source of the issue.

@robin-checkmk
Copy link
Author

I am afraid we will not be able to test this soon, but I appreciate everyone collaborating here and will report as soon as we got around to testing. Currently, our workaround (ignoring the Ubuntu update) is stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please confirm resolved We believe the issue is resolved ! if so, please close the issue, thanks ;-)
Projects
None yet
Development

No branches or pull requests

4 participants