Update the location and post-upload command to use the new repository.#26
Update the location and post-upload command to use the new repository.#26nuclearsandwich merged 1 commit intomasterfrom
Conversation
|
This looks reasonable and it's a nice and compact change. Writing the wrapper script is a good idea. I haven't had a chance to test this though. |
|
Once this is ready to be merge please let me know. I could do a patch release of |
|
As @dirk-thomas pointed out in a separate discussion the time it takes for aptly to intake, re-snapshot, and publish all our distributions is more than just "a little bit longer" it's quite egregious. Some possible solutions:
Of these, I'd recommend (1) first and both (1) and (2) second if we continue to have issues with S3 publishing being unreliable. Some extra info: One of the limitations we're dealing with is that aptly repos:one: cannot contain, as far as I am able to determine, multiple distributions. Instead we have one aptly repo for each distribution we support and we publish all of them to the same endpoint. Publishing repositories to S3 is the time-intensive part of the process but it's also unfortunately the one part we cannot parallelize beyond aptly's internal parallelization without option (3) above which I don't recommend. 1️⃣ Here the term "aptly repos" refers to the aptly internal repository structure. aptly is capable of producing a published repository where multiple distributions share the same package pool |
Each Python package defines its own list of targeted distros. We should try to minimize the work on the server to only the targeted distros (if possible). I didn't pay attention during the release of I don't think we should remove the possibility to update EOL distros all together for performance reasons. In the case of
Afaik the bootstrap repository is not a user facing repository. And even the buildfarm only pulls from it in the Therefore I would suggest doing option 2 (using a local webserver only) first and then measure again where we stand. That would also solve the problem of not being able to get a directory listing atm. |
Every distribution gets snapshotted and republished after including new packages in order to avoid the need to do out-of-band detection on which distributions have been changed. We could try parsing out only the distributions we expect to change but it adds to the brittleness of the wrapper script and increases the likelihood that what you think you see is not what you'll get.
It would still be doable, but not via the ros_release_python exclusively. The process would entail accessing the host via ssh and running the aptly snapshot and publish commands manually for affected EOL distros.
The advantages in S3 are mostly administrative. It requires less effort and cost to create and maintain recurring backups of the bootstrap repository contents and is slightly more redundantly available than our one repository host. |
|
Yeah, since this is only used for boostrapping repositories I don't think that the CDN is necessary for this repository. It's only manually queried by our buildfarm and any other buildfarms. On our old host the load was undetectable on their 2nd smallest instance. (2nd smallest because we needed more disk space than their smallest). So I would suggest trying (2) and we could certainly add some extra logic to skip unaffected repositories along the lines of (1) but I agree with Dirk that I'd rather not disable us from pushing to older repos. For backup purposes could we setup the aptly instance to still push to S3 but periodically out of band so that it doesn't slow down either the developer upload experience? |
Yeah this is what I was imagining. Setting it up to do so reliably is straightforward but not trivial work. I'll go ahead and reconfigure things to use a local web server and locally published repository. Option (2). |
|
repos.ros.org is now set up with a locally published repo hosted by nginx. None of the config in this PR has changed. I'm planning to make a small Bloom release sometime this weekend as the next test. |
|
With much thanks to @dirk-thomas for hitting all the issues I seemed not to, I believe we have a relatively stable first round of aptly support. The issue this second round of testing was memory usage from aptly which was saturating the nano EC2 instance holding this repository. I've bumped it up so we should now be quite comfortable memory-wise unless aptly is doing something pathological. There are some improvements that could be made to the system which, while not "out of scope" are lower priority than other things I need to get to this month. Uploading to the new repository is still significantly slower than it was.Aptly has limitations reprepro doesn't with regard to maintaining separate distributions. There are some pre-processing steps that may be able to be eliminated with support from aptly (aptly-dev/aptly#757) but ultimately we're going to be O(n) on snapshot and publish operations where n is the number of distributions the automated tooling supports pushing to. The most dramatic improvement for minimal effort would come from dropping automated support for all or most end-of-life ubuntu and debian distributions which would cut our n down significantly. It would still be possible for us to release into these if necessary bypassing the automated tooling. The most impactful high-effort change would involve modifying the release script here to only snapshot and publish the aptly repositories after all new packages, source and binary, python2 and python3, are pushed and added to the aptly repo. This modification to the release script would have the best hope of getting us back to near the performance of the unified reprepro repository at the cost of no longer being "agnostic" to the repository implementation on the other side of dput. No automatic recovery after upload failureOne of the advantages aptly has over reprepro is that rolling back an undesired change is extremely straightforward, as simple as switching the published snapshots back to older ones. But if the package upload fails partway through the process it leaves the repository in a state that so far requires a significant amount of manual intervention to recover from before another release attempt can be made. Some of this recovery effort can be mitigated with improved error handling in the publishing script to rollback the staging repository state but because each release atomically publishes multiple packages (catkin_pkg for example causes atomic publishes for python-catkin-pkg src, python-catkin-pkg bin, python-catkin-pkg-modules src, python-catkin-pkg-modules bin, python3-catkin-pkg src, python3-catkin-pkg bin, python3-catkin-pkg-modules src, and python3-catkin-pkg-modules bin) and if a later package fails to publish the earlier ones are left behind even after rolling back to the previous snapshot. When the release script is re-run it will try and upload all packages again and because we publish repositories one distribution at a time this causes a conflict between the last uploaded version of a package and the newly uploaded one that will need to be resolved for recovery to work. A way to do that would be to unpublish everything and only re-publish it all once when every distribution has the same package set. The current behavior maximizes uptime of the bootstrap repo sacrificing distribution consistency, but I think few minutes where no packages are available during release is better than a few minutes where distributions are inconsistent. That would also make retries after partial failures easier to manage. |
Not quite ready for merge as I had to re-create the S3 bucket this publishes to due to an esoteric S3 limitation but this configuration change will support the new aptly-backed bootstrap repository. It requires no changes to how the script is used.
Releasers should expect publishing packages to take a bit longer as after being indexed by aptly the new package files and updated repository metadata will be pushed to S3.