Skip to content

Avoid updating Ubuntu Docker image in PR builds#15840

Closed
MiguelWeezardo wants to merge 3 commits intotrinodb:masterfrom
MiguelWeezardo:docker_build_cache
Closed

Avoid updating Ubuntu Docker image in PR builds#15840
MiguelWeezardo wants to merge 3 commits intotrinodb:masterfrom
MiguelWeezardo:docker_build_cache

Conversation

@MiguelWeezardo
Copy link
Member

@MiguelWeezardo MiguelWeezardo commented Jan 25, 2023

Description

Some issues with stability of Ubuntu repositories have surfaced lately.
Our Dockerfile tries to update packages in the base Ubuntu image, and when that fails, the entire maven-checks job fails.

We can use a build cache to avoid running these updates.

Additional context and related issues

#15807 (comment)
https://github.com/trinodb/trino/actions/runs/4004028582/jobs/6872683430

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jan 25, 2023
@MiguelWeezardo MiguelWeezardo force-pushed the docker_build_cache branch 3 times, most recently from 6ce5da2 to aae073a Compare January 26, 2023 23:41
@MiguelWeezardo MiguelWeezardo force-pushed the docker_build_cache branch 5 times, most recently from 452dd7c to 973f8d0 Compare January 29, 2023 22:30
@MiguelWeezardo
Copy link
Member Author

I'm also thinking about running buildx for multiple platforms in parallel: https://github.com/MiguelWeezardo/trino/tree/docker_buildx_parallel

@nineinchnick
Copy link
Member

I'm also thinking about running buildx for multiple platforms in parallel: https://github.com/MiguelWeezardo/trino/tree/docker_buildx_parallel

Cool, but this is out of the scope of this PR. We're trying to improve resiliency against network issues. Let's get this working and merged first, and we can improve the duration of these jobs later.

@nineinchnick
Copy link
Member

CI hit this:

#14 ERROR: error writing layer blob: GitHub Actions is temporarily unavailable. Please visit https://www.githubstatus.com/ for the status of our services.

I started wondering if it'll actually be more reliable.

Also, it doesn't look like it's using separate cache entries for every architecture. If it overwrites the cache for every arch, it won't be able to use it at all. WDYT on how to continue?

@MiguelWeezardo
Copy link
Member Author

CI hit this:

#14 ERROR: error writing layer blob: GitHub Actions is temporarily unavailable. Please visit https://www.githubstatus.com/ for the status of our services.

I started wondering if it'll actually be more reliable.

Also, it doesn't look like it's using separate cache entries for every architecture. If it overwrites the cache for every arch, it won't be able to use it at all. WDYT on how to continue?

I'd like to check if an approach using inline caching will work. It seems to work locally on my laptop.

@MiguelWeezardo MiguelWeezardo force-pushed the docker_build_cache branch 2 times, most recently from 245bf87 to 2cfc118 Compare January 31, 2023 12:07
@MiguelWeezardo
Copy link
Member Author

Some issues with stability of Ubuntu repositories have surfaced lately.
Our Dockerfile tries to update packages in the base Ubuntu image, and
when that fails, the entire `maven-checks` job fails.

We can use layers from the latest trino image to avoid running
these updates.
@MiguelWeezardo MiguelWeezardo force-pushed the docker_build_cache branch 3 times, most recently from ce010b4 to 905095f Compare February 1, 2023 10:50
@MiguelWeezardo MiguelWeezardo force-pushed the docker_build_cache branch 3 times, most recently from a8753db to d147775 Compare February 2, 2023 13:23
@MiguelWeezardo MiguelWeezardo marked this pull request as ready for review February 3, 2023 12:33
@MiguelWeezardo
Copy link
Member Author

It seems that Docker cache cuts down the step time from ~9 minutes to ~7.

Copy link
Member

@nineinchnick nineinchnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nitpicks

Use GitHub Actions cache in Docker BuildX
@MiguelWeezardo MiguelWeezardo added the dx Issues or pull requests related to developer experience label Feb 3, 2023
@findepi
Copy link
Member

findepi commented Feb 6, 2023

platforms: arm64,ppc64le
- name: Test Docker Image
run: core/docker/build.sh
run: core/docker/build.sh -c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some issues with stability of Ubuntu repositories have surfaced lately.
Our Dockerfile tries to update packages in the base Ubuntu image, and
when that fails, the entire maven-checks job fails.

Updates are not necessary (per Docker's philosophy, updates to the base image should be part of base image lifecycle), but we install new stuff too.

We can use layers from the latest trino image to avoid running
these updates.

We already have some retries

echo 'Acquire::Retries "3";' > /etc/apt/apt.conf.d/80-retries && \
echo 'Acquire::http::Timeout "15";' > /etc/apt/apt.conf.d/80-timeouts && \

are these not sufficient?
what can we do to make them sufficient?

This change may get CI working, but we cannot rely on that for automated releases (once we have them), so would be nice to have some more self-contained solution

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this means CI won't see any updates to eclipse-temurin:17-jdk image only until we do a release.
So if eclipse-temurin:17-jdk updates affects us in any way, we won't know about that until our users know about that.

- uses: docker/setup-qemu-action@v2
with:
platforms: arm64,ppc64le
- name: Set up Docker Buildx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not well-versed in buildx.
if you want me to continue the review of this PR, I would recommend splitting this out to a follow-up

@MiguelWeezardo
Copy link
Member Author

MiguelWeezardo commented Feb 6, 2023

Additional context and related issues

#15807 (comment) https://github.com/trinodb/trino/actions/runs/4004028582/jobs/6872683430

@MiguelWeezardo could this be an issue?

Yes, that's the problem this PR is trying to solve by caching.
I've reported it as #15992, thanks for pointing that out.

@MiguelWeezardo
Copy link
Member Author

I'm also considering adding more apt mirrors before running apt update so we have a fallback in case the main Ubuntu repo goes down.

@MiguelWeezardo
Copy link
Member Author

I'm also considering adding more apt mirrors before running apt update so we have a fallback in case the main Ubuntu repo goes down.

Testing of that approach continues in MiguelWeezardo#14

@github-actions
Copy link

github-actions bot commented Mar 3, 2023

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Mar 3, 2023
@mosabua
Copy link
Member

mosabua commented Mar 4, 2023

@MiguelWeezardo and @nineinchnick .. is this work still ongoing and valid or do we want to close this PR?

@nineinchnick
Copy link
Member

We paused the work for now. We got some initial feedback from Piotr that we need to address. We can close this now and reopen later.

@MiguelWeezardo
Copy link
Member Author

The cache doesn't reliably solve the flaky Ubuntu repos issue. This approach might be useful for speeding up builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed dx Issues or pull requests related to developer experience stale

Development

Successfully merging this pull request may close these issues.

4 participants