Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build arm containers with new github actions arm runner (PP-2139) #2280

Merged
merged 6 commits into from
Feb 11, 2025

Conversation

jonathangreen
Copy link
Member

Description

Build our images using the github ARM runners.

This makes several changes:

  • Actually run out unit tests on the ARM image (I think this is a bit win, it was too slow with emulation, but since native libraries can cause issues, its nice to verify that our tests actually run in the built arm image).
  • Technically our images get pushed every commit now, but they are not tagged until the tests pass.
    • This lets us use the images in other workflow jobs easily, while making it unlikely anyone will come across a broken image.
    • It has the advantage that if desired, you could pull the image via its hash for debugging.
  • The build takes place in a (pretty ugly IMO) two step process, where native images are built for each platform, then they are combined together into a manifest and tagged.
    • This is the process that that docker documentation recommends for a multipart build like this, so despite being kind of ugly, it is the officially blessed way to do things.

Motivation and Context

Github now has native arm runners:
community/community#148648

This allows us to build our ARM images faster, and without relying on emulation which was causing issues (see: actions/runner-images#11471).

This work is mainly based on this documentation from docker:
https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners

But adapted to our environment / workflow.

How Has This Been Tested?

  • I did some testing of these workflows on my fork
  • Workflows running in CI on this PR

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@jonathangreen jonathangreen requested a review from a team February 10, 2025 20:14
Copy link

codecov bot commented Feb 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.12%. Comparing base (c88b13b) to head (b76b858).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2280   +/-   ##
=======================================
  Coverage   91.12%   91.12%           
=======================================
  Files         363      363           
  Lines       41327    41327           
  Branches     8846     8846           
=======================================
  Hits        37660    37660           
  Misses       2405     2405           
  Partials     1262     1262           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -51,7 +51,7 @@ runs:
id: cache
with:
path: ${{ steps.poetry-dir.outputs.home }}
key: ${{ runner.os }}-poetry${{ inputs.version }}-install-py${{ steps.python-version.outputs.version }}
key: ${{ runner.os }}-${{ runner.arch }}-poetry${{ inputs.version }}-install-py${{ steps.python-version.outputs.version }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be broken out to a separate PR if we want. Our action to install poetry didn't take into account the runners architecture before, so it tried to use the same cache for both intel and arm, causing poetry installs to fail on whichever platform wasn't in the cache.

This adds the platform to the cache key, so it will work with new arm based runners.

- main
paths:
- .github/workflows/build-base-image.yml
- docker/Dockerfile.baseimage
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now update the base image on pushes to main that modify the base-image workflow or dockerfile. Previously this was done as part of build.yml, but based on the new structure, pushing this within its own workflow made more sense to me.

needs: [build]
permissions:
contents: read
strategy:
fail-fast: false
matrix:
platform: ["linux/amd64", "linux/arm64"]
image: ["scripts", "webapp"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of the time in this workflow is spent pulling the image, rather then running the tests, so it made sense to combine the job for scripts and webapp, since running the tests themselves is quick.

@@ -1,4 +1,4 @@
FROM opensearchproject/opensearch:1 as opensearch
FROM opensearchproject/opensearch:1 AS opensearch
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly necessary, but resolves a build warning about mixed cases in statements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's funny how persnickety Docker is about this. 😂

@@ -1,5 +1,3 @@
version: "3.9"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again not strictly necessary, just resolves a build warning about version being deprecated

# Wait for container to start
wait_for_runit "$container"

# Make sure database initialization completed successfully
timeout 240s grep -q 'Initialization complete' <(docker compose logs "$container" -f 2>&1)
timeout 240s grep -q -e 'Initialization complete' -e "Migrations complete" <(docker compose logs "$container" -f 2>&1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to change this, because there is a race condition now that we test both webapp and scripts in the same container, previously we would have always hit initialization, now the first container to start does it.

from unittest.mock import MagicMock, call
from unittest.mock import MagicMock, call, patch

import pytest
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test changes could be broken out to separate PR. They are necessary because we now run the tests against a container that has a version set. The tests previously assumed this was not the cause, causing some of these tests to fail.

The changes just mock the __version__ variable so it doesn't matter if its set or not, the tests will always behave correctly.

Copy link
Contributor

@tdilauro tdilauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! 🎸

@jonathangreen jonathangreen merged commit b8ba6cc into main Feb 11, 2025
19 checks passed
@jonathangreen jonathangreen deleted the feature/workflow-arm-runner branch February 11, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants