Skip to content

Added multiarch build support for teleport-operator#16688

Merged
fheinecke merged 61 commits into
masterfrom
fred/multiarch-teleport-container-images
Oct 19, 2022
Merged

Added multiarch build support for teleport-operator#16688
fheinecke merged 61 commits into
masterfrom
fred/multiarch-teleport-container-images

Conversation

@fheinecke
Copy link
Copy Markdown
Contributor

@fheinecke fheinecke commented Sep 23, 2022

This is part one of a series of PRs updating drone/dronegen to support multiarch builds.

The primary purpose of this PR is to add multiarch Teleport operator container images. This includes amd64, arm, and arm64 images under the $MAJOR_VERSION tag on each release.

Due to the complexity of how we handle building and publishing container images this PR includes a number of other changes:

  • Removed old/unneeded Makefile targets that pertain to building images.
  • Updated makefiles to use the new Dockerfile
  • Refactored several Dronegen functions that are now used in multiple pipelines
  • Refactored several makefile targets to support local archive building, deb building, and image building from source

Regarding multiarch container builds the following features have been added:

  • Added major, minor, and canonical tags based upon release
  • Separate arch-specific images/tags for each major version
  • Image manifest for each major version that "points" to each arch-specific image
  • Publishing to all three container repos (ECR staging, ECR production, Quay)
  • Highly parallelized/efficient container image pipelines that properly build a DAG for step dependencies, executing as many as possible at once
  • All images will continue to be rebuilt/update daily to ensure the latest upstream packages are installed, across the latest three major Teleport versions
  • Wrote dronegen so that it should be very easy to add/remove Dockerfiles (such as the Teleport, Teleport Lab), ISAs, and container repositories

After this PR is merged I will open several much smaller PRs for Teleport and Teleport lab.

Related PR: #15514 #3384

Test Drone run: https://drone.platform.teleport.sh/gravitational/teleport/15845

@github-actions
Copy link
Copy Markdown
Contributor

@fheinecke - this PR is large and will require admin approval to merge. Consider breaking it up into a series smaller changes.

@fheinecke fheinecke requested a review from logand22 September 26, 2022 17:36
Copy link
Copy Markdown
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fheinecke A couple of meta-comments too:

  • The Drone file is MASSIVE now. Does Drone have a concept of imports/includes, or can we for example factor out all cron-related stuff into a separate Drone file? I would also try to move some things that we can into scripts/programs (I left a couple of related comments).
  • Because the changeset is so big, it's pretty scary to put in production right away. Can we make sure that all new pipelines that build multiarch images have failures ignored on them so once this merges we can run for a few releases without their failures (if any) affecting anything and observe general behavior?

@logand22 Can you please give this another look? This got split from the original's @fheinecke's multi-arch PR in the attempt to land it in chunks.

Comment thread .drone.yml Outdated
Comment thread .drone.yml Outdated
Comment thread Makefile
Comment thread dronegen/container_images.go Outdated
Copy link
Copy Markdown
Contributor

@logand22 logand22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including a passing build in the Comments.

This wasn't necessarily what I had in mind for reducing the size of the PR. It seems like you have a lot of changes still that are orthogonal in goals that could be done completely separately.

  1. Consolidate to a single Dockerfile
  2. Add multi-arch builds
  3. Refactor build pipeline to use dronegen.

Not including operator and teleport lab doesn't necessary reduce this PR by that much.

Other than the small comments I left in the changed files I have questions over the following:

  • Do you need to reference an updated teleport.e ref for teleport-ent images to work?
  • Can you include examples of promote event running as well as push events?
  • Why is the promote event building images? Shouldn't it just be pulling and retagging existing images?
  • Your updates include changes to the cronjob that automatically keep the latest 10.x.x 10.x and 10 version up-to-date daily vs. just 10 (Same for previous versions). Is this intended?

Per @r0mant comment regarding drone file size. I believe drone supports both templates and configuration extensions that could help reduce the size. However this seems either out of scope or a prerequisite for these changes.

Comment thread build.assets/Dockerfile-fips Outdated
Comment thread docs/postrelease.md Outdated
@fheinecke
Copy link
Copy Markdown
Contributor Author

Regarding multiple files @logand22 is right, however templates are not supported in the version of Drone we're using. See https://github.com/harness/drone/blob/master/HISTORY.md#202. It's probably sufficient for this discussion to simply say that it is not feasible to split .drone.yml up within the scope of this PR @r0mant .

That being said, I think the increasing .drone.yml size is an issue I think this would be a great discussion to have outside the scope of this PR.

@russjones russjones linked an issue Sep 30, 2022 that may be closed by this pull request
@fheinecke
Copy link
Copy Markdown
Contributor Author

@logand22 to address your comments/questions above:

This wasn't necessarily what I had in mind for reducing the size of the PR [...]

I considered this when opening the PR however there are several reasons why I split the original PR this way:

  • Splitting as suggested would require significantly more work as the original PR is essentially a complete rewrite of how we build and publish container images. If I split each feature out separately (for example, consolidating Teleport dockerfiles), then I would also need to write a non-trivial amount of logic to make the feature "fit" the existing codebase. That logic would then be completely removed in future PRs. IMO this would add a lot of work for not much gain.
  • Splitting the PR by product multiplicatively reduces the amount of .drone.yml changes. These changes are the vast majority of the PR by line count.
  • If there is an issue not caught until a "production" run (such as a promotion event) it will only affect one product, not all. This ideally will greatly reduce the blast radius of arising issues.

Do you need to reference an updated teleport.e ref for teleport-ent images to work?

No. All logic for building teleport-ent images in the context of a drone pipeline is in dronegen. I will be opening up a separate PR that updates the teleport.e makefile to remove some old targets, and update existing image/related targets to incorporate the Dockerfile changes in this PR. In other words, the teleport.e PR will depend on this one but not the other way around.

Can you include examples of promote event running as well as push events?

The linked test run (https://drone.platform.teleport.sh/gravitational/teleport/15845) was triggered on a push event to this branch. Is that what you're looking for?

Regarding testing a promote event I believe I can test this by removing the "Check if tag is prerelease" step, then promoting a dev tag. Is that sufficient? 99.99% of the logic from the test run should be identical here however testing on a promote trigger may be useful to catch drone vars that are not set the same when a different trigger is ran.

Why is the promote event building images? Shouldn't it just be pulling and retagging existing images?

This was primarily done in an attempt to unify how images are built. Here's what currently happens when a release is about to be published (as I understand it):

  1. A tag event triggers the build-docker-images pipeline, which builds container images and pushes them to staging
  2. A promote event triggers promote-docker-ecr and promote-docker-quay which pull from ECR staging and push to public ECR and Quay.
  3. A cron event, occuring within 24 hours of a promotion, triggers teleport-docker-cron-ecr and teleport-docker-cron. These pipelines rebuild the staging ECR, public ECR, and Quay images from scratch and push to all three registries. This push replaces the tags created in (2).

As a result, the staging images built in (1) and pulled/tagged/pushed in (2) are only available for 24 hours at most. In practice I think this is closer to less than 12 hours. By having the promote triggered pipeline also build images we unify how images are built between pipelines with different triggers.

That being said, I did completely forget to implement build-docker-images for pushing to staging on tag events for testing purposes. I'll get that added.

Your updates include changes to the cronjob that automatically keep the latest 10.x.x 10.x and 10 version up-to-date daily vs. just 10 (Same for previous versions). Is this intended?

Yes this was 100% intended. Doing so provides several advantages:

  • Our customers can "sticky" a major, minor, or canonical version across patches if they so choose
  • We have a consistent image tagging scheme across products (right now teleport, teleport-lab, and teleport-operator tags don't follow the same scheme)
  • There is a bug with our current pipelines where only certain tags receive base images security updates which this fixes

Does this address your comments/concerned? I'm not dead set on how the PR currently changes things - just trying to explain why I made the above decisions.

@fheinecke
Copy link
Copy Markdown
Contributor Author

Test runs prior to merging PR:

Everything is green and working. I need one more commit to disable my testing pipeline then I'll be merging.

@fheinecke fheinecke enabled auto-merge (squash) October 19, 2022 01:48
@fheinecke fheinecke merged commit 633b958 into master Oct 19, 2022
@github-actions
Copy link
Copy Markdown
Contributor

@fheinecke See the table below for backport results.

Branch Result
branch/v10 Failed
branch/v11 Failed
branch/v9 Failed

fheinecke added a commit that referenced this pull request Nov 8, 2022
* Added multiarch build support for teleport-operator (#16688)

* Added multiarch build support for Teleport (#17597)

* Added product min version
fheinecke added a commit that referenced this pull request Nov 14, 2022
* Added multiarch build support for teleport-operator (#16688)

* Added multiarch build support for Teleport (#17597)

* Fix cron jobs updating teleport operator on v9

* Linter fix

* Disabled promote prerelease check

* Linter fix

* Resigned drone.yml after merge

* Removed accidentally added pipline due to branch merge
fheinecke added a commit that referenced this pull request Nov 14, 2022
* Added multiarch build support for teleport-operator (#16688)

* Added multiarch build support for Teleport (#17597)

* Added product min version

* Linter fix

* Disabled promote prerelease check

* Linter fix

* Resigned .drone.yml after merge
@zmb3 zmb3 deleted the fred/multiarch-teleport-container-images branch November 15, 2022 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request: ARM container builds

3 participants