Introduce Automatic platform ARGs #1935

tenzen-y · 2022-08-20T10:16:37Z

What this PR does / why we need it:
To resolve the complex build conditions for each CPU architecture, I refactored all Dockerfiles using Automatic platform ARGs.

Also, I switched builder to docker buildx since Automatic platform ARGs are only available when using the BuildKit as described in the following documentation.

This feature is only available when using the BuildKit backend.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part-of: #1900

Checklist:

Docs included if any changes are user facing

coveralls · 2022-08-20T10:20:38Z

Coverage decreased (-0.5%) to 73.456% when pulling bd4587c on tenzen-y:intrduce-automatic-platform-args into 1927e73 on kubeflow:master.

Automatic platform ARGs: https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope

gaocegege

LGTM

google-oss-prow · 2022-08-21T00:23:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaocegege, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [gaocegege]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

anencore94

/lgtm

Looks nice :) BTW, it's hard to test build all images for each cpu architecture now. Do you think we need to test docker bulid on gihtub action runner with arm-architecture ?

tenzen-y · 2022-08-21T03:13:56Z

/lgtm

Looks nice :) BTW, it's hard to test build all images for each cpu architecture now. Do you think we need to test docker bulid on gihtub action runner with arm-architecture ?

Thanks for your comments. @anencore94
The goal of #1900 is to be able to build a multi-architecture supported image (support arm64 and amd64 architecture with a single image tag) on gh-actions and a local machine.

anencore94 · 2022-08-21T05:35:08Z

Thanks for your comments. @anencore94
The goal of #1900 is to be able to build a multi-architecture supported image (support arm64 and amd64 architecture with a single image tag) on gh-actions and a local machine.

I agree with your comment :), Though what I meant was that I think we may need to check arm-image is built "well".
For example, by contructing k8s cluster in arm-action-runner and setup katib with all arm-katib-image with same tag(supported feature in this PR), and see whether all e2e test passed. @tenzen-y

What do you think of it ?

tenzen-y · 2022-08-21T07:04:23Z

Thanks for your comments. @anencore94
The goal of #1900 is to be able to build a multi-architecture supported image (support arm64 and amd64 architecture with a single image tag) on gh-actions and a local machine.

I agree with your comment :), Though what I meant was that I think we may need to check arm-image is built "well". For example, by contructing k8s cluster in arm-action-runner and setup katib with all arm-katib-image with same tag(supported feature in this PR), and see whether all e2e test passed. @tenzen-y

What do you think of it ?

Thanks for pointing out that. @anencore94
It would be great to verify katib components on arm64 env by our e2e test.
Although, it seems that minikube qemu2 driver does not support emulating arm64 when the host machine's CPU architecture is amd64.
So, we need to consider other ways such as the below:

Run our e2e tests on Amazon EKS with AWS Graviton2 EC2 instance.
Run our e2e tests on Github Actions with a self-hosted runner using AWS Graviton2 EC2 instance.
And more...

In any case, we need to touch with the AWS team.
Do you have any other ideas?

anencore94 · 2022-08-21T11:08:13Z

Thanks for your comments. @anencore94
The goal of #1900 is to be able to build a multi-architecture supported image (support arm64 and amd64 architecture with a single image tag) on gh-actions and a local machine.

I agree with your comment :), Though what I meant was that I think we may need to check arm-image is built "well". For example, by contructing k8s cluster in arm-action-runner and setup katib with all arm-katib-image with same tag(supported feature in this PR), and see whether all e2e test passed. @tenzen-y
What do you think of it ?

Thanks for pointing out that. @anencore94 It would be great to verify katib components on arm64 env by our e2e test. Although, it seems that minikube qemu2 driver does not support emulating arm64 when the host machine's CPU architecture is amd64. So, we need to consider other ways such as the below:

Run our e2e tests on Amazon EKS with AWS Graviton2 EC2 instance.

Run our e2e tests on Github Actions with a self-hosted runner using AWS Graviton2 EC2 instance.

And more...

In any case, we need to touch with the AWS team. Do you have any other ideas?

Thanks for your detailed comments 👍 @tenzen-y
For minikube qemu2 driver does not support emulating arm64 when the host machine's CPU architecture is amd64, I agree, so that's why I think we need to setup arm64-github-action runner. (it is second way of your suggestion)

tenzen-y · 2022-08-21T11:54:08Z

so that's why I think we need to setup arm64-github-action runner. (it is second way of your suggestion)

I think so too.

What do you think? @kubeflow/wg-automl-leads

johnugeorge · 2022-08-22T06:12:35Z

My only concern is that our CI tests will take much longer to test both platforms.

tenzen-y · 2022-08-22T08:52:12Z

My only concern is that our CI tests will take much longer to test both platforms.

It makes sense. @johnugeorge Deploying multiple VMs might be helpful for us.

Also, we need to build a CI system that deploys an ephemeral clean VM for each job, as described in the following document.

GitHub-hosted runners execute code within ephemeral and clean isolated virtual machines, meaning there is no way to persistently compromise this environment, or otherwise gain access to more information than was placed in this environment during the bootstrap process.

Self-hosted runners for GitHub do not have guarantees around running in ephemeral clean virtual machines, and can be persistently compromised by untrusted code in a workflow.

As a result, self-hosted runners should almost never be used for public repositories on GitHub, because any user can open pull requests against the repository and compromise the environment. Similarly, be cautious when using self-hosted runners on private or internal repositories, as anyone who can fork the repository and open a pull request (generally those with read access to the repository) are able to compromise the self-hosted runner environment, including gaining access to secrets and the GITHUB_TOKEN which, depending on its settings, can grant write access to the repository. Although workflows can control access to environment secrets by using environments and required reviews, these workflows are not run in an isolated environment and are still susceptible to the same risks when run on a self-hosted runner.

https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#hardening-for-self-hosted-runners

johnugeorge · 2022-08-22T11:25:43Z

@tenzen-y Since we are planning to use AWS credits, we can create a new issue to track various possible solutions with their required CI changes for each. This will help to estimate efforts for each solution

tenzen-y · 2022-08-22T12:46:56Z

@tenzen-y Since we are planning to use AWS credits, we can create a new issue to track various possible solutions with their required CI changes for each. This will help to estimate efforts for each solution

Sounds good. @johnugeorge
I will create an issue to keep tracking this.

anencore94 · 2022-08-24T09:23:36Z

@tenzen-y Since we are planning to use AWS credits, we can create a new issue to track various possible solutions with their required CI changes for each. This will help to estimate efforts for each solution

Sounds good. @johnugeorge I will create an issue to keep tracking this.

Thanks @johnugeorge , @tenzen-y . That would be nice

google-oss-prow bot requested review from anencore94 and sperlingxx August 20, 2022 10:16

google-oss-prow bot added the size/L label Aug 20, 2022

tenzen-y force-pushed the intrduce-automatic-platform-args branch 3 times, most recently from 924beef to c3d0edf Compare August 20, 2022 10:35

tenzen-y changed the title ~~Introduce Automatic platform ARGs~~ [WIP] Introduce Automatic platform ARGs Aug 20, 2022

google-oss-prow bot added the do-not-merge/work-in-progress label Aug 20, 2022

tenzen-y force-pushed the intrduce-automatic-platform-args branch from c3d0edf to c9c5ada Compare August 20, 2022 11:30

introduce Automatic platform ARGs

5a541f6

Automatic platform ARGs: https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope

tenzen-y force-pushed the intrduce-automatic-platform-args branch from c9c5ada to 5a541f6 Compare August 20, 2022 11:47

tenzen-y changed the title ~~[WIP] Introduce Automatic platform ARGs~~ Introduce Automatic platform ARGs Aug 20, 2022

google-oss-prow bot removed the do-not-merge/work-in-progress label Aug 20, 2022

tenzen-y added 2 commits August 20, 2022 22:45

use BuildKit in actions test for charmed-katib

2b6f9c1

use docker buildx in scripts/v1beta1/build.sh

bd4587c

tenzen-y force-pushed the intrduce-automatic-platform-args branch from 471a8a5 to bd4587c Compare August 20, 2022 15:44

gaocegege approved these changes Aug 21, 2022

View reviewed changes

google-oss-prow bot added the approved label Aug 21, 2022

anencore94 reviewed Aug 21, 2022

View reviewed changes

google-oss-prow bot assigned anencore94 Aug 21, 2022

google-oss-prow bot added the lgtm label Aug 21, 2022

google-oss-prow bot merged commit fe4d6e7 into kubeflow:master Aug 21, 2022

tenzen-y deleted the intrduce-automatic-platform-args branch August 21, 2022 03:14

tenzen-y mentioned this pull request Aug 28, 2022

Build CI system for Arm64 images #1941

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Automatic platform ARGs #1935

Introduce Automatic platform ARGs #1935

tenzen-y commented Aug 20, 2022 •

edited

Loading

coveralls commented Aug 20, 2022 •

edited

Loading

gaocegege left a comment

google-oss-prow bot commented Aug 21, 2022

anencore94 left a comment

tenzen-y commented Aug 21, 2022 •

edited

Loading

anencore94 commented Aug 21, 2022

tenzen-y commented Aug 21, 2022 •

edited

Loading

anencore94 commented Aug 21, 2022

tenzen-y commented Aug 21, 2022

johnugeorge commented Aug 22, 2022

tenzen-y commented Aug 22, 2022

johnugeorge commented Aug 22, 2022 •

edited

Loading

tenzen-y commented Aug 22, 2022

anencore94 commented Aug 24, 2022

Introduce Automatic platform ARGs #1935

Introduce Automatic platform ARGs #1935

Conversation

tenzen-y commented Aug 20, 2022 • edited Loading

coveralls commented Aug 20, 2022 • edited Loading

gaocegege left a comment

Choose a reason for hiding this comment

google-oss-prow bot commented Aug 21, 2022

anencore94 left a comment

Choose a reason for hiding this comment

tenzen-y commented Aug 21, 2022 • edited Loading

anencore94 commented Aug 21, 2022

tenzen-y commented Aug 21, 2022 • edited Loading

anencore94 commented Aug 21, 2022

tenzen-y commented Aug 21, 2022

johnugeorge commented Aug 22, 2022

tenzen-y commented Aug 22, 2022

johnugeorge commented Aug 22, 2022 • edited Loading

tenzen-y commented Aug 22, 2022

anencore94 commented Aug 24, 2022

tenzen-y commented Aug 20, 2022 •

edited

Loading

coveralls commented Aug 20, 2022 •

edited

Loading

tenzen-y commented Aug 21, 2022 •

edited

Loading

tenzen-y commented Aug 21, 2022 •

edited

Loading

johnugeorge commented Aug 22, 2022 •

edited

Loading