Skip to content

Build multi-arch images if PLATFORMS env var is set#143

Merged
hashhar merged 11 commits intotrinodb:masterfrom
nineinchnick:multi-arch
Oct 27, 2022
Merged

Build multi-arch images if PLATFORMS env var is set#143
hashhar merged 11 commits intotrinodb:masterfrom
nineinchnick:multi-arch

Conversation

@nineinchnick
Copy link
Copy Markdown
Member

This is an alternative to #138

This PR wrap calls to docker build, docker tag, and docker push with scripts that recognize the PLATFORMS env var. If it's set, a separate image for every architecture is used. After pushing all images, a manifest is created.

@cla-bot cla-bot bot added the cla-signed label Aug 17, 2022
@nineinchnick nineinchnick force-pushed the multi-arch branch 3 times, most recently from 6a17490 to b4fff74 Compare August 18, 2022 12:03
@nineinchnick nineinchnick force-pushed the multi-arch branch 3 times, most recently from b7675e0 to 6ba28ce Compare August 18, 2022 12:40
@nineinchnick
Copy link
Copy Markdown
Member Author

The only images that we can't build for arm64 are CDH, HDP, and Greenplum. I guess Greenplum is not critical, but we should look for replacements for CDH and HDP.

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Aug 18, 2022

@nineinchnick What do you mean by replacements for CDH and HDP? Also note that it'll always be the case when some software either can't be compiled for ARM or has different behaviour on ARM.

@nineinchnick
Copy link
Copy Markdown
Member Author

@nineinchnick What do you mean by replacements for CDH and HDP? Also note that it'll always be the case when some software either can't be compiled for ARM or has different behaviour on ARM.

I'm not sure TBH. My next step is to go over all product tests in Trino, see which ones work and what's the coverage of those that don't. I'm open to suggestions and only wanted to start a discussion.

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Aug 18, 2022

Sorry, I just meant to say that even when we have ARM images it should be limited to developer workflow only and CI shouldn't use those (unless the software is usually deployed on ARM instances).

@nineinchnick
Copy link
Copy Markdown
Member Author

Agreed, I have no plans on trying to run anything in the CI with emulation.

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Aug 18, 2022

Approach seems sound to me and it's not very invasive.

It might help if you could point out in the commit message + PR description why buildx doesn't work for this use-case for the benefit of other reviewers.

I'll review the wrappers later in the week.

@nineinchnick
Copy link
Copy Markdown
Member Author

@hashhar I updated the commit message. I also added another commit with the necessary changes in the release workflow. I don't like how it needs to keep track of which images are multi-arch and which are not, but the alternative is to keep them in separate directories, and this is worse, because:

  • it'll mess up targets in the Makefile, which expects a simpler convention
  • is too explicit for users, when building these images locally users probably shouldn't care about multi-arch, they should only build for their current arch

I think it's ok as is, as manual releases are discouraged anyway. I don't want to mess with the Makefile any more than necessary.

@nineinchnick
Copy link
Copy Markdown
Member Author

@hashhar @ebyhr @wendigo PTAL

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Aug 30, 2022

We were discussing about how to better support ARM/M1 and some issues were pointed out. Since we won't actually test using the ARM images on CI we can never guarantee that the ARM images work correctly. e.g. it's possible that someone uses ARM image to test and then sees a test failure but the actual cause of failure is in the software being run on ARM instead of Trino itself.

Another issue was that not everything can be built for ARM anyway.

I'm unclear how we want to proceed here - cc: @wendigo @lukasz-walkiewicz

@wendigo
Copy link
Copy Markdown
Contributor

wendigo commented Aug 30, 2022

I'd prefer to work out remote M1 to X86 testing rather than building and maintaining a large set of images for ARM64. I guess that having a basic multinode is fine to have, and we could even execute some tests on it in the CI, but maintaining all the images is too much.

@martint
Copy link
Copy Markdown
Member

martint commented Aug 30, 2022

I'd prefer to work out remote M1 to X86 testing rather than building and maintaining a large set of images for ARM64

ARM64 is going to become, increasingly, the main development platform for most Trino developers, so it needs to be possible to run those tests locally.

@wendigo
Copy link
Copy Markdown
Contributor

wendigo commented Aug 30, 2022

@martint not every system we integrate with will run on arm. How can we workaround that?

@mdesmet
Copy link
Copy Markdown

mdesmet commented Sep 2, 2022

I think we should split this in two:

  1. The low hanging fruit is to support development in core, client and plugins modules. Ideally these should run locally on both ARM and AMD. I think with this PR we can already drastically improve the devex for ARM development. Note that not only these images are causing slowness or instability, but also plenty other docker images (eg Bump version of Hydra to 1.11.9 (multiarch) trino#13634)
  2. Testing on (exotic) systems can be done in CI or can be looked at after fixing the first priority.

@nineinchnick
Copy link
Copy Markdown
Member Author

@martint @hashhar @wendigo @mdesmet I'm sorry but I'm not sure if there's any action for me to do in this PR.

Would it be sufficient to test images with emulation? Currently, only the kerberos image would be tested on all architectures. The centos7-oj17 image doesn't have any tests. I can modify tests to at least see if the image starts.

Until GitHub adds public runners for arm64, emulation is the only option. There's an option of using free arm64 instances from other cloud providers and configuring self-hosted runners in GitHub Actions, but using self-hosted runners for public repositories is highly discouraged by GitHub for security reasons.

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Sep 4, 2022

I'm ok with adding images provided we are all aware that images may or may not work as expected. i.e. tests may fail locally which run fine on CI.

@nineinchnick
Copy link
Copy Markdown
Member Author

nineinchnick commented Sep 4, 2022

tests may fail locally which run fine on CI.

Why would that happen? Or did you mean trino tests? But it's clear which images are provided for which architecture, so I'm not sure why there would be unexpected behavior.

@nineinchnick
Copy link
Copy Markdown
Member Author

@hashhar rebased and ready for another round.

@nineinchnick
Copy link
Copy Markdown
Member Author

@hashhar ping

1 similar comment
@nineinchnick
Copy link
Copy Markdown
Member Author

@hashhar ping

@nineinchnick
Copy link
Copy Markdown
Member Author

@hashhar ping, could you suggest other maintainers that could review it?

make release
exit 0
fi
single_arch=(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now is there some way to make sure that all images get released. Looks like now anything no in either list gets skipped silently.

Is it not reasonable to assume anything not in multi_arch list is single-arch?
Also seems like we now have moved responsibility of images to release and build for in two places (leading to this question).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to figure out how to get a list of all available images and add some validation here to throw an error if there are any not mentioned in any of these two lists. I wouldn't make any assumptions about them being single or multi-arch.

Also see this old comment: #143 (comment)

Copy link
Copy Markdown
Member

@hashhar hashhar Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea I had was to place a marker file in each directory for the multi-arch supported ones and both ci.yml and release.yml can use that to decide what needs to be done. (Very hacky and I feel dirty for even suggesting it but I don't see better ways yet to be honest).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! While testing this, I noticed we were missing the recently added centos7-oj11 and spark3-hudi, so it was a great idea, thanks!

)
images=("${single_arch[@]}" "${multi_arch[@]}")
make require-clean-repo require-on-master require-release-version
make prepare-release
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" Extract prepare-release target from release " - should this come before others (or maybe even together with the change which allows releasing multi-arch images)? Also some context is useful.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'd put them into a single commit, you'd ask me to extract it :-) I moved it so it's before the commit that adds multi-arch releases, but now it adds a target that's unused until the next commit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest something for more context?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adds a target that's unused until the next commit.

Sorry I has misordered commits while reviewing. It's all good here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context I'd probably say:

With multi-arch releases images for all platforms must be built before they can be published - which requires decoupling prepare-release from release goal.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I has misordered commits while reviewing. It's all good here.

But I moved it, so are you sure?

@nineinchnick nineinchnick force-pushed the multi-arch branch 4 times, most recently from 3b89a50 to bb66ea9 Compare October 26, 2022 19:12
Jan Waś and others added 10 commits October 26, 2022 23:49
Calling `docker build` is wrapped in a shell script, that recognizes
when PLATFORMS is set and runs `docker build` for every requested
platform. This is needed because `--load` currently doesn't work with
multiple arguments in `--platform`.
With multi-arch releases images for all platforms must be built before
they can be published - which requires decoupling prepare-release from
release goal.
@hashhar hashhar merged commit 52c43a1 into trinodb:master Oct 27, 2022
@hashhar
Copy link
Copy Markdown
Member

hashhar commented Oct 27, 2022

I'll release docker-images version 70 so that we can start to test the upgrade in Trino.

@nineinchnick nineinchnick deleted the multi-arch branch October 27, 2022 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants