Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 docker images #1553

Merged

Conversation

bdovaz
Copy link
Collaborator

@bdovaz bdovaz commented Jun 23, 2022

Edit: 05/01/2022

I have been working a bit on this PR and thinking....

I understand that sooner or later it will get to the point where all linters are amd64/arm64 compatible either because those linters start to be arm64 compatible or because development of them has been abandoned and they end up being removed from Megalinter.

But as of today as we know, this is not the case and we don't know how long it will take.....

Support of runners on arm64

Windows: actions/runner-images#6175

Linux: actions/runner-images#5631

macOS: github/roadmap#528

As you can see, the "closest" is macOS for Q3 of 2023. For the rest there is no ETA.

Change from docker build to docker buildx

A first step in cross-platform image generation.

Already merged in #2199 and working for amd64.

Supported platform descriptors configuration

With the help of @echoix is added on this PR and is that for each descriptor now allows to configure the supported platforms:

supported_platforms:
  platform:
    - linux/amd64
    - linux/arm64

In some linters it is set to more platforms besides amd64 and arm64 because linter itself supports it but the reality is that initially at least we should focus on arm64 only.

This opens the door to:

  1. Being able to document which descriptors work on which platforms.
  2. Create a Dockerfile (or several) based on those values.

Point 2 leads us to...

Decide how to generate the Dockerfile(s).

There are 2 options that at least I know of:

  • Enter conditions in the RUN command.
    • To do this we would have to do something like this: https://stackoverflow.com/a/60518444/699056. It means that whoever makes a change in a descriptor that supports the 2 platforms has to take it into account when making modifications on the dockerfile section.
    • There is the problem that there are dependencies that are pip,apk,.... In those it is not possible to introduce a logic as with the RUN method and it would have to be solved in another way (I guess in the build.py script?).
  • Generate a Dockerfile (amd64) and another one Dockerfile.aarch64 (arm64)
    • The build_dockerfile function of the build.py script should be refactored.
    • It loses sense the PR of docker buildx that is already merge since being different Dockerfile files, the build would be done in separate commands.

Additional blocks

Update to .NET 6 / 7 #1680

Additional blocks (external)

There may be more...

Testing

I have executed it in the following way (from repo root with Raspberry Pi 4 with Raspberry Pi OS 64 bits):

docker buildx build --platform linux/arm64 -f flavors/ci_light/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/cupcake/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/documentation/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/dotnet/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/go/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/java/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/javascript/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/php/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/python/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/ruby/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/rust/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/salesforce/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/security/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/swift/Dockerfile .
docker buildx build --platform linux/arm64 -f flavors/terraform/Dockerfile .

@bdovaz bdovaz changed the title Feature/arm docker image ARM64 docker images Jun 23, 2022
@echoix
Copy link
Collaborator

echoix commented Jun 23, 2022

Thanks for the first draft!
What I understood from working with this project the last time is that the Dockerfiles are generated by script, so if there are changes to make in the Dockerfiles, we must change the .megalinter-descriptor.yml like megalinter/descriptors/kubernetes.megalinter-descriptor.yml. Then run the bash build.sh script and see if the Dockerfiles produced match what is expected.

Since you have found the changes needed, a part of the job is done!

With my secondary testing a couple months ago, that didn't finish nor end up with a pull request (after I passed around my raspberry pi 4 wasn't running in a 64 bit OS), I learned that it was possible with the buildx to cross build on my windows amd64 computer, which was quite faster with more cores and way more memory to build simultaneously the aarch64 and amd64. I didn't manage to compile up to the end though, and hadn't had a chance to look at the next problems (solving a problem one layer at the time).

The problem that I didn't have a good idea to fix, that remains, is how to change the .megalinter-descriptor.yml schema to account for the multitude of different conventions of getting and including tools for many architectures. Some might need to be in another format maybe (installed in amd64 vs Docker in aarch64). Sometimes it was a different url scheme, and not everyone names the arm 64 bit architecture the same way. Maybe one of the new active collaborators have some ideas architecture-wise?

@nvuillam
Copy link
Member

Thanks for this POC !

That looks nice, but as Dockerfiles are not written but generated (from build.py) , such enhancement must be done with an evolution of the architecture.

I think that having a Dockerfile_arm at the root and Dockerfile_arm files in flavors could be a good way to implement that.
And to generate such ARM-specific dockerfiles, I think that adding a new property install_arm in YML descriptor would be the way, and when generating dockerfile:

  • If in the linter_descriptor there is an install_arm property, use it
  • else use install property

@bdovaz
Copy link
Collaborator Author

bdovaz commented Jun 24, 2022

@nvuillam I've been analyzing it and from what I understand we have to:

  • megalinter/descriptors/*: add the "install_arm" field where it differs from "normal" and add whatever changes.
  • megalinter/descriptors/schemas/megalinter-descriptor.jsonschema.json: add the "install_arm" field.
  • .automation/build.py: somehow we have to make that script generate the Dockerfile and Dockerfile_arm files (root and flavors) in the same execution no? And in the "normal" case ignore the "install_arm" field but in the case of ARM we will have to search if the "install_arm" field exists and if not use the default "install" field.

@echoix
Copy link
Collaborator

echoix commented Jun 24, 2022

For

  • megalinter/descriptors/*: add the "install_arm" field where it differs from "normal" and add whatever changes.
    and
  • And in the "normal" case ignore the "install_arm" field but in the case of ARM we will have to search if the "install_arm" field exists and if not use the default "install" field.

I think we need to account that not every tool can or could be supported on first try, so if we could have a way to "check" (like in a checkbox) which tools to include in this arm flavor. This way, a compatibility table could be created showing which tools are ported.

@nvuillam
Copy link
Member

@echoix we could add a property arm-compliant on linter descriptors :)

@echoix
Copy link
Collaborator

echoix commented Jun 24, 2022

@nvuillam Shouldn't we be not short-sighted and inverse the logic a bit, to say we comply with these platforms, instead of considering a dichotomy amd64/aarch64(arm64)? Especially in docker architectures, I'm pretty sure some tools (flavors) like python-only could use armv7l, which is still the default Raspbian OS (most Raspberry Pis can run in 64 bit mode, but when there was no >4Gb option, there wasn't any advantage to use 64bit).

@echoix
Copy link
Collaborator

echoix commented Jun 24, 2022

@bdovaz I could be there for help with this PR if you need :)

- Temporarily disable validators
@bdovaz
Copy link
Collaborator Author

bdovaz commented Jun 24, 2022

@echoix I'll appreciate that.

Specifically:

  • Adjust the descriptor schema to add the 2 fields because I have no experience creating schemas:
    • arm_compliant: I would make it optional and set it to true by default because there is only one descriptor that doesn't work (dot-env).
    • install_arm: Optional also with a fallback to install if it doesn't exist.
  • Make the settings of the "build.py" file because I have tried but I don't understand well how the Dockerfiles generation system works and which part is automatic and which is manual.
    • I have disabled validation temporarily until the json schema changes are done.
    • I have tried to create the Dockerfile_arm file of each flavor "in theory" so that it is generated as the original but I have not managed to generate more than the header.

I also see that the main Dockerfile does not use descriptors no? Because just running the original build.py mashes the changes I made and changes are lost.

@echoix
Copy link
Collaborator

echoix commented Jun 24, 2022

For sure @bdovaz, I'm looking into that this afternoon :)

@Kurt-von-Laven
Copy link
Collaborator

I agree with @echoix that it may be better long-term to have a supported_architectures (or arches) list. I wonder how many of the linters we use have an asdf plugin with multi-architecture support. I know many of the tools we use have an asdf plugin (or, more often, a package manager with an asdf plugin).

@echoix
Copy link
Collaborator

echoix commented Jun 24, 2022

Could I have a little recap on asdf, or how it is used or planned to be used in Megalinter? I receive most of the issues of this repo by email, but I still don't understand it in relation to this project.

I'm glad you think the same way as I thought, as I have an idea of how Megalinter could make use of buildx and multi-arch build efficiently if we know how which platform each tool supports.

@Kurt-von-Laven
Copy link
Collaborator

buildx does seem like a good way to go with the caveat that I always optimistically hope there is some approach that involves less work ha ha ha. Is this the same multiarch you are referring to? #870 contains some of my thoughts on asdf's relevance to MegaLinter. asdf is capable of installing all of these tools. It is basically a version manager manager, but it can manage versions of most anything. asdf has expressed interest in supporting multiple architectures in the core if necessary (c.f., asdf-vm/asdf#62), but I don't know which of the plugins presently do. One of the challenges MegaLinter inherently faces is managing many dependencies from many different package ecosystems plus some with bespoke installation methods. It will simplify maintenance if we can eventually install everything that has an asdf plugin (or, better yet, a package manager that has an asdf plugin) through asdf, but the first step would be to simply install one package manager this way. I suppose one way of marrying the ideas would be to use your buildx approach to improve or create asdf plugins since then MegaLinter could leverage that uniformity to install everything with as much multi-architecture support as the plugin (and underlying tool) allows. This might also be a good way to get more hands on deck since the asdf community is presently much larger than MegaLinter's, and I assume many people who don't use MegaLinter share your woes getting things working on various architectures. If there is some tool other than asdf that is already better suited to your goal, I imagine someone there will educate us, particularly since it brings together a diverse cross-section of people using a huge variety of toolchains.

@echoix
Copy link
Collaborator

echoix commented Jun 25, 2022

Is this the same multiarch you are referring to?

Nope, the multi arch in Docker, where there an image's manifest contains different images for many cpu architectures. https://docs.docker.com/desktop/multi-arch/

@bdovaz
Copy link
Collaborator Author

bdovaz commented Jun 27, 2022

For sure @bdovaz, I'm looking into that this afternoon :)

Have you finally started working on it?

Likewise you would have to clearly define how you want to make the changes.

As far as I have understood it would be to put a "supported_architectures" to be able to distinguish the installation commands and that for the future it is better designed.

My doubt is at what level would it be? Below "install", above "install"? I also have doubts with the name "amd64" and "arm64", opinions?

Example 1:

linters:
  - linter_name: ansible-lint
    install:
      supported_architectures:
        amd64:
          pip:
            - ansible-lint==6.0.2
        arm64:
          pip:
            - ansible-lint==6.0.2

Example 2:

linters:
  - linter_name: ansible-lint
    supported_architectures:
      install:
          amd64:
            pip:
              - ansible-lint==6.0.2
          arm64:
            pip:
              - ansible-lint==6.0.2

@echoix
Copy link
Collaborator

echoix commented Jul 3, 2022

(Sorry, I started this message on Monday but never had the opportunity to finish it)

For the name of the architecture used, it's sure that we will be using the names used by Docker, it greatly simplifies writing multi-architecture Dockerfiles.
Ideally, the same Dockerfile can be used for multiple architectures. That way, buildx build --target can be used directly for all architectures (like many multi-arch image https://docs.docker.com/desktop/multi-arch/)

I drafted out scenarios (without committing anything yet), but couldn't find a way that would be a breaking change. The most common problem is when there is a URL that doesn't use the same architecture names.

One of the scenarios that wouldn't be as breaking, would be that we keep the install property with their pip, docker, apk, etc children fields, but add some architecture-specific overrides.
So in this case, supported_architectures, could have the architectures, and any overrides in the same format as install.
One simple override I think could be used to make architecture-independent Dockerfiles is having a mapping of the architecture name, for mapping the shellcheck architecture URL that uses aarch64. Otherwise, we could always use inline shell case/esac construct, in the original RUN command.

A scenario that would be breaking, is explicitly having all instructions for each platform, but doesn't help reuse layer caching.

In the same line of thought, common multi-arch Dockerfile patterns use multi-stage Dockerfiles to be able to execute some build in the builder with one architecture, and for the final layer(s), have them in the target instruction set, so this speeds up building.

@bdovaz
Copy link
Collaborator Author

bdovaz commented Jan 16, 2023

@bdovaz it seems there is a workaround -> docker/buildx#59 (comment)

@echoix can you help me to fix it?

There are several lines here where we should make that change but I don't have much knowledge of bash and I don't know how to change it so that it doesn't affect the following:

In upload-docker.sh:

https://github.com/bdovaz/megalinter/blob/feature/arm-docker-image/.automation/upload-docker.sh#L319

image

In several workflows like this:

https://github.com/bdovaz/megalinter/blob/feature/arm-docker-image/.github/workflows/auto-update-linters.yml#L61

image

@bdovaz
Copy link
Collaborator Author

bdovaz commented Jan 16, 2023

@echoix @nvuillam and about the merge issue, if one of the 2 of you is able to do it, the better because my knowledge of git operations/commands is not very advanced either.

@echoix
Copy link
Collaborator

echoix commented Jan 16, 2023

There are several lines here where we should make that change but I don't have much knowledge of bash and I don't know how to change it so that it doesn't affect the following:

In upload-docker.sh:

https://github.com/bdovaz/megalinter/blob/feature/arm-docker-image/.automation/upload-docker.sh#L319

image

The 2>&1 is a output redirection that sends the output to somewhere. Program output can be stdout and stderr. It's possible to redirect and output to a file. 2 is a file descriptor for stderr. 1 is a file descriptor for stdout. It's also possible to redirect an output to another stream, instead of a file. And that's what is here. Stderr is redirected to stdout. https://stackoverflow.com/a/818265

I was thinking about this, one day we could let go of bash specific syntax. Is there still a use case where the building of the images of megalinter isn't done on GitHub actions? If we aren't able to understand it well, well, we aren't maintaining it.

Maybe a quick search to see if the github action logs already include both outputs.
However, the important shell syntax is what follows, https://github.com/bdovaz/megalinter/blob/ea67c7777ed2e0e4e7f8642b0c6d29e0c9936017/.automation/upload-docker.sh#L324

The $? is a special variable that is equal to the exit status of the last executed command. I don't really like relying on bashisms. If someone doesn't know about it, it's really hard to follow and debug.
https://stackoverflow.com/a/7248048

In several workflows like this:

https://github.com/bdovaz/megalinter/blob/feature/arm-docker-image/.github/workflows/auto-update-linters.yml#L61

image

Here I'm not sure what construct you want to know, is it the pipe character (|)?

Here, that line calls the command normally, and at the end, the pipe sends the stream to something else as an argument, (the read line thing), which is a loop that takes each line outputted ($line being an argument), and instead runs echo, to print out a message to the console, and that message is the current date and time with that format, a pipe character (it's inside a string, it's not the operator), and adds the contents of the variable $line. In short, it prepends a time stamp to each line output. Since GitHub action logs can show the time stamps (maybe it wasn't possible before), the rest of the line after load . can either be kept, or simply removed without impact.

@bdovaz
Copy link
Collaborator Author

bdovaz commented Jan 16, 2023

@echoix are you clear about the change to make without breaking the commands below in the 2 cases I have put? In case you want to make help me with a PR in my fork or if you can tell me what to change it to exactly.

Thanks!

From this change could already be done the merge issue that you were talking about in which as I said, I prefer that you give me a hand because of my limited knowledge in git.

@echoix
Copy link
Collaborator

echoix commented Jan 16, 2023

For this line: https://github.com/bdovaz/megalinter/blob/ea67c7777ed2e0e4e7f8642b0c6d29e0c9936017/.github/workflows/auto-update-linters.yml#L61
It's non breaking to change to

        run: docker buildx build --platform linux/amd64,linux/arm64 --build-arg "BUILD_DATE=${BUILD_DATE}" --build-arg "BUILD_REVISION=auto_update_${GITHUB_SHA}" --build-arg "BUILD_VERSION=auto_update_${GITHUB_SHA}" --no-cache -t oxsecurity/megalinter:auto_update_${GITHUB_SHA} --load .

That is, removing | while read line ; do echo "$(date +'%H:%M:%S')| $line"; done; , as it is purely aesthetic.

The other one, I didn't really specify a change. It's answering what it is. I don't like it, but it might be deep for a first shot.

@echoix
Copy link
Collaborator

echoix commented Jan 16, 2023

@bdovaz

From this change could already be done the merge issue that you were talking about in which as I said, I prefer that you give me a hand because of my limited knowledge in git.

Oh I think I understand. You mean the suggestion to target a branch inside the oxsecurity/megalinter repo, and continue working on it there?

I'll try to take a look this evening, it's still a workday in my timezone and gotta go back ;)

@nvuillam
Copy link
Member

Something i thought about (maybe for a later PR... or this one ? ) is to get rid of our code building docker, as it seems that there is a github action from docker that does that very well using buildx, and the neighbours of Super-Linter are using it

https://github.com/github/super-linter/blob/bb170af75d406d562ae56771946053adeed41ca6/.github/workflows/cd.yml#L59

https://github.com/docker/build-push-action

@nvuillam
Copy link
Member

Another thing that they do well is that when releasing, they don't build again from scratch like we do, but they take the beta images and retag them

@echoix
Copy link
Collaborator

echoix commented Jan 16, 2023

Same thing for Login Docker and Setup QEMU. We can have multiple login docker for multiple registries, and the post action to logout with chain themselves. I tried it yesterday on another repo:
https://github.com/echoix/editorconfig-checker/blob/66b890c81c54c282068e5b1591049cb68eeb75ad/.github/workflows/release.yml#L29-L45

      - # Add support for more platforms with QEMU (optional)
        # https://github.com/docker/setup-qemu-action
        name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      - name: Docker Setup Buildx
        uses: docker/setup-buildx-action@v2
      - name: Login to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Login to GitHub Container Registry
        uses: docker/login-action@v2
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

image

@Kurt-von-Laven
Copy link
Collaborator

Maybe a quick search to see if the github action logs already include both outputs.

They do, yes, but maybe the intention of redirecting stderr to stdout was intended to guarantee stable output order (c.f., isaacs/github#1981) or something else?

I don't really like relying on bashisms. If someone doesn't know about it, it's really hard to follow and debug.

So true, and Bashisms are also unusually difficult to Google.

Something i thought about (maybe for a later PR... or this one ? ) is to get rid of our code building docker, as it seems that there is a github action from docker that does that very well using buildx, and the neighbours of Super-Linter are using it
https://github.com/docker/build-push-action

Good point! Yes, using build-push-action is the best practice I'm aware of here.

@echoix echoix changed the base branch from main to dev/multi-platform-images January 17, 2023 01:45
@echoix echoix marked this pull request as ready for review January 17, 2023 01:46
@echoix echoix requested a review from nvuillam as a code owner January 17, 2023 01:46
@echoix echoix merged commit e223a1a into oxsecurity:dev/multi-platform-images Jan 17, 2023
@echoix
Copy link
Collaborator

echoix commented Jan 17, 2023

@bdovaz I've created that dev branch at https://github.com/oxsecurity/megalinter/tree/dev/multi-platform-images. I let you create a new Draft pull request from that branch to continue discussions there, and to have CI there, and it won't be a fork.
We will announce in the PR the location of the new discussion once done.

@echoix echoix added enhancement New feature or request docker Pull requests that update Docker code github_actions Pull requests that update Github_actions code labels Jan 17, 2023
@bdovaz bdovaz deleted the feature/arm-docker-image branch January 17, 2023 19:38
@bdovaz bdovaz mentioned this pull request Jan 17, 2023
bdovaz pushed a commit that referenced this pull request Feb 26, 2023
bdovaz pushed a commit that referenced this pull request Mar 10, 2023
bdovaz pushed a commit that referenced this pull request Mar 26, 2023
bdovaz pushed a commit that referenced this pull request Mar 26, 2023
bdovaz pushed a commit that referenced this pull request Apr 13, 2023
bdovaz pushed a commit that referenced this pull request May 14, 2023
bdovaz pushed a commit that referenced this pull request May 14, 2023
bdovaz pushed a commit that referenced this pull request May 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker Pull requests that update Docker code enhancement New feature or request github_actions Pull requests that update Github_actions code nostale This issue or pull request is not stale, keep it open
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants