-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Cross-compilation MVP #36066
base: master
Are you sure you want to change the base?
Conversation
This commit introduces a new configuration file specific to AMD64 architecture for the experimental ct-ng build. This config file includes specifications for various components like kernel, C library, binary utilities, etc. It's an automatically generated file and should not be edited manually. This will be used during the crosstool-ng build process.
This commit updates the build script and configuration files to improve cross-compilation process. The modifications ensure proper setup of sysroot, device architecture support and build toolchains. Additionally, the arrangement for building third-party libraries has been updated for efficiency.
Modifications have been made to the build-all.sh script for better support of the i686 architecture and to handle unknown architectures. The sysroot, CC, CXX, LD, and PKG_CONFIG_PATH variables have been updated for more efficient cross-compilation and build process. Moreover, the build directory naming is now architecture-specific and error handling has been added for unrecognized architectures.
Updated the cross-compilation configuration for the "arm" case in the build-all.sh script. Adjusted sysroot path, added path for armv7-centos7-linux-gnueabi binaries, specified new compilers, linker, and pkg-config path for this architecture. Also, made corresponding changes in Dockerfile-build and Makefile to accommodate the updated 'arm' configuration.
Added the 'arm' case to the build-all.sh script for cross compilation. Set appropriate paths for sysroot, armv7-centos7-linux-gnueabi binaries and defined specific compilers
Added a new Dockerfile to setup crosstool-ng for cross compilation and updated the build-all.sh script to use this Docker container for building. Dockerfile creates a non-root user and sets up the necessary environments. The build script has also been updated to utilize these new configurations.
Moved the build-all.sh script to the build.assets directory and modified its Docker build command. Also altered the binary utilities settings and static toolchain option in the amd64.config file under the ct-ng-configs directory. Changes enable better setup for cross-compilation using crosstool-ng in a Docker container.
This commit renames and revises the build-all script to build-arch, and further adapts it for multi-architecture Docker builds involving amd64, i686, arm64 and arm. Correspondingly, Dockerfile-ct-ng has been adjusted to handle these configurations. These changes aim to facilitate better cross-compilation using crosstool-ng within Docker containers.
In this commit, the build script is refactored and renamed from build-all to build-arch. It has been configured to handle multi-architecture Docker builds, including amd64, i686, arm64, and arm architectures. Furthermore, the Dockerfile-ct-ng is adjusted to handle these settings. The purpose of these updates is to improve cross-compilation using
Several modifications are made to the build-arch.sh script, Dockerfile-build and other related files. The adjustments include the use of 'clang-12' instead of 'clang', the removal of 'libtool', and improvements related to executing Docker builds for multiple architectures. These changes are targeted towards optimizing the cross-compilation process using tool-ng across Docker environments.
Changes have been made to build-arch.sh, Dockerfile-build and related files to enhance the cross-compilation process. This includes substituting 'clang' usage with 'clang-12', removing 'libtool' and improving Docker build execution for various architectures. Overall, these adjustments are aimed at optimizing the cross-compilation process using tool-ng in Docker environments.
This commit simplifies the Makefile by removing unnecessary conditions that were based on the architecture type. It mostly affects the CGOFLAG and BUILDFLAGS for 'arm' cases, streamlining the build process specifically for Linux OS and 'arm' architecture, and improving the cross-compilation process overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting together this MVP. It is something we've been dancing around for past couple years -- and it has broad applicability to a range of areas:
- Customer requests: Support for Teleport on Alpine Linux (libmusl) #35398
- Infra (allowing us to deprecate our GHA arm runners)
- Security (e.g. Bump actions/download-artifact from 3 to 4 #35845)
I'd be excited to see it land. That said, this (and the followup testing/changes to make it official) feel big enough to be a quarterly goal. Do we want to formally allocate some time (and maybe an RFD) or are we cool with this slipping in the side door? I don't want to impose undue process, but I'm also leery of something partially finished going in, and you becoming a de-facto tools team member for the next 3-6 months while we deal with followup. This is ultimately a question for eng leadership.
From code/security side:
There are a handful of dependency installs I'd recommend we vendor the install script and checksum the incoming artifacts -- instead of relying on curl | bash
or git clone <implicitly master> && make
as a fundamental part of our build process.
This doesn't need to be done at PoC time, but I'd like to see it in place before we distribute binaries generated via this technique to customers.
I'm thinking about attacks like https://blog.gitguardian.com/codecov-supply-chain-breach/ here.
The security team has some upcoming work that is related -- but we probably won't get to it until Q1 or Q2 of next year -- depending on how other priorities fall out.
https://github.com/gravitational/SecOps/issues/455
https://github.com/gravitational/SecOps/issues/458
|
||
# Install Go | ||
ARG GOLANG_VERSION | ||
RUN mkdir -p /opt && cd /opt && curl -fsSL https://storage.googleapis.com/golang/${GOLANG_VERSION}.linux-amd64.tar.gz | tar xz && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version is pinned, but we could use a checksum here to validate the tarball is what we expect. This isn't super important as we have a higher level of trust for Google and golang distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is PoC, and this code has a lot of bad practices. Versions are not pinned, the repositories are GH mirrors instead of the official. Code is built from master
instead of a pinned version. Build options are not set etc.
I wanted to show an alternative approach to what we are using today. I fully agree that the production version should have all those concerns addressed.
# Install Rust | ||
ARG RUST_VERSION | ||
ENV PATH=/home/ubuntu/.cargo/bin:$PATH | ||
RUN curl --proto '=https' --tlsv1.2 -fsSL https://sh.rustup.rs | sh -s -- -y --profile minimal --default-toolchain $RUST_VERSION && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend we vendor the shell script and checksum the resulting binary that it fetches. This is probably the worst offender IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from our "production" code 🙈
https://github.com/gravitational/teleport/blob/master/build.assets/Dockerfile-centos7#L252
USER ctng | ||
WORKDIR /home/ctng | ||
|
||
RUN wget https://github.com/crosstool-ng/crosstool-ng/releases/download/crosstool-ng-1.26.0/crosstool-ng-1.26.0.tar.bz2 && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend we verify the checksum of this tarball. This isn't as important because we have a pinned release (can be overwritten afaik) and can probably trust GitHub/Microsoft to not get hacked. We're more protecting against the crosstools-ng org/user being compromised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# | ||
#FROM ubuntu:22.04 | ||
# | ||
## Create a non-root user with id 1000 - ubuntu | ||
#RUN useradd -m -u 1000 ubuntu | ||
#USER ubuntu | ||
#WORKDIR /home/ubuntu | ||
# | ||
#COPY --from=ct-ng /home/ctng/x-tools /home/ubuntu/x-tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a nice 2-phase build. I'm curious as to why you've commented it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to remove. The idea is simple:
- We build a simple container with
ct-ng
- We use that container to build the cross-compiler
- We build container Implement a functional prototype #2 with minimal dependencies and use it to build Teleport.
For some reason, when I tried to build the cross-compiler as a part of docker build,
I was getting a weird build failure that I was not able to reproduce anywhere else. This part included the cross-compiler in another stage, but because it was not working, I commented it out and switched to building cross-compiler using docker run
instead of docker build
.
# Build and install | ||
|
||
#zlib | ||
git clone https://github.com/madler/zlib.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be pinned to a commit SHA before building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd .. | ||
|
||
#libzstd | ||
git clone https://github.com/facebook/zstd.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be pinned to a commit SHA before building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd .. | ||
|
||
#libelf | ||
git clone https://github.com/arachsys/libelf.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be pinned to a commit SHA before building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd .. | ||
|
||
#libbpf | ||
git clone https://github.com/libbpf/libbpf.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be pinned to a commit SHA before building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd ../.. | ||
|
||
#libtirpc | ||
wget https://zenlayer.dl.sourceforge.net/project/libtirpc/libtirpc/1.3.4/libtirpc-1.3.4.tar.bz2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd checksum this tarball. I don't trust sorceforge as much as GitHub or Google.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd .. | ||
|
||
#libpam | ||
git clone https://github.com/linux-pam/linux-pam.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be pinned to a commit SHA before building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a question for @r0mant and folks from the release team. I don't mind preparing an RFD, but we need to have a conversation first if this is the direction that we want to take.
Agree. I skipped those just to see if this approach works. I did the same thing with a few different tools in the past and none of them fully worked (zig is one example here). That's something that we would need to address before rolling this out to our users. |
@fheinecke's been advocating for this, and I've got experience with artifactory from a previous job. I think it is likely we'll assess these tools soon. |
This commit enhances the build-arch.sh script by adding a command to remove old builds before moving the new build; it also updates the architecture in rdpclient/client.go for Linux 'arm'. Additionally, it modifies the data type of the 'handle' from 'ulong' to 'size_t' in various function calls, ensuring compatibility with 32-bit architectures.
ea5c6f7
to
691e0a6
Compare
@@ -0,0 +1,45 @@ | |||
FROM ubuntu:22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some docs/comments on what this dockerfile and the other do? We have so many under build.assets/**
that I don't even know what they all do anymore.
@@ -0,0 +1,57 @@ | |||
# Base on https://github.com/crosstool-ng/crosstool-ng/blob/8825cfc2abf696395bd27bd0e7cea3653004280b/testing/docker/ubuntu22.04/Dockerfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to check with the open source group, but if this dockerfile is based on that link then we may need to include a GPL license header with copyright listed as crosstool-ng
build.assets/build-arch.sh
Outdated
"arm64") | ||
export ARCH="arm64" | ||
export GO_ARCH="arm64" | ||
export SYSROOT="${HOME}/x-tools/aarch64-centos7-linux-gnu/aarch64-centos7-linux-gnu/sysroot" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is $HOME
the appropriate directory for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could move it to /opt
but for hack MVP $HOME is good enough 😅
This is (arguably) a part of one of our quarterly goals but I'm not sure that we'll get to it this quarter due to GHA migration stuff. Either way it is hopefully soonTM.
Here are some of things that we need to address (issue specific but not PR specific) before merging any solution:
|
@fheinecke Answering your questions:
Agree. This is a proposal to start writing one, not a final product.
Any links to the discussion? I was also thinking about Clang, but I'm not sure if Clang still works with Glibc 2.17.
I'm open to writing an RFD, but I wanted to know everyone's options first. I don't think there is a point in writing an RFD when a proposed solution may not work or there is some reason why the solution makes no sense.
There are no other OS distros that use Glibc 2.17 other than CenOS 7. If we want to maintain this GLibc version I don't really see other options. I also think that there is a lot of value in having a toolchain that is independent from the OS. We could run any OS we want and keep it up to date without updating other toolchain dependencies if we don't want to.
It's the best tool that I'm aware of. I've been using it for many years now and it works for me. If you're aware of any alternatives, let me know. I think it's easier to pick a working tool and fix some issue when we find them rather than keep debating between different tools if they have similar functionality.
I think we either need aritfcatory or a docker image with all toolchains (similar to what we do with centos7-assets today). Building all toolchains on every push would be a huge waste of time and resources.
Go compiler doesn't use LTO and I don't feel like LTO would make a huge difference. We could investigate it, but I think this is something additional and not the main goal here. Building everything from source would help with it tho.
I think it should. One day, we will need to update the toolchain and update the version of glibc. When this happens, only the latest version of Teleport should require the new Glibc. I don't see any problems with having the same toolchain configuration on multiple branches. It gives us more control over what is built and how. |
@wadells @fheinecke I was thinking about it, and I think there is a simpler way to achieve the same thing as we would use Artifactory only as a blob store. The downside of having the Artifactory is the need to self-host it and maintain it. As we would probably host it on AWS, we would also need to pay for the AWS -> GH data transfer and AFAIR for our CI workers. This is not cheap. What if we create a separate GH repository to manage our 3rd party dependencies? We could create WDYT? |
@jakule sorry for the delayed response. To address your above comments/questions: This PR/toolchain discussion
Unfortunately no - I believe this has all been verbal during team meetings over the past few months.
I don't know for certain that it does, however the LLVM project predates glibc 2.17 by around a decade so I would be surprised if some Clang version did not support it.
This is a little complicated because this issue is tightly coupled with several others:
With this in mind, I see several possible options:
To be clear, the ranking here are intended to indicate which solutions are more difficult than others, and is NOT intended to convey the total amount of engineering effort required for any given solution. @camscale @tcsc @wadells is this chart missing any options you've thought of, and does it seem reasonably correct?
The toolchain isn't really independent from the OS - it's a part of the OS. Architecturally, an OS is primarily comprised of the following:
Container images OSs exclude the kernel as containers use the host kernel. With this in mind, if we start building our own toolchain and glibc version then we've hit the point where we're pretty much maintaining an entire container image OS. Our package manager is RPM (from the base image) as well as the dockerfile targets + makefile targets + shell scripts that we use for specific packages. The packages that the buildbox is expected to have are the "default set of software packages" I mentioned above.
See the above table. There are others out there, but the above lists some that I'm aware of.
It may be easier initially, but there is a non-trivial tradeoff with maintenance burden. This is analogous to to using the wrong screwdriver to put a screw in - it may be faster/easier than getting the right screwdriver, but when you strip the head it will take a lot more effort to replace/do maintenance.
I agree if we go with this solution, or a similar one. Unfortunately artifactory got explicitly axed from our quarterly goals, so if we want that solution then it's going to have to wait. But I digress.
There is also LTO for compiling Clang itself. This can improve compile time significantly. This is also just an example - there are a ton of CMake vars that can be used to compile Clang/LLVM, and some of them could potentially improve Clang's and/or Teleport's performance.
Has this happened historically? I don't think we've bumped glibc minimum requirement during my tenure. I think that this is the best approach for our customers, but it does come at a cost to us. Artifactory discussionAs an aside, this should probably be in a separate issue and/or RFD
I may be wrong, but I'm fairly certain that we can upload OS packages and arbitrary tarballs to it as well.
This could potentially be free. We self host some of our GHA runners in AWS EKS clusters, so we could potentially just setup a tunnel between the VPCs (AWS has resources for this) and not pay any egress costs. Either way though, setting up and maintaining artifactory is probably significantly less expensive than taking into account the opportunity cost of rolling and maintaining our own solution.
As outlined above I think this would be extremely expensive when taking into account the opportunity cost of development and maintenance. I did some work a few quarters ago to significantly lower the engineering costs of self hosting commercial solutions, so I think generally it would be better to purchase a self-hosted commercial solution than develop a new one. If we want to solve the dependency caching/pinning issue and this at the same time, then I would recommend switching most of our third party build processes to Nix, and hosting a nNx package repo/cache in S3/Cloudfront. That said, I really don't want to introduce a new build tool of any kind without fully committing to it by moving our existing processes to it. Doing so would be expensive due to the increased complexity, which shows up as slower development speed and additional maintenance requirements. Our current makefile + dockerfile + shell script solution is messy, but it at least uses tools that all engineering folks are expected to know. In other words, if we want to go with any of the "build it for us tool" solutions then I recommend we pick one that can do everything for us (toolchain + other packages). I also think that if we go this route then it is our best interest to use the new tool to build all the other packages we're compiling as well. Of all the solutions listed above, I personally think that we should make Teleport compatible with musl libc and statically link with it, or remove the libc requirement (if this is even possible). I think fixing this would speed up development, prevent some common customer-facing issues, add support for more OSs, and remove manual testing steps from the major release test plan. Until we fix this I think that we're going to continue dumping engineering effort into working around the problem, and I think that we will eventually lose revenue because of it based upon the number of customers we raising issues related to this. |
We won't be able to use centos 7 after it's EOL. It's not about security. RedHat will probably shut the RPM repository, so we won't be able to install any packages using
We will have to do it at some point, but we still need some solution to replace CentoOS as there is really no replacement for CentOS 7 (CentOS 8 is already EOL, CentOS 9 doesn't exist, and CentOS Stream uses rolling updates AFAIR, probably the only alternative for us is Debian).
We were thinking about doing this at some point, but I personally found it a bit unstable. Some packages were also lagging a few weeks after the official releases, which creates an issue when we need to upgrade some packages to get security updates. Probably someone more experienced with it may have a solution to those issues. Buildroot uses crosstool-ng under the hood to build the compiler. I'm not sure about the yocto project, but those tools let you create a whole Linux distribution. We only need a cross-compiler.
I'm not sure about the current situation, but it used to be quite simple to create musl build in the past. I created one a while ago. The problem is that our implementation of SSH access depends on some glibc features. If we swap the glibc with musl we may have some issues with SSH access. AFAIR the performance of musl was also an issue when we tested it. That being said I think we could create an "experimental" musl build and see what issues exist in the wild. At the end of the day, not everyone is using SSH access.
I don't think I understand. You can use cross-compiler to only build code without the need to build the whole OS. Read below.
We dropped support for CentOS 6 with Teleport 8 AFAIR. I think that the whole discussion is missing the main point. Currently, we build everything on CentOS 7 image. That creates multiple issues: Node.js is not compatible with the images anymore, Clang builds don't work on it, GHA plugins don't work on it etc. Regarding the Artifactory, I don't think it's a bad product. I just think that not having another thing to maintain is a good thing, but I can be missing something here. |
Added build configuration for aarch64-unknown-linux-gnu in `.cargo/config.toml` and updated `build-arch.sh` script to ensure a complete make process. Updated `ct-ng-configs/arm64.config` to newer versions for kernel, binutils, glibc, and gcc, enabling new features and properly setting flags for better compatibility and security.
e02114c
to
6250d29
Compare
User gold for linking ARM64 binaries
This PR shows how we could use https://github.com/crosstool-ng/crosstool-ng to create a cross-compiler base of Glibc 2.17 and compile Teleport to all supported architectures.
To build cross-compilers and build Teleport:
After the build, you will find
build-${ARCH}
directories with Teleport builds.Contributes to #40037