Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Cross-compilation MVP #36066

Draft
wants to merge 20 commits into
base: master
Choose a base branch
from
Draft

[Draft] Cross-compilation MVP #36066

wants to merge 20 commits into from

Conversation

jakule
Copy link
Contributor

@jakule jakule commented Dec 27, 2023

This PR shows how we could use https://github.com/crosstool-ng/crosstool-ng to create a cross-compiler base of Glibc 2.17 and compile Teleport to all supported architectures.

To build cross-compilers and build Teleport:

cd build.assets
./build-all.sh # It will build cross-compilers, can take a while
make release-amd64
make release-386
make release-arm64
make release-arm

After the build, you will find build-${ARCH} directories with Teleport builds.

Contributes to #40037

This commit introduces a new configuration file specific to AMD64 architecture for the experimental ct-ng build. This config file includes specifications for various components like kernel, C library, binary utilities, etc. It's an automatically generated file and should not be edited manually. This will be used during the crosstool-ng build process.
This commit updates the build script and configuration files to improve cross-compilation process. The modifications ensure proper setup of sysroot, device architecture support and build toolchains. Additionally, the arrangement for building third-party libraries has been updated for efficiency.
Modifications have been made to the build-all.sh script for better support of the i686 architecture and to handle unknown architectures. The sysroot, CC, CXX, LD, and PKG_CONFIG_PATH variables have been updated for more efficient cross-compilation and build process. Moreover, the build directory naming is now architecture-specific and error handling has been added for unrecognized architectures.
Updated the cross-compilation configuration for the "arm" case in the build-all.sh script. Adjusted sysroot path, added path for armv7-centos7-linux-gnueabi binaries, specified new compilers, linker, and pkg-config path for this architecture. Also, made corresponding changes in Dockerfile-build and Makefile to accommodate the updated 'arm' configuration.
Added the 'arm' case to the build-all.sh script for cross compilation. Set appropriate paths for sysroot, armv7-centos7-linux-gnueabi binaries and defined specific compilers
Added a new Dockerfile to setup crosstool-ng for cross compilation and updated the build-all.sh script to use this Docker container for building. Dockerfile creates a non-root user and sets up the necessary environments. The build script has also been updated to utilize these new configurations.
Moved the build-all.sh script to the build.assets directory and modified its Docker build command. Also altered the binary utilities settings and static toolchain option in the amd64.config file under the ct-ng-configs directory. Changes enable better setup for cross-compilation using crosstool-ng in a Docker container.
This commit renames and revises the build-all script to build-arch, and further adapts it for multi-architecture Docker builds involving amd64, i686, arm64 and arm. Correspondingly, Dockerfile-ct-ng has been adjusted to handle these configurations. These changes aim to facilitate better cross-compilation using crosstool-ng within Docker containers.
In this commit, the build script is refactored and renamed from build-all to build-arch. It has been configured to handle multi-architecture Docker builds, including amd64, i686, arm64, and arm architectures. Furthermore, the Dockerfile-ct-ng is adjusted to handle these settings. The purpose of these updates is to improve cross-compilation using
Several modifications are made to the build-arch.sh script, Dockerfile-build and other related files. The adjustments include the use of 'clang-12' instead of 'clang', the removal of 'libtool', and improvements related to executing Docker builds for multiple architectures. These changes are targeted towards optimizing the cross-compilation process using tool-ng across Docker environments.
Changes have been made to build-arch.sh, Dockerfile-build and related files to enhance the cross-compilation process. This includes substituting 'clang' usage with 'clang-12', removing 'libtool' and improving Docker build execution for various architectures. Overall, these adjustments are aimed at optimizing the cross-compilation process using tool-ng in Docker environments.
This commit simplifies the Makefile by removing unnecessary conditions that were based on the architecture type. It mostly affects the CGOFLAG and BUILDFLAGS for 'arm' cases, streamlining the build process specifically for Linux OS and 'arm' architecture, and improving the cross-compilation process overall.
Copy link
Contributor

@wadells wadells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting together this MVP. It is something we've been dancing around for past couple years -- and it has broad applicability to a range of areas:

I'd be excited to see it land. That said, this (and the followup testing/changes to make it official) feel big enough to be a quarterly goal. Do we want to formally allocate some time (and maybe an RFD) or are we cool with this slipping in the side door? I don't want to impose undue process, but I'm also leery of something partially finished going in, and you becoming a de-facto tools team member for the next 3-6 months while we deal with followup. This is ultimately a question for eng leadership.


From code/security side:

There are a handful of dependency installs I'd recommend we vendor the install script and checksum the incoming artifacts -- instead of relying on curl | bash or git clone <implicitly master> && make as a fundamental part of our build process.

This doesn't need to be done at PoC time, but I'd like to see it in place before we distribute binaries generated via this technique to customers.

I'm thinking about attacks like https://blog.gitguardian.com/codecov-supply-chain-breach/ here.

The security team has some upcoming work that is related -- but we probably won't get to it until Q1 or Q2 of next year -- depending on how other priorities fall out.

https://github.com/gravitational/SecOps/issues/455
https://github.com/gravitational/SecOps/issues/458


# Install Go
ARG GOLANG_VERSION
RUN mkdir -p /opt && cd /opt && curl -fsSL https://storage.googleapis.com/golang/${GOLANG_VERSION}.linux-amd64.tar.gz | tar xz && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version is pinned, but we could use a checksum here to validate the tarball is what we expect. This isn't super important as we have a higher level of trust for Google and golang distribution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is PoC, and this code has a lot of bad practices. Versions are not pinned, the repositories are GH mirrors instead of the official. Code is built from master instead of a pinned version. Build options are not set etc.
I wanted to show an alternative approach to what we are using today. I fully agree that the production version should have all those concerns addressed.

# Install Rust
ARG RUST_VERSION
ENV PATH=/home/ubuntu/.cargo/bin:$PATH
RUN curl --proto '=https' --tlsv1.2 -fsSL https://sh.rustup.rs | sh -s -- -y --profile minimal --default-toolchain $RUST_VERSION && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend we vendor the shell script and checksum the resulting binary that it fetches. This is probably the worst offender IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USER ctng
WORKDIR /home/ctng

RUN wget https://github.com/crosstool-ng/crosstool-ng/releases/download/crosstool-ng-1.26.0/crosstool-ng-1.26.0.tar.bz2 && \
Copy link
Contributor

@wadells wadells Dec 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend we verify the checksum of this tarball. This isn't as important because we have a pinned release (can be overwritten afaik) and can probably trust GitHub/Microsoft to not get hacked. We're more protecting against the crosstools-ng org/user being compromised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +47 to +55
#
#FROM ubuntu:22.04
#
## Create a non-root user with id 1000 - ubuntu
#RUN useradd -m -u 1000 ubuntu
#USER ubuntu
#WORKDIR /home/ubuntu
#
#COPY --from=ct-ng /home/ctng/x-tools /home/ubuntu/x-tools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice 2-phase build. I'm curious as to why you've commented it out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to remove. The idea is simple:

  1. We build a simple container with ct-ng
  2. We use that container to build the cross-compiler
  3. We build container Implement a functional prototype #2 with minimal dependencies and use it to build Teleport.

For some reason, when I tried to build the cross-compiler as a part of docker build, I was getting a weird build failure that I was not able to reproduce anywhere else. This part included the cross-compiler in another stage, but because it was not working, I commented it out and switched to building cross-compiler using docker run instead of docker build.

# Build and install

#zlib
git clone https://github.com/madler/zlib.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be pinned to a commit SHA before building.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd ..

#libzstd
git clone https://github.com/facebook/zstd.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be pinned to a commit SHA before building.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd ..

#libelf
git clone https://github.com/arachsys/libelf.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be pinned to a commit SHA before building.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd ..

#libbpf
git clone https://github.com/libbpf/libbpf.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be pinned to a commit SHA before building.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd ../..

#libtirpc
wget https://zenlayer.dl.sourceforge.net/project/libtirpc/libtirpc/1.3.4/libtirpc-1.3.4.tar.bz2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd checksum this tarball. I don't trust sorceforge as much as GitHub or Google.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd ..

#libpam
git clone https://github.com/linux-pam/linux-pam.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be pinned to a commit SHA before building.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakule
Copy link
Contributor Author

jakule commented Dec 27, 2023

Do we want to formally allocate some time (and maybe an RFD) or are we cool with this slipping in the side door? I don't want to impose undue process, but I'm also leery of something partially finished going in, and you becoming a de-facto tools team member for the next 3-6 months while we deal with followup. This is ultimately a question for eng leadership.

I think this is a question for @r0mant and folks from the release team. I don't mind preparing an RFD, but we need to have a conversation first if this is the direction that we want to take.

There are a handful of dependency installs I'd recommend we vendor the install script and checksum the incoming artifacts -- instead of relying on curl | bash or git clone && make as a fundamental part of our build process.

Agree. I skipped those just to see if this approach works. I did the same thing with a few different tools in the past and none of them fully worked (zig is one example here). That's something that we would need to address before rolling this out to our users.
If we want to have more control over 3rd party dependencies, we could also look into using https://jfrog.com/artifactory/ or https://www.sonatype.com/products/sonatype-nexus-repository. But that's even more work/maintenance for us.

@jakule jakule requested a review from wadells December 27, 2023 22:04
@wadells
Copy link
Contributor

wadells commented Dec 27, 2023

If we want to have more control over 3rd party dependencies, we could also look into using https://jfrog.com/artifactory/ or https://www.sonatype.com/products/sonatype-nexus-repository. But that's even more work/maintenance for us.

@fheinecke's been advocating for this, and I've got experience with artifactory from a previous job. I think it is likely we'll assess these tools soon.

This commit enhances the build-arch.sh script by adding a command to remove old builds before moving the new build; it also updates the architecture in rdpclient/client.go for Linux 'arm'. Additionally, it modifies the data type of the 'handle' from 'ulong' to 'size_t' in various function calls, ensuring compatibility with 32-bit architectures.
@@ -0,0 +1,45 @@
FROM ubuntu:22.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some docs/comments on what this dockerfile and the other do? We have so many under build.assets/** that I don't even know what they all do anymore.

@@ -0,0 +1,57 @@
# Base on https://github.com/crosstool-ng/crosstool-ng/blob/8825cfc2abf696395bd27bd0e7cea3653004280b/testing/docker/ubuntu22.04/Dockerfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to check with the open source group, but if this dockerfile is based on that link then we may need to include a GPL license header with copyright listed as crosstool-ng

"arm64")
export ARCH="arm64"
export GO_ARCH="arm64"
export SYSROOT="${HOME}/x-tools/aarch64-centos7-linux-gnu/aarch64-centos7-linux-gnu/sysroot"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is $HOME the appropriate directory for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move it to /opt but for hack MVP $HOME is good enough 😅

@fheinecke
Copy link
Contributor

I think it is likely we'll assess these tools soon.

This is (arguably) a part of one of our quarterly goals but I'm not sure that we'll get to it this quarter due to GHA migration stuff. Either way it is hopefully soonTM.

[Conversation regarding I think this is a question for @r0mant and folks from the release team.]
Personally I think this needs to be a RFD. We've had several discussions internally about how to approach this problem. A Clang based solution is actually one I've advocated for, but as shown in this PR there are a lot of specifics that need to be hashed out. There are also a lot of alternative options, and potential problems with any approach we take.

Here are some of things that we need to address (issue specific but not PR specific) before merging any solution:

  • Do we want to maintain our own toolchain build, or use a distro-provided build that is configured properly? We've historically maintained our own, and by "maintain" I mean we've set it up and updated the version once ~2 years later.
  • If we want to build our own toolchain, is crosstool-ng the appropriate tool for the job? Toolchains have been a large issue in the single board computer world for a long time and there are several other options that have stemmed from this.
  • Do we want to implement this prior to other build system related work? Personally I think there is value in doing "caching" work (read: artifactory or a related solution, if appropriate) prior to this. I also think that we need to rework our buildbox build process so that we don't end up spending another hour building a more-appropriately configured toolchain on every PR merge. Last I checked, we are not setting some flags (such as LTO optimization support) when compiling clang/llvm that would improve performance of our binaries, and the compiler itself, at the cost of compiler build time.
  • Should this be versioned with Teleport? We support the same OS/glibc versions across all supported major versions, so I don't think it should be versioned with Teleport. However pulling this out is a non-trivial effort.

@jakule
Copy link
Contributor Author

jakule commented Jan 3, 2024

@fheinecke Answering your questions:

Personally I think this needs to be a RFD.

Agree. This is a proposal to start writing one, not a final product.

A Clang based solution is actually one I've advocated for,

Any links to the discussion? I was also thinking about Clang, but I'm not sure if Clang still works with Glibc 2.17.

there are a lot of specifics that need to be hashed out. There are also a lot of alternative options, and potential problems with any approach we take.

I'm open to writing an RFD, but I wanted to know everyone's options first. I don't think there is a point in writing an RFD when a proposed solution may not work or there is some reason why the solution makes no sense.

  • Do we want to maintain our own toolchain build, or use a distro-provided build that is configured properly? We've historically maintained our own, and by "maintain" I mean we've set it up and updated the version once ~2 years later.

There are no other OS distros that use Glibc 2.17 other than CenOS 7. If we want to maintain this GLibc version I don't really see other options. I also think that there is a lot of value in having a toolchain that is independent from the OS. We could run any OS we want and keep it up to date without updating other toolchain dependencies if we don't want to.

If we want to build our own toolchain, is crosstool-ng the appropriate tool for the job? Toolchains have been a large issue in the single board computer world for a long time and there are several other options that have stemmed from this.

It's the best tool that I'm aware of. I've been using it for many years now and it works for me. If you're aware of any alternatives, let me know. I think it's easier to pick a working tool and fix some issue when we find them rather than keep debating between different tools if they have similar functionality.

Do we want to implement this prior to other build system related work? Personally I think there is value in doing "caching" work (read: artifactory or a related solution, if appropriate) prior to this. I also think that we need to rework our buildbox build process so that we don't end up spending another hour building a more-appropriately configured toolchain on every PR merge

I think we either need aritfcatory or a docker image with all toolchains (similar to what we do with centos7-assets today). Building all toolchains on every push would be a huge waste of time and resources.

Last I checked, we are not setting some flags (such as LTO optimization support) when compiling clang/llvm that would improve performance of our binaries, and the compiler itself, at the cost of compiler build time.

Go compiler doesn't use LTO and I don't feel like LTO would make a huge difference. We could investigate it, but I think this is something additional and not the main goal here. Building everything from source would help with it tho.

Should this be versioned with Teleport? We support the same OS/glibc versions across all supported major versions, so I don't think it should be versioned with Teleport. However pulling this out is a non-trivial effort.

I think it should. One day, we will need to update the toolchain and update the version of glibc. When this happens, only the latest version of Teleport should require the new Glibc. I don't see any problems with having the same toolchain configuration on multiple branches. It gives us more control over what is built and how.

@jakule
Copy link
Contributor Author

jakule commented Jan 25, 2024

If we want to have more control over 3rd party dependencies, we could also look into using https://jfrog.com/artifactory/ or https://www.sonatype.com/products/sonatype-nexus-repository. But that's even more work/maintenance for us.

@fheinecke's been advocating for this, and I've got experience with artifactory from a previous job. I think it is likely we'll assess these tools soon.

@wadells @fheinecke I was thinking about it, and I think there is a simpler way to achieve the same thing as we would use Artifactory only as a blob store. The downside of having the Artifactory is the need to self-host it and maintain it. As we would probably host it on AWS, we would also need to pay for the AWS -> GH data transfer and AFAIR for our CI workers. This is not cheap.

What if we create a separate GH repository to manage our 3rd party dependencies? We could create gravitational/3rdparty repository with only build scripts for OpenSSL, libbpf etc. Each build could be posted as GitHub release (they allow files up to 2GB). We could generate a release with files with names like openssh-3.0.1-amd64.tar.gz. Then, in our docker, we would fetch all required dependencies in the matching version and install it. This removes the need to maintain the Artifactory instance and the fees for transferring the data from AWS. We could also make our build reproducible as we could point to the exact library version and not depend on "the latest" in the same way as we could do in Artifactory.

WDYT?

@fheinecke
Copy link
Contributor

@jakule sorry for the delayed response. To address your above comments/questions:

This PR/toolchain discussion

Any links to the discussion?

Unfortunately no - I believe this has all been verbal during team meetings over the past few months.

I was also thinking about Clang, but I'm not sure if Clang still works with Glibc 2.17.

I don't know for certain that it does, however the LLVM project predates glibc 2.17 by around a decade so I would be surprised if some Clang version did not support it.

I'm open to writing an RFD, but I wanted to know everyone's options first

This is a little complicated because this issue is tightly coupled with several others:

  • Internal tools team is at or above capacity for work, so if there are additional maintenance costs then the company will pay for it by via reduced output from our team. I know that this is pretty vague, but it essentially equates to our team not being able to contribute as much to high level company goals. The inverse is true as well.
  • We have issues with dependency management and reliance on dependencies from third party sources. The whole artifactory discussion stems from this, so I don't think I need to elaborate further here.
  • Being dependent on a specific glibc version is the (partial) root cause of not being able to statically compile (or partially statically compile) Teleport, and of not being compatible with Alpine Linux/similar without gcompat.
  • We already build a bunch of third party dependencies from source (such as libbpf)

With this in mind, I see several possible options:

Solution description Initial technical difficulty ranking (1 = easiest) Long term technical difficulty ranking (1 = easiest) Additional software packages that need to be maintained Additional notes
Keep building on centos 7 1 1 None I'm probably missing something here, but I don't think this is any less secure than any of the other solutions if we don't keep the other solutions up to date. Historically we only bump versions of forked and/or built third party tools when there is a major issue/feature that requires it.
Bump glibc version and tell customers that we no longer support glibc 2.17, and bump centos version 2 1 We're probably going to have to do this at some point... but it will not make customers happy.
Use package manager based Clang for cross compilation 3 2 None
Use crosstool-ng to build multiple cross compilation gcc toolchains and glibc. 4 7 crosstool-ng, 4x gcc toolchains, 4x glibc versions
Write our own makefiles/dockerfiles/shell scripts to build GCC toolchains 5 6 Our own tooling, x gcc toolchains, 4x glibc versions I marked this as easier for maintenance than crosstool-ng as it doesn't introduce new configs/tools.
Use a "build from source" package manager such as Nix to build all our packages 6 4 Nix, 4x gcc toolchains, 4x glibc versions Nix would make our dependency builds reproducible, and is probably easier to maintain than a collection of makefile targets + dockerfiles + shell scripts.
Use a distro-building tool such as buildroot or yocto 7 5 buildroot/yocto, 4x gcc toolchains, 4x glibc versions This would unify our build process for third party dependencies. I think everybody on internal tools has some prior experience with at least one of these tools.
Make Teleport compatible with musl libc and statically link with libc, or remove libc requirement entirely 8 3 None Yes, I know this has been investigated several times in the past. My understanding is that it is technically possible but would require a lot of engineering effort. The advantage is that we solve at least a half dozen high impact issues, several of which customers regularly complain about.

To be clear, the ranking here are intended to indicate which solutions are more difficult than others, and is NOT intended to convey the total amount of engineering effort required for any given solution.

@camscale @tcsc @wadells is this chart missing any options you've thought of, and does it seem reasonably correct?

If we want to maintain this GLibc version I don't really see other options. I also think that there is a lot of value in having a toolchain that is independent from the OS.

The toolchain isn't really independent from the OS - it's a part of the OS. Architecturally, an OS is primarily comprised of the following:

  • A kernel
  • A build toolchain
  • A libc version (or lack thereof for statically linked OSs)
  • A package manager, whose purpose is to provide built software
  • A default set of software packages

Container images OSs exclude the kernel as containers use the host kernel. With this in mind, if we start building our own toolchain and glibc version then we've hit the point where we're pretty much maintaining an entire container image OS. Our package manager is RPM (from the base image) as well as the dockerfile targets + makefile targets + shell scripts that we use for specific packages. The packages that the buildbox is expected to have are the "default set of software packages" I mentioned above.

It's the best tool that I'm aware of. I've been using it for many years now and it works for me. If you're aware of any alternatives, let me know.

See the above table. There are others out there, but the above lists some that I'm aware of.

I think it's easier to pick a working tool and fix some issue when we find them rather than keep debating between different tools if they have similar functionality.

It may be easier initially, but there is a non-trivial tradeoff with maintenance burden. This is analogous to to using the wrong screwdriver to put a screw in - it may be faster/easier than getting the right screwdriver, but when you strip the head it will take a lot more effort to replace/do maintenance.

I think we either need aritfcatory or a docker image with all toolchains (similar to what we do with centos7-assets today). Building all toolchains on every push would be a huge waste of time and resources.

I agree if we go with this solution, or a similar one. Unfortunately artifactory got explicitly axed from our quarterly goals, so if we want that solution then it's going to have to wait. But I digress.

Go compiler doesn't use LTO and I don't feel like LTO would make a huge difference. We could investigate it, but I think this is something additional and not the main goal here. Building everything from source would help with it tho.

There is also LTO for compiling Clang itself. This can improve compile time significantly. This is also just an example - there are a ton of CMake vars that can be used to compile Clang/LLVM, and some of them could potentially improve Clang's and/or Teleport's performance.

When this happens, only the latest version of Teleport should require the new Glibc.

Has this happened historically? I don't think we've bumped glibc minimum requirement during my tenure. I think that this is the best approach for our customers, but it does come at a cost to us.

Artifactory discussion

As an aside, this should probably be in a separate issue and/or RFD

we would use Artifactory only as a blob store

I may be wrong, but I'm fairly certain that we can upload OS packages and arbitrary tarballs to it as well.

The downside of having the Artifactory is the need to self-host it and maintain it. As we would probably host it on AWS, we would also need to pay for the AWS -> GH data transfer and AFAIR for our CI workers. This is not cheap.

This could potentially be free. We self host some of our GHA runners in AWS EKS clusters, so we could potentially just setup a tunnel between the VPCs (AWS has resources for this) and not pay any egress costs.

Either way though, setting up and maintaining artifactory is probably significantly less expensive than taking into account the opportunity cost of rolling and maintaining our own solution.

What if we create a separate GH repository to manage our 3rd party dependencies? We could create gravitational/3rdparty repository with only build scripts for OpenSSL, libbpf etc. Each build could be posted as GitHub release (they allow files up to 2GB). We could generate a release with files with names like openssh-3.0.1-amd64.tar.gz. Then, in our docker, we would fetch all required dependencies in the matching version and install it. This removes the need to maintain the Artifactory instance and the fees for transferring the data from AWS. We could also make our build reproducible as we could point to the exact library version and not depend on "the latest" in the same way as we could do in Artifactory.

As outlined above I think this would be extremely expensive when taking into account the opportunity cost of development and maintenance. I did some work a few quarters ago to significantly lower the engineering costs of self hosting commercial solutions, so I think generally it would be better to purchase a self-hosted commercial solution than develop a new one.


If we want to solve the dependency caching/pinning issue and this at the same time, then I would recommend switching most of our third party build processes to Nix, and hosting a nNx package repo/cache in S3/Cloudfront. That said, I really don't want to introduce a new build tool of any kind without fully committing to it by moving our existing processes to it. Doing so would be expensive due to the increased complexity, which shows up as slower development speed and additional maintenance requirements. Our current makefile + dockerfile + shell script solution is messy, but it at least uses tools that all engineering folks are expected to know.

In other words, if we want to go with any of the "build it for us tool" solutions then I recommend we pick one that can do everything for us (toolchain + other packages). I also think that if we go this route then it is our best interest to use the new tool to build all the other packages we're compiling as well.

Of all the solutions listed above, I personally think that we should make Teleport compatible with musl libc and statically link with it, or remove the libc requirement (if this is even possible). I think fixing this would speed up development, prevent some common customer-facing issues, add support for more OSs, and remove manual testing steps from the major release test plan. Until we fix this I think that we're going to continue dumping engineering effort into working around the problem, and I think that we will eventually lose revenue because of it based upon the number of customers we raising issues related to this.

@jakule
Copy link
Contributor Author

jakule commented Jan 26, 2024

@fheinecke

Keep building on centos 7 | 1 | 1 | None | I'm probably missing something here, but I don't think this is any less secure than any of the other solutions if we don't keep the other solutions up to date. Historically we only bump versions of forked and/or built third party tools when there is a major issue/feature that requires it.
-- | -- | -- | -- | --

We won't be able to use centos 7 after it's EOL. It's not about security. RedHat will probably shut the RPM repository, so we won't be able to install any packages using yum

Bump glibc version and tell customers that we no longer support glibc 2.17, and bump centos version | 2 | 1 | We're probably going to have to do this at some point... but it will not make customers happy.
-- | -- | -- | --

We will have to do it at some point, but we still need some solution to replace CentoOS as there is really no replacement for CentOS 7 (CentOS 8 is already EOL, CentOS 9 doesn't exist, and CentOS Stream uses rolling updates AFAIR, probably the only alternative for us is Debian).

Use a "build from source" package manager such as Nix to build all our packages

We were thinking about doing this at some point, but I personally found it a bit unstable. Some packages were also lagging a few weeks after the official releases, which creates an issue when we need to upgrade some packages to get security updates. Probably someone more experienced with it may have a solution to those issues.

Use a distro-building tool such as buildroot or yocto

Buildroot uses crosstool-ng under the hood to build the compiler. I'm not sure about the yocto project, but those tools let you create a whole Linux distribution. We only need a cross-compiler.

Make Teleport compatible with musl libc and statically link with libc, or remove libc requirement entirely | 8 | 3 | None | Yes, I know this has been investigated several times in the past. My understanding is that it is technically possible but would require a lot of engineering effort. The advantage is that we solve at least a half dozen high impact issues, several of which customers regularly complain about.
-- | -- | -- | -- | --

I'm not sure about the current situation, but it used to be quite simple to create musl build in the past. I created one a while ago. The problem is that our implementation of SSH access depends on some glibc features. If we swap the glibc with musl we may have some issues with SSH access. AFAIR the performance of musl was also an issue when we tested it. That being said I think we could create an "experimental" musl build and see what issues exist in the wild. At the end of the day, not everyone is using SSH access.

With this in mind, if we start building our own toolchain and glibc version then we've hit the point where we're pretty much maintaining an entire container image OS.

I don't think I understand. You can use cross-compiler to only build code without the need to build the whole OS. Read below.

Has this happened historically? I don't think we've bumped glibc minimum requirement during my tenure. I think that this is the best approach for our customers, but it does come at a cost to us.

We dropped support for CentOS 6 with Teleport 8 AFAIR.

I think that the whole discussion is missing the main point. Currently, we build everything on CentOS 7 image. That creates multiple issues: Node.js is not compatible with the images anymore, Clang builds don't work on it, GHA plugins don't work on it etc.
If we switch to the cross-compiler approach proposed here, we can use Ubuntu/Debian or any other distro as the base image without the need to compile everything from the source as we're doing this today.
To build Teleport we need to have OpenSSL (and dependend libudev-zero, libcbor), libfido2, libpcsclite, libpam and libbpf.
We're already building those as a part of our release build. libbpf will be out in Q2.
If we switch to the cross-compiler approach, then we can build those dependencies and store the pre-build binaries (artifactory, GHA, S3, etc). Because we can use cross-compilers on any OS, we don't need a separate image to build web assets. We also don't need to build clang anymore, as Clang can be installed on ubuntu/debian from Ubuntu repo. We don't need to build git, cmake, ninja and any other tool that we're building right now. As a result, we will need to maintain 6 C libraries and 4 toolchains. As I also mentioned, we could drop our ARM workers and simplify the process.

Regarding the Artifactory, I don't think it's a bad product. I just think that not having another thing to maintain is a good thing, but I can be missing something here.

wadells

This comment was marked as outdated.

Added build configuration for aarch64-unknown-linux-gnu in `.cargo/config.toml` and updated `build-arch.sh` script to ensure a complete make process. Updated `ct-ng-configs/arm64.config` to newer versions for kernel, binutils, glibc, and gcc, enabling new features and properly setting flags for better compatibility and security.
@jakule jakule force-pushed the jakule/cross-compile-ct-ng branch from e02114c to 6250d29 Compare July 16, 2024 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants