Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MUSL: Target x86_64_musl resolves incorrectly to x86_64-unknown-linux-gnu toolchain #2726

Closed
marvin-hansen opened this issue Jul 5, 2024 · 19 comments · Fixed by #2829
Closed

Comments

@marvin-hansen
Copy link
Contributor

marvin-hansen commented Jul 5, 2024

Full replication with platform debug: https://github.com/marvin-hansen/musl_cross_compiling

Basically, on linux x86_64, instead of the MUSL Rust toolchain, the Rust x86_64-unknown-linux-gnu toolchain gets selected.

As a result, the binary gets dynamically linked and then fails to run in a Scratch Docker image.

This affects me currently because my CI runs on Linux and builds these incorrectlydynamically linked binaries as part of my OCI image pipeline.

The majority of all CI servers out there run on Linux so this issue has real significance.

Also, this affects PR: #2713

The cause of the issue seems to have been identified in issue: #2544

I am open to whatever workaround resolves the toolchain selection issue.

@marvin-hansen
Copy link
Contributor Author

I'm debugging the rules in an attempt to figure out why the MUSL toolchain resolution for Linux x86_64 fails, but cannot trace down where the actual toolchain resolution happens.

On linux-gnu-x86_64, the rules incorrectly select the rust_linux_x86_64__x86_64-unknown-linux-gnu toolchain instead of rust_linux_x86_64__x86_64-unknown-linux-musl__stable toolchain. It's only on x86_64, on ARM this doesn't happen.

What puzzles me, the toolchain_repository_hub in rust_register_toolchains seems to have the correct toolchain in store and yet the wrong one gets selected. I just cannot figure out the actual selection process.

Any ideas or pointers how the toolchain resolution process works?

@marvin-hansen marvin-hansen changed the title MUSL: Target x86_64_musl resolves incorrectly to clang-x86_64-linux MUSL: Target x86_64_musl resolves incorrectly to x86_64-unknown-linux-gnu toolchain Jul 7, 2024
@marvin-hansen
Copy link
Contributor Author

I've tried the changes mentioned in issue: #2544, but this breaks stuff on the CI so ultimately, I've rolled them all back. Could be that the rules have changed in the past three months or that I did something wrong, but I just could not make this work.

I think one of the few (hopefully) non-breaking ways to solve this issue could be this:

The function rust_toolchain_repository in repository.bzl calls into triple_to_constraint_set to resolve exec and target constraints.

The triple_to_constraint_set function in the triple_mappings.bzl file calls into various utils to build a constraint_set for the given toolchain. However, the abi_to_constraints util can be used to add a custom constraint to the MUSL case since it gets the abi, architecture, and system as arguments, hence it's relatively easy to discern the MUSL from the non-MUSL toolchain.

Specifically, something like:

def abi_to_constraints(abi, *, arch = None, system = None):
 
    if abi == "musl" and arch == "x86_64":
        return ["//rust/platform/constraints:musl_on"]
...

The custom constraint could be something like:

package(default_visibility = ["//visibility:public"])

constraint_value(
    name = "musl_on",
    constraint_setting = ":use_musl",
)

constraint_value(
    name = "musl_off",
    constraint_setting = ":use_musl",
)

constraint_setting(
    name = "use_musl",
    default_constraint_value = ":musl_off",
)

This doesn't work yet or at least I have not figure out how to make a custom constraint set accessible when the function is called.

Another way could be to use the triple function in the triple.bzl file to set a boolean flag i.e. musl_static in the struct that it returns and then use this flag downstream. However, this may conflicts with the triple_to_constraint resolution, whereas the other way around seems relatively safe AFAIT.

@illicitonion
Copy link
Collaborator

Thanks for all the digging you've been doing here!

I've been experimenting with the fix in #2731 - it's not quite green on CI yet, but hopefully will be soon. I'm not super available the next few days, but can hopefully finish this off soon (but feel free to pick it up if you're interested!)

The problem diagnosis in #2544 is accurate, but I think there are a few corners of the fix still to work out.

One part of the fix is that we probably want to rename extra_target_triples to just target_triples - if it's set, we don't want to register a default target triple.

@marvin-hansen
Copy link
Contributor Author

Thank you, that's helpful.

Thing is, I'm hacking for the first time on a Bazel rule set and the implementation details of this one are a steep learning curve.

That said, if you could get the fix in #2731 merged into main, I can take it from there and fix up the Bzlmod musl example.

@marvin-hansen
Copy link
Contributor Author

Alright,

I've cloned your PR #2731 and started working on the bazelmod MIUSL example to see if I can make it work with your fix. In a nutshell, your solution makes sense and it looks very coherent and first principled to make exec and target platform explicit and that's fundamentally a good thing.

However, I'm seeing errors for both MUSL targets in my first try config:

No matching toolchains found for types @@rules_rust~//rust:toolchain_type.

That probably indicates that the bazelmod setup isn't correct yet as the WORKSPACE config is working fine, but I want to share this anyways. I keep working on this one in the meantime.

Here is what I did in my MODULE:

  1. Registered rust toolchain as is without any special:
# Use local rules
bazel_dep(
    name = "rules_rust",
    version = "0.0.0",
)
local_path_override(
    module_name = "rules_rust",
    path = "../../..",
)

RUST_EDITION = "2021"
RUST_VERSION = "1.79.0"

rust = use_extension("@rules_rust//rust:extensions.bzl", "rust")
rust.toolchain(edition = RUST_EDITION, versions = [RUST_VERSION])
use_repo(rust, "rust_toolchains")
register_toolchains("@rust_toolchains//:all")
  1. Added all the triplets from your MUSL example via the rust_repository_set extension:
repos = use_extension("@rules_rust//rust:repositories.bzl", "repos")
repos.rust_repository_set(
    name = "darwin_x86_64_to_x86_64_musl_tuple",
    edition = RUST_EDITION,
    exec_triple = "x86_64-apple-darwin",
    # Setting this extra_target_triples allows differentiating the musl case from the non-musl case, in case multiple linux-targeting toolchains are registered.
    extra_target_triples = {
        "x86_64-unknown-linux-musl": [
            "@//build/linker:musl",
            "@platforms//cpu:x86_64",
            "@platforms//os:linux",
        ],
    },
    versions = [RUST_VERSION],
)
repos.rust_repository_set(
    name = "darwin_arm64_to_x86_64_musl_tuple",
    edition = RUST_EDITION,
    exec_triple = "aarch64-apple-darwin",
    extra_target_triples = {
        "x86_64-unknown-linux-musl": [
            "@//build/linker:musl",
            "@platforms//cpu:x86_64",
            "@platforms//os:linux",
        ],
    },
    versions = [RUST_VERSION],
)
....

use_repo(repos, "darwin_x86_64_to_x86_64_musl_tuple")
use_repo(repos, "darwin_x86_64_to_arm64_musl_tuple")
use_repo(repos, "darwin_arm64_to_x86_64_musl_tuple")
use_repo(repos, "darwin_arm64_to_arm64_musl_tuple")
use_repo(repos, "linux_x86_64_to_x86_64_gnu_tuple")
use_repo(repos, "linux_x86_64_to_x86_64_musl_tuple")
use_repo(repos, "linux_x86_64_to_arm64_musl_tuple")
  1. Configured and registered MUSL
# MUSL toolchain
load_musl_toolchains = use_extension("@toolchains_musl//:repositories.bzl", "load_musl_toolchains")
load_musl_toolchains.config(extra_target_compatible_with = ["@//build/linker:musl"])

toolchains_musl = use_extension("@toolchains_musl//:toolchains_musl.bzl", "toolchains_musl", dev_dependency = True)
register_toolchains("@toolchains_musl//:all")

I am not so sure about the last one; It's the bazelmod equivalence to this WORKSPACE snippet:

load("@musl_toolchains//:repositories.bzl", "load_musl_toolchains")

# Setting this extra_target_triples allows differentiating the musl case from the non-musl case, in case multiple linux-targeting toolchains are registered.
load_musl_toolchains(extra_target_compatible_with = ["@//linker_config:musl"])

load("@musl_toolchains//:toolchains.bzl", "register_musl_toolchains")

register_musl_toolchains()

@pnathan
Copy link

pnathan commented Jul 30, 2024

Hi, I just wanted to chime in and say that I ran full tilt into this issue when trying to build a static binary for loading into my scratch OCI images.

The goal I have is to mirror this capability:

# somwwhat cargo-culted from https://www.21analytics.ch/blog/docker-from-scratch-for-rust-applications/

FROM docker.io/library/rust:1.79-alpine as builder
# set to release on release.
ARG LEVEL="cicd"
RUN apk add --no-cache musl-dev  openssl-dev openssl-libs-static pkgconf libpq-dev

# Set `SYSROOT` to a dummy path (default is /usr) because pkg-config-rs *always*
# links those located in that path dynamically but we want static linking, c.f.
# https://github.com/rust-lang/pkg-config-rs/blob/54325785816695df031cef3b26b6a9a203bbc01b/src/lib.rs#L613
ENV SYSROOT=/dummy

# The env var tells pkg-config-rs to statically link libpq.
ENV LIBPQ_STATIC=1

WORKDIR /wd
# copy . /wd/
RUN mkdir -p /wd/project


copy Cargo.toml /wd/Cargo.toml
copy Cargo.lock /wd/Cargo.lock
copy .cargo /wd/.cargo
copy vendor/ /wd/vendor
COPY project/ /wd/project/

# test should yield a binary in target/$LEVEL
RUN cargo test --profile $LEVEL --target-dir target --offline
############################################################################
#  Final stage used.
FROM scratch
ARG LEVEL="cicd"
ARG version=unknown
ARG release=unreleased
LABEL name="project" 

WORKDIR /opt
COPY --from=builder /tmp/project /opt/project
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /wd/target/$LEVEL/project-server /opt/project-server
ENTRYPOINT ["/opt/project-server"]

The end result here is a container with essentially 2 core files: /opt/project-server and /etc/ssl/certs/ca-certificates.crt, along with any other assets that came along the way.

@marvin-hansen
Copy link
Contributor Author

@pnathan While it's possible to do what you are trying to do with rules_rust, you will get a dynamically linked binary on X86 / Linux instead of a statically linked one at least until the drafted solution has been completed and merged.

Currently, you can only move forward with distroless or a custom base image that provides libc.

I'm stuck on this issue for a while now and it's getting increasingly more difficult to work around libc to meet security targets.

@lalten
Copy link

lalten commented Aug 15, 2024

Also running into this.

...
bazel_dep(name = "rules_rust", version = "0.49.3")
rust = use_extension("@rules_rust//rust:extensions.bzl", "rust")
rust.toolchain(
    edition = "2021",
    versions = ["1.80.0"],
    extra_target_triples = [
        "aarch64-unknown-linux-musl",
        "x86_64-unknown-linux-musl",
    ],
)
use_repo(rust, "rust_toolchains")
bazel_dep(name = "toolchains_musl", version = "0.1.17")
...
 in function `std::sys::pal::unix::os::glibc_version':
          /rustc/.../library/std/src/sys/pal/unix/os.rs:777: undefined reference to `gnu_get_libc_version'

I'm confused about the state of muslc support in rules_rust, should this be working?
https://github.com/bazelbuild/rules_rust/tree/bef8d2d/examples/musl_cross_compiling is there
But also there is

# TODO: This ignores musl. Longer term what does Bazel think about musl?
"linux": ["-ldl", "-lpthread"],

@lalten
Copy link

lalten commented Aug 15, 2024

I applied the rust/extensions.bzl and rust/repositories.bzl patches from #2731 and that seems to make the build work... more?
I'm hitting can't find crate for darling_macro / can't find crate for thiserror_impl now. Not sure how that is caused, might be unrelated.

@marvin-hansen
Copy link
Contributor Author

@lalten Were you able to make this work?

@lalten
Copy link

lalten commented Aug 19, 2024

@marvin-hansen
Copy link
Contributor Author

I see,
Let me clone your repo and see if I can fix the missing crate issue.
Looking at your Module file, there might be an issue with dependencies from workspace members.

@marvin-hansen
Copy link
Contributor Author

@lalten

I am working on a Mac, Apple Silicon, therefore I used OrbStack to create an ARM64 Linux VM and exec into that, installed Rust, a C Compiler, and Bazel.

Both Bazel build and Bazel test passes on my machine.

I cannot reproduce the issue you are reporting. In VM, this builds.

Screenshot 2024-08-19 at 5 35 34 PM

@marvin-hansen
Copy link
Contributor Author

Okay, I see now that you placed the repro in a different branch. I build the main branch without the MUSL config.

Now, the cross-compile branch throws 99 errors. For once, there is a CC toolchain resolution error and then then test stumbles across a number of sys dependencies.

May I ask you a few questions about your project setup:

  1. Which Linux distro are you using?
  2. Are you building on a Intel or Arm CPU? (This matters a lot for system dependencies)
  3. Are you using gcc, llvm or any other C compiler by default?

I am getting a lot of errors, but not yet the one you are reporting so please help me a bit to replicate the issue.

@lalten
Copy link

lalten commented Aug 19, 2024

What matters for that project is that it builds on GitHub Actions hosted ubuntu-latest (i.e. x86_64) runners. I need musl to be able to compile "truly static" binaries I'd be fine with any compiler, but being able to just bazel_dep(name = "toolchains_musl", version = "0.1.17") to get a gcc set up to use musl is a huge win.

@marvin-hansen
Copy link
Contributor Author

Here is the tricky thing, the original bug only triggers on Linux / X86, which is exactly your GH runner.
That hit me on my CI as well because most CI runners run on Linux / X86...

I have resumed working on investigating a feasible workaround in my repro using a Linux / X86 VM:

https://github.com/marvin-hansen/musl_cross_compiling

I trying to verify if your patch actually solves that issue or not.

As for the dependencies, I generally recommend using either direct dependencies or vendoring, as it removes
a number of pitfalls that pop up when using the macros that rely on Cargo.toml In your case, I suggest to keep the Cargo.toml as is, but change the Bazel depds to direct dependencies and see if if the issue of missing crates still occurs.

https://bazelbuild.github.io/rules_rust/crate_universe_bzlmod.html

@marvin-hansen
Copy link
Contributor Author

@lalten The static compile seems to be working on Linux. I still get an error from the zerocopy crate, but this will not affact you as you don't use it.

Take a look at the config in the repro that also showcases direct dependencies.
https://github.com/marvin-hansen/musl_cross_compiling

Hope that helps you to fix your CI

github-merge-queue bot pushed a commit that referenced this issue Sep 3, 2024
This allows for selecting non-default toolchains where the exec triple
matches the target triple.

This is tested by enabling the musl static linking tests on the Linux
host platform. Before this PR, the test would fail because the -gnu
rather than -musl rust toolchain would end up getting selected. Now,
everything works.

Fixes #2726
@lalten
Copy link

lalten commented Sep 8, 2024

Thanks for the PR @illicitonion.
It looks like the problem is still not solved in lalten/appimage-runtime-rs#1 with current rules_rust main.
It's still failing with undefined reference to 'gnu_get_libc_version' even though there are only musl triplets.

@marvin-hansen
Copy link
Contributor Author

marvin-hansen commented Sep 8, 2024

@lalten

I managed to get the hello world example from yesterdays main branch to work with Bazelmod, kinda.

Basically, I moved all the triplets into a WORKSPACE.bazelmod and then it worked. See this example.

https://github.com/marvin-hansen/rules_rust_fork/tree/bzlmod-musl/examples/musl_cross_compiling

If it not works with the current main, go back one or two commits.

Because the WORKSPACE format will be gone this Dec with Bazel 8, but not every rule
may have transitioned to Bazelmod, the WORKSPACE.bazelmod serves as a bridge between the two until everthing
has transitions. It is the best possible alternative right now.

There are currently some other issues with MUSL that block the 0.50 release so it may take a bit of time
until stuff get released. In the meantime, git patching and WORKSPACE.bazelmod hopefully do the trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants