Wild is a linker with the goal of being very fast for iterative development.
The plan is to eventually make it incremental, however that isn't yet implemented. It is however already pretty fast even without incremental linking.
For production builds, it's recommended to use a more mature linker like GNU ld or LLD.
During development, if you'd like faster warm build times, then you could give Wild a try. It's at the point now where it should be usable for development purposes provided you're developing on x86-64 Linux. If you hit any issues, please file a bug report.
To install a pre-built binary, you can copy and paste the command from the releases
page. Alternatively, you can download the tarball
and manually copy the wild
binary somewhere on your path.
cargo binstall wild-linker
cargo install --locked wild-linker
To build and install the latest, unreleased code:
cargo install --locked --bin wild --git https://github.com/davidlattimore/wild.git wild-linker
If you'd like to use Wild as your default linker for building Rust code, you can put the following
in ~/.cargo/config.toml
.
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "link-arg=--ld-path=wild"]
Mold is already very fast, however it doesn't do incremental linking and the author has stated that they don't intend to. Wild doesn't do incremental linking yet, but that is the end-goal. By writing Wild in Rust, it's hoped that the complexity of incremental linking will be achievable.
The following platforms / architectures are currently supported:
- x86-64 on Linux
- ARM64 on Linux
The following is working with the caveat that there may be bugs:
- Output to statically linked, non-relocatable binaries
- Output to statically linked, position-independent binaries (static-PIE)
- Output to dynamically linked binaries
- Output to shared objects (.so files)
- Rust proc-macros, when linked with Wild work
- Most of the top downloaded crates on crates.io have been tested with Wild and pass their tests
- Debug info
Lots of stuff. Here are some of the larger things that aren't yet done, roughly sorted by current priority:
- Incremental linking
- Support for more architectures
- Support for a wider range of linker flags
- Linker scripts
- Mac support
- Windows support
- LTO
Install readelf
, then run:
readelf -p .comment my-executable
Look for a line like:
Linker: Wild version 0.1.0
Or if you don't want to install readelf, you can probably get away with:
strings my-executable | grep 'Linker:'
It's somewhat of a tradition for linkers to end with the letters "ld". e.g. "GNU ld, "gold", "lld", "mold". Since the end-goal is for the linker to be incremental, an "I" is added. Let's say the "W" stands for "Wild", since recursive acronyms are popular in open-source projects.
The goal of Wild is to eventually be very fast via incremental linking. However, we also want to be as fast as we can be for non-incremental linking and for the initial link when incremental linking is enabled.
See BENCHMARKING.md for more details on running benchmarks.
All benchmarks are run with output to a tmpfs.
This benchmark was run on David Lattimore's laptop (2020 model System76 Lemur pro), which has 4 cores (8 threads) and 42 GB of RAM.
First, without debug info:
❯ OUT=/home/david/ttt throttle-count hyperfine --warmup 1 -N -n lld-18 './run-with ld.lld --strip-debug' -n mold-2.36-no-fork './run-with mold --no-fork --strip-debug' -n wild-0.4.0-no-fork './run-with wild --no-fork --strip-debug' -n mold-2.36 './run-with mold --strip-debug' -n wild-0.4.0 './run-with wild --strip-debug'
temp: +40.0°C
Benchmark 1: lld-18
Time (mean ± σ): 514.1 ms ± 5.1 ms [User: 1000.1 ms, System: 451.9 ms]
Range (min … max): 507.1 ms … 523.4 ms 10 runs
Benchmark 2: mold-2.36-no-fork
Time (mean ± σ): 388.5 ms ± 5.3 ms [User: 1986.9 ms, System: 436.2 ms]
Range (min … max): 379.0 ms … 396.8 ms 10 runs
Benchmark 3: wild-0.4.0-no-fork
Time (mean ± σ): 244.4 ms ± 3.2 ms [User: 1087.5 ms, System: 313.0 ms]
Range (min … max): 240.3 ms … 251.7 ms 12 runs
Benchmark 4: mold-2.36
Time (mean ± σ): 365.7 ms ± 8.5 ms [User: 9.4 ms, System: 3.1 ms]
Range (min … max): 358.1 ms … 384.9 ms 10 runs
Benchmark 5: wild-0.4.0
Time (mean ± σ): 220.5 ms ± 3.4 ms [User: 2.6 ms, System: 2.0 ms]
Range (min … max): 214.1 ms … 226.1 ms 13 runs
Summary
'wild-0.4.0' ran
1.11 ± 0.02 times faster than 'wild-0.4.0-no-fork'
1.66 ± 0.05 times faster than 'mold-2.36'
1.76 ± 0.04 times faster than 'mold-2.36-no-fork'
2.33 ± 0.04 times faster than 'lld-18'
Throttle pkg: 0 core: 0 ms: 0 temp: +59.0°C
Note, the user and system CPU times for mold and wild when run with default flags are meaningless,
since these linkers fork by default and hyperfine doesn't see the CPU usage of the forked
subprocess. For accurate CPU usage, see the no-fork variants. For later benchmarks, we always
include --no-fork
for these linkers. This makes each of these linkers slower by about 10%.
GNU ld is excluded from the benchmarks because its speed is so totally different to the other linkers that it makes it hard to compare. But for reference, here is the time for GNU ld for this benchmark:
Benchmark 1: GNU-ld-2.38
Time (mean ± σ): 8.414 s ± 0.323 s [User: 7.291 s, System: 1.121 s]
Range (min … max): 7.601 s … 8.668 s 10 runs
Now with debug info:
❯ OUT=/home/david/ttt throttle-count hyperfine --warmup 1 -N -n lld-18 './run-with ld.lld' -n mold-2.36 './run-with mold --no-fork' -n wild-0.4.0 './run-with wild --no-fork'
temp: +42.0°C
Benchmark 1: lld-18
Time (mean ± σ): 11.350 s ± 0.209 s [User: 70.592 s, System: 6.677 s]
Range (min … max): 11.085 s … 11.621 s 10 runs
Benchmark 2: mold-2.36
Time (mean ± σ): 11.826 s ± 0.607 s [User: 73.569 s, System: 5.435 s]
Range (min … max): 11.130 s … 12.721 s 10 runs
Benchmark 3: wild-0.4.0
Time (mean ± σ): 8.800 s ± 0.197 s [User: 49.397 s, System: 8.273 s]
Range (min … max): 8.588 s … 9.136 s 10 runs
Summary
'wild-0.4.0' ran
1.29 ± 0.04 times faster than 'lld-18'
1.34 ± 0.08 times faster than 'mold-2.36'
Throttle pkg: 18454 core: 5916 ms: 328 temp: +77.0°C
Note, despite setting my fans to maximum before the start of the benchmark, I did get some thermal throttling in this run. However, the standard deviations look pretty tight, so I don't think it really invalidated the results.
The big takeaway from this benchmark is that debug info can make your link times really slow, so if you don't need it, turn it off. If you do need it, try to use split debug info or unpacked debug info. The situation with debug info is especially bad for C++ codebases like clang, probably due to header files causing lots of the same information to be repeated.
Without debug info:
❯ OUT=/home/david/ttt throttle-count hyperfine --warmup 1 -N -n lld-18 './run-with ld.lld --strip-debug' -n mold-2.36 './run-with mold --no-fork --strip-debug' -n wild-0.4.0 './run-with wild --no-fork --strip-debug'
temp: +47.0°C
Benchmark 1: lld-18
Time (mean ± σ): 1.485 s ± 0.009 s [User: 2.247 s, System: 0.771 s]
Range (min … max): 1.471 s … 1.499 s 10 runs
Benchmark 2: mold-2.36
Time (mean ± σ): 819.9 ms ± 6.6 ms [User: 3815.0 ms, System: 715.4 ms]
Range (min … max): 810.3 ms … 829.3 ms 10 runs
Benchmark 3: wild-0.4.0
Time (mean ± σ): 476.6 ms ± 46.6 ms [User: 2080.9 ms, System: 514.6 ms]
Range (min … max): 436.7 ms … 553.7 ms 10 runs
Summary
'wild-0.4.0' ran
1.72 ± 0.17 times faster than 'mold-2.36'
3.12 ± 0.31 times faster than 'lld-18'
Throttle pkg: 0 core: 0 ms: 0 temp: +62.0°C
With debug info:
❯ OUT=/home/david/ttt throttle-count hyperfine --warmup 1 -N -n lld-18 './run-with ld.lld' -n mold-2.36 './run-with mold --no-fork' -n wild-0.4.0 './run-with wild --no-fork'
temp: +47.0°C
Benchmark 1: lld-18
Time (mean ± σ): 1.663 s ± 0.022 s [User: 3.244 s, System: 1.024 s]
Range (min … max): 1.633 s … 1.710 s 10 runs
Benchmark 2: mold-2.36
Time (mean ± σ): 1.120 s ± 0.016 s [User: 5.126 s, System: 1.005 s]
Range (min … max): 1.101 s … 1.149 s 10 runs
Benchmark 3: wild-0.4.0
Time (mean ± σ): 646.3 ms ± 10.0 ms [User: 3016.3 ms, System: 807.9 ms]
Range (min … max): 626.0 ms … 657.5 ms 10 runs
Summary
'wild-0.4.0' ran
1.73 ± 0.04 times faster than 'mold-2.36'
2.57 ± 0.05 times faster than 'lld-18'
Throttle pkg: 0 core: 0 ms: 0 temp: +62.0°C
OUT=/run/user/1000/ttt hyperfine --warmup 2 -n lld-19 './run-with ld.lld-19 --strip-debug' -n mold-2.36 './run-with mold --no-fork --strip-debug' -n wild-0.4.0 './run-with wild --no-fork --strip-debug'
Benchmark 1: lld-19
Time (mean ± σ): 1.170 s ± 0.005 s [User: 2.046 s, System: 0.233 s]
Range (min … max): 1.165 s … 1.177 s 10 runs
Benchmark 2: mold-2.36
Time (mean ± σ): 919.3 ms ± 2.7 ms [User: 3120.7 ms, System: 304.4 ms]
Range (min … max): 915.5 ms … 923.7 ms 10 runs
Benchmark 3: wild-0.4.0
Time (mean ± σ): 423.1 ms ± 5.5 ms [User: 1352.9 ms, System: 147.1 ms]
Range (min … max): 419.1 ms … 438.0 ms 10 runs
Summary
wild-0.4.0 ran
2.17 ± 0.03 times faster than mold-2.36
2.76 ± 0.04 times faster than lld-19
Without debug info:
OUT=/run/user/1000/ttt hyperfine -N --warmup 2 -n lld-19 './run-with ld.lld-19 --strip-debug' -n mold-2.36 './run-with mold --no-fork --strip-debug' -n wild-0.4.0 './run-with wild --no-fork --strip-debug'
Benchmark 1: lld-19
Time (mean ± σ): 225.7 ms ± 2.6 ms [User: 321.5 ms, System: 52.5 ms]
Range (min … max): 222.8 ms … 232.7 ms 13 runs
Benchmark 2: mold-2.36
Time (mean ± σ): 152.2 ms ± 0.7 ms [User: 496.0 ms, System: 46.8 ms]
Range (min … max): 150.5 ms … 153.5 ms 19 runs
Benchmark 3: wild-0.4.0
Time (mean ± σ): 78.3 ms ± 0.8 ms [User: 227.9 ms, System: 30.6 ms]
Range (min … max): 76.9 ms … 80.3 ms 38 runs
Summary
wild-0.4.0 ran
1.94 ± 0.02 times faster than mold-2.36
2.88 ± 0.04 times faster than lld-19
With debug info:
OUT=/run/user/1000/ttt hyperfine --warmup 2 -n lld-19 './run-with ld.lld-19' -n mold-2.36 './run-with mold --no-fork' -n wild-0.4.0 './run-with wild --no-fork'
Benchmark 1: lld-19
Time (mean ± σ): 325.0 ms ± 4.5 ms [User: 664.0 ms, System: 66.3 ms]
Range (min … max): 319.2 ms … 333.7 ms 10 runs
Benchmark 2: mold-2.36
Time (mean ± σ): 262.3 ms ± 2.7 ms [User: 890.9 ms, System: 75.3 ms]
Range (min … max): 259.1 ms … 269.6 ms 11 runs
Benchmark 3: wild-0.4.0
Time (mean ± σ): 183.2 ms ± 3.1 ms [User: 588.5 ms, System: 64.5 ms]
Range (min … max): 179.8 ms … 192.3 ms 16 runs
Summary
wild-0.4.0 ran
1.43 ± 0.03 times faster than mold-2.36
1.77 ± 0.04 times faster than lld-19
The following is a cargo test
command-line that can be used to build and test a crate using Wild.
This has been run successfully on a few popular crates (e.g. ripgrep, serde, tokio, rand, bitflags).
It assumes that the "wild" binary is on your path. It also depends on the Clang compiler being
installed, since GCC doesn't allow using an arbitrary linker.
RUSTFLAGS="-Clinker=clang -Clink-args=--ld-path=wild" cargo test
For more information on contributing to wild
see CONTRIBUTING.md.
For a high-level overview of Wild's design, see DESIGN.md.
Many of the posts on David's blog are about various aspects of the Wild linker.
If you'd like to sponsor this work, that would be very much appreciated. The more sponsorship I get the longer I can continue to work on this project full time.
Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Wild by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.