Conversation
- Copy Ruff's platform-specific allocator configuration to ty:
- Windows: uses mimalloc
- Unix-like (x86_64, aarch64, powerpc64, riscv64): uses jemalloc by default
- Other platforms: uses system allocator
- Add `mimalloc` feature flag to prefer mimalloc over jemalloc on
platforms that support both allocators
- Add allocator memory usage statistics:
- Set TY_ALLOCATOR_STATS=1 to print memory stats on exit
- jemalloc: shows allocated, active, resident, mapped, retained,
metadata bytes and fragmentation percentage
- mimalloc: provides guidance for using MIMALLOC_SHOW_STATS=1
- Add tikv-jemalloc-ctl workspace dependency with stats feature
Add comprehensive documentation to the allocator module explaining: - TY_ALLOCATOR_STATS=1 for built-in stats output - MALLOC_CONF=stats_print:true for detailed jemalloc stats - stats_print_opts flags for jemalloc (g, m, d, a, b, l, x) - MIMALLOC_SHOW_STATS=1 and MIMALLOC_VERBOSE=1 for mimalloc Also update the jemalloc stats output to include a tip about MALLOC_CONF for users wanting more detailed information.
Diagnostic diff on typing conformance testsNo changes detected when running ty on typing conformance tests ✅ |
Update allocator configuration so that: - No feature flags: use system allocator (default) - --features jemalloc: use jemalloc on supported platforms - --features mimalloc: use mimalloc on all platforms - Both features enabled: jemalloc on supported platforms, mimalloc elsewhere This makes it easy to compare allocator performance by simply changing the feature flags during build.
|
|
MacOSmimalloc wins consistently across almost all benchmarks, with speedups typically ranging from 1.03× to 1.13× over the system allocator.
Key takeaways:
Benchmark resultsLinux glibc Allocator Benchmark SummaryThe system allocator (glibc) performs significantly worse on Linux compared to macOS, making the alternative allocators much more impactful.
Key differences from macOS:
Bottom line: On Linux, either jemalloc or mimalloc is a massive win over glibc. The choice between them matters less than on macOS, though mimalloc still has a slight edge overall. Benchmark resultsWindowsmimalloc delivers substantial wins over the Windows system allocator. jemalloc isn't included because it doesn't support Windows.
Benchmark resultsSummary
Note, this only compares performance, not memory usage |
Cold peak memory usageI used homeassistant on mac and linux Mac OS
Resultspeak memory footprint as reported by System:
Jemalloc
mimalloc
Linux
Results
System:
Jemalloc:
Mimalloc:
WindowsPandas
Prefect
ResultsPandas, System
Pandas, mimalloc
Prefect, System
Prefect, mimalloc
Script $proc = Start-Process "..\ruff\target\release\ty-system.exe" -ArgumentList "check -q pandas typings" -PassThru -NoNewWindow
while (!$proc.HasExited) { $proc.Refresh(); Start-Sleep -Milliseconds 10 }
$proc.Refresh()
Write-Host "Peak: $([math]::Round($proc.PeakWorkingSet64 / 1MB, 2)) MB" |
|
Incremental memory usage is much more difficult to capture and I found results to change a lot between (manual) runs. It's also difficult to know at which number to use in the first place. Virtual memory, resident memory? How long to wait for decay to kick in, etc. MacosSystem:
Jemalloc:
mimalloc:
LinuxJemalloc does pretty well overall, even using less than the system allocator? System
jemalloc:
mimalloc:
WindowsSystem, prefect:
mimalloc:
|
|
This looks awesome. Thanks for spending so much time on the analysis!! I'll leave it to others to review the code, however; I feel very underqualified there 😆 |
| #[cfg(all( | ||
| not(target_os = "macos"), | ||
| not(target_os = "windows"), | ||
| not(target_os = "openbsd"), | ||
| not(target_os = "aix"), | ||
| not(target_os = "android"), | ||
| any( | ||
| target_arch = "x86_64", | ||
| target_arch = "aarch64", | ||
| target_arch = "powerpc64", | ||
| target_arch = "riscv64" | ||
| ) | ||
| ))] | ||
| #[global_allocator] | ||
| static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc; |
There was a problem hiding this comment.
We do something very similar in fd, and had to add more and more exclusions over the years. Might be worth comparing the lists(?).
Unrelated, it might be good to add a comment here and in Cargo.toml that the filtering rules need to be kept in sync? We don't test on all of these platforms..
There was a problem hiding this comment.
For reference, I believe this is the same as in Ruff (except you're also omitting macOS (intentionally)). In uv, it looks like we use:
#[cfg(all(
not(target_os = "windows"),
not(target_os = "openbsd"),
not(target_os = "freebsd"),
any(
target_arch = "x86_64",
target_arch = "aarch64",
target_arch = "powerpc64"
)
))]
#[global_allocator]
static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;There was a problem hiding this comment.
Yes, it's the same as Ruff, but excluding macos for the reasons described in the PR summary
|
Just as a note, we split this into a separate crate in uv a while back to improve developer compile times (https://github.com/astral-sh/uv/tree/main/crates/uv-performance-memory-allocator / astral-sh/uv#7686) |
* origin/main: Fluent formatting of method chains (#21369) [ty] Avoid stack overflow when calculating inferable typevars (#21971) [ty] Add "qualify ..." code fix for undefined references (#21968) [ty] Use jemalloc on linux (#21975) Update MSRV to 1.90 (#21987) [ty] Improve check enforcing that an overloaded function must have an implementation (#21978) Update actions/checkout digest to 8e8c483 (#21982) [ty] Use `ParamSpec` without the attr for inferable check (#21934) [ty] Emit diagnostic when a type variable with a default is followed by one without a default (#21787)
|
FWIW, I've seen substantial RSS usage on macOS (5 GB+) for one of my repositories. Use |
Summary
After reviewing the results in the thread below, I propose using jemalloc on Linux only and considering a custom allocator on Windows separately.
Linux
I don't expect this to move the needle on our benchmarks as the improvements mainly show when running ty on many cores.
Windows:
It might be worth using mimalloc in the future but this certainly requires a more in-depth analysis.
macOS:
Part of astral-sh/ty#644
Test Plan
See the following comments on this PR