Skip to content

perf(aqua): use phf for baked registry lookups#9763

Merged
jdx merged 6 commits into
jdx:mainfrom
risu729:perf/phf-registry-lookups
May 11, 2026
Merged

perf(aqua): use phf for baked registry lookups#9763
jdx merged 6 commits into
jdx:mainfrom
risu729:perf/phf-registry-lookups

Conversation

@risu729

@risu729 risu729 commented May 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Rebased onto current main and narrowed this PR to the baked aqua standard registry lookup path only.
  • Generate the baked aqua package and alias lookup tables as static phf::Map values instead of runtime LazyLock<HashMap<...>> values.
  • Use phf_codegen in build.rs because the map entries are generated from the vendored aqua registry at build time.
  • Keep build-time diagnostics for malformed baked aqua rows, duplicate canonical package IDs, package filename hash collisions, and alias collisions.
  • Moved the mise registry PHF lookup change to the separate, non-stacked draft PR perf(registry): use phf for mise registry lookups #9769.

Performance impact

  • Before this PR, the baked aqua package map and alias map were lazy runtime HashMaps. The first baked-registry lookup initialized the relevant LazyLock, allocated a heap table, and inserted all generated entries.
  • After this PR, those lookup tables are generated as PHF tables at build time. Runtime lookup no longer builds or allocates the HashMap; the process uses static read-only map metadata plus the existing baked package byte slices.
  • The large baked aqua package blobs are not decoded or copied at startup. They were already compiled with include_bytes! before this PR, and they remain static binary data now. The OS can still demand-page those read-only sections as they are touched.
  • So this is not an across-the-board warmed lookup speedup. Warmed HashMap and PHF lookup are close in the microbenchmark. The clearer win is removing first-use map construction and runtime heap allocation for the baked registry maps.
  • In the benchmark below, the old file map shape would allocate at least about 115.5 KiB of heap for the 2,179-entry HashMap bucket/control storage after first use. The alias map is smaller but had the same lazy-allocation pattern. The baked package bytes themselves are unchanged.

phf_codegen

  • phf_codegen is appropriate for this PR because the keys and values are known only after build-script parsing/codegen, not as hand-written Rust source.
  • The dependency reuses the already locked 0.11.x phf crates; phf_codegen is listed in cargo-machete ignores because it is used from the root build script.

Benchmark

The previous PR body had benchmark results from temporary commits 5c39d6635 / fe8be4c0e, but those measured the earlier stacked PR that also changed the mise registry lookup table. This PR is now aqua-only, so I reran a targeted baked aqua registry benchmark.

The benchmark was intentionally committed and then removed so the PR history keeps the source while the final diff stays clean:

  • Added in d26a809f9 (test(aqua): add temporary registry lookup benchmark).
  • Removed in 23553c1fc (test(aqua): remove temporary registry lookup benchmark).

Command:

MISE_BENCH_ROUNDS=10000 cargo run --release --bin bench_aqua_registry_lookup

Representative run from this environment:

entries: 2179
rounds: 10000
lookups/sample: 21790000
hashmap capacity: 3584
estimated runtime HashMap heap: >= 115.5 KiB
lookup ns/lookup total ms checksum
phf 69.50 1514.35 107630400000
HashMap warmed 71.07 1548.63 107630400000
first-use path total ms checksum
HashMap build + first lookup 0.092135 16124
phf first lookup 0.000058 10361

Interpretation: warmed lookup is effectively comparable for this table and can vary run-to-run. The meaningful behavior change is that the baked registry no longer pays the first-use HashMap construction/allocation cost.

Tests

  • cargo fmt --all -- --check
  • cargo check -p mise --all-features
  • cargo test -p mise aqua::standard_registry
  • cargo test -p mise aqua::aqua_registry_wrapper
  • cargo test -p aqua-registry

This PR body was generated by an AI coding assistant.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the tool and Aqua registries to use phf (Perfect Hash Functions) for efficient compile-time lookups and rkyv for binary serialization of Aqua package definitions. The Aqua registry data has been moved to a vendor/ directory, and the aqua-registry crate was simplified by moving baked-in data logic to the main crate. Feedback was provided regarding the deserialize_string_map function, which currently fails to handle null values in YAML, potentially causing deserialization errors.

Comment thread crates/aqua-registry/src/types.rs
@greptile-apps

greptile-apps Bot commented May 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the baked aqua registry's runtime LazyLock<HashMap> lookup tables with phf::Map statics generated by phf_codegen in build.rs, eliminating first-use heap allocation and HashMap construction. It also addresses two previously flagged build-time robustness gaps.

  • aqua_registry_bytes_map_code and aqua_registry_string_map_code now emit phf_codegen::Map output instead of HashMap::from([...]), and AQUA_STANDARD_REGISTRY_FILES / AQUA_STANDARD_REGISTRY_ALIASES are declared as phf::Map statics directly.
  • aqua_package_registries now emits a cargo:warning for any row skipped due to a missing canonical ID, and returns a hard build error on duplicate canonical IDs.
  • aqua_registry_aliases_code now uses BTreeMap (replacing HashMap+sort) for deterministic alias ordering and returns a hard build error on alias collisions instead of silently overwriting.

Confidence Score: 5/5

Safe to merge — the change is a mechanical substitution of two static lookup tables from lazy-initialized HashMaps to compile-time PHF maps, with no change to the public API or lookup semantics.

The generated PHF tables are structurally validated at build time (collision errors abort the build before the binary is produced), the runtime lookup paths in standard_registry.rs are functionally identical, and the existing test suite covers canonical, alias, and path-only lookups. No behavioral regressions are expected.

No files require special attention.

Important Files Changed

Filename Overview
build.rs Replaces HashMap-based codegen for the baked aqua registry with phf_codegen::Map; adds build-time warnings for skipped malformed rows and hard errors for canonical ID and alias collisions; switches alias accumulator from HashMap+sort to BTreeMap for deterministic ordering.
src/aqua/standard_registry.rs Swaps LazyLock statics for phf::Map statics; removes HashMap/LazyLock imports; lookup logic is unchanged apart from the PHF get/deref pattern.

Reviews (6): Last reviewed commit: "Merge branch 'main' into perf/phf-regist..." | Re-trigger Greptile

Comment thread src/registry.rs Outdated
Comment thread build.rs Outdated
Comment thread build.rs
@risu729

This comment was marked as outdated.

@risu729 risu729 force-pushed the perf/phf-registry-lookups branch from d7bcc83 to 71fbf89 Compare May 10, 2026 16:16
@risu729 risu729 changed the title perf(registry): use phf for baked lookup tables perf(aqua): use phf for baked registry lookups May 10, 2026
@risu729 risu729 force-pushed the perf/phf-registry-lookups branch from 71fbf89 to 23553c1 Compare May 11, 2026 12:08
@risu729 risu729 marked this pull request as ready for review May 11, 2026 12:22
@jdx jdx enabled auto-merge (squash) May 11, 2026 19:30
@jdx jdx merged commit 39858d6 into jdx:main May 11, 2026
33 checks passed
@risu729 risu729 deleted the perf/phf-registry-lookups branch May 11, 2026 19:58
mise-en-dev added a commit that referenced this pull request May 13, 2026
### 🐛 Bug Fixes

- **(backend)** use runtime paths for backend bin dirs by @risu729 in
[#9606](#9606)
- **(ci)** preserve vendor/aqua-registry/ in PPA publish workflow by
@jdx in [#9782](#9782)
- **(ci)** set UTF-8 locale in e2e Docker image by @jdx in
[#9820](#9820)
- **(ci)** pass UTF-8 locale through to e2e tests by @jdx in
[#9823](#9823)
- **(conda)** dedup repodata by archive identifier instead of URL by
@jdx in [#9831](#9831)
- **(github)** use default shell for credential command by @risu729 in
[#9664](#9664)
- **(settings)** distinguish unset known settings from unknown ones by
@jdx in [#9818](#9818)
- **(upgrade)** remove completed progress jobs to prevent duplicate
output by @jdx in [#9779](#9779)
- **(vfox)** resolve GitHub token lazily inside Lua plugins by @jdx in
[#9816](#9816)

### 🚜 Refactor

- **(config)** separate core and backend tool options by @risu729 in
[#9753](#9753)
- **(schema)** reuse env directive property schemas by @risu729 in
[#9651](#9651)

### 📚 Documentation

- **(aliases)** fix Aliased Versions example and drop stale asdf callout
by @jdx in [#9830](#9830)

### ⚡ Performance

- **(aqua)** use phf for baked registry lookups by @risu729 in
[#9763](#9763)
- **(task)** cache per-file content hashes for
source_freshness_hash_contents by @jdx in
[#9819](#9819)

### 🧪 Testing

- **(e2e)** pin aube to known-good version in npm package_manager test
by @jdx in [#9794](#9794)

### 📦 Registry

- replace unsupported exe options by @risu729 in
[#9587](#9587)
- update pi by @garysassano in
[#9792](#9792)

### Chore

- **(ci)** use non-large runners for release builds by @jdx in
[#9786](#9786)
- **(ci)** compare registry PRs from fork point by @risu729 in
[#9643](#9643)
- **(ci)** make build-copr.sh the single source of truth for COPR
chroots by @jdx in [#9788](#9788)
- **(ci)** use crates.io trusted publishing in release-plz by @jdx in
[#9793](#9793)
- **(ci)** remove autofix.ci workflow by @jdx in
[#9801](#9801)
- **(ci)** restore -large runner for Linux release builds by @jdx in
[#9815](#9815)
- **(ci)** add zizmor workflow for github actions security analysis by
@jdx in [#9804](#9804)
- **(ci)** assert mise run render produces no diff by @jdx in
[#9803](#9803)
- **(copr)** publish EL9 builds via centos-stream+epel-next-9 chroot by
@jdx in [#9787](#9787)

### Ci

- remove pull_request_target workflow by @jdx in
[#9799](#9799)
- remove caching from publishing workflows by @jdx in
[#9800](#9800)

### Security

- reject shell metacharacters in version strings and CI inputs by @jdx
in [#9814](#9814)

## 📦 Aqua Registry Updates

### New Packages (11)

- [`Code-Hex/Neo-cowsay`](https://github.com/Code-Hex/Neo-cowsay)
-
[`SonarSource/sonarqube-cli`](https://github.com/SonarSource/sonarqube-cli)
- [`earendil-works/pi`](https://github.com/earendil-works/pi)
- [`hylo-lang/hylo-new`](https://github.com/hylo-lang/hylo-new)
- [`jfernandez/bpftop`](https://github.com/jfernandez/bpftop)
- [`modem-dev/hunk`](https://github.com/modem-dev/hunk)
- [`npm/cli`](https://github.com/npm/cli)
- [`racket/racket/minimal`](https://github.com/racket/racket)
- [`slackapi/slack-cli`](https://github.com/slackapi/slack-cli)
- [`vectordotdev/vector`](https://github.com/vectordotdev/vector)
- [`wasilibs/go-yamllint`](https://github.com/wasilibs/go-yamllint)

### Updated Packages (10)

- [`DataDog/pup`](https://github.com/DataDog/pup)
- [`aquasecurity/trivy`](https://github.com/aquasecurity/trivy)
- [`astral-sh/uv`](https://github.com/astral-sh/uv)
- [`caarlos0/svu`](https://github.com/caarlos0/svu)
-
[`cargo-bins/cargo-binstall`](https://github.com/cargo-bins/cargo-binstall)
- [`foundry-rs/foundry`](https://github.com/foundry-rs/foundry)
- [`gastownhall/beads`](https://github.com/gastownhall/beads)
-
[`gruntwork-io/terragrunt`](https://github.com/gruntwork-io/terragrunt)
- [`pnpm/pnpm`](https://github.com/pnpm/pnpm)
- [`santosr2/TerraTidy`](https://github.com/santosr2/TerraTidy)
3PeatVR pushed a commit to 3PeatVR/mise that referenced this pull request May 14, 2026
### 🐛 Bug Fixes

- **(backend)** use runtime paths for backend bin dirs by @risu729 in
[jdx#9606](jdx#9606)
- **(ci)** preserve vendor/aqua-registry/ in PPA publish workflow by
@jdx in [jdx#9782](jdx#9782)
- **(ci)** set UTF-8 locale in e2e Docker image by @jdx in
[jdx#9820](jdx#9820)
- **(ci)** pass UTF-8 locale through to e2e tests by @jdx in
[jdx#9823](jdx#9823)
- **(conda)** dedup repodata by archive identifier instead of URL by
@jdx in [jdx#9831](jdx#9831)
- **(github)** use default shell for credential command by @risu729 in
[jdx#9664](jdx#9664)
- **(settings)** distinguish unset known settings from unknown ones by
@jdx in [jdx#9818](jdx#9818)
- **(upgrade)** remove completed progress jobs to prevent duplicate
output by @jdx in [jdx#9779](jdx#9779)
- **(vfox)** resolve GitHub token lazily inside Lua plugins by @jdx in
[jdx#9816](jdx#9816)

### 🚜 Refactor

- **(config)** separate core and backend tool options by @risu729 in
[jdx#9753](jdx#9753)
- **(schema)** reuse env directive property schemas by @risu729 in
[jdx#9651](jdx#9651)

### 📚 Documentation

- **(aliases)** fix Aliased Versions example and drop stale asdf callout
by @jdx in [jdx#9830](jdx#9830)

### ⚡ Performance

- **(aqua)** use phf for baked registry lookups by @risu729 in
[jdx#9763](jdx#9763)
- **(task)** cache per-file content hashes for
source_freshness_hash_contents by @jdx in
[jdx#9819](jdx#9819)

### 🧪 Testing

- **(e2e)** pin aube to known-good version in npm package_manager test
by @jdx in [jdx#9794](jdx#9794)

### 📦 Registry

- replace unsupported exe options by @risu729 in
[jdx#9587](jdx#9587)
- update pi by @garysassano in
[jdx#9792](jdx#9792)

### Chore

- **(ci)** use non-large runners for release builds by @jdx in
[jdx#9786](jdx#9786)
- **(ci)** compare registry PRs from fork point by @risu729 in
[jdx#9643](jdx#9643)
- **(ci)** make build-copr.sh the single source of truth for COPR
chroots by @jdx in [jdx#9788](jdx#9788)
- **(ci)** use crates.io trusted publishing in release-plz by @jdx in
[jdx#9793](jdx#9793)
- **(ci)** remove autofix.ci workflow by @jdx in
[jdx#9801](jdx#9801)
- **(ci)** restore -large runner for Linux release builds by @jdx in
[jdx#9815](jdx#9815)
- **(ci)** add zizmor workflow for github actions security analysis by
@jdx in [jdx#9804](jdx#9804)
- **(ci)** assert mise run render produces no diff by @jdx in
[jdx#9803](jdx#9803)
- **(copr)** publish EL9 builds via centos-stream+epel-next-9 chroot by
@jdx in [jdx#9787](jdx#9787)

### Ci

- remove pull_request_target workflow by @jdx in
[jdx#9799](jdx#9799)
- remove caching from publishing workflows by @jdx in
[jdx#9800](jdx#9800)

### Security

- reject shell metacharacters in version strings and CI inputs by @jdx
in [jdx#9814](jdx#9814)

## 📦 Aqua Registry Updates

### New Packages (11)

- [`Code-Hex/Neo-cowsay`](https://github.com/Code-Hex/Neo-cowsay)
-
[`SonarSource/sonarqube-cli`](https://github.com/SonarSource/sonarqube-cli)
- [`earendil-works/pi`](https://github.com/earendil-works/pi)
- [`hylo-lang/hylo-new`](https://github.com/hylo-lang/hylo-new)
- [`jfernandez/bpftop`](https://github.com/jfernandez/bpftop)
- [`modem-dev/hunk`](https://github.com/modem-dev/hunk)
- [`npm/cli`](https://github.com/npm/cli)
- [`racket/racket/minimal`](https://github.com/racket/racket)
- [`slackapi/slack-cli`](https://github.com/slackapi/slack-cli)
- [`vectordotdev/vector`](https://github.com/vectordotdev/vector)
- [`wasilibs/go-yamllint`](https://github.com/wasilibs/go-yamllint)

### Updated Packages (10)

- [`DataDog/pup`](https://github.com/DataDog/pup)
- [`aquasecurity/trivy`](https://github.com/aquasecurity/trivy)
- [`astral-sh/uv`](https://github.com/astral-sh/uv)
- [`caarlos0/svu`](https://github.com/caarlos0/svu)
-
[`cargo-bins/cargo-binstall`](https://github.com/cargo-bins/cargo-binstall)
- [`foundry-rs/foundry`](https://github.com/foundry-rs/foundry)
- [`gastownhall/beads`](https://github.com/gastownhall/beads)
-
[`gruntwork-io/terragrunt`](https://github.com/gruntwork-io/terragrunt)
- [`pnpm/pnpm`](https://github.com/pnpm/pnpm)
- [`santosr2/TerraTidy`](https://github.com/santosr2/TerraTidy)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants