perf(aqua): use phf for baked registry lookups#9763
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the tool and Aqua registries to use phf (Perfect Hash Functions) for efficient compile-time lookups and rkyv for binary serialization of Aqua package definitions. The Aqua registry data has been moved to a vendor/ directory, and the aqua-registry crate was simplified by moving baked-in data logic to the main crate. Feedback was provided regarding the deserialize_string_map function, which currently fails to handle null values in YAML, potentially causing deserialization errors.
Greptile SummaryThis PR replaces the baked aqua registry's runtime
Confidence Score: 5/5Safe to merge — the change is a mechanical substitution of two static lookup tables from lazy-initialized HashMaps to compile-time PHF maps, with no change to the public API or lookup semantics. The generated PHF tables are structurally validated at build time (collision errors abort the build before the binary is produced), the runtime lookup paths in standard_registry.rs are functionally identical, and the existing test suite covers canonical, alias, and path-only lookups. No behavioral regressions are expected. No files require special attention. Important Files Changed
Reviews (6): Last reviewed commit: "Merge branch 'main' into perf/phf-regist..." | Re-trigger Greptile |
This comment was marked as outdated.
This comment was marked as outdated.
d7bcc83 to
71fbf89
Compare
71fbf89 to
23553c1
Compare
### 🐛 Bug Fixes - **(backend)** use runtime paths for backend bin dirs by @risu729 in [#9606](#9606) - **(ci)** preserve vendor/aqua-registry/ in PPA publish workflow by @jdx in [#9782](#9782) - **(ci)** set UTF-8 locale in e2e Docker image by @jdx in [#9820](#9820) - **(ci)** pass UTF-8 locale through to e2e tests by @jdx in [#9823](#9823) - **(conda)** dedup repodata by archive identifier instead of URL by @jdx in [#9831](#9831) - **(github)** use default shell for credential command by @risu729 in [#9664](#9664) - **(settings)** distinguish unset known settings from unknown ones by @jdx in [#9818](#9818) - **(upgrade)** remove completed progress jobs to prevent duplicate output by @jdx in [#9779](#9779) - **(vfox)** resolve GitHub token lazily inside Lua plugins by @jdx in [#9816](#9816) ### 🚜 Refactor - **(config)** separate core and backend tool options by @risu729 in [#9753](#9753) - **(schema)** reuse env directive property schemas by @risu729 in [#9651](#9651) ### 📚 Documentation - **(aliases)** fix Aliased Versions example and drop stale asdf callout by @jdx in [#9830](#9830) ### ⚡ Performance - **(aqua)** use phf for baked registry lookups by @risu729 in [#9763](#9763) - **(task)** cache per-file content hashes for source_freshness_hash_contents by @jdx in [#9819](#9819) ### 🧪 Testing - **(e2e)** pin aube to known-good version in npm package_manager test by @jdx in [#9794](#9794) ### 📦 Registry - replace unsupported exe options by @risu729 in [#9587](#9587) - update pi by @garysassano in [#9792](#9792) ### Chore - **(ci)** use non-large runners for release builds by @jdx in [#9786](#9786) - **(ci)** compare registry PRs from fork point by @risu729 in [#9643](#9643) - **(ci)** make build-copr.sh the single source of truth for COPR chroots by @jdx in [#9788](#9788) - **(ci)** use crates.io trusted publishing in release-plz by @jdx in [#9793](#9793) - **(ci)** remove autofix.ci workflow by @jdx in [#9801](#9801) - **(ci)** restore -large runner for Linux release builds by @jdx in [#9815](#9815) - **(ci)** add zizmor workflow for github actions security analysis by @jdx in [#9804](#9804) - **(ci)** assert mise run render produces no diff by @jdx in [#9803](#9803) - **(copr)** publish EL9 builds via centos-stream+epel-next-9 chroot by @jdx in [#9787](#9787) ### Ci - remove pull_request_target workflow by @jdx in [#9799](#9799) - remove caching from publishing workflows by @jdx in [#9800](#9800) ### Security - reject shell metacharacters in version strings and CI inputs by @jdx in [#9814](#9814) ## 📦 Aqua Registry Updates ### New Packages (11) - [`Code-Hex/Neo-cowsay`](https://github.com/Code-Hex/Neo-cowsay) - [`SonarSource/sonarqube-cli`](https://github.com/SonarSource/sonarqube-cli) - [`earendil-works/pi`](https://github.com/earendil-works/pi) - [`hylo-lang/hylo-new`](https://github.com/hylo-lang/hylo-new) - [`jfernandez/bpftop`](https://github.com/jfernandez/bpftop) - [`modem-dev/hunk`](https://github.com/modem-dev/hunk) - [`npm/cli`](https://github.com/npm/cli) - [`racket/racket/minimal`](https://github.com/racket/racket) - [`slackapi/slack-cli`](https://github.com/slackapi/slack-cli) - [`vectordotdev/vector`](https://github.com/vectordotdev/vector) - [`wasilibs/go-yamllint`](https://github.com/wasilibs/go-yamllint) ### Updated Packages (10) - [`DataDog/pup`](https://github.com/DataDog/pup) - [`aquasecurity/trivy`](https://github.com/aquasecurity/trivy) - [`astral-sh/uv`](https://github.com/astral-sh/uv) - [`caarlos0/svu`](https://github.com/caarlos0/svu) - [`cargo-bins/cargo-binstall`](https://github.com/cargo-bins/cargo-binstall) - [`foundry-rs/foundry`](https://github.com/foundry-rs/foundry) - [`gastownhall/beads`](https://github.com/gastownhall/beads) - [`gruntwork-io/terragrunt`](https://github.com/gruntwork-io/terragrunt) - [`pnpm/pnpm`](https://github.com/pnpm/pnpm) - [`santosr2/TerraTidy`](https://github.com/santosr2/TerraTidy)
### 🐛 Bug Fixes - **(backend)** use runtime paths for backend bin dirs by @risu729 in [jdx#9606](jdx#9606) - **(ci)** preserve vendor/aqua-registry/ in PPA publish workflow by @jdx in [jdx#9782](jdx#9782) - **(ci)** set UTF-8 locale in e2e Docker image by @jdx in [jdx#9820](jdx#9820) - **(ci)** pass UTF-8 locale through to e2e tests by @jdx in [jdx#9823](jdx#9823) - **(conda)** dedup repodata by archive identifier instead of URL by @jdx in [jdx#9831](jdx#9831) - **(github)** use default shell for credential command by @risu729 in [jdx#9664](jdx#9664) - **(settings)** distinguish unset known settings from unknown ones by @jdx in [jdx#9818](jdx#9818) - **(upgrade)** remove completed progress jobs to prevent duplicate output by @jdx in [jdx#9779](jdx#9779) - **(vfox)** resolve GitHub token lazily inside Lua plugins by @jdx in [jdx#9816](jdx#9816) ### 🚜 Refactor - **(config)** separate core and backend tool options by @risu729 in [jdx#9753](jdx#9753) - **(schema)** reuse env directive property schemas by @risu729 in [jdx#9651](jdx#9651) ### 📚 Documentation - **(aliases)** fix Aliased Versions example and drop stale asdf callout by @jdx in [jdx#9830](jdx#9830) ### ⚡ Performance - **(aqua)** use phf for baked registry lookups by @risu729 in [jdx#9763](jdx#9763) - **(task)** cache per-file content hashes for source_freshness_hash_contents by @jdx in [jdx#9819](jdx#9819) ### 🧪 Testing - **(e2e)** pin aube to known-good version in npm package_manager test by @jdx in [jdx#9794](jdx#9794) ### 📦 Registry - replace unsupported exe options by @risu729 in [jdx#9587](jdx#9587) - update pi by @garysassano in [jdx#9792](jdx#9792) ### Chore - **(ci)** use non-large runners for release builds by @jdx in [jdx#9786](jdx#9786) - **(ci)** compare registry PRs from fork point by @risu729 in [jdx#9643](jdx#9643) - **(ci)** make build-copr.sh the single source of truth for COPR chroots by @jdx in [jdx#9788](jdx#9788) - **(ci)** use crates.io trusted publishing in release-plz by @jdx in [jdx#9793](jdx#9793) - **(ci)** remove autofix.ci workflow by @jdx in [jdx#9801](jdx#9801) - **(ci)** restore -large runner for Linux release builds by @jdx in [jdx#9815](jdx#9815) - **(ci)** add zizmor workflow for github actions security analysis by @jdx in [jdx#9804](jdx#9804) - **(ci)** assert mise run render produces no diff by @jdx in [jdx#9803](jdx#9803) - **(copr)** publish EL9 builds via centos-stream+epel-next-9 chroot by @jdx in [jdx#9787](jdx#9787) ### Ci - remove pull_request_target workflow by @jdx in [jdx#9799](jdx#9799) - remove caching from publishing workflows by @jdx in [jdx#9800](jdx#9800) ### Security - reject shell metacharacters in version strings and CI inputs by @jdx in [jdx#9814](jdx#9814) ## 📦 Aqua Registry Updates ### New Packages (11) - [`Code-Hex/Neo-cowsay`](https://github.com/Code-Hex/Neo-cowsay) - [`SonarSource/sonarqube-cli`](https://github.com/SonarSource/sonarqube-cli) - [`earendil-works/pi`](https://github.com/earendil-works/pi) - [`hylo-lang/hylo-new`](https://github.com/hylo-lang/hylo-new) - [`jfernandez/bpftop`](https://github.com/jfernandez/bpftop) - [`modem-dev/hunk`](https://github.com/modem-dev/hunk) - [`npm/cli`](https://github.com/npm/cli) - [`racket/racket/minimal`](https://github.com/racket/racket) - [`slackapi/slack-cli`](https://github.com/slackapi/slack-cli) - [`vectordotdev/vector`](https://github.com/vectordotdev/vector) - [`wasilibs/go-yamllint`](https://github.com/wasilibs/go-yamllint) ### Updated Packages (10) - [`DataDog/pup`](https://github.com/DataDog/pup) - [`aquasecurity/trivy`](https://github.com/aquasecurity/trivy) - [`astral-sh/uv`](https://github.com/astral-sh/uv) - [`caarlos0/svu`](https://github.com/caarlos0/svu) - [`cargo-bins/cargo-binstall`](https://github.com/cargo-bins/cargo-binstall) - [`foundry-rs/foundry`](https://github.com/foundry-rs/foundry) - [`gastownhall/beads`](https://github.com/gastownhall/beads) - [`gruntwork-io/terragrunt`](https://github.com/gruntwork-io/terragrunt) - [`pnpm/pnpm`](https://github.com/pnpm/pnpm) - [`santosr2/TerraTidy`](https://github.com/santosr2/TerraTidy)
Summary
mainand narrowed this PR to the baked aqua standard registry lookup path only.phf::Mapvalues instead of runtimeLazyLock<HashMap<...>>values.phf_codegeninbuild.rsbecause the map entries are generated from the vendored aqua registry at build time.Performance impact
HashMaps. The first baked-registry lookup initialized the relevantLazyLock, allocated a heap table, and inserted all generated entries.HashMap; the process uses static read-only map metadata plus the existing baked package byte slices.include_bytes!before this PR, and they remain static binary data now. The OS can still demand-page those read-only sections as they are touched.HashMapand PHF lookup are close in the microbenchmark. The clearer win is removing first-use map construction and runtime heap allocation for the baked registry maps.HashMapbucket/control storage after first use. The alias map is smaller but had the same lazy-allocation pattern. The baked package bytes themselves are unchanged.phf_codegen
phf_codegenis appropriate for this PR because the keys and values are known only after build-script parsing/codegen, not as hand-written Rust source.phf_codegenis listed in cargo-machete ignores because it is used from the root build script.Benchmark
The previous PR body had benchmark results from temporary commits
5c39d6635/fe8be4c0e, but those measured the earlier stacked PR that also changed the mise registry lookup table. This PR is now aqua-only, so I reran a targeted baked aqua registry benchmark.The benchmark was intentionally committed and then removed so the PR history keeps the source while the final diff stays clean:
d26a809f9(test(aqua): add temporary registry lookup benchmark).23553c1fc(test(aqua): remove temporary registry lookup benchmark).Command:
Representative run from this environment:
Interpretation: warmed lookup is effectively comparable for this table and can vary run-to-run. The meaningful behavior change is that the baked registry no longer pays the first-use
HashMapconstruction/allocation cost.Tests
cargo fmt --all -- --checkcargo check -p mise --all-featurescargo test -p mise aqua::standard_registrycargo test -p mise aqua::aqua_registry_wrappercargo test -p aqua-registryThis PR body was generated by an AI coding assistant.