Skip to content

perf(aqua): bake rkyv aqua package blobs#9535

Merged
jdx merged 24 commits into
jdx:mainfrom
risu729:perf/aqua-compiled-baked-registry
May 10, 2026
Merged

perf(aqua): bake rkyv aqua package blobs#9535
jdx merged 24 commits into
jdx:mainfrom
risu729:perf/aqua-compiled-baked-registry

Conversation

@risu729

@risu729 risu729 commented May 2, 2026

Copy link
Copy Markdown
Contributor

Motivation

  • I am planning to add support for custom aqua registries.
  • mise currently looks up aqua registry entries by pkgs/<owner>/<repo>/registry.yaml from a git-cloned registry repository, but cloning the entire repository is heavier than necessary for registry lookup.
  • aqua does not support that per-package repository layout as the public registry format. The supported registry input is a single merged YAML file.
  • If a user pins a specific tag of the standard registry, that merged YAML file is around 3 MB, which is too large to parse on every registry lookup.
  • aqua converts YAML registries to a single adjacent JSON cache file, which is enough for aqua because it mostly reads registry metadata while installing new tools.
  • mise needs registry metadata in hotter paths because it may need aqua registry data to resolve versions, including from commands like hook-env.
  • For mise, custom registries need a compiled cache that parses the large YAML only when it changes, then resolves a package id or alias and decodes one package entry at runtime.
  • This PR changes the baked-in registry toward that lookup model before adding custom-registry support.

Summary

  • Move the vendored merged aqua registry files to vendor/aqua-registry/, with the bundled registry stored as registry.yml alongside its upstream LICENSE and generated metadata.json.
  • Generate baked aqua registry entries by parsing the bundled registry YAML into typed AquaPackage values from the existing aqua-registry crate, then serializing each typed package as rkyv blobs.
  • Move the rkyv package encode/decode helpers into aqua-registry so build-time baking and runtime lookup use the same codec. The implementation enables rkyv's unaligned mode so include_bytes! package blobs can decode without copying.
  • Skip runtime-only fields with rkyv::with::Skip so archived data stays focused on parsed package data.
  • Keep serde derives to the YAML parsing surface only; rkyv derives handle the baked binary package codec.
  • Tighten RegistryFetcher around fetch_package() only; registry YAML parsing is now an implementation detail of fetchers that still read cloned per-package files.
  • Decode only the requested baked package blob at lookup time, preserving lazy package loading and alias handling.
  • Remove the temporary registry format benchmark example now that the format decision for this PR is reflected in the implementation.
  • Detect generated package filename hash collisions at build time and include package context in baked blob decode errors.
  • Add AquaPackageType::GoBuild after typed build-time parsing exposed go_build entries in the bundled registry, and reject it like go_install in the aqua backend.
  • Keep build-script error handling direct now that the root build script returns Result<()>.

Aqua Registry Compatibility

  • The typed parser does not use deny_unknown_fields, so unknown or unsupported fields in merged aqua registry YAML are silently ignored.
  • The only extra top-level field consumed by the registry row wrapper is aliases, and it is used only to generate the baked alias index. Alias metadata is not archived into per-package rkyv blobs, and nested aliases under overrides or version_overrides remain ignored.
  • If a cloned/custom aqua registry is present but does not contain a requested package, mise still falls back to the baked standard registry when aqua.baked_registry is enabled. Setting aqua.baked_registry = false disables that fallback and returns a registry-unavailable error instead.
Registry format benchmark

Command used while evaluating the registry format:

cargo run -p aqua-registry --example bench_registry_formats --release

The benchmark measured aqua-registry package work only. It intentionally did not time a full mise command because command runtime is dominated by config loading, cache state, shell setup, and API calls.

The decode columns are the relevant runtime cost for the baked-in registry: the binary already contains compiled blobs, so lookup only decodes the requested package into AquaPackage.

The convert column models future runtime compilation for custom registries after the YAML source has been parsed into AquaPackage. It is AquaPackage -> cached format for every format, so YAML is not zero: YAML conversion means serializing the parsed AquaPackage back to YAML. JSON conversion measures JSON encoding, MessagePack conversion measures MessagePack encoding, rkyv conversion measures rkyv encoding, and MessagePack+zlib conversion measures MessagePack encoding plus zlib compression.

Representative ns/op from this branch after adding rkyv to the restored benchmark:

package format bytes decode ns convert ns
01mf02/jaq YAML 7,865 798,325 334,871
01mf02/jaq JSON 7,310 99,401 18,308
01mf02/jaq MessagePack 5,461 91,847 17,747
01mf02/jaq rkyv 10,184 24,047 9,524
01mf02/jaq MessagePack+zlib 681 122,384 115,430
hashicorp/terraform YAML 1,663 157,855 79,726
hashicorp/terraform JSON 1,686 9,990 4,419
hashicorp/terraform MessagePack 1,361 6,286 2,961
hashicorp/terraform rkyv 1,976 1,926 1,652
hashicorp/terraform MessagePack+zlib 638 26,193 75,436
WebAssembly/binaryen YAML 71,406 6,992,245 5,285,232
WebAssembly/binaryen JSON 67,159 1,147,305 225,388
WebAssembly/binaryen MessagePack 53,661 929,070 194,281
WebAssembly/binaryen rkyv 76,060 350,336 99,903
WebAssembly/binaryen MessagePack+zlib 1,750 1,276,407 1,384,718
commercialhaskell/stack YAML 47,673 6,464,268 7,379,113
commercialhaskell/stack JSON 44,605 679,860 340,178
commercialhaskell/stack MessagePack 33,849 928,482 409,058
commercialhaskell/stack rkyv 55,480 420,295 103,542
commercialhaskell/stack MessagePack+zlib 1,277 1,346,596 1,508,999
fastfetch-cli/fastfetch YAML 40,183 11,620,460 4,788,665
fastfetch-cli/fastfetch JSON 36,980 1,205,924 196,605
fastfetch-cli/fastfetch MessagePack 27,678 856,442 177,785
fastfetch-cli/fastfetch rkyv 49,580 214,291 109,483
fastfetch-cli/fastfetch MessagePack+zlib 1,274 827,573 1,136,177
cri-o/cri-o YAML 14,343 3,573,568 1,641,749
cri-o/cri-o JSON 13,884 464,733 50,593
cri-o/cri-o MessagePack 10,657 247,866 53,999
cri-o/cri-o rkyv 16,456 95,189 20,740
cri-o/cri-o MessagePack+zlib 854 347,174 298,248

Fixture total payload sizes for the table fixtures:

format bytes
YAML 183,133
JSON 171,624
MessagePack 132,667
rkyv 209,736
MessagePack+zlib 6,474

Notes:

  • aqua itself uses JSON as its YAML registry cache: GitHub YAML registries are converted to adjacent JSON cache files and later reads prefer that JSON file. See pkg/install-registry/json.go.
  • rkyv has the fastest checked decode and conversion path in this microbenchmark, beating uncompressed MessagePack for every measured package.
  • rkyv payloads are larger than uncompressed MessagePack and much larger than MessagePack+zlib. This PR notes that tradeoff, but the format decision is based on runtime speed rather than binary size.
  • MessagePack+zlib is much smaller, but it is slower to decode and convert than rkyv in this benchmark.
  • For future custom registries, conversion should happen only when the downloaded YAML source changes. rkyv keeps that compiled-cache path aligned with the baked-registry lookup path.
  • The large package cases show why avoiding repeated YAML decode matters: YAML decode is much slower than JSON/MessagePack/rkyv.

Follow-up: custom registries

The intended follow-up is to use the same compiled package lookup shape for custom single-file registries with rkyv package blobs on disk:

  1. Download/cache the merged registry YAML source.
  2. Parse and compile it only when the source changes into an id/alias index plus direct rkyv package blobs.
  3. Runtime lookup resolves the id and decodes one package blob, matching the baked-registry path at the AquaPackage lookup layer.

This keeps custom registries consistent with baked lookup without parsing the whole merged YAML per package request.

Tests

  • cargo fmt --all -- --check
  • cargo test -p mise aqua::aqua_registry_wrapper
  • cargo test -p aqua-registry
  • cargo test -p mise aqua::standard_registry
  • cargo check -p mise --all-features

This PR body was generated by an AI coding assistant.

@greptile-apps

greptile-apps Bot commented May 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the per-package YAML registry file lookups with pre-compiled rkyv blobs baked into the mise binary at build time. The bundled aqua registry YAML is parsed once during cargo build, each package is serialized to an rkyv blob and written to OUT_DIR, and a generated HashMap of include_bytes! calls is included at compile time. Runtime lookup decodes only the requested blob.

  • Baked registry pipeline: build.rs parses vendor/aqua-registry/registry.yml into typed AquaPackage values, serializes each as an rkyv blob with a collision-detected FNV-1a filename stem, and emits include_bytes! maps for packages, aliases, and metadata.
  • MiseRegistryFetcher: The git-repo per-package path and baked-registry fallback are composed in aqua_registry_wrapper.rs; the fallback triggers when the cloned registry does not contain the requested package and aqua.baked_registry is enabled.
  • New AquaPackageType::GoBuild: Build-time typed parsing exposed go_build entries; the aqua backend now rejects both go_install and go_build with the same Use the go backend guidance.

Confidence Score: 5/5

Safe to merge. The baked-registry pipeline is self-contained at build time, the rkyv unaligned feature correctly enables include_bytes! decoding without alignment assumptions, runtime-only fields are properly reconstructed after decode, and build-time hash collision detection prevents silent blob overwrites.

The build-time codegen, rkyv codec, alias resolution, and MiseRegistryFetcher fallback logic are all implemented correctly. The previously raised hash-collision concern is addressed in aqua_registry_files_code. The duplicate early-return for the baked-registry path is an intentional micro-optimization that avoids redundant FS stat calls, not a bug. No data-correctness or runtime-safety issues were found.

No files require special attention.

Important Files Changed

Filename Overview
build.rs Adds codegen_aqua_standard_registry: parses registry YAML, encodes each package as rkyv, detects filename hash collisions at build time, and emits include_bytes! maps for packages, aliases and metadata. Logic is correct; metadata.json is parsed via serde_yaml rather than serde_json (harmless since YAML is a superset of JSON).
crates/aqua-registry/src/codec.rs New file: thin encode/decode wrappers around rkyv with package-id context in error messages. Roundtrip test covers the var-default case.
crates/aqua-registry/src/types.rs Adds rkyv Archive/Serialize/Deserialize derives to all package types. Runtime-only fields (version_filter_expr, var_values) correctly use rkyv::with::Skip so they are absent from blobs and reconstructed after decode. Adds GoBuild variant to AquaPackageType.
src/aqua/aqua_registry_wrapper.rs Replaces DefaultRegistryFetcher with MiseRegistryFetcher that adds baked-registry fallback around the existing git-repo path. The two baked-registry checks are logically redundant but correct; the first is an optimisation that avoids repeated FS stat calls when no git repo is present.
src/aqua/standard_registry.rs New file: exposes LazyLock maps generated at build time (packages, aliases, metadata) and a package() helper that handles alias resolution before rkyv decode. Tests cover direct lookup, alias resolution, path-only packages and metadata shape.
crates/aqua-registry/src/registry.rs AquaRegistry is now generic over RegistryFetcher and CacheStore; DefaultRegistryFetcher retains git-repo reading as the library default. No baked-registry logic here by design.
src/backend/aqua.rs Adds GoBuild to the rejected package types alongside GoInstall; error message now includes the actual type name.

Reviews (25): Last reviewed commit: "Merge branch 'main' into perf/aqua-compi..." | Re-trigger Greptile

Comment thread crates/aqua-registry/build.rs Outdated
Comment thread crates/aqua-registry/examples/bench_registry_formats.rs Outdated

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the baked registry storage from YAML strings to Zlib-compressed MessagePack blobs to optimize binary size. It introduces flate2 and rmp-serde dependencies, updates the build script to generate individual compressed files using a hashing scheme, and adds a benchmark to compare registry formats. Review feedback suggests implementing collision detection for the generated filenames, using maximum compression for build-time artifacts, and enhancing error reporting during the decompression of baked data.

Comment thread crates/aqua-registry/build.rs Outdated
Comment thread crates/aqua-registry/build.rs Outdated
Comment thread crates/aqua-registry/src/registry.rs Outdated
@risu729 risu729 changed the title perf(aqua): bake compressed registry package blobs perf(aqua): bake compressed aqua package blobs May 2, 2026
@risu729 risu729 force-pushed the perf/aqua-compiled-baked-registry branch from c03740a to 2db074f Compare May 2, 2026 15:03
@risu729 risu729 marked this pull request as ready for review May 3, 2026 04:22
@risu729 risu729 force-pushed the perf/aqua-compiled-baked-registry branch from bb68be2 to 80714d0 Compare May 4, 2026 04:40

This comment was marked as outdated.

@risu729 risu729 marked this pull request as draft May 4, 2026 06:04
@risu729 risu729 marked this pull request as ready for review May 4, 2026 07:45
@jdx

jdx commented May 5, 2026

Copy link
Copy Markdown
Owner

what about rkyv?

@risu729 risu729 marked this pull request as draft May 5, 2026 19:24
@risu729

This comment was marked as outdated.

@risu729

risu729 commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

I haven't heard of it, but it looks really fast! The binary size increased than raw YAML but I think it's totally fine.

@risu729

This comment was marked as outdated.

@risu729 risu729 force-pushed the perf/aqua-compiled-baked-registry branch from 0433d6a to 97b95d0 Compare May 6, 2026 13:03
@risu729

This comment was marked as duplicate.

@risu729

This comment was marked as outdated.

@risu729 risu729 changed the title perf(aqua): bake compressed aqua package blobs perf(aqua): bake rkyv aqua package blobs May 7, 2026
@risu729

This comment was marked as outdated.

@socket-security

socket-security Bot commented May 9, 2026

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcargo/​rkyv@​0.8.1610010093100100

View full report

@risu729

This comment was marked as outdated.

@risu729 risu729 marked this pull request as ready for review May 10, 2026 09:36
@jdx jdx merged commit 1a81f78 into jdx:main May 10, 2026
33 of 34 checks passed
@risu729 risu729 deleted the perf/aqua-compiled-baked-registry branch May 10, 2026 12:13
mise-en-dev added a commit that referenced this pull request May 11, 2026
### 🚀 Features

- **(cli)** add minimum release age flag to lock and ls-remote by
@risu729 in [#9269](#9269)
- **(config)** add run field for hooks by @risu729 in
[#9718](#9718)
- **(github)** add native oauth token source by @jdx in
[#9654](#9654)
- **(oci)** scope build to project config by default by @jdx in
[#9766](#9766)
- add support for prefixed latest version queries in outdated checks by
@roele in [#9767](#9767)

### 🐛 Bug Fixes

- **(activate)** guard bash chpwd hook under nounset by @risu729 in
[#9716](#9716)
- **(backend)** date-check latest stable fast path by @risu729 in
[#9650](#9650)
- **(config)** parse core tool options consistently by @risu729 in
[#9742](#9742)
- **(exec)** propagate __MISE_DIFF so nested mise recovers pristine PATH
by @jdx in [#9765](#9765)
- **(forgejo)** include prereleases when opted in by @risu729 in
[#9717](#9717)
- **(github)** avoid caching empty release assets by @risu729 in
[#9616](#9616)
- **(java)** resolve lockfile URLs from metadata by @risu729 in
[#9719](#9719)
- **(lock)** cache unavailable github attestations by @risu729 in
[#9741](#9741)
- **(pipx)** preserve options when reinstalling tools by @risu729 in
[#9663](#9663)
- **(python)** skip redundant lockfile provenance verification by
@risu729 in [#9739](#9739)
- **(vfox)** run pre_uninstall hook by @risu729 in
[#9662](#9662)

### 🚜 Refactor

- **(schema)** extract tool options definition by @risu729 in
[#9649](#9649)

### ⚡ Performance

- **(aqua)** bake rkyv aqua package blobs by @risu729 in
[#9535](#9535)

### 📦️ Dependency Updates

- lock file maintenance by @renovate[bot] in
[#9773](#9773)

### 📦 Registry

- add vector
([github:vectordotdev/vector](https://github.com/vectordotdev/vector))
by @kquinsland in [#9761](#9761)
- add oc and openshift-install (http backend) by @konono in
[#9669](#9669)

### New Contributors

- @konono made their first contribution in
[#9669](#9669)
- @kquinsland made their first contribution in
[#9761](#9761)
jdx pushed a commit that referenced this pull request May 11, 2026
## Summary
- Fix `xtasks/release-plz` so `aqua-registry` is rewritten in both
`[dependencies]` and `[build-dependencies]` after the subcrate is
published.
- Avoid Cargo's mixed-source manifest error during release publishing by
removing both local path entries before re-adding the published
crates.io version.

## Root Cause
The release-plz job for `chore: release 2026.5.6 (#9764)` failed at
https://github.com/jdx/mise/actions/runs/25666525627/job/75340319171
after publishing `aqua-registry v2026.5.5`.

The failing step was:

```text
+ cargo add aqua-registry@2026.5.5
error: failed to parse manifest at `/home/runner/work/mise/mise/Cargo.toml`

Caused by:
  Dependency 'aqua-registry' has different source paths depending on the build target. Each dependency must have a single canonical source path irrespective of build target.
```

PR #9535 introduced the new baked aqua registry pipeline. As part of
that change, the root `mise` crate now depends on `aqua-registry` at
runtime and also in `build.rs`, so `Cargo.toml` has `aqua-registry = {
path = "crates/aqua-registry" }` in both `[dependencies]` and
`[build-dependencies]`.

The release script already handled publishing the subcrate, but it only
ran `cargo add aqua-registry@$AQUA_REGISTRY_VERSION` for the normal
dependency. That made the normal dependency point to crates.io while the
build dependency still pointed at the local workspace path, and Cargo
rejects that mixed source state before the release can continue.

## Test Plan
- `bash -n xtasks/release-plz`
- `mise run lint-fix`
- Manually verified the dependency rewrite sequence: `cargo rm
aqua-registry`, `cargo rm --build aqua-registry`, `cargo add
aqua-registry@2026.5.5`, `cargo add --build aqua-registry@2026.5.5`,
then restored the manifest.

*This PR body was generated by an AI coding assistant.*

Made with [Cursor](https://cursor.com)
jdx added a commit that referenced this pull request May 11, 2026
## Summary

- `cargo vendor` deletes the target directory's contents by default,
which was wiping out the git-tracked `vendor/aqua-registry/` directory
(containing `registry.yml`, `metadata.json`, `LICENSE`).
- `build.rs` reads `vendor/aqua-registry/registry.yml` at compile time
(added in [#9535](#9535)), so the source
tarball uploaded to Launchpad was missing this file and the build failed
with `No such file or directory` at `build.rs:357`.
- Adding `--no-delete` to `cargo vendor` preserves the existing
directory while still vendoring Rust crates alongside it.

Observed failure: [Launchpad build log for
`mise_2026.5.6~resolute1`](https://launchpadlibrarian.net/860699557/buildlog_ubuntu-resolute-amd64.mise_2026.5.6~resolute1_BUILDING.txt.gz).

## Test plan

- [ ] Re-run the `ppa-publish` workflow manually and confirm
`vendor/aqua-registry/registry.yml` is present in the uploaded source
tarball.
- [ ] Confirm the resulting Launchpad build succeeds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk CI-only change that alters `cargo vendor` behavior during
packaging; main risk is unexpected differences in the vendored directory
contents affecting reproducibility.
> 
> **Overview**
> Updates the `ppa-publish` GitHub Actions workflow to run `cargo
vendor` with `--no-delete`, preventing it from wiping the git-tracked
`vendor/aqua-registry/` directory needed at build time.
> 
> Adds brief inline comments documenting why the flag is required; all
other packaging steps remain unchanged.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
e47e7ad. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants