perf(aqua): bake rkyv aqua package blobs#9535
Conversation
Greptile SummaryThis PR replaces the per-package YAML registry file lookups with pre-compiled rkyv blobs baked into the mise binary at build time. The bundled aqua registry YAML is parsed once during
Confidence Score: 5/5Safe to merge. The baked-registry pipeline is self-contained at build time, the rkyv unaligned feature correctly enables include_bytes! decoding without alignment assumptions, runtime-only fields are properly reconstructed after decode, and build-time hash collision detection prevents silent blob overwrites. The build-time codegen, rkyv codec, alias resolution, and MiseRegistryFetcher fallback logic are all implemented correctly. The previously raised hash-collision concern is addressed in aqua_registry_files_code. The duplicate early-return for the baked-registry path is an intentional micro-optimization that avoids redundant FS stat calls, not a bug. No data-correctness or runtime-safety issues were found. No files require special attention. Important Files Changed
Reviews (25): Last reviewed commit: "Merge branch 'main' into perf/aqua-compi..." | Re-trigger Greptile |
There was a problem hiding this comment.
Code Review
This pull request migrates the baked registry storage from YAML strings to Zlib-compressed MessagePack blobs to optimize binary size. It introduces flate2 and rmp-serde dependencies, updates the build script to generate individual compressed files using a hashing scheme, and adds a benchmark to compare registry formats. Review feedback suggests implementing collision detection for the generated filenames, using maximum compression for build-time artifacts, and enhancing error reporting during the decompression of baked data.
c03740a to
2db074f
Compare
bb68be2 to
80714d0
Compare
This comment was marked as outdated.
This comment was marked as outdated.
|
what about rkyv? |
This comment was marked as outdated.
This comment was marked as outdated.
|
I haven't heard of it, but it looks really fast! The binary size increased than raw YAML but I think it's totally fine. |
This comment was marked as outdated.
This comment was marked as outdated.
0433d6a to
97b95d0
Compare
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
This comment was marked as outdated.
This comment was marked as outdated.
### 🚀 Features - **(cli)** add minimum release age flag to lock and ls-remote by @risu729 in [#9269](#9269) - **(config)** add run field for hooks by @risu729 in [#9718](#9718) - **(github)** add native oauth token source by @jdx in [#9654](#9654) - **(oci)** scope build to project config by default by @jdx in [#9766](#9766) - add support for prefixed latest version queries in outdated checks by @roele in [#9767](#9767) ### 🐛 Bug Fixes - **(activate)** guard bash chpwd hook under nounset by @risu729 in [#9716](#9716) - **(backend)** date-check latest stable fast path by @risu729 in [#9650](#9650) - **(config)** parse core tool options consistently by @risu729 in [#9742](#9742) - **(exec)** propagate __MISE_DIFF so nested mise recovers pristine PATH by @jdx in [#9765](#9765) - **(forgejo)** include prereleases when opted in by @risu729 in [#9717](#9717) - **(github)** avoid caching empty release assets by @risu729 in [#9616](#9616) - **(java)** resolve lockfile URLs from metadata by @risu729 in [#9719](#9719) - **(lock)** cache unavailable github attestations by @risu729 in [#9741](#9741) - **(pipx)** preserve options when reinstalling tools by @risu729 in [#9663](#9663) - **(python)** skip redundant lockfile provenance verification by @risu729 in [#9739](#9739) - **(vfox)** run pre_uninstall hook by @risu729 in [#9662](#9662) ### 🚜 Refactor - **(schema)** extract tool options definition by @risu729 in [#9649](#9649) ### ⚡ Performance - **(aqua)** bake rkyv aqua package blobs by @risu729 in [#9535](#9535) ### 📦️ Dependency Updates - lock file maintenance by @renovate[bot] in [#9773](#9773) ### 📦 Registry - add vector ([github:vectordotdev/vector](https://github.com/vectordotdev/vector)) by @kquinsland in [#9761](#9761) - add oc and openshift-install (http backend) by @konono in [#9669](#9669) ### New Contributors - @konono made their first contribution in [#9669](#9669) - @kquinsland made their first contribution in [#9761](#9761)
## Summary - Fix `xtasks/release-plz` so `aqua-registry` is rewritten in both `[dependencies]` and `[build-dependencies]` after the subcrate is published. - Avoid Cargo's mixed-source manifest error during release publishing by removing both local path entries before re-adding the published crates.io version. ## Root Cause The release-plz job for `chore: release 2026.5.6 (#9764)` failed at https://github.com/jdx/mise/actions/runs/25666525627/job/75340319171 after publishing `aqua-registry v2026.5.5`. The failing step was: ```text + cargo add aqua-registry@2026.5.5 error: failed to parse manifest at `/home/runner/work/mise/mise/Cargo.toml` Caused by: Dependency 'aqua-registry' has different source paths depending on the build target. Each dependency must have a single canonical source path irrespective of build target. ``` PR #9535 introduced the new baked aqua registry pipeline. As part of that change, the root `mise` crate now depends on `aqua-registry` at runtime and also in `build.rs`, so `Cargo.toml` has `aqua-registry = { path = "crates/aqua-registry" }` in both `[dependencies]` and `[build-dependencies]`. The release script already handled publishing the subcrate, but it only ran `cargo add aqua-registry@$AQUA_REGISTRY_VERSION` for the normal dependency. That made the normal dependency point to crates.io while the build dependency still pointed at the local workspace path, and Cargo rejects that mixed source state before the release can continue. ## Test Plan - `bash -n xtasks/release-plz` - `mise run lint-fix` - Manually verified the dependency rewrite sequence: `cargo rm aqua-registry`, `cargo rm --build aqua-registry`, `cargo add aqua-registry@2026.5.5`, `cargo add --build aqua-registry@2026.5.5`, then restored the manifest. *This PR body was generated by an AI coding assistant.* Made with [Cursor](https://cursor.com)
## Summary - `cargo vendor` deletes the target directory's contents by default, which was wiping out the git-tracked `vendor/aqua-registry/` directory (containing `registry.yml`, `metadata.json`, `LICENSE`). - `build.rs` reads `vendor/aqua-registry/registry.yml` at compile time (added in [#9535](#9535)), so the source tarball uploaded to Launchpad was missing this file and the build failed with `No such file or directory` at `build.rs:357`. - Adding `--no-delete` to `cargo vendor` preserves the existing directory while still vendoring Rust crates alongside it. Observed failure: [Launchpad build log for `mise_2026.5.6~resolute1`](https://launchpadlibrarian.net/860699557/buildlog_ubuntu-resolute-amd64.mise_2026.5.6~resolute1_BUILDING.txt.gz). ## Test plan - [ ] Re-run the `ppa-publish` workflow manually and confirm `vendor/aqua-registry/registry.yml` is present in the uploaded source tarball. - [ ] Confirm the resulting Launchpad build succeeds. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Low risk CI-only change that alters `cargo vendor` behavior during packaging; main risk is unexpected differences in the vendored directory contents affecting reproducibility. > > **Overview** > Updates the `ppa-publish` GitHub Actions workflow to run `cargo vendor` with `--no-delete`, preventing it from wiping the git-tracked `vendor/aqua-registry/` directory needed at build time. > > Adds brief inline comments documenting why the flag is required; all other packaging steps remain unchanged. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit e47e7ad. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Motivation
pkgs/<owner>/<repo>/registry.yamlfrom a git-cloned registry repository, but cloning the entire repository is heavier than necessary for registry lookup.hook-env.Summary
vendor/aqua-registry/, with the bundled registry stored asregistry.ymlalongside its upstreamLICENSEand generatedmetadata.json.AquaPackagevalues from the existingaqua-registrycrate, then serializing each typed package as rkyv blobs.aqua-registryso build-time baking and runtime lookup use the same codec. The implementation enables rkyv's unaligned mode soinclude_bytes!package blobs can decode without copying.rkyv::with::Skipso archived data stays focused on parsed package data.RegistryFetcheraroundfetch_package()only; registry YAML parsing is now an implementation detail of fetchers that still read cloned per-package files.AquaPackageType::GoBuildafter typed build-time parsing exposedgo_buildentries in the bundled registry, and reject it likego_installin the aqua backend.Result<()>.Aqua Registry Compatibility
deny_unknown_fields, so unknown or unsupported fields in merged aqua registry YAML are silently ignored.aliases, and it is used only to generate the baked alias index. Alias metadata is not archived into per-package rkyv blobs, and nestedaliasesunderoverridesorversion_overridesremain ignored.aqua.baked_registryis enabled. Settingaqua.baked_registry = falsedisables that fallback and returns a registry-unavailable error instead.Registry format benchmark
Command used while evaluating the registry format:
The benchmark measured aqua-registry package work only. It intentionally did not time a full mise command because command runtime is dominated by config loading, cache state, shell setup, and API calls.
The decode columns are the relevant runtime cost for the baked-in registry: the binary already contains compiled blobs, so lookup only decodes the requested package into
AquaPackage.The convert column models future runtime compilation for custom registries after the YAML source has been parsed into
AquaPackage. It isAquaPackage -> cached formatfor every format, so YAML is not zero: YAML conversion means serializing the parsedAquaPackageback to YAML. JSON conversion measures JSON encoding, MessagePack conversion measures MessagePack encoding, rkyv conversion measures rkyv encoding, and MessagePack+zlib conversion measures MessagePack encoding plus zlib compression.Representative ns/op from this branch after adding rkyv to the restored benchmark:
Fixture total payload sizes for the table fixtures:
Notes:
pkg/install-registry/json.go.Follow-up: custom registries
The intended follow-up is to use the same compiled package lookup shape for custom single-file registries with rkyv package blobs on disk:
AquaPackagelookup layer.This keeps custom registries consistent with baked lookup without parsing the whole merged YAML per package request.
Tests
cargo fmt --all -- --checkcargo test -p mise aqua::aqua_registry_wrappercargo test -p aqua-registrycargo test -p mise aqua::standard_registrycargo check -p mise --all-featuresThis PR body was generated by an AI coding assistant.