Skip to content

feat(aqua): compress baked registry blobs with zstd#9131

Closed
risu729 wants to merge 22 commits into
jdx:mainfrom
risu729:copilot/aqua-registry-zstd
Closed

feat(aqua): compress baked registry blobs with zstd#9131
risu729 wants to merge 22 commits into
jdx:mainfrom
risu729:copilot/aqua-registry-zstd

Conversation

@risu729

@risu729 risu729 commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Summary

  • compress each baked per-package registry YAML blob at build time with zstd
  • embed compressed blobs via include_bytes! and keep alias/canonical lookup behavior unchanged
  • lazily decompress selected blobs when fetching baked registry entries

Binary size (serious profile, all features)

  • before: 83,100,752 bytes
  • after: 81,233,024 bytes
  • delta: -1,867,728 bytes (-2.25%)

Notes

@risu729 risu729 closed this Apr 15, 2026
@risu729 risu729 deleted the copilot/aqua-registry-zstd branch April 15, 2026 20:17

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the aqua-registry integration by compressing registry data with zstd and bundling it into the binary, which replaces the previous method of including numerous individual YAML files. It introduces metadata tracking for the baked-in registry, adds support for package aliases, and updates the build and release scripts to use a more efficient download-based workflow. Feedback suggests refactoring logic in build.rs and registry.rs to use more idiomatic Rust patterns and iterators.

Comment on lines +112 to +128
fn package_registries(packages: &[Value]) -> Result<Vec<PackageRegistry>> {
let mut registries = Vec::new();
for package in packages {
let Some(id) = canonical_package_id(package) else {
continue;
};
let content = package_registry_yaml(package)?;
let compressed_content = compress_registry_yaml(&content)?;
let aliases = package_aliases(package);
registries.push(PackageRegistry {
id,
compressed_content,
aliases,
});
}
Ok(registries)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function can be rewritten using iterators to be more idiomatic and concise.

fn package_registries(packages: &[Value]) -> Result<Vec<PackageRegistry>> {
    packages
        .iter()
        .filter_map(|package| canonical_package_id(package).map(|id| (id, package)))
        .map(|(id, package)| {
            let content = package_registry_yaml(package)?;
            let compressed_content = compress_registry_yaml(&content)?;
            let aliases = package_aliases(package);
            Ok(PackageRegistry {
                id,
                compressed_content,
                aliases,
            })
        })
        .collect()
}

Comment on lines +71 to +80
fn baked_registry_file(package_id: &str) -> Option<&'static [u8]> {
if let Some(content) = AQUA_STANDARD_REGISTRY_FILES.get(package_id) {
return Some(*content);
}

AQUA_STANDARD_REGISTRY_ALIASES
.get(package_id)
.and_then(|canonical| AQUA_STANDARD_REGISTRY_FILES.get(*canonical))
.copied()
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function can be simplified by using or_else to create a more idiomatic and concise expression.

Suggested change
fn baked_registry_file(package_id: &str) -> Option<&'static [u8]> {
if let Some(content) = AQUA_STANDARD_REGISTRY_FILES.get(package_id) {
return Some(*content);
}
AQUA_STANDARD_REGISTRY_ALIASES
.get(package_id)
.and_then(|canonical| AQUA_STANDARD_REGISTRY_FILES.get(*canonical))
.copied()
}
fn baked_registry_file(package_id: &str) -> Option<&'static [u8]> {
AQUA_STANDARD_REGISTRY_FILES
.get(package_id)
.copied()
.or_else(|| {
AQUA_STANDARD_REGISTRY_ALIASES
.get(package_id)
.and_then(|canonical| AQUA_STANDARD_REGISTRY_FILES.get(*canonical))
.copied()
})
}

@greptile-apps

greptile-apps Bot commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR compresses per-package aqua registry YAML blobs with zstd at build time (via the build.rs build script) and embeds them via include_bytes!, then lazily decompresses them on first use. The result is a ~1.87 MB (2.25%) binary size reduction with no change to the public lookup behavior. The implementation is clean and correct.

Confidence Score: 5/5

Safe to merge — implementation is correct, well-tested, and only remaining finding is a P2 style issue.

The compression/decompression logic is correct, types are properly updated throughout, the lazy decompression is gated by the existing AquaPackage cache, and tests cover canonical lookup, path-only lookup, alias resolution, and metadata. The only finding is a cosmetic grouping issue in Cargo.toml.

crates/aqua-registry/Cargo.toml — minor: zstd placed under # Logging section

Important Files Changed

Filename Overview
crates/aqua-registry/build.rs Adds per-package zstd compression at build time; writes blobs to OUT_DIR and generates include_bytes! entries; deterministic ordering via sorted package IDs.
crates/aqua-registry/src/registry.rs Changes AQUA_STANDARD_REGISTRY_FILES value type from &'static str to &'static [u8]; adds zstd::stream::decode_all on baked registry lookup; tests updated to decompress before parsing.
crates/aqua-registry/Cargo.toml Adds zstd = "0.13" to both [dependencies] and [build-dependencies]; zstd is placed under the # Logging comment in [dependencies] which is misleading.
.github/workflows/vendored-file-warning.yml New workflow using pull_request_target to warn contributors modifying vendored aqua files; safe because permissions: {} is set and no PR code is checked out.
scripts/gen-aqua-changelog.sh New helper script to generate aqua-registry changelog sections by comparing release tags; correctly handles edge cases like missing tags and empty ranges.
src/cli/doctor/mod.rs Adds aqua registry metadata (repository, tag, tool count) to both the human-readable and JSON doctor output.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["registry.yaml\n(~3MB)"] -->|build.rs| B["Parse packages\n(serde_yaml)"]
    B --> C["Per-package YAML\n(String)"]
    C -->|"zstd::encode_all\n(DEFAULT_COMPRESSION_LEVEL)"| D["Compressed blob\n(.zst, Vec<u8>)"]
    D -->|"fs::write"| E["OUT_DIR/aqua_standard_registry_blobs/{idx}.zst"]
    E -->|"include_bytes! (compile-time)"| F["Binary static bytes\n(&'static [u8])"]
    F -->|"LazyLock HashMap"| G["AQUA_STANDARD_REGISTRY_FILES\nHashMap<&str, &[u8]>"]

    H["fetch_registry(pkg_id)"] --> I{local .git?}
    I -->|Yes| J["Read pkgs/.../registry.yaml\nfrom disk"]
    I -->|No| K{use_baked_registry?}
    K -->|Yes| L["baked_registry_file(pkg_id)\nor via alias map"]
    L -->|"zstd::decode_all"| M["Decompressed YAML bytes"]
    M -->|"serde_yaml::from_slice"| N["RegistryYaml → AquaPackage\n(cached in Mutex<HashMap>)"]
    K -->|No| O["RegistryNotAvailable error"]
Loading

Reviews (1): Last reviewed commit: "feat(aqua): compress baked registry blob..." | Re-trigger Greptile

Comment on lines +50 to 51
zstd = "0.13"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 zstd grouped under # Logging comment

zstd is placed directly after log = "0.4" under the # Logging section header, which will confuse future readers looking for compression-related dependencies.

Suggested change
zstd = "0.13"
# Logging
log = "0.4"
# Compression
zstd = "0.13"

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@risu729

risu729 commented Apr 15, 2026

Copy link
Copy Markdown
Contributor Author

This compresses the size of binary 1,867,728 bytes; considering the original registry file is 3,003,656 bytes, this is good. However I think we don't need this as the binary is already 80MB and zstd compression costs in runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant