Skip to content

refactor(file): rename TarFormat and split archive extraction#10241

Draft
risu729 wants to merge 20 commits into
jdx:mainfrom
risu729:fix/aqua-unarchive-format-helper
Draft

refactor(file): rename TarFormat and split archive extraction#10241
risu729 wants to merge 20 commits into
jdx:mainfrom
risu729:fix/aqua-unarchive-format-helper

Conversation

@risu729

@risu729 risu729 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

  • narrow file::untar to tar extraction only
  • rename TarFormat to ArchiveFormat; add extract_archive and ExtractOptions
  • keep decompress_file for single-file compression (gz/xz/zst/bz2)
  • stop canonicalizing aqua format aliases in aqua-registry; ArchiveFormat::from_ext handles aliases at extraction time
  • route aqua compressed assets through decompress_file to the registry bin path
  • remove implicit “unknown format → treat as gzip tar” fallback (see below)

Stacked on #10269 (chore/ignore-proc-macro-error2-advisory). Depends on #10224.

API changes

Before After
TarFormat ArchiveFormat
TarFormat::from_ext → fallback Raw ArchiveFormat::from_extOption<_>
unarchive(dest, ArchiveOptions { single_file_dest, ... }) extract_archive(dest, ExtractOptions) for archives; decompress_file(output_file, ...) for single compression
ArchiveOptions ExtractOptions (strip_components, pr, preserve_mtime only)

extract_archive handles tar archives, zip/vsix, and 7z (Windows only). It does not accept compressed single-file formats.

decompress_file always writes to an explicit output file path (parent dirs are created).

Removal of implicit tar.gz fallback (Raw)

This is the largest behavioral change in the PR.

Previously, unknown or unrecognized format strings silently became TarFormat::Raw, and Raw was treated as gzip-compressed tar in several places:

  1. TarFormat::from_ext — any unrecognized registry/option string parsed to Raw instead of failing.
  2. open_tarRaw (and Gz) opened with GzDecoder, assuming tar.gz payload.
  3. untar — also handled zip, 7z, and single-file gz/xz/zst/bz2 inline; a Raw format string could end up on the tar path and get gzip-decoded.
  4. src/backend/aqua.rs — built TarOptions::new(TarFormat::from_ext(format)) for every package, so unknown aqua formats became Raw tar opts. Additionally, format.starts_with("tar") forced untar for any tar.* string even when from_ext did not recognize it (e.g. tar.brRaw opts but still extracted via the starts_with("tar") branch).

This PR removes that silent defaulting in most paths. Formats must be recognized explicitly; unsupported ones fail instead of guessing tar.gz.

Affected code and behavior

Location Before After Preserved?
ArchiveFormat::from_ext (src/file.rs) parse().unwrap_or(Raw) parse().ok()None for unknown strings N/A — callers choose fallback
open_tar (src/file.rs) TarGz | Gz | RawGzDecoder TarGz | RawGzDecoder; Gz bails as non-tar Raw → gzip tar kept for legacy callers that still pass Raw into untar
untar (src/file.rs) zip / 7z / single-file compression / tar in one function tar family only; zip/7z/compression moved to extract_archive / decompress_file same outcome for supported formats, different entry points
src/backend/aqua.rs install TarOptions::new(from_ext(format)) + format.starts_with("tar") catch-all ArchiveFormat::from_ext(format) on pkg.format(…); explicit raw/dmg/pkg branches; only GithubArchive uses unwrap_or(TarGz) GithubArchive still defaults to tar.gz; other types fail when from_ext cannot parse the registry format
src/backend/aqua.rs SLSA fallback from_ext(pkg.format(…)) → unknown became Raw, then failed is_archive() from_ext(pkg.format(…)).ok_or(...) — unknown registry format errors immediately stricter; no silent Raw
src/backend/http.rs single-file compression via untar; archives via untar decompress_file / extract_archive same supported formats
src/backend/static_helpers.rs unknown format option → from_extRaw unknown format option → from_ext(...).unwrap_or(Raw) preserved — explicit opt-in legacy for github:/static installs
src/backend/github.rs SLSA fallback unknown format option → from_extRaw from_ext(...).unwrap_or(Raw) preserved for explicit format option only
src/plugins/core/python.rs untar with from_file_name extract_archive with from_file_name preserved — extensionless filenames still resolve to Raw → gzip tar via open_tar
src/plugins/core/java.rs zip special-cased via metadata file_type from_ext(file_type) then from_file_name fallback preserved — metadata file_type honoured again
src/plugins/core/{zig,erlang,go,node,ruby,swift}.rs untar with explicit TarGz (etc.) at call site unchanged pattern, renamed types preserved
src/cli/generate/tool_stub.rs unarchive extract_archive + from_file_name preserved for recognized archive extensions
crates/aqua-registry/src/types.rs canonicalized tgz/txz/tbz → long forms returns literal alias; from_ext parses aliases preserved via strum aliases on ArchiveFormat

Intentional behavior changes

  • Aqua (non-GithubArchive): when pkg.format(v, os, arch) returns a string that ArchiveFormat::from_ext cannot parse (and it is not handled by the explicit raw / dmg / pkg / GithubContent branches), install now bail!("unsupported format: …") instead of attempting gzip-tar extraction. That string comes from AquaPackage::format() — either the registry YAML format: field, or detect_format() on the resolved asset/url filename when format: is empty.
  • Aqua: removed format.starts_with("tar") catch-all; tar.br / tar.lz4 / rar etc. are recognized enum variants and fail via unimplemented!() instead of being misread as gzip tar.
  • Aqua SLSA content fallback: unknown registry format fails at parse time instead of flowing through Raw.
  • from_ext generally: no silent Raw unless a caller explicitly opts in with .unwrap_or(Raw).

Explicit defaults that remain

  • GithubArchive packages (src/backend/aqua.rs): ArchiveFormat::from_ext(format).unwrap_or(TarGz) — same as before when aqua registry omits format for GitHub archive installs.
  • Raw in open_tar: still gzip-decoded (marked with a TODO in code). Used when callers detect no extension via from_file_name (e.g. python builds).
  • format install option (static_helpers, github SLSA): unknown value still maps to Raw via .unwrap_or(Raw) — opt-in legacy path for explicit user configuration, not registry auto-detection.

Why format alias canonicalization is no longer needed

Previously, crates/aqua-registry/src/types.rs normalized short archive suffixes before passing a format string to mise:

  • .tgz / explicit format: tgztar.gz
  • .txz / txztar.xz
  • .tbz / .tbz2tar.bz2

That existed because extraction used string-based routing and needed one canonical spelling per archive family.

This PR uses ArchiveFormat::from_ext, which accepts aliases directly (tgz, tbz, txz, tzst, vsix, …). aqua-registry can return literal formats from filenames/registry and extraction still works. This matches aqua upstream (RemoveExtFromAsset returns tgz, not tar.gz).

dmg, pkg, and raw remain special-cased in src/backend/aqua.rs.

Aqua extraction routing

Format Helper Destination
tar / zip / 7z extract_archive install_path
gz / xz / zst / bz2 decompress_file first_bin_path from registry
raw / GithubContent copy first_bin_path
dmg / pkg un_dmg / un_pkg install_path

Caller format support

Caller Helper Supported formats
src/backend/aqua.rs extract_archive or decompress_file; raw/dmg/pkg branches archives + single compression; macOS installer formats stay special-cased
src/backend/http.rs decompress_file or extract_archive same split as before
src/backend/static_helpers.rs decompress_file or extract_archive same split as before
src/backend/spm.rs unzip zip only
src/cli/generate/tool_stub.rs extract_archive archive formats from filename
src/plugins/core/java.rs extract_archive file_type metadata, then filename
src/plugins/core/python.rs extract_archive filename; extensionless → Raw → gzip tar (legacy)
src/plugins/core/zig.rs extract_archive filename-detected archives
file::untar callers (erlang, go, node, ruby, swift) untar explicit tar formats at call sites

Tests

  • cargo test test_archive_format
  • cargo test test_decompress
  • cargo test test_extract_archive
  • cargo test test_untar_rejects_single_file_compression
  • cargo check

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ec73ccf7-e248-42b0-9ff8-bcb96c7366ca

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR refactors archive handling in mise by renaming TarFormat to ArchiveFormat, splitting the monolithic untar into extract_archive (for multi-file archives) and decompress_file (for single-file compression), and removing the silent "unknown format → treat as gzip-tar" fallback that previously hid unsupported formats.

  • ArchiveFormat gains new variants (TarBr, TarLz4, TarSz, Lz4, Sz, Rar) and all previously unimplemented!() arms are replaced with bail!(), so encountering these formats produces a graceful error instead of a process panic.
  • The aqua backend now routes compressed single-file assets through decompress_file to the registry-specified bin path, and non-GithubArchive packages with unrecognised formats now fail explicitly rather than silently attempting gzip-tar extraction.
  • TarOptions becomes ExtractOptions (dropping the format field), and the aqua-registry crate drops its tgz/txz/tbz2 canonicalization now that ArchiveFormat::from_ext handles those aliases via strum serialize attributes.

Confidence Score: 5/5

Safe to merge; all edge cases touched by the refactor are handled with proper bail! returns, and explicitly documented behavioral changes are intentional and clearly scoped.

The refactor is well-structured: every previously panicking unimplemented!() site is replaced with bail!(), the legacy Raw→gzip-tar fallback is preserved exactly where documented, and callers are systematically updated. No silent swallowed errors or incorrect routing was found across the changed paths.

src/cli/generate/tool_stub.rs has a minor redundant format detection call (pre-existing pattern made more visible by the refactor), but it is harmless.

Important Files Changed

Filename Overview
src/file.rs Core refactor: renames TarFormat→ArchiveFormat, splits untar into extract_archive/decompress_file, replaces unimplemented!() with bail!, adds TarBr/TarLz4/TarSz/Lz4/Sz/Rar variants
src/backend/aqua.rs Routes GithubArchive through extract_archive, compressed files through decompress_file, adds explicit bail! for unsupported formats instead of silent Raw fallback
src/backend/http.rs Replaces untar calls with decompress_file/extract_archive, cleans up FileInfo struct, preserves preserve_mtime:false for archive extraction
src/backend/static_helpers.rs Migrates to ArchiveFormat/ExtractOptions with .unwrap_or(Raw) preserved for explicit format opts; branch logic unchanged
src/cli/generate/tool_stub.rs Switches to extract_archive; introduces a redundant second call to ArchiveFormat::from_file_name for the same archive_path (pre-existing pattern, now more visible)
crates/aqua-registry/src/types.rs Removes tgz/txz/tbz2 canonicalization since ArchiveFormat::from_ext now handles aliases directly via strum serialize attributes
src/plugins/core/java.rs Unifies zip/tar handling: uses ArchiveFormat::from_ext on file_type metadata with from_file_name fallback, routes through extract_archive
src/backend/github.rs Adds .unwrap_or(Raw) for explicit format option (preserving legacy behavior); from_file_name fallback unchanged

Reviews (13): Last reviewed commit: "refactor(file): remove TarOptions in fav..." | Re-trigger Greptile

Comment thread src/file.rs Outdated
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

This PR currently has merge conflicts. If this continues for 7 days, it will be closed automatically.

This is warning day 1 of 7.

Please update the PR when you have a chance. Feel free to reopen or create a new PR if it is closed and you'd like to continue working on it.

This comment was generated by an automated workflow.

@risu729 risu729 force-pushed the fix/aqua-unarchive-format-helper branch from 262da3d to 9b660a1 Compare June 6, 2026 02:12
@risu729 risu729 changed the title refactor(file): split unarchive from untar refactor(file): split archive extraction API Jun 7, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

This PR currently has failing checks. If this continues for 7 days, it will be closed automatically.

This is warning day 1 of 7.

Please update the PR when you have a chance. Feel free to reopen or create a new PR if it is closed and you'd like to continue working on it.

This comment was generated by an automated workflow.

Comment thread src/file.rs Outdated
risu729 added 15 commits June 9, 2026 06:41
cargo deny fails on the new unmaintained advisory for proc-macro-error2,
a transitive dependency with no safe upgrade path yet.
TarFormat::from_ext handles tgz/tbz/txz aliases at extraction time.
Call TarFormat::from_ext and unarchive directly from the aqua backend.
- rename TarFormat to ArchiveFormat; from_ext returns Option
- replace unarchive/ArchiveOptions with extract_archive/ExtractOptions
- route aqua compressed assets through decompress_file directly
Move tar.br, lz4, sz, rar and aliases into ArchiveFormat and fail via
unimplemented! in open_tar and related extraction paths instead of a
separate aqua-specific format list.
Use ArchiveFormat::from_ext on JavaMetadata.file_type when present and
fall back to filename detection only when metadata omits the format.
@risu729 risu729 force-pushed the fix/aqua-unarchive-format-helper branch from 3c6a7fa to d9d5b2c Compare June 8, 2026 20:42
@risu729 risu729 changed the title refactor(file): split archive extraction API refactor(file): introduce ArchiveFormat and split archive extraction Jun 8, 2026
@risu729 risu729 changed the title refactor(file): introduce ArchiveFormat and split archive extraction refactor(file): rename TarFormat and split archive extraction Jun 8, 2026
risu729 added 3 commits June 9, 2026 07:14
Replace unimplemented! with bail! for tar.br, lz4, sz, and rar so
encountering these formats surfaces a CLI error instead of panicking.
Satisfy clippy::needless_update after all ExtractOptions fields were set
explicitly in extract_archive.
@github-actions

Copy link
Copy Markdown

This PR currently has merge conflicts. If this continues for 7 days, it will be closed automatically.

This is warning day 1 of 7.

Please update the PR when you have a chance. Feel free to reopen or create a new PR if it is closed and you'd like to continue working on it.

This comment was generated by an automated workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant