Skip to content

feat(prepare): add blake3 content-hash freshness checking#8404

Merged
jdx merged 5 commits into
mainfrom
feat/prepare-content-hash
Mar 2, 2026
Merged

feat(prepare): add blake3 content-hash freshness checking#8404
jdx merged 5 commits into
mainfrom
feat/prepare-content-hash

Conversation

@jdx

@jdx jdx commented Mar 2, 2026

Copy link
Copy Markdown
Owner

Summary

  • Replace mtime-based freshness detection with blake3 content hashing for prepare providers
  • Source file hashes are persisted to .mise/prepare-state.toml after each successful run
  • Add FreshnessResult enum with detailed stale reasons (e.g., "package.json changed", "outputs missing")
  • Dry-run output now shows why providers are stale

This is more reliable than mtime across CI, VCS operations, and clock skew scenarios. On first run (no stored hashes), providers are always considered stale. After a successful prepare run, the current source hashes are saved so subsequent checks compare content rather than timestamps.

Test plan

  • mise run build compiles
  • mise run test:e2e test_prepare passes — updated tests verify content-hash freshness
  • mise run lint passes

🤖 Generated with Claude Code


Note

Medium Risk
Changes core mise prepare freshness logic from mtime comparison to persisted content hashes, which can alter when providers run and introduces a new on-disk state file under .mise/. Risk is moderated by added e2e coverage, but regressions could cause unnecessary reruns or missed stale detection in edge cases (directories, missing state, corrupt state).

Overview
mise prepare freshness detection is switched from mtime-based checks to blake3 content-hash comparison persisted in a new .mise/prepare-state.toml file.

The engine now saves source hashes after successful runs and uses them on subsequent runs to decide freshness, emitting more specific stale reasons (e.g., outputs missing, requirements.txt changed/added/removed, no previous state) that also surface in --dry-run/--explain output. E2E test_prepare is updated to assert the new hashing-driven behavior and cleans up .mise between cases.

Written by Cursor Bugbot for commit 8174229. This will update automatically on new commits. Configure here.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability of mise prepare by transitioning its freshness detection mechanism from modification times (mtime) to BLAKE3 content hashing. This change addresses issues related to clock skew, VCS operations, and CI environments, providing a more robust way to determine if a prepare step needs to be re-run. The new system persists content hashes, offers detailed reasons for staleness, and improves the clarity of dry-run outputs.

Highlights

  • Content-Hash Freshness: Replaced mtime-based freshness detection with BLAKE3 content hashing for mise prepare providers, ensuring more reliable checks across different environments.
  • Persistent State: Source file hashes are now persisted to .mise/prepare-state.toml after each successful prepare run, allowing subsequent checks to compare content rather than timestamps.
  • Detailed Stale Reasons: Introduced a FreshnessResult enum that provides detailed reasons for a provider being stale (e.g., "package.json changed", "outputs missing").
  • Enhanced Dry-Run Output: The mise prepare --dry-run command now explicitly shows why providers are considered stale, improving debugging and user understanding.
Changelog
  • e2e/cli/test_prepare
    • Updated existing tests to validate the new content-hash based freshness detection.
    • Removed mtime-based freshness test cases.
    • Added assertions to verify that dry-run output includes specific reasons for staleness.
    • Included cleanup for the new .mise state directory in test teardowns.
  • src/cli/prepare.rs
    • Modified the WouldRun variant of PrepareStepResult to include a reason string.
    • Updated the dry-run output message to display the detailed reason for a prepare step being stale.
  • src/prepare/engine.rs
    • Removed the SystemTime import and related mtime-based utility functions (last_modified, newest_file_in_dir).
    • Refactored the check_freshness method to utilize BLAKE3 content hashing and the new FreshnessResult enum.
    • Integrated the PrepareState mechanism to load and save content hashes after successful prepare runs.
    • Updated the PrepareJob struct to include sources and project_root for hash calculation and state management.
    • Modified the run method to persist content hashes for providers that complete successfully.
  • src/prepare/mod.rs
    • Introduced the FreshnessResult enum, defining various states of freshness (Fresh, NoOutputs, OutputsMissing, Stale, NoSources, Forced).
    • Added methods to FreshnessResult for checking freshness status and retrieving human-readable reasons.
    • Integrated the new state module for persistent freshness state management.
  • src/prepare/state.rs
    • Added a new module to manage the persistent state for prepare freshness checking.
    • Defined the PrepareState struct to store BLAKE3 content hashes of source files, keyed by provider ID.
    • Implemented load and save methods for PrepareState to interact with the .mise/prepare-state.toml file.
    • Provided get_hashes and set_hashes methods for managing stored hashes.
    • Included utility functions hash_sources and hash_dir_files to compute BLAKE3 hashes for individual files and recursively within directories.
Activity
  • The pull request author, jdx, has provided a clear summary of the changes and their benefits.
  • A comprehensive test plan was executed, including mise run build, mise run test:e2e test_prepare, and mise run lint, all passing successfully.
  • The e2e tests for test_prepare were specifically updated to verify the new content-hash freshness logic.
  • The pull request was generated with the assistance of Claude Code.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and well-executed feature enhancement, replacing the mtime-based freshness detection with a more reliable content-hashing mechanism using blake3. The changes are well-structured, with the new logic encapsulated in src/prepare/state.rs and integrated into src/prepare/engine.rs. The addition of FreshnessResult and the improved dry-run output are great for user experience. The suggestions to improve efficiency and error handling are still valid.

Comment thread src/prepare/engine.rs Outdated
Comment on lines 387 to 400
for (step_result, outputs, sources, project_root, provider_id) in run_results {
for output in &outputs {
super::clear_output_stale(output);
}
// Save content hashes for this provider after successful run
if let Ok(hashes) = state::hash_sources(&sources, &project_root) {
let mut st = PrepareState::load(&project_root);
st.set_hashes(&provider_id, hashes);
if let Err(e) = st.save(&project_root) {
warn!("failed to save prepare state: {e}");
}
}
results.push(step_result);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation loads and saves the prepare-state.toml file inside a loop for each provider. If multiple providers share the same project root, this can lead to redundant file I/O.

To improve efficiency, you could aggregate the state changes in memory and write to each state file only once after all providers have been processed.

Here's a suggested implementation to replace lines 387-400:

let mut states_to_save: std::collections::BTreeMap<std::path::PathBuf, PrepareState> =
    std::collections::BTreeMap::new();
for (step_result, outputs, sources, project_root, provider_id) in run_results {
    for output in &outputs {
        super::clear_output_stale(output);
    }
    // Save content hashes for this provider after successful run
    if let Ok(hashes) = state::hash_sources(&sources, &project_root) {
        states_to_save
            .entry(project_root.clone())
            .or_insert_with_key(|k| PrepareState::load(k))
            .set_hashes(&provider_id, hashes);
    }
    results.push(step_result);
}
for (project_root, st) in states_to_save {
    if let Err(e) = st.save(&project_root) {
        warn!("failed to save prepare state for {:?}: {e}", project_root);
    }
}
let mut states_to_save: std::collections::BTreeMap<std::path::PathBuf, PrepareState> =
    std::collections::BTreeMap::new();
for (step_result, outputs, sources, project_root, provider_id) in run_results {
    for output in &outputs {
        super::clear_output_stale(output);
    }
    // Save content hashes for this provider after successful run
    if let Ok(hashes) = state::hash_sources(&sources, &project_root) {
        states_to_save
            .entry(project_root.clone())
            .or_insert_with_key(|k| PrepareState::load(k))
            .set_hashes(&provider_id, hashes);
    }
    results.push(step_result);
}
for (project_root, st) in states_to_save {
    if let Err(e) = st.save(&project_root) {
        warn!("failed to save prepare state for {:?}: {e}", project_root);
    }
}

Comment thread src/prepare/state.rs
Comment on lines +27 to +30
match file::read_to_string(&path) {
Ok(contents) => toml::from_str(&contents).unwrap_or_default(),
Err(_) => Self::default(),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The load function currently swallows file read and parsing errors silently, returning a default state. While this is safe, it can hide underlying issues like file permission errors or a corrupted state file.

It would be more informative to log a warning when these errors occur. This will help users diagnose why their prepare steps might be re-running unexpectedly.

Note: You'll need to add use log::warn; at the top of the file for this suggestion to work.

Suggested change
match file::read_to_string(&path) {
Ok(contents) => toml::from_str(&contents).unwrap_or_default(),
Err(_) => Self::default(),
}
match file::read_to_string(&path) {
Ok(contents) => match toml::from_str(&contents) {
Ok(state) => state,
Err(e) => {
warn!("Failed to parse prepare state file, assuming stale: {e}");
Self::default()
}
},
Err(e) => {
warn!("Failed to read prepare state file, assuming stale: {e}");
Self::default()
}
}

@greptile-apps

greptile-apps Bot commented Mar 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Replaces mtime-based freshness detection with blake3 content hashing for prepare providers, significantly improving reliability across CI, VCS operations, and clock skew scenarios. The implementation introduces a new PrepareState that persists source file hashes to .mise/prepare-state.toml, and adds a FreshnessResult enum that provides detailed staleness reasons (e.g., "package.json changed", "outputs missing"). The dry-run output now shows exactly why providers are stale.

Key Changes:

  • Created src/prepare/state.rs with PrepareState struct and blake3 hashing utilities
  • Refactored check_freshness() to use content hashing instead of mtime comparison
  • State is persisted after successful prepare runs and loaded on subsequent checks
  • First runs (no stored state) are always considered stale, subsequent runs compare content hashes
  • Directory sources are hashed recursively up to 3 levels deep for performance
  • Tests updated to verify content-hash-based freshness instead of mtime manipulation

Confidence Score: 5/5

  • This PR is safe to merge - it's a well-implemented enhancement to freshness detection
  • Clean implementation with comprehensive test coverage, proper error handling, and no breaking changes to existing functionality
  • No files require special attention

Important Files Changed

Filename Overview
src/prepare/state.rs New file implementing blake3 content hashing for prepare freshness checking with persistent state management
src/prepare/engine.rs Refactored freshness checking from mtime-based to content-hash-based, added detailed staleness reasons
src/prepare/mod.rs Added FreshnessResult enum with detailed staleness reasons and human-readable messages

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Prepare Command] --> B{Outputs exist?}
    B -->|No| C[Stale: outputs missing]
    B -->|Yes| D{Output created this session?}
    D -->|Yes| E[Stale: output created this session]
    D -->|No| F{Sources empty?}
    F -->|Yes| G[Fresh: no sources to check]
    F -->|No| H[Load PrepareState from .mise/prepare-state.toml]
    H --> I{Stored hashes exist?}
    I -->|No| J[Stale: no previous state]
    I -->|Yes| K[Compute blake3 hashes for sources]
    K --> L{Hash sources recursively}
    L --> M[Files: hash directly]
    L --> N[Directories: hash files up to 3 levels]
    M --> O{Hashes match?}
    N --> O
    O -->|All match| P[Fresh]
    O -->|File changed| Q[Stale: path changed]
    O -->|File added| R[Stale: path added]
    O -->|File removed| S[Stale: path removed]
    
    C --> T[Run prepare command]
    E --> T
    J --> T
    Q --> T
    R --> T
    S --> T
    
    T --> U{Success?}
    U -->|Yes| V[Compute new hashes]
    V --> W[Save to PrepareState]
    W --> X[Clear stale markers]
    U -->|No| Y[State not saved]
    
    P --> Z[Skip execution]
    G --> Z
Loading

Last reviewed commit: 4e021ee

Comment thread src/prepare/engine.rs
Comment thread src/prepare/engine.rs
@github-actions

github-actions Bot commented Mar 2, 2026

Copy link
Copy Markdown

Hyperfine Performance

mise x -- echo

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.2.24 x -- echo 28.9 ± 0.7 27.2 34.3 1.17 ± 0.05
mise x -- echo 24.7 ± 0.8 23.0 29.9 1.00
✅ Performance improvement for x -- echo is 17%

mise env

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.2.24 env 28.7 ± 0.7 26.8 31.5 1.18 ± 0.06
mise env 24.3 ± 1.1 22.7 41.6 1.00
✅ Performance improvement for env is 18%

mise hook-env

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.2.24 hook-env 29.9 ± 0.8 28.3 39.2 1.21 ± 0.04
mise hook-env 24.7 ± 0.5 23.1 26.3 1.00
✅ Performance improvement for hook-env is 21%

mise ls

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.2.24 ls 23.8 ± 1.1 22.2 37.8 1.06 ± 0.07
mise ls 22.5 ± 1.2 21.0 35.5 1.00

xtasks/test/perf

Command mise-2026.2.24 mise Variance
install (cached) 165ms 152ms +8%
ls (cached) 89ms 81ms +9%
bin-paths (cached) 97ms ✅ 86ms +12%
task-ls (cached) 854ms 824ms +3%

✅ Performance improvement: bin-paths cached is 12%

@jdx jdx force-pushed the feat/prepare-content-hash branch from 38f74c9 to b2b1a91 Compare March 2, 2026 02:42
@jdx

jdx commented Mar 2, 2026

Copy link
Copy Markdown
Owner Author

bugbot run

Comment thread e2e/cli/test_prepare
@jdx jdx force-pushed the feat/prepare-content-hash branch from 6844d77 to 9fee6b7 Compare March 2, 2026 03:04
Comment thread e2e/cli/test_prepare
@jdx jdx force-pushed the feat/prepare-content-hash branch 2 times, most recently from b973284 to b2ff36e Compare March 2, 2026 03:31
@jdx jdx enabled auto-merge (squash) March 2, 2026 03:40
jdx and others added 4 commits March 2, 2026 04:08
Replace mtime-based freshness detection with blake3 content hashing.
Source file hashes are persisted to .mise/prepare-state.toml after
each successful run. This is more reliable across CI, VCS operations,
and clock skew scenarios.

- Add FreshnessResult enum with detailed stale reasons
- Add state.rs for persistent hash storage
- Update dry-run output to show why providers are stale
- Update e2e tests for content-hash based freshness

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change check_staleness() to unwrap_or(FreshnessResult::Fresh) so
  errors are treated as fresh rather than stale, avoiding spurious
  warnings
- Add warn! logging in state.rs when parse or read of prepare-state.toml
  fails instead of silently returning defaults

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jdx jdx force-pushed the feat/prepare-content-hash branch from c076b4b to 49dec80 Compare March 2, 2026 04:08
The explain "fresh" test needs to run prepare first to establish
content-hash state, since freshness is no longer mtime-based.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread src/prepare/engine.rs
}
}
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content hashes not saved when any provider fails

Medium Severity

The hash-saving code at lines 397–412 only runs after both run_with_deps (line 379) and run_parallel (line 388) return Ok, because both use ? to propagate errors. If any provider fails, the error propagates and no content hashes are saved for any provider — including those that completed successfully. On the next run, all those successful providers will be considered stale ("no previous state") and will re-run unnecessarily. This is a regression from the old mtime approach, where touch_outputs was called per-provider inside the execution handler and persisted even on partial failure.

Additional Locations (1)

Fix in Cursor Fix in Web

@jdx jdx merged commit 05288e4 into main Mar 2, 2026
35 checks passed
@jdx jdx deleted the feat/prepare-content-hash branch March 2, 2026 04:25
jdx pushed a commit that referenced this pull request Mar 2, 2026
### 🚀 Features

- **(hooks)** add task references to hooks and watch_files by @jdx in
[#8400](#8400)
- **(prepare)** add git-submodule built-in provider by @jdx in
[#8407](#8407)
- **(prepare)** add human-readable stale reasons to prepare output by
@jdx in [#8408](#8408)
- **(prepare)** add dependency ordering to prepare steps by @jdx in
[#8401](#8401)
- **(prepare)** add --explain flag for provider diagnostics by @jdx in
[#8409](#8409)
- **(prepare)** add per-provider timeout support by @jdx in
[#8405](#8405)
- **(prepare)** add blake3 content-hash freshness checking by @jdx in
[#8404](#8404)
- **(tasks)** monorepo vars and per-task vars by @halms in
[#8248](#8248)

### 🐛 Bug Fixes

- **(aqua)** restore bin_paths disk cache with fresh_file invalidation
by @jdx in [#8398](#8398)
- **(idiomatic)** use generic parser for idiomatic files by @risu729 in
[#8171](#8171)
- **(install)** apply precompiled options to all platforms in lockfile
by @jdx in [#8396](#8396)
- **(install)** normalize "v" prefix when matching lockfile versions by
@jdx in [#8413](#8413)
- **(prepare)** improve git submodule parser and fix check_staleness
error handling by @jdx in [#8412](#8412)
- **(python)** respect precompiled settings in lock file generation by
@jdx in [#8399](#8399)
- **(python)** clarify uv_venv_auto docs + prevent uv shim recursion in
venv creation by @halms in
[#8402](#8402)
- **(task)** remove deprecated `# mise` task header syntax by @jdx in
[#8403](#8403)
- **(vfox)** avoid eager metadata loading during config file detection
by @jdx in [#8397](#8397)
- clarify GitHub attestations to be artifact ones by @scop in
[#8394](#8394)
- ignore comments in idiomatic version files by @iloveitaly in
[#7682](#7682)

### 🚜 Refactor

- unify archive detection by @risu729 in
[#8137](#8137)

### 📚 Documentation

- remove duplicated docs for npm.package_manager by @risu729 in
[#8414](#8414)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant