feat(prepare): add blake3 content-hash freshness checking#8404
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the reliability of Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a significant and well-executed feature enhancement, replacing the mtime-based freshness detection with a more reliable content-hashing mechanism using blake3. The changes are well-structured, with the new logic encapsulated in src/prepare/state.rs and integrated into src/prepare/engine.rs. The addition of FreshnessResult and the improved dry-run output are great for user experience. The suggestions to improve efficiency and error handling are still valid.
| for (step_result, outputs, sources, project_root, provider_id) in run_results { | ||
| for output in &outputs { | ||
| super::clear_output_stale(output); | ||
| } | ||
| // Save content hashes for this provider after successful run | ||
| if let Ok(hashes) = state::hash_sources(&sources, &project_root) { | ||
| let mut st = PrepareState::load(&project_root); | ||
| st.set_hashes(&provider_id, hashes); | ||
| if let Err(e) = st.save(&project_root) { | ||
| warn!("failed to save prepare state: {e}"); | ||
| } | ||
| } | ||
| results.push(step_result); | ||
| } |
There was a problem hiding this comment.
The current implementation loads and saves the prepare-state.toml file inside a loop for each provider. If multiple providers share the same project root, this can lead to redundant file I/O.
To improve efficiency, you could aggregate the state changes in memory and write to each state file only once after all providers have been processed.
Here's a suggested implementation to replace lines 387-400:
let mut states_to_save: std::collections::BTreeMap<std::path::PathBuf, PrepareState> =
std::collections::BTreeMap::new();
for (step_result, outputs, sources, project_root, provider_id) in run_results {
for output in &outputs {
super::clear_output_stale(output);
}
// Save content hashes for this provider after successful run
if let Ok(hashes) = state::hash_sources(&sources, &project_root) {
states_to_save
.entry(project_root.clone())
.or_insert_with_key(|k| PrepareState::load(k))
.set_hashes(&provider_id, hashes);
}
results.push(step_result);
}
for (project_root, st) in states_to_save {
if let Err(e) = st.save(&project_root) {
warn!("failed to save prepare state for {:?}: {e}", project_root);
}
}let mut states_to_save: std::collections::BTreeMap<std::path::PathBuf, PrepareState> =
std::collections::BTreeMap::new();
for (step_result, outputs, sources, project_root, provider_id) in run_results {
for output in &outputs {
super::clear_output_stale(output);
}
// Save content hashes for this provider after successful run
if let Ok(hashes) = state::hash_sources(&sources, &project_root) {
states_to_save
.entry(project_root.clone())
.or_insert_with_key(|k| PrepareState::load(k))
.set_hashes(&provider_id, hashes);
}
results.push(step_result);
}
for (project_root, st) in states_to_save {
if let Err(e) = st.save(&project_root) {
warn!("failed to save prepare state for {:?}: {e}", project_root);
}
}| match file::read_to_string(&path) { | ||
| Ok(contents) => toml::from_str(&contents).unwrap_or_default(), | ||
| Err(_) => Self::default(), | ||
| } |
There was a problem hiding this comment.
The load function currently swallows file read and parsing errors silently, returning a default state. While this is safe, it can hide underlying issues like file permission errors or a corrupted state file.
It would be more informative to log a warning when these errors occur. This will help users diagnose why their prepare steps might be re-running unexpectedly.
Note: You'll need to add use log::warn; at the top of the file for this suggestion to work.
| match file::read_to_string(&path) { | |
| Ok(contents) => toml::from_str(&contents).unwrap_or_default(), | |
| Err(_) => Self::default(), | |
| } | |
| match file::read_to_string(&path) { | |
| Ok(contents) => match toml::from_str(&contents) { | |
| Ok(state) => state, | |
| Err(e) => { | |
| warn!("Failed to parse prepare state file, assuming stale: {e}"); | |
| Self::default() | |
| } | |
| }, | |
| Err(e) => { | |
| warn!("Failed to read prepare state file, assuming stale: {e}"); | |
| Self::default() | |
| } | |
| } |
Greptile SummaryReplaces mtime-based freshness detection with blake3 content hashing for prepare providers, significantly improving reliability across CI, VCS operations, and clock skew scenarios. The implementation introduces a new Key Changes:
Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Prepare Command] --> B{Outputs exist?}
B -->|No| C[Stale: outputs missing]
B -->|Yes| D{Output created this session?}
D -->|Yes| E[Stale: output created this session]
D -->|No| F{Sources empty?}
F -->|Yes| G[Fresh: no sources to check]
F -->|No| H[Load PrepareState from .mise/prepare-state.toml]
H --> I{Stored hashes exist?}
I -->|No| J[Stale: no previous state]
I -->|Yes| K[Compute blake3 hashes for sources]
K --> L{Hash sources recursively}
L --> M[Files: hash directly]
L --> N[Directories: hash files up to 3 levels]
M --> O{Hashes match?}
N --> O
O -->|All match| P[Fresh]
O -->|File changed| Q[Stale: path changed]
O -->|File added| R[Stale: path added]
O -->|File removed| S[Stale: path removed]
C --> T[Run prepare command]
E --> T
J --> T
Q --> T
R --> T
S --> T
T --> U{Success?}
U -->|Yes| V[Compute new hashes]
V --> W[Save to PrepareState]
W --> X[Clear stale markers]
U -->|No| Y[State not saved]
P --> Z[Skip execution]
G --> Z
Last reviewed commit: 4e021ee |
Hyperfine Performance
|
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2026.2.24 x -- echo |
28.9 ± 0.7 | 27.2 | 34.3 | 1.17 ± 0.05 |
mise x -- echo |
24.7 ± 0.8 | 23.0 | 29.9 | 1.00 |
✅ Performance improvement for x -- echo is 17% |
mise env
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2026.2.24 env |
28.7 ± 0.7 | 26.8 | 31.5 | 1.18 ± 0.06 |
mise env |
24.3 ± 1.1 | 22.7 | 41.6 | 1.00 |
✅ Performance improvement for env is 18% |
mise hook-env
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2026.2.24 hook-env |
29.9 ± 0.8 | 28.3 | 39.2 | 1.21 ± 0.04 |
mise hook-env |
24.7 ± 0.5 | 23.1 | 26.3 | 1.00 |
✅ Performance improvement for hook-env is 21% |
mise ls
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2026.2.24 ls |
23.8 ± 1.1 | 22.2 | 37.8 | 1.06 ± 0.07 |
mise ls |
22.5 ± 1.2 | 21.0 | 35.5 | 1.00 |
xtasks/test/perf
| Command | mise-2026.2.24 | mise | Variance |
|---|---|---|---|
| install (cached) | 165ms | 152ms | +8% |
| ls (cached) | 89ms | 81ms | +9% |
| bin-paths (cached) | 97ms | ✅ 86ms | +12% |
| task-ls (cached) | 854ms | 824ms | +3% |
✅ Performance improvement: bin-paths cached is 12%
38f74c9 to
b2b1a91
Compare
|
bugbot run |
6844d77 to
9fee6b7
Compare
b973284 to
b2ff36e
Compare
Replace mtime-based freshness detection with blake3 content hashing. Source file hashes are persisted to .mise/prepare-state.toml after each successful run. This is more reliable across CI, VCS operations, and clock skew scenarios. - Add FreshnessResult enum with detailed stale reasons - Add state.rs for persistent hash storage - Update dry-run output to show why providers are stale - Update e2e tests for content-hash based freshness Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change check_staleness() to unwrap_or(FreshnessResult::Fresh) so errors are treated as fresh rather than stale, avoiding spurious warnings - Add warn! logging in state.rs when parse or read of prepare-state.toml fails instead of silently returning defaults Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
c076b4b to
49dec80
Compare
The explain "fresh" test needs to run prepare first to establish content-hash state, since freshness is no longer mtime-based. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Content hashes not saved when any provider fails
Medium Severity
The hash-saving code at lines 397–412 only runs after both run_with_deps (line 379) and run_parallel (line 388) return Ok, because both use ? to propagate errors. If any provider fails, the error propagates and no content hashes are saved for any provider — including those that completed successfully. On the next run, all those successful providers will be considered stale ("no previous state") and will re-run unnecessarily. This is a regression from the old mtime approach, where touch_outputs was called per-provider inside the execution handler and persisted even on partial failure.
Additional Locations (1)
### 🚀 Features - **(hooks)** add task references to hooks and watch_files by @jdx in [#8400](#8400) - **(prepare)** add git-submodule built-in provider by @jdx in [#8407](#8407) - **(prepare)** add human-readable stale reasons to prepare output by @jdx in [#8408](#8408) - **(prepare)** add dependency ordering to prepare steps by @jdx in [#8401](#8401) - **(prepare)** add --explain flag for provider diagnostics by @jdx in [#8409](#8409) - **(prepare)** add per-provider timeout support by @jdx in [#8405](#8405) - **(prepare)** add blake3 content-hash freshness checking by @jdx in [#8404](#8404) - **(tasks)** monorepo vars and per-task vars by @halms in [#8248](#8248) ### 🐛 Bug Fixes - **(aqua)** restore bin_paths disk cache with fresh_file invalidation by @jdx in [#8398](#8398) - **(idiomatic)** use generic parser for idiomatic files by @risu729 in [#8171](#8171) - **(install)** apply precompiled options to all platforms in lockfile by @jdx in [#8396](#8396) - **(install)** normalize "v" prefix when matching lockfile versions by @jdx in [#8413](#8413) - **(prepare)** improve git submodule parser and fix check_staleness error handling by @jdx in [#8412](#8412) - **(python)** respect precompiled settings in lock file generation by @jdx in [#8399](#8399) - **(python)** clarify uv_venv_auto docs + prevent uv shim recursion in venv creation by @halms in [#8402](#8402) - **(task)** remove deprecated `# mise` task header syntax by @jdx in [#8403](#8403) - **(vfox)** avoid eager metadata loading during config file detection by @jdx in [#8397](#8397) - clarify GitHub attestations to be artifact ones by @scop in [#8394](#8394) - ignore comments in idiomatic version files by @iloveitaly in [#7682](#7682) ### 🚜 Refactor - unify archive detection by @risu729 in [#8137](#8137) ### 📚 Documentation - remove duplicated docs for npm.package_manager by @risu729 in [#8414](#8414)


Summary
.mise/prepare-state.tomlafter each successful runFreshnessResultenum with detailed stale reasons (e.g., "package.json changed", "outputs missing")This is more reliable than mtime across CI, VCS operations, and clock skew scenarios. On first run (no stored hashes), providers are always considered stale. After a successful prepare run, the current source hashes are saved so subsequent checks compare content rather than timestamps.
Test plan
mise run buildcompilesmise run test:e2e test_preparepasses — updated tests verify content-hash freshnessmise run lintpasses🤖 Generated with Claude Code
Note
Medium Risk
Changes core
mise preparefreshness logic from mtime comparison to persisted content hashes, which can alter when providers run and introduces a new on-disk state file under.mise/. Risk is moderated by added e2e coverage, but regressions could cause unnecessary reruns or missed stale detection in edge cases (directories, missing state, corrupt state).Overview
mise preparefreshness detection is switched from mtime-based checks to blake3 content-hash comparison persisted in a new.mise/prepare-state.tomlfile.The engine now saves source hashes after successful runs and uses them on subsequent runs to decide freshness, emitting more specific stale reasons (e.g.,
outputs missing,requirements.txt changed/added/removed,no previous state) that also surface in--dry-run/--explainoutput. E2Etest_prepareis updated to assert the new hashing-driven behavior and cleans up.misebetween cases.Written by Cursor Bugbot for commit 8174229. This will update automatically on new commits. Configure here.