ci: stop a partial mmproj cache from poisoning Mac Studio GGUF CI#5459
Conversation
The "JSON, images" Mac Studio GGUF CI job hit a stale cache for
${{ runner.os }}-gguf-...-mmproj-F16.gguf-v1 that contains only the
main GGUF, not the mmproj sibling. cache-hit==true so the download
step was skipped, then the post-load \`ls\` failed:
ls: ...gguf-cache/mmproj-F16.gguf: No such file or directory
Three guards layered:
1) Bump cache key v1 -> v2 to invalidate the poisoned entry on the
GitHub-hosted side.
2) New verify-cache step explicitly checks BOTH files are present
before trusting cache-hit. If not, fall through to download.
3) Save step gains a hashFiles() check on the mmproj path so a
partial mmproj download cannot land back in the cache.
Behaviour on a clean run is unchanged; cache hit + verify ok skips
the re-download, partial-hit triggers fresh download, success
saves a complete archive.
|
Note Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 319dcb3103
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # poisoning the cache for the next run. | ||
| - name: Save GGUF + mmproj files | ||
| if: always() && steps.download-gguf.outcome != 'skipped' && hashFiles('gguf-cache/**/*.gguf') != '' | ||
| if: always() && steps.download-gguf.outcome != 'skipped' && hashFiles('gguf-cache/**/*.gguf') != '' && hashFiles(format('gguf-cache/{0}', env.MMPROJ_FILE)) != '' |
There was a problem hiding this comment.
Require the model GGUF before saving the cache
In the JSON/images workflow, a run where mmproj-F16.gguf downloads successfully but the main GGUF download fails or is cancelled still satisfies this if: the mmproj file itself matches hashFiles('gguf-cache/**/*.gguf'), and the added check only proves the mmproj exists. That allows actions/cache/save to publish a v2 cache with only the mmproj, which the new verify step will reject on every later run and force a full re-download instead of ever getting a usable cache; require GGUF_FILE explicitly here as well.
Useful? React with 👍 / 👎.
…othai#5475) The `JSON, images` job in `studio-mac-inference-smoke.yml` (Job 3 of Mac Studio GGUF CI) downloads ~4 GB on a cache miss: 3 GB gemma-4-E2B-it-UD-Q4_K_XL.gguf + ~1 GB mmproj-F16.gguf. The 30 min cap was tight even with `HF_HUB_ENABLE_HF_TRANSFER=1` and parallel downloads, and timed out the cache-miss run on PR unslothai#5430 mid-download (run 25950714888) before Studio install or the smoke assertions ran. Once the actions/cache restore hits, the job comes in under 10 min, so 45 min only costs runner time on the first run after a cache key bump (v1->v2 was just bumped in unslothai#5459, which is what produced this failure). Jobs 1 (openai-anthropic, 270M model) and 2 (tool-calling, ~1.5 GB model) are not bumped -- their 25 min cap has been comfortable.
Summary
Test plan