QVAC-18047: prevent artifact poisoning in integration + publish workflows#2317
Conversation
Mobile integration tests — @qvac/classification-ggml (Android)Result: passed
|
Mobile integration tests — @qvac/classification-ggml (iOS)Result: passed
|
CI triage — first full
|
| Addon | Windows integration job | Result |
|---|---|---|
| OCR | run-integration-tests / test-win32-x64 |
✅ pass (13m15s) |
| LLM | run-integration-tests / test-win32-x64 |
✅ pass (12m57s) |
| Classification-ggml | run-integration-tests / win32-x64-integration-tests |
✅ pass (1m37s) |
| NMTCPP | run-integration-tests / win32-x64-integration-tests |
✅ pass (6m30s) |
| Whispercpp | blocked before integration | ⏳ darwin-x64 prebuild flaked (see below) |
| TTS-ONNX | blocked before integration | ⏳ sanity-checks pre-existing yamlfmt (see below) |
CodeQL is also green — the 8 actions/artifact-poisoning/critical alerts referenced in QVAC-18612 are closed.
Unrelated failures (none caused by this PR)
merge-guard / validate-pr× 6 stale — these are runs that fired beforeverifiedwas applied. They explicitly say "This PR needs the 'verified' label to be merged". Newer merge-guard runs will supersede them.sanity-checks(ONNX) —Verify that yaml files are formattedstep — yamlfmt flags pre-existing trailing-whitespace in files this PR does not touch (.github/actions/cpp-lint/action.yaml,detect-version-bump/action.yml,integration-test-ocr-ggml.yml,integration-test-tts-ggml.yml,integration-test-vla.yml, …) plus a few blank lines inintegration-test-{transcription-whispercpp,translation-nmtcpp,tts-onnx}.ymlthat are also unrelated to this diff. I deliberately did NOT include a repo-wide yamlfmt sweep here; it belongs in its own PR.prebuild / prebuild / darwin-x64(Whispercpp),cpp-tests-coverage / linux-x64-cpp-tests(Whispercpp),cpp-tests / cpp-tests(NMTCPP) — all three are the same vcpkg cache race on self-hosted runners:Worth afatal: Unable to create '…/vcpkg/downloads/git-tmp/.git/shallow.lock': File exists. Another git process seems to be running in this repositorygh run rerun --failedto clear, then Whispercpp's win32-x64 integration leg will run too.run-integration-tests / test-darwin-arm64(LLM) — real LLM test failure ([TextLlm] failed to decode next tokenin finetuning pause/resume + session-cache tests). Pre-existing/flaky on darwin-arm64, unrelated to artifact staging.
Suggested reviewer actions
- Decide whether to
gh run rerun --failedon the three vcpkg-race jobs to unblock Whispercpp's Windows integration leg as additional confidence (already confirmed on 4 other addons). - Pre-existing yamlfmt issues and the LLM darwin-arm64 finetuning failure are out of scope for this PR.
Scope reminder
This PR only addresses the 8 actions/artifact-poisoning/critical alerts. The 7 actions/untrusted-checkout/critical alerts on .github/actions/run-lint-and-unit-tests/action.yaml and .github/workflows/on-pr-ocr-onnx.yml are deferred to a follow-up PR (as discussed in the QVAC-18612 thread).
Tier-based Approval Status |
Mobile integration tests — @qvac/ocr-onnx (iOS)Result: passed
|
Mobile integration tests — @qvac/translation-nmtcpp (iOS)Result: passed
|
Mobile integration tests — @qvac/ocr-onnx (Android)Result: passed
|
Mobile integration tests — @qvac/translation-nmtcpp (Android)Result: passed
|
Mobile integration tests — @qvac/llm-llamacpp (iOS)Result: passed
|
Mobile integration tests — @qvac/llm-llamacpp (Android)Result: failed
|
Preview deployments for qvac-docs-staging ⚡️
Commit: Deployment ID: Static site name: |
|
/review |
Summary
Re-land the artifact-poisoning hardening that was reverted in #1871 (revert of #1728), using the win32-safe split-step pattern already proven on the VLA addon in 779ee92.
Closes the 8 open critical
actions/artifact-poisoning/criticalCodeQL alerts across 7 workflow files. The remaining 7 open criticalactions/untrusted-checkout/criticalalerts (in.github/actions/run-lint-and-unit-tests/action.yamland.github/workflows/on-pr-ocr-onnx.yml) are deferred to a follow-up PR — their fix requires architectural changes to a shared composite action consumed by 7+ on-pr workflows and is intentionally out of scope here to keep this PR low-risk.What changed
For each affected
actions/download-artifactconsumer, the artifact is now downloaded into${{ runner.temp }}/<staging>-staginginstead of straight into the workspace, then moved into place by two mutually-exclusive steps gated onmatrix.platform:matrix.platform != 'win32'):bash+cp -r staging/* dst/, followed byls -la dst/for diagnostics.matrix.platform == 'win32'):powershell+Copy-Item -Recurse -Forcewith native Windows paths, followed byGet-ChildItem -Recurse $dst.This is the same pattern shipped in
integration-test-vla.ymllines 155-183 and proven on win32 since 779ee92.Files touched (8 alerts, 7 files)
.github/workflows/integration-test-classification-ggml.ymlpattern: classification-ggml-${{ matrix.platform }}-${{ matrix.arch }}*filter.github/workflows/integration-test-llm-llamacpp.yml.github/workflows/integration-test-ocr-onnx.yml${{ inputs.workdir || env.PKG_DIR }}to match existing path resolution.github/workflows/integration-test-transcription-whispercpp.yml.github/workflows/integration-test-translation-nmtcpp.yml.github/workflows/integration-test-tts-onnx.ymlrun-integration-tests(L187) andrun-supertonic-desktop-benchmarks(L414) jobs.github/workflows/publish-sdk.ymlWhy the previous attempt regressed
PR #1728 introduced the staging pattern but used a single bash
cp -r ... 2>/dev/null \|\| truestep for all platforms. On Windows runners${{ runner.temp }}substitutes toC:\a\_temp; Git Bash interprets\aand\_as escape sequences, the source path becomes invalid,cp -rreturns 1, and the trailing2>/dev/null \|\| truesilently swallows the error. The destination ends up empty andbare's addon resolver throws the misleadingADDON_NOT_FOUND— exactly what triggered #1871. Full root-cause writeup in 779ee92.Invariants vs #1728
2>/dev/null \|\| trueon the Unix branch — surface real errors instead of masking them.pattern:/name:/merge-multiple:/continue-on-error:attribute on the download step.permissions:blocks, or the env-var indirection fornpm-token/pat-tokeninrun-lint-and-unit-tests/action.yaml(those were already settled by Revert "fix: prevent code injection and untrusted checkout in CI workflows (#1728)" #1871 and remain onmain).Test plan
on-pr-classification-ggmlWindows (windows-2025) matrix leg succeeds;ls -la prebuilds/step shows the expectedqvac__*.barebinary.on-pr-llm-llamacppWindows matrix leg succeeds (same check).on-pr-ocr-onnxWindows matrix leg succeeds (same check).on-pr-transcription-whispercppWindows matrix leg succeeds (same check).on-pr-translation-nmtcppWindows matrix leg succeeds (same check).on-pr-tts-onnxWindows matrix leg succeeds for bothrun-integration-testsandrun-supertonic-desktop-benchmarks.publish-sdkUbuntupublish-gpr/publish-npmjobs succeed; the stageddist/contains the expected build output.\|\| truemasking).Out of scope (follow-up PRs)
actions/untrusted-checkout/criticalalerts in.github/actions/run-lint-and-unit-tests/action.yaml(lines 63, 67, 71, 75) and.github/workflows/on-pr-ocr-onnx.yml(lines 131, 148).Refs