Skip to content
Merged
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
---
id: B-0063
priority: P1
status: open
title: Streamed-installer download-to-temp + checksum-verify pattern — replace pipe-to-shell for upstream installers (Codex P0 on PR #75)
tier: install-path-supply-chain
effort: M
ask: codex P0 review on PR #75 (5 threads on tools/setup/common/curl-fetch.sh, macos.sh, linux.sh, elan.sh) flagging that even bare `curl --retry` can retry after bytes are written to stdout, leaving the shell consumer with partial+full concatenated script content. PR #75 immediate fix: drop --retry from `curl_fetch_stream` entirely. This row tracks the structurally safe replacement.
created: 2026-04-28
last_updated: 2026-04-28
composes_with: [B-0060]
tags: [install-path, supply-chain, upstream-installers, codex-p0, pr-75, streaming-vs-buffered, checksum]
---

# Streamed-installer download-to-temp + checksum-verify pattern

PR #75 landed a `curl_fetch_stream` helper for upstream
installer URLs (`mise.run`, Homebrew install.sh, elan-init.sh)
that originally used bare `curl --retry`. Codex P0 review
correctly identified that **any retry on a streamed-to-shell
pipe is unsafe**: curl can retry after partial bytes have
already been piped to the consumer, and the consumer
(`sh`, `bash -c "$(...)"`) cannot un-receive what it has
read. The retry then concatenates with the partial,
producing corrupted script content that may re-execute
commands or run truncated halves.

The PR #75 immediate fix dropped `--retry` from the stream
variant entirely. Streamed installers now fail-fast on
transient errors; user re-runs `install.sh`. That removes
the unsafe retry behaviour but does **not** add the safety
margin that the file-output variant has via
`--retry-all-errors`.

This row tracks the structurally safe replacement.

## The proper structural fix

For each streamed-installer call site, replace the
`curl_fetch_stream URL | sh` pattern with:

```bash
# 1. Download to a temp file (curl_fetch is safe with retries
# because the file restarts from scratch on retry).
TEMP="$(mktemp)"
trap 'rm -f "$TEMP"' EXIT
curl_fetch -o "$TEMP" "$URL"

# 2. Verify size + (when upstream publishes one) checksum.
if [ ! -s "$TEMP" ]; then
echo "error: installer empty after download; refusing to exec" >&2
exit 1
fi
# (when upstream publishes a SHA256SUMS or .sig:)
EXPECTED_SHA="<pinned>"
# Cross-platform SHA-256: macOS ships `shasum -a 256` (Perl
# script in /usr/bin) but not `sha256sum`; Linux has both;
# `openssl dgst -sha256` works everywhere openssl is
# available. Detect-and-dispatch keeps the install path
# 4-shell-portable per Otto-235.
if command -v sha256sum >/dev/null 2>&1; then
ACTUAL_SHA="$(sha256sum "$TEMP" | awk '{print $1}')"
elif command -v shasum >/dev/null 2>&1; then
ACTUAL_SHA="$(shasum -a 256 "$TEMP" | awk '{print $1}')"
else
ACTUAL_SHA="$(openssl dgst -sha256 "$TEMP" | awk '{print $NF}')"
fi
if [ "$ACTUAL_SHA" != "$EXPECTED_SHA" ]; then
echo "error: installer checksum mismatch; refusing to exec" >&2
exit 1
fi

# 3. Exec the verified file.
bash "$TEMP" # or: bash "$TEMP" -- ...args
```

This pattern:

- Gets full retry coverage (file-output variant).
- Buffers the entire installer before exec.
- Allows checksum verification when upstream publishes one.
- Fail-fasts on empty / truncated / corrupted downloads.
- Composes with the existing `curl_fetch` helper without
introducing new flags.

## Per-call-site work

1. **`tools/setup/macos.sh` Homebrew install.** Currently:
`HOMEBREW_INSTALLER="$(curl_fetch_stream URL)"; bash -c
"$HOMEBREW_INSTALLER"`. Convert to download-to-temp +
exec. Homebrew does NOT publish a SHA256 of install.sh;
document the size-check-only stance + the upstream
project as the trust anchor.
2. **`tools/setup/linux.sh` mise install.** Currently:
`curl_fetch_stream https://mise.run | sh`. Convert.
mise.run publishes signed releases — investigate
whether install.sh ships in a verifiable form.
3. **`tools/setup/common/elan.sh` Lean-toolchain install.**
Currently: `curl_fetch_stream URL | sh -s -- -y
--default-toolchain none`. Convert. elan-init.sh ships
from `raw.githubusercontent.com/leanprover/elan/master/`
— investigate whether tag-pinned versions are available
(move from `master` to a pinned tag if so).

## Done-criteria

- [ ] All three call sites converted to download-to-temp
+ size-check + exec pattern.
- [ ] For each call site, the upstream's verifiability
story is documented in the inline comment (signed
release / SHA256SUMS / project-as-trust-anchor with
no upstream verification).
- [ ] `tools/setup/common/curl-fetch.sh` doc-comments
reflect the new pattern; the `curl_fetch_stream`
function may then be DEPRECATED-WARNING-on-use or
removed entirely.
- [ ] CI passes on macOS-26, ubuntu-24.04, ubuntu-24.04-arm
with the new pattern.

## Why P1 (not P0)

The PR #75 fix (`curl_fetch_stream` without `--retry`) closes
the immediate retry-replay hazard. The structural fix here
adds defense-in-depth (checksum verification, size guard,
buffered exec) but the immediate hazard is already gone.
P1 = within 2-3 rounds, not an absolute ship-blocker.

## Composes with

- **PR #75** — the originating thread cluster lives at
cids 3151434903 / 3151434921 / 3151434929 / 3151434941 /
3151434956 (now resolved with this row as the concrete
tracking destination per
`feedback_bulk_resolve_is_not_answer_recurring_pattern_aaron_2026_04_28.md`).
- **B-0060** — human-lineage / external-anchor backfill
(the chosen verification mechanisms here should cite
external prior art / RFCs / vendor docs).
- The original cost-driver memory:
`feedback_structural_fix_beats_process_discipline_velocity_multiplier_aaron_2026_04_28.md`
(curl 502 → retry-on-file-output was the structural fix
that closed the original failure class; this is the
follow-on hardening that covers the streamed-shell
pattern that the original fix didn't address).
183 changes: 183 additions & 0 deletions tools/setup/common/curl-fetch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
#!/usr/bin/env bash
#
# tools/setup/common/curl-fetch.sh — sourceable helpers for
# fetching URLs during install.
#
# Two helpers with DIFFERENT retry semantics by output mode:
# - curl_fetch — file-output downloads. `--retry 5`
# + `--retry-all-errors` (safe because
# curl restarts the file from scratch
# on retry).
# - curl_fetch_stream — streamed-to-shell installers
# (`curl ... | sh`, `bash -c
# "$(curl ...)"`). NO retries. Streamed
# retry is unsafe — partial bytes
# already piped to the consumer cannot
# be un-received. Streamed installers
# fail-fast on transient errors;
# caller re-runs install.sh.
# Do NOT assume all curl usage in this repo is retried —
# only the `curl_fetch` (file-output) variant retries. See
# the per-function comments below + B-0063 for the
# download-to-temp structural fix to the streamed case.
#
# WHY
# ===
# Human maintainer 2026-04-28: external-infra failures
# (upstream package mirrors returning 5xx, transient curl-22
# / network blips) should be absorbed by retry-with-backoff
# inside the install path, not kicked out to a workflow-rerun
# discipline. Quote: *"curl 502 pattern i mean why should a
# PR ever fail for this? our code does not handle the retries
# already?"*
#
# This file centralises the retry policy so every call site
# uses the same flags. Previously the policy was inlined in
# `tools/setup/common/verifiers.sh` and missing entirely from
# `linux.sh` (mise install), `macos.sh` (Homebrew install),
# and `elan.sh` (Lean toolchain install). Follow-up framing:
# *"sounds like a common helper would help too rather than
# copy/paste."*
#
# TWO FUNCTIONS — file-output vs streamed
# =======================================
# Two helpers are exposed because the safe retry policy
# differs by output mode. Code review on the original single-
# function form flagged the partial-output-replay risk for
# pipe-to-shell call sites:
#
# curl_fetch — for file-output downloads
# (`-o`/`--output` to disk). Uses
# `--retry-all-errors` because curl
# restarts the file from scratch on
# retry, so partial-output replay
# cannot happen.
#
# curl_fetch_stream — for streamed-to-shell installers
# (`curl ... | sh`, `bash -c "$(curl
# ...)"`). NO --retry. Codex P0 review
# on PR #75 confirmed: even bare
# `--retry` (without `--retry-all-
# errors`) can retry after bytes have
# already been written to stdout, and
# the consumer cannot un-receive piped
# bytes. Streamed installers fail-fast
# on transient errors; the user re-runs
# install.sh. Proper download-to-temp
# hardening tracked as B-0063.
#
# USAGE
# =====
# Source this file, then call the appropriate helper:
#
# # shellcheck source=/dev/null
# source "$REPO_ROOT/tools/setup/common/curl-fetch.sh"
#
# # File output (safe with full retries):
# curl_fetch --output "$path" "$url"
#
# # Streamed pipe (must use the stream variant):
# curl_fetch_stream https://example.com/install.sh | sh
#
# # Command substitution (capture to var first; see
# # IDEMPOTENCE / SET-E note below):
# INSTALLER="$(curl_fetch_stream https://example.com/install.sh)"
# /bin/bash -c "$INSTALLER"
#
# RETRY POLICY (rationale)
# ========================
# --retry 5 — five attempts total. Empirically
# covers the upstream 5xx blips
# this install path has hit.
# --retry-delay 2 — 2-second base delay between retries.
# --retry-all-errors — (file-output only) retry on ALL
# transient errors including HTTP
# 5xx without `Retry-After`. Curl's
# default `--retry` only retries
# connect / DNS / 408 / 429 / 5xx-
# with-Retry-After.
# -fsSL — original flags preserved:
# -f: fail on HTTP errors
# -s: silent (no progress meter)
# -S: show errors when silent
# -L: follow redirects
#
# COMMAND-SUBSTITUTION + SET-E (caveat per codex review)
# ======================================================
# bash's `errexit` (`set -e`) is NOT reliably triggered by a
# command substitution that fails without producing output —
# in some bash versions (especially without `inherit_errexit`
# enabled) `VAR="$(failing_cmd)"` leaves `VAR=""` and continues.
# Our macos.sh capture pattern uses an explicit two-gate
# approach: `if ! HOMEBREW_INSTALLER="$(curl_fetch_stream
# ...)"; then exit 1; fi` (catches curl failure via the
# if-not test on the assignment's exit status — verified on
# bash 3.2.57 / 5.x: `if ! x="$(false)"; then echo CAUGHT;
# fi` does print CAUGHT) PLUS a secondary `[ -z
# "$HOMEBREW_INSTALLER" ] && exit 1` empty-string check.
# Network errors trigger the first gate (curl-22 / curl-6 /
# HTTP-non-2xx via `-fsSL`); the unreachable case where curl
# exits 0 but produces empty output is caught by the second
# gate. Either failure produces a hard `exit 1` with a
# diagnostic message — never falls through to `bash -c ""`.
# This is NOT a defense against partial-byte corruption —
# proper fix is download-to-temp + checksum-verify, tracked
# as B-0063. The current pattern is a small improvement over
# the prior `bash -c "$(curl ...)"` direct form (which
# silently ran whatever partial output survived); it is NOT
# the structurally safe form.
#
# IDEMPOTENCE
# ===========
# Re-sourcing this file is a no-op once both helpers are
# loaded. The guard uses a file-local sentinel variable
# (`_CURL_FETCH_LOADED`) instead of probing for an
# existing `curl_fetch` function: a function-name probe
# would silently skip BOTH definitions if the caller
# environment already had an unrelated `curl_fetch`
# function, leaving `curl_fetch_stream` undefined and
# breaking the streamed callers (`linux.sh` / `macos.sh`
# / `elan.sh`) at runtime with `curl_fetch_stream:
# command not found`. Sentinel-based guarding ties the
# load decision to "did this file load?" instead of "does
# that name exist?" — collisions in the caller environment
# can no longer accidentally suppress our definitions.

if [[ -z "${_CURL_FETCH_LOADED:-}" ]]; then
_CURL_FETCH_LOADED=1

# File-output variant — safe with --retry-all-errors because
# curl restarts the output file from scratch on each retry.
curl_fetch() {
curl -fsSL --retry 5 --retry-delay 2 --retry-all-errors "$@"
Comment thread
AceHack marked this conversation as resolved.
}

# Streamed variant — NO --retry, NO --retry-all-errors.
#
# Codex P0 review on PR #75 surfaced that even bare `curl
# --retry` (without --retry-all-errors) can still retry after
# bytes have been written to stdout: the connect error happens
# mid-transfer, curl resets the input but the bytes already
# piped into the consumer (`sh`, `bash -c "$(...)"`) cannot be
# un-written. The consumer then sees concatenated partial+full
# script content, which can re-execute commands or run
# truncated halves. There is no curl-flag combination that
# gives both retry-on-transient AND safe-restart-on-streamed-
# stdout — those are mutually exclusive without an
# intermediate buffer.
#
# Therefore this variant ships WITHOUT retries. Streamed
# installer failures (mise.run / Homebrew / elan) bubble up
# as install errors; the user re-runs install.sh.
#
# The proper structural fix — download to a temp file with
# `curl_fetch` (file-output), checksum-verify if available,
# then `bash <tempfile` — is tracked as a backlog item under
# `docs/backlog/P1/B-0063-streamed-installer-download-to-temp-
# pattern-codex-p0-pr-75.md`. Until that lands, this variant
# is fail-fast-no-retry by design.
curl_fetch_stream() {
curl -fsSL "$@"
}

fi
13 changes: 12 additions & 1 deletion tools/setup/common/elan.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,20 @@

set -euo pipefail

# shellcheck source=curl-fetch.sh
# shellcheck disable=SC1091 # CI runs without -x; source path verified in tools/setup/common/curl-fetch.sh
source "$(dirname "${BASH_SOURCE[0]}")/curl-fetch.sh"

if ! command -v elan >/dev/null 2>&1; then
echo "↓ installing elan (Lean 4 toolchain manager)..."
curl -fsSL https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh \
# Use the stream variant (NO --retry, NO --retry-all-errors).
# Codex P0 review on PR #75: even bare `--retry` can retry
# after bytes are piped to sh, and the consumer cannot
# un-receive piped bytes. Streamed installers fail-fast on
# transient errors; user re-runs install.sh. Proper
# download-to-temp + checksum-verify hardening tracked as
# B-0063.
curl_fetch_stream https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh \
| sh -s -- -y --default-toolchain none
fi

Expand Down
Loading
Loading