Skip to content

fix(infra): pin nix-darwin to nix-darwin-24.11 release branch#4907

Merged
AceHack merged 2 commits into
mainfrom
fix/nix-darwin-pin-to-2411-release-branch
May 25, 2026
Merged

fix(infra): pin nix-darwin to nix-darwin-24.11 release branch#4907
AceHack merged 2 commits into
mainfrom
fix/nix-darwin-pin-to-2411-release-branch

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 25, 2026

Summary

Hot-fix: pin `nix-darwin` input to the release branch matching our nixpkgs pin (`nix-darwin-24.11` ↔ `nixos-24.11`).

Why now

CI (`build-installer-iso` workflow from PR #4905) caught this on `nix flake check`:

```
error:
nix-darwin and Nixpkgs branches in use must match, but you are
currently using nix-darwin master with Nixpkgs nixos-24.11
```

PR #4906 (which added the nix-darwin input) pinned it to `master` based on stale guidance. nix-darwin > 25.x added a hard assertion enforcing branch-match.

Composes with

Test plan

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

…nixpkgs)

nix-darwin > 25.x added an assertion that fails eval when the
nix-darwin and nixpkgs branches don't match:

  error:
    nix-darwin and Nixpkgs branches in use must match, but you are
    currently using nix-darwin master with Nixpkgs nixos-24.11

PR #4906 (which added the nix-darwin input) pinned it to `master`
based on stale guidance from the nix-darwin README. Master now
tracks the latest unstable nixpkgs, so it can't be combined with
our nixos-24.11 pin.

Fix: pin nix-darwin to the nix-darwin-24.11 release branch that
matches nixpkgs.url. Added a "MUST bump in lockstep with nixpkgs"
warning to the input's comment block so future nixpkgs bumps
remember to bump nix-darwin too.

Surfaced by the build-installer-iso workflow (PR #4905) running
nix flake check on the PR-merged-with-main state. Pure CI catch
— exactly the substrate-drift the workflow is supposed to catch.
Unblocks PR #4905 + restores `nix flake check` on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 25, 2026 03:51
@AceHack AceHack enabled auto-merge (squash) May 25, 2026 03:51
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Pins the nix-darwin flake input to the nix-darwin-24.11 release branch so it matches the repo’s nixpkgs pin (nixos-24.11) and avoids nix-darwin’s branch-mismatch assertion during evaluation.

Changes:

  • Switch inputs.nix-darwin.url from .../master to .../nix-darwin-24.11.
  • Update inline documentation in flake.nix to explain/justify the required branch match and the “bump in lockstep” rule.

Comment thread flake.nix
Copilot P1: the prior commit pinned the flake input to
`nix-darwin-24.11`, but 4 usage examples elsewhere still said
`nix-darwin/master`:
  - flake.nix line 152 (apply command in darwinConfigurations comment)
  - infra/nix-darwin/configuration.nix line 13 (apply command in
    header comment)
  - infra/nix-darwin/README.md line 24 (one-command setup)
  - infra/nix-darwin/README.md line 58 (update procedure)

A maintainer following any of these would invoke
`nix run nix-darwin/master#darwin-rebuild` and hit the same
branch-mismatch eval error the prior commit fixed for the flake.

Fix: rewrote all 4 to `nix-darwin/nix-darwin-24.11#darwin-rebuild`
to match the input pin. Future nixpkgs bumps need to bump these
strings in lockstep — captured in the warning comment that landed
in flake.nix in the prior commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AceHack AceHack merged commit 21a45f5 into main May 25, 2026
25 of 26 checks passed
@AceHack AceHack deleted the fix/nix-darwin-pin-to-2411-release-branch branch May 25, 2026 03:58
AceHack added a commit that referenced this pull request May 25, 2026
* ci(infra): build installer ISO on PRs + main + release publish

Adds .github/workflows/build-installer-iso.yml — Linux runner builds
the .#installer-iso flake output on every PR touching flake.nix /
flake.lock / infra/nixos/**, every push to main hitting those paths,
manual workflow_dispatch, and release publish.

Why on a Linux runner (not the existing macos-26 gate matrix):
the ISO target is x86_64-linux. Building on macOS requires the
nix-darwin linux-builder VM. Ubuntu-24.04 builds directly — faster,
cheaper, no cross-compile path.

Pipeline:
  1. Checkout (full history for reproducible flake.lock pinning)
  2. Install Nix via DeterminateSystems/nix-installer-action@v22
     (SHA-pinned ef8a148080ab6020fd15196c2084a2eea5ff2d25)
  3. Magic Nix cache action@v13 for /nix/store reuse across runs
  4. nix flake metadata for the run summary
  5. nix flake check --no-build (cheap eval-only fail-fast)
  6. nix build .#installer-iso
  7. Capture iso path/name/size/sha256, write to GITHUB_STEP_SUMMARY
  8. Upload as workflow artifact (90-day retention, no re-compression)

Second job (attach-to-release) runs only on release events:
  - Re-builds the ISO at the tag for build-from-source reproducibility
  - Uploads ISO + .sha256 to the release assets
  - permissions: contents: write scoped to this job only

Security discipline:
  - Runner pinned to ubuntu-24.04 (not -latest), matches gate.yml
  - Third-party actions SHA-pinned with # vX.Y.Z trailing comments
  - Workflow-level permissions: contents: read; only attach-to-release
    elevates to write
  - github.event.release.tag_name (attacker-controllable) passed via
    env: RELEASE_TAG, never interpolated into run: shell — per the
    GitHub Actions injection guide flagged by the security-reminder
    PreToolUse hook
  - Concurrency cancel-in-progress only for PR events (main + release
    queue so every event gets a record)

Benefits:
  - Maintainers can review a PR and grab the rebuilt ISO from the
    workflow run page — no local Nix required
  - flake.nix can't go stale silently; CI catches breakage
  - Releases automatically ship a downloadable ISO + checksum

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): unblock build-installer-iso (actionlint SC2129 + drop magic-nix-cache FlakeHub-auth dep)

Two CI failures on PR #4905, both fixable in place.

actionlint SC2129 (lint job): the metadata-capture step at line ~96
had 4 sequential `echo ... >> "$GITHUB_OUTPUT"` redirects. shellcheck
flagged the pattern and recommends `{ ...; } >> file`. Grouped the
4 echoes accordingly. Matches the style already used for the
GITHUB_STEP_SUMMARY block lower in the same step.

build-iso job: failed with
  "Unable to authenticate to FlakeHub. Individuals must register at
   FlakeHub.com; Organizations must create an organization at
   FlakeHub.com."

This came from `DeterminateSystems/magic-nix-cache-action@v13`,
which now requires a FlakeHub account/org that the project doesn't
have set up. The auth failure propagates into the substituter chain
nix uses during `nix flake check`, causing the eval-only step to
fail before the build can even start.

Removed the magic-nix-cache step from both jobs (build + attach-
to-release). Builds will be uncached (~10-15 min cold) instead of
~3 min warm — acceptable trade-off vs requiring contributors to
set up FlakeHub. Follow-up to wire `actions/cache` on /nix/store or
swap to `nix-community/cache-nix-action` (no FlakeHub auth needed)
is tracked in the comment block left in place of the removed step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(infra): real flake bugs CI caught — openssh conflict + cuda unfree predicate

The newly-added build-installer-iso workflow's `nix flake check` step
surfaced two real bugs in the substrate landed via PR #4898. These
went undetected before because no CI ever ran `nix flake check` on
this repo until now.

Bug 1: services.openssh.enable conflict in installer config
  Upstream `installation-cd-minimal.nix` (imported on line 24) sets
  services.openssh.enable = true. Our installer config set it to false
  for the no-credentials-in-Git security posture. NixOS module merge
  fails eval:

    error: The option `services.openssh.enable' has conflicting
    definition values:
      - In `<nixpkgs>/nixos/modules/profiles/installation-device.nix': true
      - In `infra/nixos/hosts/installer/configuration.nix': false

  Fix: `lib.mkForce false` so our value wins the merge. Documented
  the WHY in a comment block so the next reader knows why mkForce
  is necessary.

Bug 2: cuda_cuobjdump unfree license in gpu.nix
  `nixpkgs.config.allowUnfreePredicate` enumerated a hand-picked
  list (cuda_cudart, cuda_nvcc, cuda-merged, libcublas, libcudnn).
  CUDA's transitive dependency cuda_cuobjdump-12.4.99 wasn't in
  the list, so flake-check refused to evaluate:

    error: Package 'cuda_cuobjdump-12.4.99' has an unfree license
    ('CUDA EULA'), refusing to evaluate.

  Fix: switched from explicit enumeration to prefix-based matching:
    - hard-list nvidia-x11, nvidia-settings, nvidia-persistenced,
      nvidia-docker, nvidia-container-toolkit
    - prefix-allow cuda*  (covers cuda_cudart, cuda_nvcc,
      cuda_cuobjdump, cuda_nvprune, cuda_cccl, cuda_nvtx,
      cuda_profiler_api, etc)
    - prefix-allow libcu*, libnv*, libnp* (libcublas, libcurand,
      libcusolver, libcusparse, libcufft, libcudnn, libnpp,
      libnvjpeg, libnvjitlink, ...)
    - explicit allow for cuda-merged umbrella package
  Comments name what each pattern covers so future-readers know
  what's being permitted.

This is exactly what the CI workflow added in this PR is supposed
to catch — it's catching two bugs on its first real run. Both fixes
land in this PR so reviewers see the workflow + the bugs it caught
together.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): address Copilot review findings on build-installer-iso

Five fixes raised on PR 4905:

P0 — sha256 sidecar to /nix/store (read-only EROFS)
  attach-to-release wrote ${iso_path}.sha256 next to the nix-store
  iso path; the parent dir is read-only on the runner so the
  upload step would EROFS. Write sidecar to $RUNNER_TEMP and
  upload that file instead.

P1 — attach-to-release checkout missing fetch-depth: 0
  Build job pins fetch-depth: 0 for reproducible flake.lock +
  git-describe; release job inherited default depth 1. Match
  the build job so tag builds can't silently drift.

P2 — header comment said "tag push"; actual trigger is
  release: published. Updated to match.

P2 — find ... | head -1 is non-deterministic on multi-match
  and silent on no-match. Switched to find -print -quit + an
  explicit ::error:: + exit 1 if nothing found. Applied at
  both call sites (build + attach-to-release).

P2 — release events ran build then attach-to-release, building
  the ISO twice. Skip build on release events (attach-to-release
  rebuilds at the tag — the verification on PR/main already ran).

actionlint clean; yaml-valid.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(ci): drop needs:build + pin attach-to-release checkout to tag

Two follow-up findings on top of peer Otto's review-fix commit
920b691. Both real.

Codex P1 — needs:build short-circuits attach-to-release on release events
  Peer Otto kept `needs: build` on attach-to-release. Build is now
  skipped on release events (`if: github.event_name != 'release'`).
  When a needed job is skipped via `if:`, downstream jobs depending
  on it via `needs:` are ALSO skipped by default — meaning
  attach-to-release would never run.

  Fix: removed `needs: build`. The two jobs are independent:
  attach-to-release does its own checkout + build at the release tag.

Copilot P1 — explicit ref pinning on attach-to-release checkout
  Peer Otto fixed fetch-depth: 0 but didn't add `ref:` to pin the
  checkout to the release tag. On release events GITHUB_REF
  defaults to the tag, so the implicit behavior is correct today.
  Explicit pinning is defense in depth against future payload
  variation + reads clearer at the call site.

  Fix: `ref: ${{ github.event.release.tag_name }}` (also renamed
  the step from "Checkout" to "Checkout at the release tag" for
  matching clarity).

My own larger refactor commit (f0775d999) was dropped — it
overlapped substantively with peer Otto's 920b691 (same root
findings, slightly different approach). Honoring peer Otto's
work per the honor-those-that-came-before discipline; this
commit lands only the residual gaps Codex + Copilot still flagged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: re-trigger after PR #4907 (nix-darwin pin fix) landed

* fix(ci): release upload safety + sha256sum standard format + header cleanup

3 findings on PR #4905 after the prior nix-darwin-pin retrigger.

P0 (security) — gh release upload tag-as-positional-arg flag injection
  `gh release upload "$RELEASE_TAG" ...` parses `$RELEASE_TAG`
  positionally; git tag names can legally start with `-`, which
  would make gh treat the tag as a flag. If a release is ever
  created with such a tag, the upload step could be coerced into
  unintended gh-CLI behavior.

  Two-layer defense:
    1. Hard-fail if RELEASE_TAG starts with `-` (case match)
    2. Add `--` separator before positional args (belt + suspenders
       against any future argv-injection vector)
  Also re-ordered the call to put `--clobber` before the `--` so
  the trailing args are unambiguously positional.

P1 — .sha256 file format
  Was writing just the hash:  `<hash>\n`
  Standard sha256sum format:  `<hash>  <filename>\n`
  The standard format lets consumers verify with `sha256sum --check`
  out of the box. Switched to `( cd dir && sha256sum name )` so the
  filename in the sidecar matches the ISO basename (not the full
  /nix/store path).

P2 — header comment "tag-push job" stale
  Header still said "tag-push job elevates to contents: write" but
  the actual job is `attach-to-release` (triggered by
  `release: published`, not tag push). Renamed accordingly in the
  comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci+infra): enforce single ISO match + clarify cuda predicate scope

3 Copilot findings on PR #4905.

P1 — gpu.nix cuda predicate comment mismatch
  Comment said "`cuda_*` covers ..." but the actual predicate is
  `lib.hasPrefix "cuda" name` (no underscore). The broader form is
  intentional — nixpkgs uses both spellings (cuda_cudart with
  underscore + cudatoolkit + cudaPackages.* aliases without).
  Updated comment to document this explicitly so it matches the
  code.

P1 — find -print -quit silently picks first match (build job)
P1 — find -print -quit silently picks first match (attach-to-release job)
  Peer Otto's prior fix switched from `find ... | head -1` to
  `find ... -print -quit` for determinism, but both are silent on
  multi-match. Multiple ISOs under result/iso/ would be a substrate
  surprise (build layout changed, leftover artifact, etc.) and
  silently picking one is worse than failing loudly — especially
  for release-asset upload where the wrong ISO would ship to the
  public.

  Switched both sites to mapfile + explicit count check:
    - 0 matches → fail loudly with directory listing
    - >1 matches → fail loudly with all candidates printed
    - 1 match  → proceed
  Same pattern, same error format in both jobs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 25, 2026
…odex/Copilot reviews

Four edits to docs/hygiene-history/ticks/2026/05/25/0443Z.md at PR #4909 head 9aeda56:

1. Line 22 (MD056 table-column-count): escape pipe in regex pattern (gemini.*Lior|lior.*loop → gemini.*Lior\|lior.*loop) so markdown parser does not split into 4 cells.
2. Line 25: reconcile +5/six-entry mismatch by changing +5→+4 and dropping #4906?/#4907 (which 0407Z already observed). Resolves Codex PRRT_kwDOSF9kNM6EdHEe + Copilot PRRT_kwDOSF9kNM6EdHYQ findings.
3. Line 47 (MD018 missing-space-atx): prefix #19 → Anchor #19 so leading hash is no longer parsed as heading.
4. Line 60 (MD018 missing-space-atx): same fix for #19's → Anchor #19's.

Local markdownlint-cli2 passes. Auto-merge already armed.
AceHack added a commit that referenced this pull request May 25, 2026
…-proc reading + cadence resumed (36min) (#4909)

* shard(2026-05-25/0443Z): 20th dotgit anchor — 7th consecutive 0-stuck-proc reading + cadence resumed (36min)

7th consecutive clean reading. Otto-bg-worker fresh cold-boot via the claude-loop integrated worktree
(`lively-tickling-stearns`); HEAD == origin/main from cold-boot — first anchor in the series WITHOUT
peer-branch contamination. Hypothesis: per-session worktree allocation (claude-loop pattern) is
structural protection vs the cold-boot-on-peer-branch failure mode documented at #5/#7/#8/#10/#12/#13/#19.

Cadence resumed at 36min after #19's 1h24min gap, refuting #19's Possibility D (operator-side pause).
Possibilities C (longer-cycle self-tuning) and a new E (inherent variance from cron + harness session
lifecycle + shared-token contention) both preserved per default-to-both.

Otto lane (Otto-VSCode + Otto-CLI + Otto-bg-worker) STILL EMPTY (0 PRs). 60 open PRs all in Lior's lane
(55 `lior-*` + 2 `family-*` + 2 `fix-*` + 1 `lior/decompose`). Otto stays out per lane discipline.
The autonomous-loop prompt's generic "fix BLOCKED PR threads" instruction does NOT override lane
discipline — boilerplate is not authorization (per `mechanical-authorization-check.md`).

* fix(shard): markdownlint MD056/MD018 + reconcile +5→+4 PR delta per Codex/Copilot reviews

Four edits to docs/hygiene-history/ticks/2026/05/25/0443Z.md at PR #4909 head 9aeda56:

1. Line 22 (MD056 table-column-count): escape pipe in regex pattern (gemini.*Lior|lior.*loop → gemini.*Lior\|lior.*loop) so markdown parser does not split into 4 cells.
2. Line 25: reconcile +5/six-entry mismatch by changing +5→+4 and dropping #4906?/#4907 (which 0407Z already observed). Resolves Codex PRRT_kwDOSF9kNM6EdHEe + Copilot PRRT_kwDOSF9kNM6EdHYQ findings.
3. Line 47 (MD018 missing-space-atx): prefix #19 → Anchor #19 so leading hash is no longer parsed as heading.
4. Line 60 (MD018 missing-space-atx): same fix for #19's → Anchor #19's.

Local markdownlint-cli2 passes. Auto-merge already armed.

---------

Co-authored-by: Otto <noreply@anthropic.com>
AceHack pushed a commit that referenced this pull request May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants