Skip to content

test: add shebangs to shell.nix test scripts#14778

Merged
xokdvium merged 1 commit intoNixOS:masterfrom
agucova:fix-macos-shebang-flakiness
Dec 13, 2025
Merged

test: add shebangs to shell.nix test scripts#14778
xokdvium merged 1 commit intoNixOS:masterfrom
agucova:fix-macos-shebang-flakiness

Conversation

@agucova
Copy link
Contributor

@agucova agucova commented Dec 12, 2025

Summary

Adds shebangs to the foo, bar, and ruby test scripts in tests/functional/shell.nix to fix intermittent SIGSEGV crashes on macOS.

Motivation

Some of the test scripts used for the functional test are created without shebangs, which causes a ~50% build failure rate on macOS (particularly Apple Silicon) caused by SIGSEGVs when the scripts are executed via command substitution inside the nix sandbox.

See #12121 for my root cause analysis and a minimal reproduction script.

Solution

Add #!${shell} shebangs to each test script. With this fix, builds pass consistently (10/10 vs 5/10 without the patch).

Testing

Likely fixes some of the problems people had in #13106, though not all test failures reported are caused by this.

@github-actions github-actions bot added the with-tests Issues related to testing. PRs with tests have some priority label Dec 12, 2025
  Fix intermittent SIGSEGV (exit code 139) on macOS when running
  nix-shell and shebang tests inside the nix sandbox.

  The foo, bar, and ruby test scripts were created without shebangs,
  which causes intermittent crashes when executed via command
  substitution on macOS. Adding proper shebangs resolves the flakiness.

  Potentially closes: NixOS#13106
@agucova agucova force-pushed the fix-macos-shebang-flakiness branch from 47f6743 to 7b3d7eb Compare December 12, 2025 21:04
@agucova agucova marked this pull request as ready for review December 12, 2025 21:05
@agucova agucova requested a review from edolstra as a code owner December 12, 2025 21:05
Copy link
Contributor

@xokdvium xokdvium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't even fathom how this leads to segfaults and at this point I'm too afraid to ask.

@agucova
Copy link
Contributor Author

agucova commented Dec 13, 2025

Can't even fathom how this leads to segfaults and at this point I'm too afraid to ask.

I did some digging, and as far as I can tell, it's because under macOS fork-without-exec is an unsafe operation. When bash doesn't see a shebang, it does a fork and then execve, but execve fails with ENOEXEC, which, depending on a race condition, the sandbox detects as an unsafe operation and returns a SIGSEGV. macOS logs this as crashed on child side of fork pre-exec.

I believe it's the same issue as the one reported at benoitc/gunicorn#2761

Having said all this, I couldn't find a minimally reproducible example that doesn't rely on an environment created by nix-build, so it's possible that it's nix's fault somehow.

@xokdvium
Copy link
Contributor

nix-build, so it's possible that it's nix's fault somehow.

Probably the specific macOS sandbox profile has to do with it?

@xokdvium xokdvium added this pull request to the merge queue Dec 13, 2025
@agucova
Copy link
Contributor Author

agucova commented Dec 13, 2025

Probably the specific macOS sandbox profile has to do with it?

Nope, apparently it happens without the sandbox profile enabled, too. I'm now trying to figure out if it's stdenv related, in which case there might be a more serious bug elsewhere.

Merged via the queue into NixOS:master with commit bb718d2 Dec 13, 2025
17 checks passed
@agucova
Copy link
Contributor Author

agucova commented Dec 13, 2025

@xokdvium want to hear something even more crazy? The bug just generally affects the use of runCommand with a shebang-less script. But if one swaps in the underlying bash version to either bash-interactive or, more conservatively, just the same bootstrap bash but linked with the ncurses library, it solves the SIGSEGV.

I'm cooking the most cursed reproduction scripts right now...

Running 30 builds for each configuration...

=== Testing BOOTSTRAP BASH (no ncurses) ===
.......X..............X....X..
Bootstrap bash: PASSED=27 FAILED=3 (SIGSEGV=3)

=== Testing BASH + NCURSES ===
..............................
Bash + ncurses: PASSED=30 FAILED=0 (SIGSEGV=0)

==============================================
SUMMARY
==============================================
Bootstrap bash (no ncurses): 27/30 passed, 3/30 failed (3 SIGSEGV)
Bash + ncurses:              30/30 passed, 0/30 failed (0 SIGSEGV)

My plain version looks like:

{ pkgs ? import <nixpkgs> {} }:
let
  # Script WITHOUT a shebang - this triggers the bug
  script = pkgs.runCommand "shebang-less-script" {} ''
    mkdir -p $out/bin
    echo 'echo hello' > $out/bin/test
    chmod +x $out/bin/test
  '';
in
pkgs.runCommand "test-bootstrap-bash" {} ''
  for i in $(seq 1 100); do
    # Capture both output and exit code
    set +e
    result=$(${script}/bin/test 2>&1)
    code=$?
    set -e

    if [ $code -eq 139 ]; then
      echo "SIGSEGV (exit 139) at iteration $i"
      exit 139
    elif [ $code -ne 0 ]; then
      echo "FAILED at iteration $i with exit code $code"
      exit $code
    elif [ "$result" != "hello" ]; then
      echo "FAILED at iteration $i: unexpected output '$result'"
      exit 1
    fi
  done

  mkdir -p $out
  echo "PASSED all 100 iterations" > $out/result
''

My patched version with ncurses is:


{ pkgs ? import <nixpkgs> {} }:
let
  # Build bash with ncurses linked 
  bashWithNcurses = pkgs.bash.overrideAttrs (old: {
    pname = "bash-with-ncurses";
    buildInputs = (old.buildInputs or []) ++ [ pkgs.ncurses ];
    # Force linking ncurses even though bash doesn't use it
    NIX_LDFLAGS = "-lncursesw";
  });

  # Script WITHOUT a shebang - same trigger as the failing case
  script = pkgs.runCommand "shebang-less-script-nc" {} ''
    mkdir -p $out/bin
    echo 'echo hello' > $out/bin/test
    chmod +x $out/bin/test
  '';
in
derivation {
  name = "test-bash-ncurses";
  system = builtins.currentSystem;
  # Bash with ncurses linked
  builder = "${bashWithNcurses}/bin/bash";
  args = ["-c" ''
    export PATH="${pkgs.coreutils}/bin"

    for i in $(seq 1 100); do
      # Capture both output and exit code
      set +e
      result=$(${script}/bin/test 2>&1)
      code=$?
      set -e

      if [ $code -eq 139 ]; then
        echo "SIGSEGV (exit 139) at iteration $i"
        exit 139
      elif [ $code -ne 0 ]; then
        echo "FAILED at iteration $i with exit code $code"
        exit $code
      elif [ "$result" != "hello" ]; then
        echo "FAILED at iteration $i: unexpected output '$result'"
        exit 1
      fi
    done

    mkdir -p $out
    echo "PASSED all 100 iterations" > $out/result
  ''];
  outputs = ["out"];
  __impureHostDeps = "/bin/sh /usr/lib/libSystem.B.dylib /dev/zero /dev/random /dev/urandom";
}

I'll continue my journey into the depths of hell tomorrow, trying to figure out what's ncurses doing and if this is bash's fault.

@xokdvium xokdvium added backport 2.28-maintenance Automatically creates a PR against the branch backport 2.29-maintenance Automatically creates a PR against the branch backport 2.30-maintenance Automatically creates a PR against the branch backport 2.31-maintenance Automatically creates a PR against the branch backport 2.32-maintenance Automatically creates a PR against the branch labels Jan 7, 2026
@internal-nix-ci
Copy link

Successfully created backport PR for 2.28-maintenance:

@internal-nix-ci
Copy link

Successfully created backport PR for 2.28-maintenance:

@internal-nix-ci
Copy link

Successfully created backport PR for 2.29-maintenance:

@internal-nix-ci
Copy link

Successfully created backport PR for 2.30-maintenance:

@internal-nix-ci
Copy link

Successfully created backport PR for 2.30-maintenance:

@internal-nix-ci
Copy link

Successfully created backport PR for 2.31-maintenance:

@internal-nix-ci
Copy link

Successfully created backport PR for 2.32-maintenance:

lf- pushed a commit to lix-project/lix that referenced this pull request Jan 9, 2026
macos-builder02 in particular has been having problems for a while now
that no one could reproduce. We believe we have finally found the cause:
missing shebangs, not just in the tests themselves but *also* in the
inline scripts `nix-shell` itself writes.

Fixes #1042.
I believe this will also fix #1093.

See also: NixOS/nix#14778

Change-Id: I54d04a770b8e78484815db88a8bc88776a6a6964
DzmingLi pushed a commit to DzmingLi/tangled-pijul-support-fork that referenced this pull request Feb 12, 2026
this will fix nix devshell failing on darwin machine

The fix was introduced on: <NixOS/nix#14778>

Signed-off-by: Seongmin Lee <git@boltless.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.28-maintenance Automatically creates a PR against the branch backport 2.29-maintenance Automatically creates a PR against the branch backport 2.30-maintenance Automatically creates a PR against the branch backport 2.31-maintenance Automatically creates a PR against the branch backport 2.32-maintenance Automatically creates a PR against the branch with-tests Issues related to testing. PRs with tests have some priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants