Skip to content

patch-shebangs: optimize bash and add ANSI C implementation#482713

Open
qweered wants to merge 2 commits intoNixOS:stagingfrom
qweered:patch-shebangs-speedup
Open

patch-shebangs: optimize bash and add ANSI C implementation#482713
qweered wants to merge 2 commits intoNixOS:stagingfrom
qweered:patch-shebangs-speedup

Conversation

@qweered
Copy link
Contributor

@qweered qweered commented Jan 22, 2026

Motivation

The patchShebangs hook is called during the fixup phase of nearly every package build. For most packages this is fast, but large projects with thousands of scripts become a serious bottleneck:

  • Electron: 5000+ scripts to patch
  • Chromium: 3000+ scripts
  • Node.js monorepos: Often 2000+ scripts in node_modules
  • Qt/KDE applications: 1000+ scripts

With the original bash implementation, patching 1000 files takes 26 seconds. For Electron builds, this means 2+ minutes spent just on shebang patching, which adds up significantly in CI pipelines and when iterating on package development.

Benchmark Results

Files Original Bash Optimized Bash C Implementation
1000 26.1s 16.5s 0.12s

C implementation is 217x faster than original bash (0.12s vs 26.1s for 1000 files).

For Electron-scale builds (5000+ files), this reduces shebang patching from ~2.5 minutes to ~0.6 seconds.

Implementation

The C implementation:

  • Is added as a package (patchShebangs) in extraNativeBuildInputs of final stdenv
  • The shell hook auto-detects the binary and dispatches to it when available
  • Falls back to bash implementation during bootstrap stages (before C compiler is available)
  • Handles all edge cases identically to bash: env shebangs, -S flag, read-only files, timestamp preservation, recursive directories
  • Compiles with standard C flags for maximum portability
  • Includes comprehensive test suite (patchShebangs.passthru.tests)

Changes

Commit 1: Bash optimizations

  • Reuse temp file across all files (avoids mktemp per-file overhead)
  • Use touch -r instead of stat + touch --date (avoids subshell)
  • Use printf/tail instead of sed for shebang replacement

Commit 2: C implementation + integration

  • Portable C implementation with identical CLI interface
  • Package definition at pkgs/build-support/setup-hooks/patch-shebangs/
  • Shell hook dispatches to C binary when available
  • Added to final stdenv's extraNativeBuildInputs
  • Test suite with 14 test cases covering all functionality

Things done

  • Built on platform:
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • Tested, as applicable:
  • Ran nixpkgs-review on this PR. See nixpkgs-review usage.
  • Tested basic functionality of all binary files, usually in ./result/bin/.
  • Nixpkgs Release Notes
    • Package update: when the change is major or breaking.
  • NixOS Release Notes
    • Module addition: when adding a new NixOS module.
    • Module update: when the change is significant.
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other READMEs.

github-actions[bot]

This comment was marked as outdated.

@qweered qweered force-pushed the patch-shebangs-speedup branch from dfc579d to 6a44c22 Compare January 22, 2026 15:27
@github-actions github-actions bot dismissed their stale review January 22, 2026 15:28

Review dismissed automatically

@nixpkgs-ci nixpkgs-ci bot requested a review from Ericson2314 January 22, 2026 15:34
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-darwin: 501+ This PR causes many rebuilds on Darwin and should normally target the staging branches. 10.rebuild-linux-stdenv This PR causes stdenv to rebuild on Linux and must target a staging branch. 10.rebuild-darwin-stdenv This PR causes stdenv to rebuild on Darwin and must target a staging branch. 10.rebuild-darwin: 5001+ This PR causes many rebuilds on Darwin and must target the staging branches. 10.rebuild-linux: 5001+ This PR causes many rebuilds on Linux and must target the staging branches. labels Jan 22, 2026
@qweered

This comment was marked as outdated.

@qweered qweered force-pushed the patch-shebangs-speedup branch from 6a44c22 to ec7a09c Compare January 22, 2026 15:39
@qweered qweered requested review from infinisil and roberth January 22, 2026 15:40
@qweered qweered force-pushed the patch-shebangs-speedup branch from d17d854 to f9b68a6 Compare January 24, 2026 00:07
@doronbehar
Copy link
Contributor

+def main() -> None:
+    args = sys.argv[1:]

Why not use argparse? Speaking of another implementation, I think Python is too far in the bootstrap chain to be used for that. Today with LLMs etc you can write an ANSI C program that would do that much faster, and would not require an interpreter.

Copy link
Contributor

@doronbehar doronbehar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.


# Make original file writable if it is read-only
local restoreReadOnly
local restoreReadOnly=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change coincides with d61930c from #462319 - cc @bmillwood .

@qweered qweered force-pushed the patch-shebangs-speedup branch 3 times, most recently from f14d9e0 to 461a2de Compare February 2, 2026 13:32
@qweered qweered changed the title patch-shebangs: optimize bash implementation patch-shebangs: optimize bash and add ANSI C implementation Feb 2, 2026
@qweered qweered force-pushed the patch-shebangs-speedup branch 2 times, most recently from 2ac44f6 to a7dda1b Compare February 2, 2026 14:17
@qweered qweered marked this pull request as draft February 2, 2026 14:19
@qweered qweered force-pushed the patch-shebangs-speedup branch 2 times, most recently from 79a049b to 50c2605 Compare February 2, 2026 14:24
@nixpkgs-ci nixpkgs-ci bot added the 6.topic: stdenv Standard environment label Feb 2, 2026
- Reuse temp file across all files (avoids mktemp per-file overhead)
- Use touch -r instead of stat + touch --date (avoids subshell)
- Use printf/tail instead of sed for shebang replacement
- Print status messages immediately (no buffering)
Add a portable C implementation that can be compiled and included
in stdenv for significant performance improvements.

Benchmark (1000 files):
- Original bash: 26.1s
- Optimized bash: 16.5s
- C implementation: 0.12s (217x faster)

Features:
- C99/POSIX implementation for portability
- Compiles with -std=c99 -O3 -Wall
- Handles all edge cases: env shebangs, -S flag, read-only files
- Preserves file timestamps
- Same CLI interface as bash version
@qweered qweered force-pushed the patch-shebangs-speedup branch from 50c2605 to 3697136 Compare February 2, 2026 14:40
@qweered
Copy link
Contributor Author

qweered commented Feb 2, 2026

Ok i think its ready for review, tested build manually and it works exactly as bash, now recompiling stdenv to be absolutely sure

@qweered qweered marked this pull request as ready for review February 2, 2026 14:43
Copy link
Contributor

@doronbehar doronbehar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qweered these changes look super cool, and I strongly endorse them, but it'd be much nicer if before you continue, you could give your review on:

As the issues these fix are more bothering then the optimization issues you tackle here.

Comment on lines +28 to +40
local mode="" update=""
while [[ $# -gt 0 ]]; do
case "$1" in
--host) mode="--host"; shift ;;
--build) mode="--build"; shift ;;
--update) update="--update"; shift ;;
--) shift; break ;;
-*|--*) echo "Unknown option $1 supplied to patchShebangs" >&2; return 1 ;;
*) break ;;
esac
done
[[ -z "$mode" ]] && { [[ -n $strictDeps && $1 == "$NIX_STORE"* ]] && mode="--host" || mode="--build"; }
patch-shebangs $mode $update -- "$@"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you have to put this logic here and not inside the C code?

@@ -23,6 +23,24 @@ fixupOutputHooks+=(patchShebangsAuto)
# $ patchShebangs --build configure

patchShebangs() {
# Use C implementation if available (much faster)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, last time I tried to use C for setup hooks it was heavily discouraged, see #394610 (comment).

@philiptaron
Copy link
Contributor

Hey @qweered, this is AI generated, isn't it? It has a certain Claude flavor. I'm asking because this isn't disclosed.

I agree that the patch shebang hook could have better performance and be much better tested. I share the @K900 judgement that using raw C std library calls is the wrong call.

What better could look like, maybe, is introducing a tool outside of Nixpkgs, with testing and documentation, which then fits into the Nixpkgs stdenv. patchelf looks like this, for instance. But it would take some doing, because of bootstrapping, and because of trust issues. The maintainer would have to prove themselves in judgement and longevity and responsiveness.

I'm loathe to make changes on a performance basis here.

@qweered
Copy link
Contributor Author

qweered commented Feb 4, 2026

Do we really need to have source in separate repository? patchShebangs is a lot simpler than patchElf and we also will not have trust issues.

Yes this pr have been assisted by Claude opus

@nixpkgs-ci nixpkgs-ci bot requested a review from a team February 10, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: stdenv Standard environment 10.rebuild-darwin: 501+ This PR causes many rebuilds on Darwin and should normally target the staging branches. 10.rebuild-darwin: 5001+ This PR causes many rebuilds on Darwin and must target the staging branches. 10.rebuild-darwin-stdenv This PR causes stdenv to rebuild on Darwin and must target a staging branch. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 5001+ This PR causes many rebuilds on Linux and must target the staging branches. 10.rebuild-linux-stdenv This PR causes stdenv to rebuild on Linux and must target a staging branch.

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

6 participants