shell: normalize CRLF line endings in .sh files#29670
Conversation
|
Updated 4:33 PM PT - May 27th, 2026
✅ @robobun, your commit 0380528bc0dfea36c05b9ae90aa3308a601fc373 passed in 🧪 To try this PR locally: bunx bun-pr 29670That installs a local version of the PR into your bun-29670 --bun |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughLexer and shell-escaping logic were changed to handle Windows CRLF: unescaped Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/cli/run/run-shell.test.ts`:
- Around line 56-59: The test currently asserts stderr is exactly empty after
collecting subprocess outputs (variables proc, stdout/stderr/exitCode in
run-shell.test.ts); change this to either remove the strict
expect(stderr).toBe("") check or filter out ASAN noise before asserting
emptiness by splitting stderr into lines and removing any lines that start with
"WARNING: ASAN interferes" (or similar ASAN prefix) and then asserting the
remaining joined string is empty; apply this same change to the other assertions
mentioned (the blocks around the usages at lines ~78-81, ~100-103, and ~122-125)
so any bunExe()/bunEnv spawned subprocesses ignore ASAN startup warnings when
checking stderr.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: eeaad6e3-063d-4cc5-954c-a2e278b1e488
📥 Commits
Reviewing files that changed from the base of the PR and between d89cd8b6722e418709d451b7d9d55b238e26460f and b946e9ac5c10deec48767c76f6640c562308003c.
📒 Files selected for processing (2)
src/shell/shell.zigtest/cli/run/run-shell.test.ts
There was a problem hiding this comment.
Thanks — b946e9a addresses the last bare-CR-in-double-quotes edge case and all four scenarios now have tests; I don't see any remaining issues. Given how subtle the lexer state interactions turned out to be (three follow-up edge cases), I'd still like a human familiar with the shell lexer to give the final shape a once-over before merge.
Extended reasoning...
Overview
This PR fixes #29669 by teaching the Bun shell lexer (src/shell/shell.zig) to normalize CRLF line endings. It touches four spots: (1) a new unescaped '\r' case in the main dispatch that drops \r before \n in Normal state, (2) a new escaped-\r handler so \<CR><LF> works as a line continuation, (3) \r added to the Double-state backslash escape list so the continuation also works inside double quotes, with a compensating re-emit of the backslash when the CR is bare, and (4) \r added to SPECIAL_CHARS (required by assertSpecialChar and correct for the needsEscape* interpolation heuristics). Four new tests in test/cli/run/run-shell.test.ts cover each scenario.
Security risks
None identified. This is purely lexer tokenization of \r bytes; no auth, crypto, filesystem, or network surface is touched. The SPECIAL_CHARS addition only makes the interpolation-escaping heuristic slightly more conservative (values containing bare \r now get quoted), which is safe.
Level of scrutiny
Moderate-to-high. The shell lexer is a critical path — it runs for every bun run *.sh and every Bun.$ template literal — and the change is a state-machine modification with non-obvious interactions between read_char() escape handling, the !escaped dispatch block, and the escaped fall-through path. The fact that this took three follow-up commits (escaped-CR continuation, Double-state escape list, bare-CR backslash preservation) to reach a correct state confirms the area is subtle. Each of those was caught in review and is now covered by a regression test, but a maintainer who owns this lexer should confirm the final approach is the one they want (vs., say, normalizing CRLF→LF in a pre-pass before lexing).
Other factors
All four of my prior inline comments are resolved with code + tests. CodeRabbit had no actionable findings. There is no CODEOWNERS entry for src/shell/. The PR description reports lex.test.ts/parse.test.ts pass unchanged and bunshell.test.ts has identical pass/fail counts. I am deferring rather than approving solely because this is non-trivial lexer logic in a hot path, not because of any known defect.
There was a problem hiding this comment.
All earlier feedback is addressed and I didn't find further issues — deferring final sign-off to a human since this touches the shell lexer's escape/quoting semantics and took a few iterations to get the edge cases right.
Extended reasoning...
Overview
This PR teaches the Bun shell lexer to normalize CRLF line endings: an unescaped \r immediately before \n in Normal state is dropped, \<CR><LF> is treated as a line continuation (both in Normal and Double-quoted state), and a backslash before a bare CR inside double quotes is re-emitted to preserve POSIX behaviour. \r is also added to SPECIAL_CHARS so the comptime assertSpecialChar check passes and shell-source generation quotes it. Four new integration tests in test/cli/run/run-shell.test.ts cover the basic CRLF case, backslash continuation, double-quoted continuation, and the bare-CR-in-double-quotes regression guard.
Security risks
None identified. This is purely lexer tokenization of CR bytes; it does not touch auth, network, filesystem permissions, or process spawning. The only behavioural change is that previously-broken CRLF scripts now tokenize the same as their LF equivalents.
Level of scrutiny
Moderate-to-high. src/shell/shell.zig's lexer is a hand-written state machine that backs both bun run *.sh and the Bun.$ tagged-template API, so tokenization changes have broad reach. The PR went through three correction rounds during review (escaped-CR continuation, dead Double-state guard, bare-CR backslash regression) — each was a real edge case, and each is now covered by a dedicated test. The final logic reads correctly to me and the bug-hunting pass on the current revision found nothing, but the iteration history suggests a maintainer familiar with the lexer should confirm the chosen semantics (particularly adding \r to the Double-state escape list at line 3715 and then compensating by re-emitting the backslash at 2893–2895, rather than peeking two bytes in read_char()).
Other factors
- All four of my prior inline comments and CodeRabbit's ASAN-stderr note are resolved in the current diff.
- CI failures on the latest build (
socket.test.ts,bun-install-lifecycle-scripts.test.tson Windows) are unrelated to the shell lexer. - No CODEOWNERS entry for
src/shell/. - Tests use
tempDirper repo convention and filter ASAN noise from stderr.
bf909fe to
862e141
Compare
There was a problem hiding this comment.
All earlier feedback (escaped-CR continuation, Double-state escape list, bare-CR backslash preservation, tempDir, ASAN filter) is addressed and the four new tests cover each case — I didn't find anything further, but the final shape touches escape handling across Normal/Double lexer states in four places, so a maintainer glance at the shell.zig changes is worth it before merge.
Extended reasoning...
Overview
This PR makes Bun's shell lexer normalize CRLF line endings so Windows-edited .sh files run correctly. It touches src/shell/shell.zig in four places: (1) a new unescaped '\r' case in the main char dispatch that drops \r before \n in Normal state, (2) a new escaped-'\r' handler so \<CR><LF> acts as a line continuation, (3) adding '\r' to the Double-state backslash escape list in read_char(), and (4) adding '\r' to SPECIAL_CHARS. Four new tests in test/cli/run/run-shell.test.ts cover plain CRLF, backslash continuation, continuation inside double quotes, and the bare-CR-in-double-quotes edge case.
Security risks
None identified. This is pure tokenizer line-ending handling with no auth, crypto, filesystem-path, or external-input-trust implications. The change only affects how \r bytes are treated relative to adjacent \n and backslash bytes.
Level of scrutiny
Moderate-to-high. The shell lexer backs both bun run *.sh and the Bun.$ API, and escape/quote-state handling is exactly the kind of code where subtle interactions hide. The change grew over several review iterations from a single switch arm into coordinated logic across the unescaped path, the escaped path, and read_char()'s Double-state escape list — each iteration fixing a real edge case I (or CodeRabbit) raised, all now covered by tests. I believe the current logic is correct, but lexer state-machine changes of this shape are worth a maintainer's eyes rather than a bot-only sign-off.
Other factors
All four of my prior inline comments and the CodeRabbit ASAN-stderr comment are resolved with corresponding commits and tests. No CODEOWNERS entry covers these files. The PR description reports existing lex.test.ts/parse.test.ts pass unchanged and bunshell.test.ts pass/fail split is identical.
There was a problem hiding this comment.
All four rounds of prior feedback are addressed and the current lexer logic traces correctly for every CRLF/bare-CR/escaped/quoted combination I checked — but this is core shell-lexer state-machine code that needed three correction rounds to converge, so a maintainer familiar with shell.zig should give it a final look.
Extended reasoning...
Overview
This PR teaches Bun's shell lexer (src/shell/shell.zig) to normalize CRLF line endings so Windows-edited .sh files run correctly. It touches four spots: (1) a new unescaped-\r case in the main char dispatch that drops \r before \n in Normal state, (2) a new escaped-\r handler so \\<CR><LF> works as line-continuation, (3) \r added to the Double-state backslash escape list so the continuation also works inside double quotes, and (4) \r added to SPECIAL_CHARS so the comptime assertion passes and interpolated values containing CR get JS-ref-wrapped rather than inlined. Four new integration tests in test/cli/run/run-shell.test.ts cover the basic CRLF case plus each edge case raised during review.
Security risks
None identified. This is lexer tokenization for line endings; it doesn't touch auth, crypto, filesystem permissions, or network. The SPECIAL_CHARS addition makes interpolation strictly more conservative (more values get escaped via JS-ref), not less.
Level of scrutiny
Moderate-to-high. shell.zig is core runtime — the lexer feeds every .sh execution and every Bun.$ template. The change itself is small (~45 LoC of logic) but the state-machine interactions are subtle: getting it fully correct took three review rounds (escaped-path coverage, Double-state escape list, then the bare-CR-in-double-quotes backslash regression). Each was a real bug, each was fixed, and the final revision now handles every combination I traced (Normal/Double × escaped/unescaped × CRLF/bare-CR). That iteration history is exactly why a human maintainer should confirm the final shape rather than rely on bot sign-off alone.
Other factors
All four of my prior inline comments are resolved, the CodeRabbit ASAN-stderr flake concern is addressed via stripAsanWarning, the bug-hunting system found nothing new on the latest revision, and the new tests use the repo-preferred tempDir helper. The PR description's claim that bare \r is preserved is now actually true in all states. I'm deferring rather than approving solely because of the code path's criticality and the demonstrated subtlety, not because of any outstanding concern.
…isting) The integration test `test/integration/build-prefetch/prefetch.test.ts` spawns `bunExe()` and asserts `stderr` is empty. Under ASAN builds (`ASSERT_ENABLED`), JSC's `@assert(!dependency.isAsync)` in `vendor/WebKit/Source/JavaScriptCore/builtins/ModuleLoader.js:544` fires during module linking and prints an "ASSERTION FAILED" line to stderr, flipping all four tests to red. The assertion is a pre-existing JSC builtins issue in async-module linking — reproducible on `main` with no local changes, and hitting other unrelated PRs on shards that pick up this test (e.g. #29670). Add it to `test/expectations.txt` under `[ ASAN ]` so it joins the other tests that are skipped on ASAN lanes. Unblocks CI for #29678 without mixing unrelated fixes into that PR's code changes. Separate commit so it can be reverted independently once the JSC assertion is fixed.
…isting) The integration test `test/integration/build-prefetch/prefetch.test.ts` spawns `bunExe()` and asserts `stderr` is empty. Under ASAN builds (`ASSERT_ENABLED`), JSC's `@assert(!dependency.isAsync)` in `vendor/WebKit/Source/JavaScriptCore/builtins/ModuleLoader.js:544` fires during module linking and prints an "ASSERTION FAILED" line to stderr, flipping all four tests to red. The assertion is a pre-existing JSC builtins issue in async-module linking — reproducible on `main` with no local changes, and hitting other unrelated PRs on shards that pick up this test (e.g. #29670). Add it to `test/expectations.txt` under `[ ASAN ]` so it joins the other tests that are skipped on ASAN lanes. Unblocks CI for #29678 without mixing unrelated fixes into that PR's code changes. Separate commit so it can be reverted independently once the JSC assertion is fixed.
…isting) The integration test `test/integration/build-prefetch/prefetch.test.ts` spawns `bunExe()` and asserts `stderr` is empty. Under ASAN builds (`ASSERT_ENABLED`), JSC's `@assert(!dependency.isAsync)` in `vendor/WebKit/Source/JavaScriptCore/builtins/ModuleLoader.js:544` fires during module linking and prints an "ASSERTION FAILED" line to stderr, flipping all four tests to red. The assertion is a pre-existing JSC builtins issue in async-module linking — reproducible on `main` with no local changes, and hitting other unrelated PRs on shards that pick up this test (e.g. #29670). Add it to `test/expectations.txt` under `[ ASAN ]` so it joins the other tests that are skipped on ASAN lanes. Unblocks CI for #29678 without mixing unrelated fixes into that PR's code changes. Separate commit so it can be reverted independently once the JSC assertion is fixed.
661fb0e to
8bc4823
Compare
8bc4823 to
65f2e8e
Compare
There was a problem hiding this comment.
All three rounds of my earlier feedback are addressed and I found nothing new on this revision — deferring only because lexer state-machine changes across two parallel implementations (shell.zig + shell_parser/parse.rs) plus the SPECIAL_CHARS table addition feel worth one human pass before merge.
Extended reasoning...
Overview
The substantive change is CRLF normalization in Bun's shell lexer, implemented in parallel in src/runtime/shell/shell.zig and the Rust port src/shell_parser/parse.rs: an unescaped-\r case that swallows the CR when followed by LF, an escaped-\r handler so \\<CR><LF> acts as line-continuation (with a Double-state special case to re-emit the backslash for bare CR), \r added to the Double-state backslash escape list, and \r added to the SPECIAL_CHARS escaping table. Four new tests in test/cli/run/run-shell.test.ts cover the base case plus each edge case raised in earlier review rounds. The remaining ten .rs files are pure autofix.ci rustfmt reflows of #[cfg(...)] / #[cfg_attr(...)] attributes with zero semantic change.
Security risks
The shell lexer tokenizes user-supplied script content, so changes here are adjacent to command-injection surface. That said, this change only narrows what reaches appendCharToStrPool (dropping a CR that would otherwise be glued onto a token) and adds \r to the escaper's special-char set — neither widens the set of characters that can terminate a quote or start a substitution. I don't see an injection vector introduced here.
Level of scrutiny
Moderate. The fix itself is conceptually simple, but it touches a hand-rolled lexer state machine in two implementations and went through three review iterations on this PR, each of which surfaced a real correctness edge case (escaped-CR continuation, dead Double-state guard, bare-CR backslash regression). That history says the area is subtle enough that a maintainer familiar with the shell subsystem should confirm the final shape — particularly the SPECIAL_CHARS table addition, which also affects ShellSrcBuilder output escaping, and the Rust port's fell_through control-flow mapping of Zig's continue.
Other factors
All prior inline comments (mine and CodeRabbit's) are resolved in the current diff, and each fix has a dedicated regression test. The robobun build-status comment still shows linker/ci.ts failures on 65f2e8e, but the ci: retrigger after release-build infra flake commit and the -no-pie warning text indicate those are infrastructure flakes unrelated to this change.
1f003f8 to
e2957e8
Compare
Windows editors write CRLF by default. The shell lexer treated \n as a word breaker / Newline token but had no case for \r, so \r\n left the trailing \r glued to the preceding token. This showed up as "command not found: export" (actually looking for "export\r"), exported env vars with a trailing \r in their value, and "bun run build" failing with "Script not found \"build\r\"" — the \r reset the terminal cursor so the user only saw "rror: Script not found \"build". Fix in the lexer (src/shell/shell.zig): when a \r is seen in Normal state and the next char is \n, skip the \r. The \n handler then fires its usual word break + Newline token. Literals inside single/double quotes are preserved (matches bash/dash).
The initial fix only handled unescaped CRLF via a case in the main lexer switch, which is inside the `!escaped` block. Backslash line-continuation in a CRLF script (`cmd arg \\<CR><LF>arg2`) bypasses it: read_char() swallows the backslash and returns the `\r` as escaped, leaving the `\n` for the next read. The `\r` gets glued onto the previous word and the `\n` fires a real Newline, so `arg2` runs as a separate command. Mirror the existing escaped-\n line-continuation handler with an escaped-\r handler: if the next char is an unescaped \n, consume it and treat the whole \<CR><LF> as a single line continuation. Escaped CR without a trailing \n falls through to preserve the byte. Also switch the new test over to the `tempDir` harness helper per CLAUDE.md convention, and add a regression test for the line-continuation case.
The previous commit added an escaped-\\r handler for the CRLF line-continuation case, with a `state != .Double` guard mirroring the existing escaped-\\n handler. But read_char()'s Double-state escape whitelist didn't include \\r, so `\<CR>` inside double quotes never returned as escaped and the guard was dead. Net result: `"hello \\<CR><LF>world"` left a literal backslash + CR + LF embedded in the string, while the LF equivalent produced "hello world" (per POSIX `\<newline>` removal inside double quotes). Add \\r to the Double-state escape list so the handler fires there too, and add a regression test.
Adding \r to the Double-state backslash escape list made `\<CR><LF>` act as a line continuation (matches `\<LF>`), but it also pulled in `\<CR>` where the CR is *not* followed by LF. In that case read_char() swallowed the backslash and returned the CR as escaped, and the lexer's fall-through appended only the CR — silently dropping the backslash. Bash/POSIX keep `\<CR>` as literal `\` + CR inside double quotes (CR isn't in the set of chars a backslash escapes there). Re-emit the backslash in that branch when state is Double so the bare-CR case matches bash. The CRLF case is unchanged because it takes the `continue` path above and never falls through. Narrow edge case — bare CR not part of CRLF is classic-Mac-only — but it's a real correctness regression vs pre-PR Bun and bash.
Debug/ASAN builds print "WARNING: ASAN interferes with JSC signal handlers..." to stderr when JSC initializes. The .sh dispatch path doesn't init JSC so this normally doesn't happen for the code under test, but the warning can still surface in other scenarios (e.g. if the spawn environment pre-inits JSC) and was flagged as a known source of test flakiness in review. Wrap the stderr assertions in a helper that drops ASAN lines before comparing, matching the existing pattern in test/js/bun/resolve/bun-main-entry-point.test.ts and friends.
Previous gate run reported the release build failing mid-fetch (truncated log at `[brotli] up to`). Locally and on buildkite the release build succeeds cleanly — this is a transient environment issue, not a real problem in the source. Empty commit to retrigger CI/gate.
The shell lexer was ported from Zig to Rust during the src/ restructure (`src/shell/shell.zig` → `src/shell_parser/parse.rs`). The Zig CRLF handling added earlier in this PR didn't carry over automatically — the Rust port mirrored the pre-PR Zig state. Port the three-part Zig fix so runtime behaviour matches: 1. Main dispatch gains an unescaped-`\r` case: in Normal state, if the next char is unescaped `\n`, skip this `\r` so the `\n` handler fires its usual word-break + Newline. Inside quotes, bytes stay literal (matches bash/dash). 2. Escaped-`\r` handler mirrors the existing escaped-`\n` branch: if followed by unescaped `\n`, consume the `\n` so `\<CR><LF>` acts as one line continuation. In Double state, re-emit the backslash that read_char() swallowed so bare `\<CR>` without a trailing `\n` stays literal `\` + CR (POSIX). 3. `\r` added to the Double-state backslash escape whitelist in read_char() so `\<CR>` returns as escaped and the handler above can reach the line-continuation path. SPECIAL_CHARS gains `\r` (size 34 → 35) so assert_special_char passes and the scanner fast path stops on CR. All 6 run-shell tests pass. Existing lex.test.ts / parse.test.ts pass unchanged (they go through the Zig TestingAPIs.shellLex path, which still has the matching Zig fix).
Per CLAUDE.md:141: `.zig` siblings of `.rs` files are reference-only and "Never add new behavior to a .zig file." `src/runtime/shell/shell.zig` is the pre-Rust-port reference for `src/shell_parser/parse.rs` — the codegen entries that look like they import it (`GeneratedJS2Native.zig`, `shellInternals.lex/parse`) are routed to Rust equivalents by the real generated `build/*/codegen/generated_js2native.rs`. The earlier Zig hunks in this PR were functional no-ops at runtime; the live fix is already in `src/shell_parser/parse.rs`. Revert them so the reference file stays a reference. All 6 run-shell tests + 29 lex tests still pass after revert.
fbd86ba to
0380528
Compare
There was a problem hiding this comment.
All earlier feedback is addressed (escaped-CR continuation, double-quote escape list, bare-CR backslash preservation, ASAN filter, and the .zig revert) and I didn't find anything new — but since this touches the shell lexer's main char dispatch and the SPECIAL_CHARS set used for template-literal escaping, it's worth a human pass before merging.
Extended reasoning...
Overview
The PR now modifies exactly two files: src/shell_parser/parse.rs (the shipped Rust shell lexer) and test/cli/run/run-shell.test.ts. In parse.rs it (1) adds a '\r' arm to the unescaped char dispatch that discards a CR immediately preceding an unescaped LF in Normal state while leaving CR literal inside quotes and for bare CR; (2) adds an escaped-\r handler so \\<CR><LF> behaves as a line continuation in both Normal and Double state, with the bare-CR-in-double-quotes case re-emitting the swallowed backslash to preserve POSIX semantics; (3) adds '\r' to the Double-state backslash escape list in read_char(); and (4) adds '\r' to SPECIAL_CHARS (now [u8; 35]). The test file adds a stripAsanWarning helper and four integration tests covering each of those paths. The earlier shell.zig edits were reverted in 0380528, so the diff no longer touches reference-only code.
Security risks
Shell lexing determines word boundaries and therefore what gets executed, so any change here is security-adjacent. The specific change is narrowing — CR before LF is dropped rather than glued onto the preceding token — which reduces the surface for surprising tokenization rather than expanding it. Adding \r to SPECIAL_CHARS also flows into SPECIAL_CHARS_TABLE, which (a) gates the lexer's ASCII fast-path scan at parse.rs:2628 (necessary for the new arm to be reachable) and (b) feeds needs_escape_* so a \r in a JS value interpolated into a Bun.$ template now routes through the JS-string-ref path instead of being inlined raw. That second effect is a behavior change outside the .sh-file path and is the part most worth a human glance, though it appears strictly safer (matches how \n is already handled).
Level of scrutiny
Medium-high. The diff is small and well-commented, every edge case I raised over four prior review rounds now has both a code fix and a dedicated regression test, and I traced fell_through / break 'escaped semantics against the surrounding cases to confirm the new arm is consistent. But this is the production shell lexer's hot loop, and lexer edge cases are subtle enough that a maintainer familiar with parse.rs should sign off rather than relying solely on automated review.
Other factors
All six of my earlier inline comments are resolved and reflected in the current diff; CodeRabbit's ASAN-stderr concern is addressed via stripAsanWarning; no CODEOWNERS entry matches these paths; the bug-hunting pass on the current revision found nothing. Given the critical code path I'm deferring rather than approving.
Fixes #29669
Repro
Before:
(the
\rreset the terminal cursor, so the user saw "rror: Script not found "build"").Windows-edited
.shfiles all have this shape, and the investigation note confirmed it repros identically on Linux if the file has CRLF.Cause
src/shell/shell.ziglexer handled'\n'as a word-breaker / Newline token but had no case for'\r'.Chars.isWhitespacelisted\rbut that helper is only consulted in narrow spots (comment lookbehind,[[/]]followers). In normal word scanning,\rfell through toappendCharToStrPooland got glued onto the preceding token — soexport\rwasn't a recognised builtin,VITE_PARAM=value\rbecame the literal value, andbun run build\rfailed script lookup.Fix
Add a
'\r'case in the lexer's main char dispatch: in Normal state, if the next char is'\n', swallow the\r(the\nhandler then emits its usual word-break + Newline token). Inside single/double quotes, fall through and preserve the byte literally — matches bash/dash behaviour for quoted multi-line strings. Bare\rnot followed by\nis also preserved (no silent Mac-classic CR-only line ending behaviour change).Verification
test/cli/run/run-shell.test.ts— new test runs a CRLF-terminated script containingexport, expanded-var echo, and a plainecho. FAILS on main (reveals\rin stdout diff), PASSES with this patch.test/js/bun/shell/lex.test.ts,parse.test.ts): 46/46 pass unchanged.bunshell.test.tspass/fail split identical before and after (292 pass / 52 pre-existing fails).