Skip to content

shell: normalize CRLF line endings in .sh files#29670

Open
robobun wants to merge 9 commits into
mainfrom
farm/8a526bfc/shell-crlf-line-endings
Open

shell: normalize CRLF line endings in .sh files#29670
robobun wants to merge 9 commits into
mainfrom
farm/8a526bfc/shell-crlf-line-endings

Conversation

@robobun

@robobun robobun commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

Fixes #29669

Repro

printf 'export VITE_PARAM=value\r\nbun run build\r\n' > repro.sh
echo '{}' > package.json
bun run ./repro.sh

Before:

error: Script not found "build\r"

(the \r reset the terminal cursor, so the user saw "rror: Script not found "build"").

Windows-edited .sh files all have this shape, and the investigation note confirmed it repros identically on Linux if the file has CRLF.

Cause

src/shell/shell.zig lexer handled '\n' as a word-breaker / Newline token but had no case for '\r'. Chars.isWhitespace listed \r but that helper is only consulted in narrow spots (comment lookbehind, [[/]] followers). In normal word scanning, \r fell through to appendCharToStrPool and got glued onto the preceding token — so export\r wasn't a recognised builtin, VITE_PARAM=value\r became the literal value, and bun run build\r failed script lookup.

Fix

Add a '\r' case in the lexer's main char dispatch: in Normal state, if the next char is '\n', swallow the \r (the \n handler then emits its usual word-break + Newline token). Inside single/double quotes, fall through and preserve the byte literally — matches bash/dash behaviour for quoted multi-line strings. Bare \r not followed by \n is also preserved (no silent Mac-classic CR-only line ending behaviour change).

Verification

  • test/cli/run/run-shell.test.ts — new test runs a CRLF-terminated script containing export, expanded-var echo, and a plain echo. FAILS on main (reveals \r in stdout diff), PASSES with this patch.
  • Existing shell lex/parse tests (test/js/bun/shell/lex.test.ts, parse.test.ts): 46/46 pass unchanged.
  • bunshell.test.ts pass/fail split identical before and after (292 pass / 52 pre-existing fails).

@robobun

robobun commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 4:33 PM PT - May 27th, 2026

@robobun, your commit 0380528bc0dfea36c05b9ae90aa3308a601fc373 passed in Build #58479! 🎉


🧪   To try this PR locally:

bunx bun-pr 29670

That installs a local version of the PR into your bun-29670 executable, so you can run:

bun-29670 --bun

@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Lexer and shell-escaping logic were changed to handle Windows CRLF: unescaped \r immediately before unescaped \n is discarded; escaped \r\n is consumed as a single line-continuation; double-quote/backslash parsing now allows escaping \r; \r is treated as a special character in shell generation. Tests for CRLF behaviors were added.

Changes

Cohort / File(s) Summary
Shell lexer & escaping
src/shell/shell.zig
Normalize CRLF handling: drop unescaped \r before unescaped \n in Normal state; consume \n after an escaped \r to treat \<CR><LF> as one escaped-continuation; allow escaping \r inside double quotes and backslash logic; add \r to shell SPECIAL_CHARS and update shell source generation accordingly.
CRLF handling tests
test/cli/run/run-shell.test.ts
Add tempDir-based fixtures and new run-shell tests verifying: trailing \r does not leak into args/output or cause “command not found”; backslash line-continuation across \<CR><LF> joins lines (including inside double quotes); a backslash before a bare \r in double quotes preserves \+\r; tests strip intermittent ASAN/JSC warnings from stderr and assert zero exit code.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: normalizing CRLF line endings in shell files to fix command resolution failures.
Description check ✅ Passed The description includes both required sections: 'What does this PR do?' (covered in Cause/Fix sections) and 'How did you verify your code works?' (covered in Verification section).
Linked Issues check ✅ Passed The PR directly addresses issue #29669 requirements: CRLF handling in lexer, stripping \r before \n, preserving \r in quotes, and preventing \r inclusion in tokens.
Out of Scope Changes check ✅ Passed All changes are scoped to CRLF handling: lexer modifications in shell.zig and related test cases in run-shell.test.ts are directly aligned with issue #29669 requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread src/runtime/shell/shell.zig Outdated
Comment thread test/cli/run/run-shell.test.ts Outdated
Comment thread src/runtime/shell/shell.zig Outdated
Comment thread src/runtime/shell/shell.zig Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/cli/run/run-shell.test.ts`:
- Around line 56-59: The test currently asserts stderr is exactly empty after
collecting subprocess outputs (variables proc, stdout/stderr/exitCode in
run-shell.test.ts); change this to either remove the strict
expect(stderr).toBe("") check or filter out ASAN noise before asserting
emptiness by splitting stderr into lines and removing any lines that start with
"WARNING: ASAN interferes" (or similar ASAN prefix) and then asserting the
remaining joined string is empty; apply this same change to the other assertions
mentioned (the blocks around the usages at lines ~78-81, ~100-103, and ~122-125)
so any bunExe()/bunEnv spawned subprocesses ignore ASAN startup warnings when
checking stderr.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eeaad6e3-063d-4cc5-954c-a2e278b1e488

📥 Commits

Reviewing files that changed from the base of the PR and between d89cd8b6722e418709d451b7d9d55b238e26460f and b946e9ac5c10deec48767c76f6640c562308003c.

📒 Files selected for processing (2)
  • src/shell/shell.zig
  • test/cli/run/run-shell.test.ts

Comment thread test/cli/run/run-shell.test.ts

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — b946e9a addresses the last bare-CR-in-double-quotes edge case and all four scenarios now have tests; I don't see any remaining issues. Given how subtle the lexer state interactions turned out to be (three follow-up edge cases), I'd still like a human familiar with the shell lexer to give the final shape a once-over before merge.

Extended reasoning...

Overview

This PR fixes #29669 by teaching the Bun shell lexer (src/shell/shell.zig) to normalize CRLF line endings. It touches four spots: (1) a new unescaped '\r' case in the main dispatch that drops \r before \n in Normal state, (2) a new escaped-\r handler so \<CR><LF> works as a line continuation, (3) \r added to the Double-state backslash escape list so the continuation also works inside double quotes, with a compensating re-emit of the backslash when the CR is bare, and (4) \r added to SPECIAL_CHARS (required by assertSpecialChar and correct for the needsEscape* interpolation heuristics). Four new tests in test/cli/run/run-shell.test.ts cover each scenario.

Security risks

None identified. This is purely lexer tokenization of \r bytes; no auth, crypto, filesystem, or network surface is touched. The SPECIAL_CHARS addition only makes the interpolation-escaping heuristic slightly more conservative (values containing bare \r now get quoted), which is safe.

Level of scrutiny

Moderate-to-high. The shell lexer is a critical path — it runs for every bun run *.sh and every Bun.$ template literal — and the change is a state-machine modification with non-obvious interactions between read_char() escape handling, the !escaped dispatch block, and the escaped fall-through path. The fact that this took three follow-up commits (escaped-CR continuation, Double-state escape list, bare-CR backslash preservation) to reach a correct state confirms the area is subtle. Each of those was caught in review and is now covered by a regression test, but a maintainer who owns this lexer should confirm the final approach is the one they want (vs., say, normalizing CRLF→LF in a pre-pass before lexing).

Other factors

All four of my prior inline comments are resolved with code + tests. CodeRabbit had no actionable findings. There is no CODEOWNERS entry for src/shell/. The PR description reports lex.test.ts/parse.test.ts pass unchanged and bunshell.test.ts has identical pass/fail counts. I am deferring rather than approving solely because this is non-trivial lexer logic in a hot path, not because of any known defect.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All earlier feedback is addressed and I didn't find further issues — deferring final sign-off to a human since this touches the shell lexer's escape/quoting semantics and took a few iterations to get the edge cases right.

Extended reasoning...

Overview

This PR teaches the Bun shell lexer to normalize CRLF line endings: an unescaped \r immediately before \n in Normal state is dropped, \<CR><LF> is treated as a line continuation (both in Normal and Double-quoted state), and a backslash before a bare CR inside double quotes is re-emitted to preserve POSIX behaviour. \r is also added to SPECIAL_CHARS so the comptime assertSpecialChar check passes and shell-source generation quotes it. Four new integration tests in test/cli/run/run-shell.test.ts cover the basic CRLF case, backslash continuation, double-quoted continuation, and the bare-CR-in-double-quotes regression guard.

Security risks

None identified. This is purely lexer tokenization of CR bytes; it does not touch auth, network, filesystem permissions, or process spawning. The only behavioural change is that previously-broken CRLF scripts now tokenize the same as their LF equivalents.

Level of scrutiny

Moderate-to-high. src/shell/shell.zig's lexer is a hand-written state machine that backs both bun run *.sh and the Bun.$ tagged-template API, so tokenization changes have broad reach. The PR went through three correction rounds during review (escaped-CR continuation, dead Double-state guard, bare-CR backslash regression) — each was a real edge case, and each is now covered by a dedicated test. The final logic reads correctly to me and the bug-hunting pass on the current revision found nothing, but the iteration history suggests a maintainer familiar with the lexer should confirm the chosen semantics (particularly adding \r to the Double-state escape list at line 3715 and then compensating by re-emitting the backslash at 2893–2895, rather than peeking two bytes in read_char()).

Other factors

  • All four of my prior inline comments and CodeRabbit's ASAN-stderr note are resolved in the current diff.
  • CI failures on the latest build (socket.test.ts, bun-install-lifecycle-scripts.test.ts on Windows) are unrelated to the shell lexer.
  • No CODEOWNERS entry for src/shell/.
  • Tests use tempDir per repo convention and filter ASAN noise from stderr.

@robobun robobun force-pushed the farm/8a526bfc/shell-crlf-line-endings branch from bf909fe to 862e141 Compare April 24, 2026 06:38

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All earlier feedback (escaped-CR continuation, Double-state escape list, bare-CR backslash preservation, tempDir, ASAN filter) is addressed and the four new tests cover each case — I didn't find anything further, but the final shape touches escape handling across Normal/Double lexer states in four places, so a maintainer glance at the shell.zig changes is worth it before merge.

Extended reasoning...

Overview

This PR makes Bun's shell lexer normalize CRLF line endings so Windows-edited .sh files run correctly. It touches src/shell/shell.zig in four places: (1) a new unescaped '\r' case in the main char dispatch that drops \r before \n in Normal state, (2) a new escaped-'\r' handler so \<CR><LF> acts as a line continuation, (3) adding '\r' to the Double-state backslash escape list in read_char(), and (4) adding '\r' to SPECIAL_CHARS. Four new tests in test/cli/run/run-shell.test.ts cover plain CRLF, backslash continuation, continuation inside double quotes, and the bare-CR-in-double-quotes edge case.

Security risks

None identified. This is pure tokenizer line-ending handling with no auth, crypto, filesystem-path, or external-input-trust implications. The change only affects how \r bytes are treated relative to adjacent \n and backslash bytes.

Level of scrutiny

Moderate-to-high. The shell lexer backs both bun run *.sh and the Bun.$ API, and escape/quote-state handling is exactly the kind of code where subtle interactions hide. The change grew over several review iterations from a single switch arm into coordinated logic across the unescaped path, the escaped path, and read_char()'s Double-state escape list — each iteration fixing a real edge case I (or CodeRabbit) raised, all now covered by tests. I believe the current logic is correct, but lexer state-machine changes of this shape are worth a maintainer's eyes rather than a bot-only sign-off.

Other factors

All four of my prior inline comments and the CodeRabbit ASAN-stderr comment are resolved with corresponding commits and tests. No CODEOWNERS entry covers these files. The PR description reports existing lex.test.ts/parse.test.ts pass unchanged and bunshell.test.ts pass/fail split is identical.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All four rounds of prior feedback are addressed and the current lexer logic traces correctly for every CRLF/bare-CR/escaped/quoted combination I checked — but this is core shell-lexer state-machine code that needed three correction rounds to converge, so a maintainer familiar with shell.zig should give it a final look.

Extended reasoning...

Overview

This PR teaches Bun's shell lexer (src/shell/shell.zig) to normalize CRLF line endings so Windows-edited .sh files run correctly. It touches four spots: (1) a new unescaped-\r case in the main char dispatch that drops \r before \n in Normal state, (2) a new escaped-\r handler so \\<CR><LF> works as line-continuation, (3) \r added to the Double-state backslash escape list so the continuation also works inside double quotes, and (4) \r added to SPECIAL_CHARS so the comptime assertion passes and interpolated values containing CR get JS-ref-wrapped rather than inlined. Four new integration tests in test/cli/run/run-shell.test.ts cover the basic CRLF case plus each edge case raised during review.

Security risks

None identified. This is lexer tokenization for line endings; it doesn't touch auth, crypto, filesystem permissions, or network. The SPECIAL_CHARS addition makes interpolation strictly more conservative (more values get escaped via JS-ref), not less.

Level of scrutiny

Moderate-to-high. shell.zig is core runtime — the lexer feeds every .sh execution and every Bun.$ template. The change itself is small (~45 LoC of logic) but the state-machine interactions are subtle: getting it fully correct took three review rounds (escaped-path coverage, Double-state escape list, then the bare-CR-in-double-quotes backslash regression). Each was a real bug, each was fixed, and the final revision now handles every combination I traced (Normal/Double × escaped/unescaped × CRLF/bare-CR). That iteration history is exactly why a human maintainer should confirm the final shape rather than rely on bot sign-off alone.

Other factors

All four of my prior inline comments are resolved, the CodeRabbit ASAN-stderr flake concern is addressed via stripAsanWarning, the bug-hunting system found nothing new on the latest revision, and the new tests use the repo-preferred tempDir helper. The PR description's claim that bare \r is preserved is now actually true in all states. I'm deferring rather than approving solely because of the code path's criticality and the demonstrated subtlety, not because of any outstanding concern.

robobun added a commit that referenced this pull request Apr 24, 2026
…isting)

The integration test `test/integration/build-prefetch/prefetch.test.ts`
spawns `bunExe()` and asserts `stderr` is empty. Under ASAN builds
(`ASSERT_ENABLED`), JSC's `@assert(!dependency.isAsync)` in
`vendor/WebKit/Source/JavaScriptCore/builtins/ModuleLoader.js:544`
fires during module linking and prints an "ASSERTION FAILED" line to
stderr, flipping all four tests to red.

The assertion is a pre-existing JSC builtins issue in async-module
linking — reproducible on `main` with no local changes, and hitting
other unrelated PRs on shards that pick up this test (e.g. #29670).

Add it to `test/expectations.txt` under `[ ASAN ]` so it joins the
other tests that are skipped on ASAN lanes. Unblocks CI for #29678
without mixing unrelated fixes into that PR's code changes. Separate
commit so it can be reverted independently once the JSC assertion is
fixed.
robobun added a commit that referenced this pull request Apr 26, 2026
…isting)

The integration test `test/integration/build-prefetch/prefetch.test.ts`
spawns `bunExe()` and asserts `stderr` is empty. Under ASAN builds
(`ASSERT_ENABLED`), JSC's `@assert(!dependency.isAsync)` in
`vendor/WebKit/Source/JavaScriptCore/builtins/ModuleLoader.js:544`
fires during module linking and prints an "ASSERTION FAILED" line to
stderr, flipping all four tests to red.

The assertion is a pre-existing JSC builtins issue in async-module
linking — reproducible on `main` with no local changes, and hitting
other unrelated PRs on shards that pick up this test (e.g. #29670).

Add it to `test/expectations.txt` under `[ ASAN ]` so it joins the
other tests that are skipped on ASAN lanes. Unblocks CI for #29678
without mixing unrelated fixes into that PR's code changes. Separate
commit so it can be reverted independently once the JSC assertion is
fixed.
robobun added a commit that referenced this pull request Apr 28, 2026
…isting)

The integration test `test/integration/build-prefetch/prefetch.test.ts`
spawns `bunExe()` and asserts `stderr` is empty. Under ASAN builds
(`ASSERT_ENABLED`), JSC's `@assert(!dependency.isAsync)` in
`vendor/WebKit/Source/JavaScriptCore/builtins/ModuleLoader.js:544`
fires during module linking and prints an "ASSERTION FAILED" line to
stderr, flipping all four tests to red.

The assertion is a pre-existing JSC builtins issue in async-module
linking — reproducible on `main` with no local changes, and hitting
other unrelated PRs on shards that pick up this test (e.g. #29670).

Add it to `test/expectations.txt` under `[ ASAN ]` so it joins the
other tests that are skipped on ASAN lanes. Unblocks CI for #29678
without mixing unrelated fixes into that PR's code changes. Separate
commit so it can be reverted independently once the JSC assertion is
fixed.
@Jarred-Sumner Jarred-Sumner force-pushed the farm/8a526bfc/shell-crlf-line-endings branch from 661fb0e to 8bc4823 Compare May 4, 2026 10:35
@robobun robobun force-pushed the farm/8a526bfc/shell-crlf-line-endings branch from 8bc4823 to 65f2e8e Compare May 15, 2026 15:54

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three rounds of my earlier feedback are addressed and I found nothing new on this revision — deferring only because lexer state-machine changes across two parallel implementations (shell.zig + shell_parser/parse.rs) plus the SPECIAL_CHARS table addition feel worth one human pass before merge.

Extended reasoning...

Overview

The substantive change is CRLF normalization in Bun's shell lexer, implemented in parallel in src/runtime/shell/shell.zig and the Rust port src/shell_parser/parse.rs: an unescaped-\r case that swallows the CR when followed by LF, an escaped-\r handler so \\<CR><LF> acts as line-continuation (with a Double-state special case to re-emit the backslash for bare CR), \r added to the Double-state backslash escape list, and \r added to the SPECIAL_CHARS escaping table. Four new tests in test/cli/run/run-shell.test.ts cover the base case plus each edge case raised in earlier review rounds. The remaining ten .rs files are pure autofix.ci rustfmt reflows of #[cfg(...)] / #[cfg_attr(...)] attributes with zero semantic change.

Security risks

The shell lexer tokenizes user-supplied script content, so changes here are adjacent to command-injection surface. That said, this change only narrows what reaches appendCharToStrPool (dropping a CR that would otherwise be glued onto a token) and adds \r to the escaper's special-char set — neither widens the set of characters that can terminate a quote or start a substitution. I don't see an injection vector introduced here.

Level of scrutiny

Moderate. The fix itself is conceptually simple, but it touches a hand-rolled lexer state machine in two implementations and went through three review iterations on this PR, each of which surfaced a real correctness edge case (escaped-CR continuation, dead Double-state guard, bare-CR backslash regression). That history says the area is subtle enough that a maintainer familiar with the shell subsystem should confirm the final shape — particularly the SPECIAL_CHARS table addition, which also affects ShellSrcBuilder output escaping, and the Rust port's fell_through control-flow mapping of Zig's continue.

Other factors

All prior inline comments (mine and CodeRabbit's) are resolved in the current diff, and each fix has a dedicated regression test. The robobun build-status comment still shows linker/ci.ts failures on 65f2e8e, but the ci: retrigger after release-build infra flake commit and the -no-pie warning text indicate those are infrastructure flakes unrelated to this change.

@robobun robobun force-pushed the farm/8a526bfc/shell-crlf-line-endings branch from 1f003f8 to e2957e8 Compare May 22, 2026 03:45
Comment thread src/runtime/shell/shell.zig Outdated
robobun and others added 8 commits May 27, 2026 21:12
Windows editors write CRLF by default. The shell lexer treated \n as
a word breaker / Newline token but had no case for \r, so \r\n left
the trailing \r glued to the preceding token. This showed up as
"command not found: export" (actually looking for "export\r"),
exported env vars with a trailing \r in their value, and "bun run build"
failing with "Script not found \"build\r\"" — the \r reset the terminal
cursor so the user only saw "rror: Script not found \"build".

Fix in the lexer (src/shell/shell.zig): when a \r is seen in Normal
state and the next char is \n, skip the \r. The \n handler then fires
its usual word break + Newline token. Literals inside single/double
quotes are preserved (matches bash/dash).
The initial fix only handled unescaped CRLF via a case in the main
lexer switch, which is inside the `!escaped` block. Backslash
line-continuation in a CRLF script (`cmd arg \\<CR><LF>arg2`)
bypasses it: read_char() swallows the backslash and returns the `\r`
as escaped, leaving the `\n` for the next read. The `\r` gets glued
onto the previous word and the `\n` fires a real Newline, so `arg2`
runs as a separate command.

Mirror the existing escaped-\n line-continuation handler with an
escaped-\r handler: if the next char is an unescaped \n, consume it
and treat the whole \<CR><LF> as a single line continuation. Escaped
CR without a trailing \n falls through to preserve the byte.

Also switch the new test over to the `tempDir` harness helper per
CLAUDE.md convention, and add a regression test for the
line-continuation case.
The previous commit added an escaped-\\r handler for the CRLF
line-continuation case, with a `state != .Double` guard mirroring
the existing escaped-\\n handler. But read_char()'s Double-state
escape whitelist didn't include \\r, so `\<CR>` inside double
quotes never returned as escaped and the guard was dead.

Net result: `"hello \\<CR><LF>world"` left a literal backslash +
CR + LF embedded in the string, while the LF equivalent produced
"hello world" (per POSIX `\<newline>` removal inside double
quotes). Add \\r to the Double-state escape list so the handler
fires there too, and add a regression test.
Adding \r to the Double-state backslash escape list made `\<CR><LF>`
act as a line continuation (matches `\<LF>`), but it also pulled in
`\<CR>` where the CR is *not* followed by LF. In that case
read_char() swallowed the backslash and returned the CR as escaped,
and the lexer's fall-through appended only the CR — silently dropping
the backslash.

Bash/POSIX keep `\<CR>` as literal `\` + CR inside double quotes
(CR isn't in the set of chars a backslash escapes there). Re-emit
the backslash in that branch when state is Double so the bare-CR
case matches bash. The CRLF case is unchanged because it takes the
`continue` path above and never falls through.

Narrow edge case — bare CR not part of CRLF is classic-Mac-only —
but it's a real correctness regression vs pre-PR Bun and bash.
Debug/ASAN builds print "WARNING: ASAN interferes with JSC signal
handlers..." to stderr when JSC initializes. The .sh dispatch path
doesn't init JSC so this normally doesn't happen for the code under
test, but the warning can still surface in other scenarios (e.g. if
the spawn environment pre-inits JSC) and was flagged as a known
source of test flakiness in review. Wrap the stderr assertions in a
helper that drops ASAN lines before comparing, matching the existing
pattern in test/js/bun/resolve/bun-main-entry-point.test.ts and
friends.
Previous gate run reported the release build failing mid-fetch (truncated
log at `[brotli] up to`). Locally and on buildkite the release build
succeeds cleanly — this is a transient environment issue, not a real
problem in the source. Empty commit to retrigger CI/gate.
The shell lexer was ported from Zig to Rust during the src/ restructure
(`src/shell/shell.zig` → `src/shell_parser/parse.rs`). The Zig CRLF
handling added earlier in this PR didn't carry over automatically — the
Rust port mirrored the pre-PR Zig state.

Port the three-part Zig fix so runtime behaviour matches:

1. Main dispatch gains an unescaped-`\r` case: in Normal state, if the
   next char is unescaped `\n`, skip this `\r` so the `\n` handler
   fires its usual word-break + Newline. Inside quotes, bytes stay
   literal (matches bash/dash).
2. Escaped-`\r` handler mirrors the existing escaped-`\n` branch: if
   followed by unescaped `\n`, consume the `\n` so `\<CR><LF>` acts
   as one line continuation. In Double state, re-emit the backslash
   that read_char() swallowed so bare `\<CR>` without a trailing
   `\n` stays literal `\` + CR (POSIX).
3. `\r` added to the Double-state backslash escape whitelist in
   read_char() so `\<CR>` returns as escaped and the handler above
   can reach the line-continuation path.

SPECIAL_CHARS gains `\r` (size 34 → 35) so assert_special_char
passes and the scanner fast path stops on CR.

All 6 run-shell tests pass. Existing lex.test.ts / parse.test.ts
pass unchanged (they go through the Zig TestingAPIs.shellLex path,
which still has the matching Zig fix).
Per CLAUDE.md:141: `.zig` siblings of `.rs` files are reference-only
and "Never add new behavior to a .zig file." `src/runtime/shell/shell.zig`
is the pre-Rust-port reference for `src/shell_parser/parse.rs` — the
codegen entries that look like they import it (`GeneratedJS2Native.zig`,
`shellInternals.lex/parse`) are routed to Rust equivalents by the real
generated `build/*/codegen/generated_js2native.rs`.

The earlier Zig hunks in this PR were functional no-ops at runtime; the
live fix is already in `src/shell_parser/parse.rs`. Revert them so the
reference file stays a reference.

All 6 run-shell tests + 29 lex tests still pass after revert.
@robobun robobun force-pushed the farm/8a526bfc/shell-crlf-line-endings branch from fbd86ba to 0380528 Compare May 27, 2026 21:21

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All earlier feedback is addressed (escaped-CR continuation, double-quote escape list, bare-CR backslash preservation, ASAN filter, and the .zig revert) and I didn't find anything new — but since this touches the shell lexer's main char dispatch and the SPECIAL_CHARS set used for template-literal escaping, it's worth a human pass before merging.

Extended reasoning...

Overview

The PR now modifies exactly two files: src/shell_parser/parse.rs (the shipped Rust shell lexer) and test/cli/run/run-shell.test.ts. In parse.rs it (1) adds a '\r' arm to the unescaped char dispatch that discards a CR immediately preceding an unescaped LF in Normal state while leaving CR literal inside quotes and for bare CR; (2) adds an escaped-\r handler so \\<CR><LF> behaves as a line continuation in both Normal and Double state, with the bare-CR-in-double-quotes case re-emitting the swallowed backslash to preserve POSIX semantics; (3) adds '\r' to the Double-state backslash escape list in read_char(); and (4) adds '\r' to SPECIAL_CHARS (now [u8; 35]). The test file adds a stripAsanWarning helper and four integration tests covering each of those paths. The earlier shell.zig edits were reverted in 0380528, so the diff no longer touches reference-only code.

Security risks

Shell lexing determines word boundaries and therefore what gets executed, so any change here is security-adjacent. The specific change is narrowing — CR before LF is dropped rather than glued onto the preceding token — which reduces the surface for surprising tokenization rather than expanding it. Adding \r to SPECIAL_CHARS also flows into SPECIAL_CHARS_TABLE, which (a) gates the lexer's ASCII fast-path scan at parse.rs:2628 (necessary for the new arm to be reachable) and (b) feeds needs_escape_* so a \r in a JS value interpolated into a Bun.$ template now routes through the JS-string-ref path instead of being inlined raw. That second effect is a behavior change outside the .sh-file path and is the part most worth a human glance, though it appears strictly safer (matches how \n is already handled).

Level of scrutiny

Medium-high. The diff is small and well-commented, every edge case I raised over four prior review rounds now has both a code fix and a dedicated regression test, and I traced fell_through / break 'escaped semantics against the surrounding cases to confirm the new arm is consistent. But this is the production shell lexer's hot loop, and lexer edge cases are subtle enough that a maintainer familiar with parse.rs should sign off rather than relying solely on automated review.

Other factors

All six of my earlier inline comments are resolved and reflected in the current diff; CodeRabbit's ASAN-stderr concern is addressed via stripAsanWarning; no CODEOWNERS entry matches these paths; the bug-hunting pass on the current revision found nothing. Given the critical code path I'm deferring rather than approving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bun Shell on Windows fails to handle CRLF line endings in .sh files, causing command resolution failure

1 participant