Skip to content

napi: don't mutate m_pendingNapiModules while range-for iterating it#29981

Merged
Jarred-Sumner merged 2 commits into
mainfrom
farm/1138e689/napi-reentrant-register
Apr 30, 2026
Merged

napi: don't mutate m_pendingNapiModules while range-for iterating it#29981
Jarred-Sumner merged 2 commits into
mainfrom
farm/1138e689/napi-reentrant-register

Conversation

@robobun

@robobun robobun commented Apr 30, 2026

Copy link
Copy Markdown
Collaborator

What

process.dlopen iterates globalObject->m_pendingNapiModules after the shared library's static constructors have run, invoking each module's nm_register_func. If that callback itself calls napi_module_register() — or triggers a nested dlopen whose static constructors do — the append can grow the backing WTF::Vector and leave the outer range-for's iterator pointing into freed storage. Re-entrantly registered modules were also silently dropped by the trailing .clear().

How

std::exchange the pending NAPI/V8 vectors into locals before iterating, then drain m_pendingNapiModules in a loop so modules registered during an init callback are executed rather than discarded. Applied to both execute paths in Process_functionDlopen (fresh static-constructor registrations and the cached-registration replay).

Test

New reentrant_register_addon in test/napi/napi-app: two modules are registered from a static constructor; the first one's nm_register_func registers 64 more. Before this change the 64 re-entrant registrations are never executed (and the second static-constructor registration is reached via a stale iterator into the reallocated buffer); after, all 66 init callbacks run in order.

Also gives the beforeAll in napi.test.ts a 120 s timeout — the bunExe() install it runs does a full node-gyp rebuild which can exceed the default 5 s hook timeout under a debug/ASAN binary, at which point the test runner SIGTERMs the install and the whole file aborts with build failed, bailing out!.

process.dlopen iterates m_pendingNapiModules after the shared library's
static constructors have run, calling each module's nm_register_func.
If that callback itself calls napi_module_register() (or triggers a
nested dlopen that does), the append can grow the backing WTF::Vector
and leave the outer range-for's iterator pointing into freed memory.
Any registrations made during init were also silently dropped by the
trailing clear().

Move the pending vectors into locals with std::exchange before
iterating, and drain the global vector in a loop so re-entrantly
registered modules are executed instead of discarded. Same treatment
for the cached-registration replay path.
@robobun

robobun commented Apr 30, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 4:35 AM PT - Apr 30th, 2026

@robobun, your commit b1432a9 has 4 failures in Build #49378 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29981

That installs a local version of the PR into your bun-29981 executable, so you can run:

bun-29981 --bun

@coderabbitai

coderabbitai Bot commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@robobun has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 12 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3a344c47-bb6d-49ba-8145-fbe112cc0db3

📥 Commits

Reviewing files that changed from the base of the PR and between 360bbb5 and b1432a9.

📒 Files selected for processing (4)
  • src/bun.js/bindings/BunProcess.cpp
  • test/napi/napi-app/binding.gyp
  • test/napi/napi-app/reentrant_register_addon.cpp
  • test/napi/napi.test.ts

Review rate limit: 0/5 reviews remaining, refill in 5 minutes and 12 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

Found 2 issues this PR may fix:

  1. bun crashes when loading re2 #20454 - ASAN trace shows null function pointer call inside node_module_register during Process_functionDlopen, directly in the code path fixed by this PR
  2. Segmentation Fault on Bun v1.2.13 during bun run with NAPI module #19802 - Segfault with napi_module_register(3) and process_dlopen(4) in features, consistent with vector reallocation corruption when multiple NAPI modules are registered

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #20454
Fixes #19802

🤖 Generated with Claude Code

Comment thread test/napi/napi.test.ts

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks correct and the Windows CRLF issue is resolved — but since this changes iteration/drain semantics in the core process.dlopen NAPI module-loading path (re-entrant registrations now execute rather than being dropped, and aren't added to the DLHandleMap cache), it's worth a maintainer's eyes before merge.

Extended reasoning...

Overview

This PR fixes an iterator-invalidation bug in Process_functionDlopen (src/bun.js/bindings/BunProcess.cpp): the execute loop range-for iterated globalObject->m_pendingNapiModules while nm_register_func callbacks could call napi_module_register() and append to that same vector, reallocating its backing buffer and leaving a dangling iterator (heap-use-after-free under ASAN, garbage function pointer otherwise). The fix std::exchanges the pending vectors into locals before iterating, then drains in a loop so re-entrantly registered modules are executed rather than silently dropped by the trailing .clear(). Applied to both the fresh-registration and cached-replay paths. Ships with a new reentrant_register_addon native test that registers 2 modules via static constructor, the first of which registers 64 more from its init callback to force reallocation. Also bumps the beforeAll node-gyp build timeout to 120s.

Security risks

None identified. No new attack surface — this is a defensive correctness fix in an existing code path. Inputs are still native module init callbacks, which are already fully trusted (loading a .node is arbitrary native code execution by design).

Level of scrutiny

High. process.dlopen is the entry point for every native addon load in Bun, and this change alters both control flow and semantics:

  • Re-entrant napi_module_register() calls from inside an nm_register_func are now executed (previously they were appended then immediately .clear()ed). This is the correct behavior, but it's a deliberate semantic change beyond just fixing the UAF.
  • Re-entrant registrations are not saved to DLHandleMap (only the initial static-constructor batch is). This is consistent — on cached replay the drain loop in the second path will pick them up again when the original nm_register_func re-registers them — but it's a design decision a NAPI-subsystem maintainer should confirm.
  • Re-entrant V8 (node_module_register) registrations are still cleared without execution, matching prior behavior.

Other factors

  • The std::exchange + drain pattern mirrors recently-merged #29949 for the same class of bug.
  • My earlier inline comment about Windows CRLF in the test assertion was addressed in b1432a9 (now splits on /\r?\n/).
  • The bug-hunting system found no issues on this revision.
  • Linked to crash reports #20454 and #19802.
  • No CODEOWNERS entry covers this file.

The fix itself reads as correct and idiomatic; I'm deferring purely because the dlopen/NAPI-init path is critical enough that a human familiar with the DLHandleMap caching semantics should sign off.

@Jarred-Sumner Jarred-Sumner merged commit 65a0b2f into main Apr 30, 2026
76 of 77 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/1138e689/napi-reentrant-register branch April 30, 2026 20:59
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
…ven-sh#29981)

## What

`process.dlopen` iterates `globalObject->m_pendingNapiModules` after the
shared library's static constructors have run, invoking each module's
`nm_register_func`. If that callback itself calls
`napi_module_register()` — or triggers a nested `dlopen` whose static
constructors do — the append can grow the backing `WTF::Vector` and
leave the outer range-for's iterator pointing into freed storage.
Re-entrantly registered modules were also silently dropped by the
trailing `.clear()`.

## How

`std::exchange` the pending NAPI/V8 vectors into locals before
iterating, then drain `m_pendingNapiModules` in a loop so modules
registered during an init callback are executed rather than discarded.
Applied to both execute paths in `Process_functionDlopen` (fresh
static-constructor registrations and the cached-registration replay).

## Test

New `reentrant_register_addon` in `test/napi/napi-app`: two modules are
registered from a static constructor; the first one's `nm_register_func`
registers 64 more. Before this change the 64 re-entrant registrations
are never executed (and the second static-constructor registration is
reached via a stale iterator into the reallocated buffer); after, all 66
init callbacks run in order.

Also gives the `beforeAll` in `napi.test.ts` a 120 s timeout — the
`bunExe() install` it runs does a full node-gyp rebuild which can exceed
the default 5 s hook timeout under a debug/ASAN binary, at which point
the test runner SIGTERMs the install and the whole file aborts with
`build failed, bailing out!`.

---------

Co-authored-by: robobun <robobun@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants