Skip to content

Allow napi_delete_reference to be called from finalizers during GC#31684

Merged
Jarred-Sumner merged 4 commits into
mainfrom
farm/2ab4862a/napi-delete-ref-in-gc-finalizer
Jun 2, 2026
Merged

Allow napi_delete_reference to be called from finalizers during GC#31684
Jarred-Sumner merged 4 commits into
mainfrom
farm/2ab4862a/napi-delete-ref-in-gc-finalizer

Conversation

@robobun

@robobun robobun commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Fixes the napi_reference_unref panic (Sentry BUN-39FR, 11 crashes/14d on 1.4.0-canary, also present on 1.3.x stable) that fires when a NAPI addon's finalize callback calls napi_delete_reference on its own reference while the garbage collector is sweeping.

Reproduction

A module built with the experimental Node-API version runs its finalizers synchronously from the weak-handle sweep (MarkedBlock::Handle::sweepWeakSet::sweepWeakBlock::finalizeNapiRefWeakHandleOwner::finalize). If such a finalizer calls napi_delete_reference — e.g. node-addon-api's ObjectWrap destructor deleting the ref returned by napi_wrap, which is exactly the two addon frames in the crash stacks — Bun aborted:

FATAL ERROR: Finalizer is calling a function that may affect GC state.
...
panic(main thread): napi_reference_unref

Reproduced on Linux with Bun.gc(true) (→ collectNow(Sync)sweepSynchronously()), which enters the same SweepingScope as the incremental sweeper timer seen in the wild — the all-Windows telemetry is population skew, not a platform difference; the guard is shared C++.

Cause

  1. napi_delete_reference used NAPI_CHECK_ENV_NOT_IN_GC. Node.js deliberately does not: it declares the function with node_api_basic_env and plain CHECK_ENV, because deleting the napi_wrap reference from its finalize callback is the documented cleanup pattern ("For a napi_reference returned from napi_wrap, this must be called in the finalizer." — node/src/js_native_api_v8.cc).
  2. The abort message in NapiEnv::checkGC() was hardcoded to napi_reference_unref regardless of which API tripped the check, which is why the crash reports blame the wrong function.

Fix

  • napi_delete_reference no longer performs the GC-access check, and uses NAPI_PREAMBLE_NO_THROW_SCOPE + napi_clear_last_error to match Node's "omit NAPI_PREAMBLE, calls here cannot throw" semantics (it must also work with a pending exception). Deleting the NapiRef mid-sweep is safe: its WeakImpl is already in the Finalized state, so Weak::clear() only marks it Deallocated — the same pattern NapiRefSelfDeletingWeakHandleOwner already uses.
  • checkGC() aborts with an accurate message. napi_reference_unref (and other non-basic functions) remain blocked from in-GC finalizers, matching Node — the existing napi_reference_unref is blocked from finalizers in experimental modules test still covers that.

Verification

New addon test/napi/napi-app/test_delete_ref_in_finalizer_experimental.c (NAPI_EXPERIMENTAL) wraps objects with napi_wrap/napi_add_finalizer whose finalizers delete their own ref, driven by test/napi/napi-finalizer-delete-ref.test.ts:

  • without the fix: panic(main thread): napi_reference_unref — the exact production signature
  • with the fix: all 100 finalizers run during the sweep and every napi_delete_reference succeeds; full test/napi/napi.test.ts suite still passes

Node.js declares napi_delete_reference with node_api_basic_env and does
not perform the GC-access check: deleting the reference returned by
napi_wrap from its finalize callback is the documented cleanup pattern,
and node-addon-api's ObjectWrap destructor relies on it. Bun applied
NAPI_CHECK_ENV_NOT_IN_GC here, so modules built with the experimental
Node-API version (whose finalizers run synchronously while the collector
sweeps) aborted as soon as a finalizer deleted its own reference.

Also replace the hardcoded "napi_reference_unref" abort message in
NapiEnv::checkGC with an accurate one; it previously blamed that
function no matter which Node-API call tripped the check.
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@robobun, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 41 minutes and 14 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 06b77a92-1abf-43c1-aaeb-93cd50ae1340

📥 Commits

Reviewing files that changed from the base of the PR and between 561eb8f and 8b5a509.

📒 Files selected for processing (6)
  • src/jsc/bindings/napi.cpp
  • src/jsc/bindings/napi.h
  • test/napi/napi-app/binding.gyp
  • test/napi/napi-app/test_delete_ref_in_finalizer_experimental.c
  • test/napi/napi-finalizer-delete-ref.test.ts
  • test/no-validate-exceptions.txt

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the claude label Jun 2, 2026
@robobun

robobun commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 2:05 AM PT - Jun 2nd, 2026

@robobun, your commit 8b5a509 has 6 failures in Build #59781 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 31684

That installs a local version of the PR into your bun-31684 executable, so you can run:

bun-31684 --bun

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Unclear napi panic can occur when no space is available #22259 - Stack trace shows wrap_cleanup (napi_wrap finalizer) hitting the same NAPI_CHECK_ENV_NOT_IN_GC guard during GC, which this PR removes from napi_delete_reference

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #22259

🤖 Generated with Claude Code

@robobun

robobun commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

#22259 is a different bug and is not fixed by this PR. Its panic is NAPI FATAL ERROR: Error::New napi_create_error — node-addon-api calling napi_fatal_error because napi_create_error returned a failure status inside a finalizer run from wrap_cleanup during env teardown (under ENOSPC). That path never reaches the NAPI_CHECK_ENV_NOT_IN_GC guard this PR removes from napi_delete_reference: teardown finalizers don't run inside the collector, and the abort fixed here prints FATAL ERROR: Finalizer is calling a function that may affect GC state + panics with napi_reference_unref, neither of which appears in that report. napi_create_error failing in a teardown finalizer is more likely pending-exception state during cleanup (what #30291 addresses) or allocation failure.

Comment thread test/napi/napi-finalizer-delete-ref.test.ts
The node-gyp rebuild re-runs on every bun version change and is
occasionally flaky under resource pressure; the addon binary doesn't
link against bun, so reuse it when present and retry a failed install
once before giving up.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find correctness issues, but this relaxes a GC-safety guard in a core NAPI path (deleting a NapiRef while the collector is mid-sweep) and changes pending-exception semantics for napi_delete_reference, so it's worth a human pass — and the x64-asan run for the new test was red on the prior commit.

Extended reasoning...

Overview

The PR changes napi_delete_reference in src/jsc/bindings/napi.cpp to drop NAPI_CHECK_ENV_NOT_IN_GC and the throw-scope preamble (so it no longer aborts when called from a finalizer running synchronously inside the JSC sweep, and no longer short-circuits on a pending exception), and rewords the checkGC() abort message in napi.h. It adds a new NAPI_EXPERIMENTAL C addon plus a standalone test driver and a binding.gyp target. The macro-level diff is small, but the behavioral change is in a hot, memory-safety-sensitive path used by every node-addon-api ObjectWrap.

Security risks

No direct security surface — no parsing, auth, or untrusted input. The risk profile is memory safety / UAF rather than security: delete napiRef now runs while WeakBlock::finalize is on the stack holding the same WeakImpl. The PR's argument (the WeakImpl is already Finalized, so Weak::clear() only flips it to Deallocated, mirroring NapiRefSelfDeletingWeakHandleOwner) is sound on inspection, and Node.js does declare this function with node_api_basic_env, but this is exactly the class of reasoning that benefits from a second pair of eyes familiar with JSC's heap.

Level of scrutiny

High. This is production native code in the NAPI bindings that interacts with the garbage collector's sweeping phase. A mistake here is a heap corruption or UAF that would be hard to reproduce and diagnose. The change also subtly alters semantics: napi_delete_reference will now succeed even with a pending JS exception (previously it returned napi_pending_exception). That matches Node, but it's a behavior change worth a maintainer ack.

Other factors

  • CI on commit 0b9e468 showed the new napi-finalizer-delete-ref.test.ts failing on x64-asan; a follow-up commit (5d7015f5, skip rebuild when addon exists) was pushed, but I don't see a green run reported against it yet. The musl LTO link errors look like infra flakes unrelated to this diff.
  • The existing sibling test (napi_reference_unref is blocked from finalizers in experimental modules) still asserts the negative case, so the guard wasn't removed wholesale — only for napi_delete_reference.
  • My earlier nit about test-file placement was answered with a reasonable rationale and the thread is resolved; not a blocker.
  • No CODEOWNERS entry covers src/jsc/bindings/napi*.

Loading any napi-app addon under BUN_JSC_validateExceptionChecks trips a
pre-existing unchecked-exception report in the module registration path
(napi_create_function -> napi_set_named_property), which is why the
other NAPI addon tests are already listed here. The new test spawns bun
with an addon and inherits the runner's env on the ASAN lane, so it
needs the same exemption.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bugs found, but this relaxes a GC-safety guard in native NAPI code (delete NapiRef mid-sweep) — worth a human pass on the JSC weak-handle lifecycle reasoning, and CI for 8b5a509 (the no-validate-exceptions exemption) hasn't reported yet.

Extended reasoning...

Overview

The PR removes NAPI_CHECK_ENV_NOT_IN_GC from napi_delete_reference in src/jsc/bindings/napi.cpp and switches it to NAPI_PREAMBLE_NO_THROW_SCOPE + napi_clear_last_error, so that experimental-version NAPI modules can delete their own napi_wrap/napi_add_finalizer reference from inside a finalizer that runs synchronously during the JSC heap sweep. It also replaces the misleading hardcoded napi_reference_unref abort message in NapiEnv::checkGC(). Supporting changes add a new C addon, a standalone test driver, a binding.gyp target, and a no-validate-exceptions.txt entry.

Security risks

None identified. This is a crash-fix in the native addon runtime; no auth, network, filesystem, or untrusted-input parsing is involved. The change loosens a guard rather than tightening one, but only to match Node.js's documented contract for node_api_basic_env functions.

Level of scrutiny

High. The functional change is small (~10 lines in napi.cpp plus a string in napi.h), but it sits at the intersection of JSC garbage collection, weak-handle finalization, and native heap deletion. Correctness rests on the claim that delete napiRef is safe while WeakBlock::finalize is on the stack — i.e., that the WeakImpl is already Finalized so Weak::clear() only flips it to Deallocated, and that ~NapiRef (which also touches strongRef, boundCleanup->deactivate(), and the env's finalizer set) does nothing that perturbs the collector. The argument in the PR description is well-reasoned and mirrors what NapiRefSelfDeletingWeakHandleOwner already does, but it's exactly the kind of invariant a maintainer familiar with the JSC heap should sign off on rather than a bot.

Other factors

  • The new test failed on x64-asan at commit 5d7015f; the follow-up commit 8b5a509 adds it to test/no-validate-exceptions.txt (the validateExceptionChecks opt-out list alongside other NAPI tests), but robobun hasn't posted updated CI results for that commit yet.
  • My earlier inline nit about test-file placement was answered and resolved by the author; nothing further on that.
  • No CODEOWNERS entry covers src/jsc/bindings/napi.cpp.
  • The bug-hunting system found no issues.

@Jarred-Sumner Jarred-Sumner merged commit 17b4071 into main Jun 2, 2026
52 of 60 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/2ab4862a/napi-delete-ref-in-gc-finalizer branch June 2, 2026 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants