CachedTypes: stop encoding source text into CachedSourceCodeKey#186
Conversation
CachedStringSourceProvider::encode serializes the full source text into the bytecode cache (it lands at byte offset 272 of every .jsc blob), and CachedStringSourceProvider::decode reconstructs it via CachedString::decode → AtomStringImpl::add, heap-allocating a fresh copy. The result is held by a CachedRefPtr finalizer for the lifetime of the Decoder, which in turn is kept alive by every UnlinkedFunctionExecutable with a lazy body. Under BUN_JSC_ADDITIONS, SourceCodeKey::operator== already skips string() == other.string(), so neither the on-disk copy nor the heap copy is ever read. They are pure overhead: ~source_size bytes of .jsc bloat plus ~source_size bytes of dirty footprint at startup, with the source text resident three times (StandaloneModuleGraph mmap, .jsc mmap, heap AtomStringImpl). Encode source().length() instead of the bytes, and at decode reuse the SourceProvider the Decoder was constructed with (decoder.provider()) when its sourceType and length match. The fallback path constructs an empty provider so the key length comparison rejects the entry.
WalkthroughUnder USE(BUN_JSC_ADDITIONS), CachedStringSourceProvider::encode now records only the source length; decode tries to reuse an existing runtime SourceProvider when type and length match, otherwise it skips cached source bytes and synthesizes an empty StringSourceProvider. Non-BUN behavior is unchanged. Changes
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Source/JavaScriptCore/runtime/CachedTypes.cpp`:
- Around line 1665-1671: The code unsafely reinterpret_casts a SourceProvider to
StringSourceProvider after matching sourceType and mutates the reused provider
via Base::decode(decoder, *provider), which can cause UB or race conditions; add
a runtime type check (e.g., virtual asStringSourceProvider() or dynamic_cast
when RTTI is available) on the object returned by decoder.provider() before
casting and only cast when that check confirms a StringSourceProvider, and avoid
mutating a shared provider by either decoding into a fresh copy or using a
non-mutating decode API/clone before calling Base::decode so that
sourceURLDirective, sourceMappingURLDirective, and sourceTaintedOrigin are not
changed on a provider that may be shared.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 904ea0da-df9a-48f1-a5a0-82e7e3e54bba
📒 Files selected for processing (1)
Source/JavaScriptCore/runtime/CachedTypes.cpp
Preview Builds
|
Pulls in the CachedTypes change that stops encoding the bundled source text into CachedSourceCodeKey. Under BUN_JSC_ADDITIONS the SourceCodeKey::operator== string comparison is already skipped, so the encoded source bytes were never read — but they were still written into every .jsc blob and heap-allocated as an AtomStringImpl during decodeCodeBlockImpl, then pinned by a Decoder finalizer for the lifetime of every UnlinkedFunctionExecutable with a lazy body. Preview pin so CI can run the full test suite against the WebKit change before it merges; will be replaced with the real commit hash once #186 lands.
Addresses review feedback on the reuse path: - Change the BUN_JSC_ADDITIONS decode() return type to SourceProvider*. The only caller (CachedSourceProvider::decode) already returns SourceProvider*, so the prior reinterpret_cast through the StringSourceProvider sibling was unnecessary type-punning. The runtime provider's leakRef() now upcasts naturally. - Drop Base::decode(decoder, *provider) for the reuse path. It mutated the runtime provider's sourceURLDirective / sourceMappingURLDirective / sourceTaintedOrigin with values encoded from the compile-time provider. The runtime provider already carries the correct values; the decoded key needs only sourceOrigin().url().host() and length() for the equality check, neither of which Base::decode supplies. CachedSourceProviderShape fields are offset-addressed (not stream-positioned), so leaving them undecoded does not affect subsequent reads. No on-disk format change vs the previous commit.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Source/JavaScriptCore/runtime/CachedTypes.cpp`:
- Around line 1682-1686: The fallback in decodeSourceCodeKey() creates an empty
StringSourceProvider (String decodedSource) so provider()->source().length()
becomes 0, causing decodedKey == key to always miss when decoder.provider() is
absent (affecting isCachedBytecodeStillValid() and decodeSourceCodeKey());
instead either thread the original key.source().provider() into the Decoder
constructed in the validity-check path (use key.source().provider() when
decoder.provider() is missing) or, if synthesising a provider, preserve the
original source length by setting m_sourceLength to key.source().length() rather
than hard-coding 0; update decodeSourceCodeKey(), the Decoder construction sites
used by isCachedBytecodeStillValid(), and any StringSourceProvider fallback
logic to use key.source().provider() or preserve m_sourceLength accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f5154e15-64db-4e24-a5c0-de953dce8ebb
📒 Files selected for processing (1)
Source/JavaScriptCore/runtime/CachedTypes.cpp
| // Fallback for callers that did not supply a provider: decode without source | ||
| // bytes. SourceCodeKey::operator== ignores string(), but length() is compared, | ||
| // so synthesize a provider whose source() is empty — length() will mismatch | ||
| // and the cache entry will be rejected, which is the conservative behaviour. | ||
| String decodedSource; |
There was a problem hiding this comment.
Don't make key-only decode paths synthesize a zero-length provider.
When decoder.provider() is absent, Lines 1682-1686 fall back to an empty StringSourceProvider, so source().length() becomes 0. That breaks local no-provider key comparisons under USE(BUN_JSC_ADDITIONS): isCachedBytecodeStillValid() still constructs its Decoder without a provider on Line 2779, so decodedKey == key now becomes an unconditional cache miss for every non-empty source. decodeSourceCodeKey() on Line 2740 is lossy for the same reason.
At minimum, thread key.source().provider() into the validity-check path; otherwise this fallback needs to preserve m_sourceLength instead of hard-coding 0.
🔧 Minimal fix for the validity-check path
bool isCachedBytecodeStillValid(VM& vm, Ref<CachedBytecode> cachedBytecode, const SourceCodeKey& key, SourceCodeType type)
{
auto span = cachedBytecode->span();
if (span.empty())
return false;
auto* cachedEntry = std::bit_cast<const GenericCacheEntry*>(span.data());
- Ref decoder = Decoder::create(vm, WTF::move(cachedBytecode));
+ Ref decoder = Decoder::create(vm, WTF::move(cachedBytecode), &key.source().provider());
return cachedEntry->isStillValid(decoder.get(), key, tagFromSourceCodeType(type));
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Source/JavaScriptCore/runtime/CachedTypes.cpp` around lines 1682 - 1686, The
fallback in decodeSourceCodeKey() creates an empty StringSourceProvider (String
decodedSource) so provider()->source().length() becomes 0, causing decodedKey ==
key to always miss when decoder.provider() is absent (affecting
isCachedBytecodeStillValid() and decodeSourceCodeKey()); instead either thread
the original key.source().provider() into the Decoder constructed in the
validity-check path (use key.source().provider() when decoder.provider() is
missing) or, if synthesising a provider, preserve the original source length by
setting m_sourceLength to key.source().length() rather than hard-coding 0;
update decodeSourceCodeKey(), the Decoder construction sites used by
isCachedBytecodeStillValid(), and any StringSourceProvider fallback logic to use
key.source().provider() or preserve m_sourceLength accordingly.
Stops encoding source text into CachedSourceCodeKey under BUN_JSC_ADDITIONS — the decoded key only feeds SourceCodeKey::operator== which already skips the byte comparison there. Removes a per-chunk ~source_size AtomStringImpl heap allocation at decode and the matching bytes from each .jsc on disk.
Summary
CachedStringSourceProvider::encodeserializes the full source text into the bytecode cache (it appears verbatim at byte offset 272 of every.jscblob), andCachedStringSourceProvider::decodeheap-allocates a fresh copy viaCachedString::decode→AtomStringImpl::add. That copy is then pinned by aCachedRefPtrfinalizer for the lifetime of theDecoder, which everyUnlinkedFunctionExecutablewith a lazy body keeps alive.Under
BUN_JSC_ADDITIONS,SourceCodeKey::operator==already skipsstring() == other.string(), so neither the on-disk copy nor the heap copy is ever read. The source text ends up resident three times at startup: the StandaloneModuleGraph mmap, the.jscmmap, and a heapAtomStringImplof the same bytes.This change, gated on
BUN_JSC_ADDITIONS:source().length()instead of the source bytesdecoder.provider()whensourceTypeand length match, instead of allocating a newStringSourceProvider; otherwise builds an empty provider so the key comparison rejects the entry conservativelyWhy this is safe — where errors/stacks/toString actually get source
The bytecode never carried source for runtime use. There are two independent data flows that meet at
ScriptExecutable:UnlinkedCodeBlockhas noSourceProviderfield — only offsets. Those offsets are resolved againstScriptExecutable::m_source, which is the runtimeSourceCodeset beforedecodeCodeBlockis called:CodeBlock::source()→m_ownerExecutable->source()(CodeBlock.h:432)Function.prototype.toString→FunctionExecutable::source()→ScriptExecutable::m_sourceUnlinkedFunctionExecutable::linkedSourceCode(parentSource)→SourceCode(parentSource.provider(), startOffset, ...)(UnlinkedFunctionExecutable.cpp:191)The
CachedSourceCodeKey's provider (entry.first.m_sourceCode.m_provider) is a separate object that exists for ~10 stack frames insidedecodeCodeBlockImpl, is read once byoperator==(which underBUN_JSC_ADDITIONSignores its source bytes), and is destroyed whenentry.firstgoes out of scope. It was never wired into any executable.Format compatibility
m_sourceLength(4 bytes) replacesCachedString m_sourcein the on-disk layout.GenericCacheEntry::decodechecksisUpToDate()(them_cacheVersionuint32 at offset 0) before touchingm_key, so an old-format.jscis rejected before the layout change is read.m_cacheVersion = hash(BUN_WEBKIT_VERSION)changes when Bun bumps its WebKit pin to pick up this PR.Verified empirically: old
.jsc(system bun) + new decoder →[Disk Cache] Cache miss→ falls back to parse, output correct. New.jsc+ new decoder →[Disk Cache] Cache hit→ output correct.Effect (release build, macOS arm64)
.jscsizeBUN_JSC_verboseDiskCache=1confirms[Disk Cache] Cache hitafter the change.Test plan
bun build --bytecode --compileoutput runs and produces identical stdoutBUN_JSC_verboseDiskCache=1showsCache hit for sourceCode(not falling back to reparse)Function.prototype.toString,Error.stack, nested-lazy-decode, async,eval,new Functionall pass with cache hit.jsc(old encoder + new decoder) cleanly rejected viam_cacheVersion, falls back to parsetest/bundler/bundler_compile.test.ts— 54/54 pass (release)test/bundler/bundler_banner.test.ts— 11/11 passtest/bundler/bun-build-api.test.ts -t bytecode— 1/1 pass