Skip to content

Conversation

@bnoordhuis
Copy link
Contributor

@bnoordhuis bnoordhuis commented Sep 30, 2025

Instead of copying long substring slices, store a reference to and offset into the parent string.

Gets more profitable as strings get longer. For 64k string slices, it's about 6.5x faster than copying.

The downside of ref-counted string slices is that they can keep the parent string alive for longer than it otherwise would have been, leading to memory usage that is higher than without string slices. That's why this optimization is only applied to long-ish slices, currently slices > 1,024 bytes.

Possible future enhancements are slicing only when the substring is > x% of the parent string, or copying lazily when the slice string is the only thing referencing the parent string.


edit: forgot to mention, the next step is adding cons/rope strings

Instead of copying long substring slices, store a reference to and
offset into the parent string.

Gets more profitable as strings get longer. For 64k string slices,
it's about 6.5x faster than copying.

The downside of ref-counted string slices is that they can keep the
parent string alive for longer than it otherwise would have been,
leading to memory usage that is higher than without string slices.
That's why this optimization is only applied to long-ish slices,
currently slices > 1,024 bytes.

Possible future enhancements are slicing only when the substring
is > x% of the parent string, or copying lazily when the slice
string is the only thing referencing the parent string.
Copy link
Contributor

@saghul saghul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a small suggestion.

@bnoordhuis bnoordhuis merged commit 08db51a into quickjs-ng:master Sep 30, 2025
126 checks passed
@bnoordhuis bnoordhuis deleted the string-slices branch September 30, 2025 16:11
bnoordhuis added a commit to bnoordhuis/quickjs that referenced this pull request Oct 10, 2025
This change seems to have introduced a seemingly impossible bug in a
real-world code base where values flip to JS_NULL for no discernible
reason.

My best guess is something manipulating string data directly, without
being aware of slice strings, but it's impossible to track down where.
The weird thing is that the value flipping to null isn't even a string,
it appears to be an array.

This reverts commit 08db51a.
This reverts commit 62b4eed.

Fixes: quickjs-ng#1178
mochaaP added a commit to mcha-forks/quickjs that referenced this pull request Oct 11, 2025
fixup: 08db51a ("Add ref-counted string slices", quickjs-ng#1175)
mochaaP added a commit to mcha-forks/quickjs that referenced this pull request Oct 14, 2025
fixup: 08db51a ("Add ref-counted string slices", quickjs-ng#1175)
@richarddd
Copy link
Contributor

richarddd commented Oct 16, 2025

@bnoordhuis this got me thinking. Do you think it be beneficial from a performance perspective to implement German Style Strings? I suspect a lot of JS code, prop names etc has 12 chars or less

@bnoordhuis
Copy link
Contributor Author

bnoordhuis commented Oct 16, 2025

It's something I've thought about but I don't think it could work for quickjs.

The two mostly-unused JSString fields are hash_next and first_weak_ref fields, 4 and 4/8 bytes respectively, so 8/12 bytes in total.

However, if the string is turned into an atom, or it's put in a WeakMap, there's no way to move the string data out. We can't realloc the JSString because its address has to be stable. With a GC, you alloc a new object and put a forwarding pointer in place. Alas, no GC.

I've been thinking of storing weak references in a JSRuntime side table so JSStrings and JSObjects don't have to carry a first_weak_ref pointer around that's almost never used, but that's one of those "when I get around to it" projects. It's also not completely without drawbacks because it makes weakening fallible when table resizing fails.

@chqrlie
Copy link
Collaborator

chqrlie commented Oct 16, 2025

property names are Atoms, not just regular strings. Using a string variant with immediate contents is enticing but will probably not apply to property names. I tested a variant where 1 code-point strings were immediate (with a JS_TAG_CHAR tag), but the performance impact was not outstanding. You might want to give it a try with more extensive benchmarks and somewhat longer strings that fit in the JSValue object.

@bnoordhuis
Copy link
Contributor Author

bnoordhuis commented Oct 16, 2025

I tested a variant where 1 code-point strings were immediate (with a JS_TAG_CHAR tag), but the performance impact was not outstanding.

Funnily enough I tried something quite similar but, same as you, there was no big impact on performance (although, come to think of it, I only looked at CPU, not memory.)

I've also investigated packing string data into JSValue itself (#907) but there are some tradeoffs and so far I'm undecided if it's worth the cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants