-
Notifications
You must be signed in to change notification settings - Fork 225
Add ref-counted string slices #1175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instead of copying long substring slices, store a reference to and offset into the parent string. Gets more profitable as strings get longer. For 64k string slices, it's about 6.5x faster than copying. The downside of ref-counted string slices is that they can keep the parent string alive for longer than it otherwise would have been, leading to memory usage that is higher than without string slices. That's why this optimization is only applied to long-ish slices, currently slices > 1,024 bytes. Possible future enhancements are slicing only when the substring is > x% of the parent string, or copying lazily when the slice string is the only thing referencing the parent string.
563ab13 to
d89658a
Compare
saghul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a small suggestion.
This change seems to have introduced a seemingly impossible bug in a real-world code base where values flip to JS_NULL for no discernible reason. My best guess is something manipulating string data directly, without being aware of slice strings, but it's impossible to track down where. The weird thing is that the value flipping to null isn't even a string, it appears to be an array. This reverts commit 08db51a. This reverts commit 62b4eed. Fixes: quickjs-ng#1178
fixup: 08db51a ("Add ref-counted string slices", quickjs-ng#1175)
fixup: 08db51a ("Add ref-counted string slices", quickjs-ng#1175)
|
@bnoordhuis this got me thinking. Do you think it be beneficial from a performance perspective to implement German Style Strings? I suspect a lot of JS code, prop names etc has 12 chars or less |
|
It's something I've thought about but I don't think it could work for quickjs. The two mostly-unused JSString fields are hash_next and first_weak_ref fields, 4 and 4/8 bytes respectively, so 8/12 bytes in total. However, if the string is turned into an atom, or it's put in a WeakMap, there's no way to move the string data out. We can't realloc the JSString because its address has to be stable. With a GC, you alloc a new object and put a forwarding pointer in place. Alas, no GC. I've been thinking of storing weak references in a JSRuntime side table so JSStrings and JSObjects don't have to carry a first_weak_ref pointer around that's almost never used, but that's one of those "when I get around to it" projects. It's also not completely without drawbacks because it makes weakening fallible when table resizing fails. |
|
property names are Atoms, not just regular strings. Using a string variant with immediate contents is enticing but will probably not apply to property names. I tested a variant where 1 code-point strings were immediate (with a |
Funnily enough I tried something quite similar but, same as you, there was no big impact on performance (although, come to think of it, I only looked at CPU, not memory.) I've also investigated packing string data into JSValue itself (#907) but there are some tradeoffs and so far I'm undecided if it's worth the cost. |
Instead of copying long substring slices, store a reference to and offset into the parent string.
Gets more profitable as strings get longer. For 64k string slices, it's about 6.5x faster than copying.
The downside of ref-counted string slices is that they can keep the parent string alive for longer than it otherwise would have been, leading to memory usage that is higher than without string slices. That's why this optimization is only applied to long-ish slices, currently slices > 1,024 bytes.
Possible future enhancements are slicing only when the substring is > x% of the parent string, or copying lazily when the slice string is the only thing referencing the parent string.
edit: forgot to mention, the next step is adding cons/rope strings