forked from servo/servo
-
Notifications
You must be signed in to change notification settings - Fork 0
Strings
Patrick Walton edited this page Jun 6, 2013
·
3 revisions
How do I represent interned strings?
- Gecko has nsIAtom, can't pass directly
- WebKit has a subclass of String called InternedString
- not clear how comparison works.
- bz likes the idea of having interned strings be a subclass of normal strings
- Want comparison between a regular string and an interned string to not require interning the regular string first.
- This can be done by just following the pointers and comparing the contents.
- In the JS engine JSAtom is just a subclass of JSString.
Should our strings be mutable?
- Gecko has mutable strings and this is bad for performance
- Shows up in microbenchmarks
- Immutable strings and string builders
How about sharing?
- CSS parser and DOM code
What are interned?
- Tag names on DOM nodes
- Attribute names on DOM nodes
- Attribute values (XUL only?)
- IDs on nodes
- Classes
- Tag selectors
- Class selectors
- ID selectors
- During selector matching, we do not need to intern anything
- We must prevent the layout task from taking strong references to interned strings. If layout needs to hold onto the string, it must copy it. This is usually not in performance-critical code, so this is OK.
- 3 kinds of strings:
- HeapString
- StaticInternedString
- RefCountedInternedString
- Nice thing about interned strings: Precomputed hash codes. This is used for the Bloom filter.
- Constructors and especially destructors are expensive
- No static typing
- JS string comes in, want to create a Gecko DependentString, constructor was expensive because it had to check whether it was a DependentString
- Would be nice to avoid hacks like that
- 3 cases that Gecko has: ref counted versus owned versus dependent string versus null-terminated versus stack buffer
- Stay as simple as possible, don't add new string types unless they're really necessary!
- We use UTF-16 in Gecko but UTF-8 in Servo
- Gecko (and WK) uses a ASCII encoding for text runs
- Could do something DOM-specific to avoid UTF-16 -> UTF-8 conversions in pure DOM->JS microbenchmarky calls