Skip to content
Patrick Walton edited this page Jun 6, 2013 · 3 revisions

Interned strings

How do I represent interned strings?

  • Gecko has nsIAtom, can't pass directly
  • WebKit has a subclass of String called InternedString
    • not clear how comparison works.
  • bz likes the idea of having interned strings be a subclass of normal strings
  • Want comparison between a regular string and an interned string to not require interning the regular string first.
  • This can be done by just following the pointers and comparing the contents.
  • In the JS engine JSAtom is just a subclass of JSString.

Mutability

Should our strings be mutable?

  • Gecko has mutable strings and this is bad for performance
  • Shows up in microbenchmarks
  • Immutable strings and string builders

Interned strings again

How about sharing?

  • CSS parser and DOM code

What are interned?

  • Tag names on DOM nodes
  • Attribute names on DOM nodes
  • Attribute values (XUL only?)
  • IDs on nodes
  • Classes
  • Tag selectors
  • Class selectors
  • ID selectors
  • During selector matching, we do not need to intern anything
  • We must prevent the layout task from taking strong references to interned strings. If layout needs to hold onto the string, it must copy it. This is usually not in performance-critical code, so this is OK.
  • 3 kinds of strings:
    • HeapString
    • StaticInternedString
    • RefCountedInternedString
  • Nice thing about interned strings: Precomputed hash codes. This is used for the Bloom filter.

Cost of creating string objects

  • Constructors and especially destructors are expensive
  • No static typing
  • JS string comes in, want to create a Gecko DependentString, constructor was expensive because it had to check whether it was a DependentString
  • Would be nice to avoid hacks like that
  • 3 cases that Gecko has: ref counted versus owned versus dependent string versus null-terminated versus stack buffer
  • Stay as simple as possible, don't add new string types unless they're really necessary!

Encodings: UTF-8 versus UTF-16

  • We use UTF-16 in Gecko but UTF-8 in Servo
  • Gecko (and WK) uses a ASCII encoding for text runs
  • Could do something DOM-specific to avoid UTF-16 -> UTF-8 conversions in pure DOM->JS microbenchmarky calls