Represent strings as sequences of bytes instead of code points #35

jayconrod · 2017-01-21T14:16:52Z

Storing strings as sequences of 32-bit code points is inefficient in terms of memory use compared with UTF-8. Most common string operations (search, split, comparing for equality) work just as well if not better on UTF-8. Operations that do require code points (collation, normalization, other locale-specific stuff) generally works fine with iteration and does not need random access.

The built-in String class should be represent strings as an array of UTF-8 bytes. Gypsum and CodeSwitch will both need to be updated.

The text was updated successfully, but these errors were encountered:

jayconrod added codeswitch enhancement gypsum labels Jan 21, 2017

jayconrod closed this as completed in 08b2f15 Feb 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Represent strings as sequences of bytes instead of code points #35

Represent strings as sequences of bytes instead of code points #35

jayconrod commented Jan 21, 2017

Represent strings as sequences of bytes instead of code points #35

Represent strings as sequences of bytes instead of code points #35

Comments

jayconrod commented Jan 21, 2017