Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Represent strings as sequences of bytes instead of code points #35

Closed
jayconrod opened this issue Jan 21, 2017 · 0 comments
Closed

Represent strings as sequences of bytes instead of code points #35

jayconrod opened this issue Jan 21, 2017 · 0 comments

Comments

@jayconrod
Copy link
Owner

Storing strings as sequences of 32-bit code points is inefficient in terms of memory use compared with UTF-8. Most common string operations (search, split, comparing for equality) work just as well if not better on UTF-8. Operations that do require code points (collation, normalization, other locale-specific stuff) generally works fine with iteration and does not need random access.

The built-in String class should be represent strings as an array of UTF-8 bytes. Gypsum and CodeSwitch will both need to be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant