Js-only utf8 string manipulation.
The built-in nodejs utf8 conversion functions are implemented as fast native C++ modules. Nodejs, however, is rather slow to call native modules. For short strings the call overhead outweighs the benefits, making it faster (sometimes much faster) to compute the results in javascript.
These functions were originally developed as part of the
qbson
library.
var qutf8 = require('q-utf8');
encode the substring between from
and to
as utf8 bytes into the buffer
starting at offset, and return the number of bytes written. Does not check
for overflow. The converted bytes are identical to buffer.write
. Does not
use string.slice
or buffer.write
.
Also known as encode
.
return the utf8 encoded string in the buffer between offset and limit.
Traverses the buffer, does not use buffer.toString
. Note: for non-trivial
strings buffer.toString() is faster.
Also known as decode
.
like utf8_encode, but control chars 0x00..0x19 are \-escaped \n
or \u-escaped
\u0001
and backslashes \
and double quotes "
are backslash-escaped as \\
and \"
.
like utf8_encode but '\0'
chars are encoded as \xC0\x80
not \u0000
.
C0 80
and E0 80 80
are valid utf8 bytes and both decode into a 00
character.
return the length of the utf8 encoded string found in the buffer between
offset and limit. The string is presumed valid utf8 and is not tested for
validity. Examines the buffer, does not use buffer.toString
. Default
encoding is 'utf8'.
return the number of bytes needed to store the specified portion of the string.
Examines the string, does not use Buffer.byteLength
.
encode the byte range as a base64 string
this was an experiment in reassembly of split utf8 byte strings, and is
still a work in progress. Over time it has evolved into a fast
work-alike of require('string_decoder')
.
- 0.1.4 - handle surrogate pairs
- 0.1.3 - normalize layout, new scanStringZUtf8, fix invalid utf8 handling, alias as
utf8.encode
andutf8.decode
- 0.1.2 - base64 fixes, cleanups
- 0.1.1 - speed up JsonDecoder, now up to 50% faster than
string_decoder
- 0.1.0 - initial version, to get it out there
- unit tests
- benchmarks
- reconcile method names, eg encodeUtf8 vs utf8_encode
qbson
- mongodb bson conversion functions