q-utf8

Js-only utf8 string manipulation.

The built-in nodejs utf8 conversion functions are implemented as fast native C++ modules. Nodejs, however, is rather slow to call native modules. For short strings the call overhead outweighs the benefits, making it faster (sometimes much faster) to compute the results in javascript.

These functions were originally developed as part of the qbson library.

Api

var qutf8 = require('q-utf8');

qutf8.utf8_encode( string, from, to, target, offset )

encode the substring between from and to as utf8 bytes into the buffer starting at offset, and return the number of bytes written. Does not check for overflow. The converted bytes are identical to buffer.write. Does not use string.slice or buffer.write.

Also known as encode.

qutf8.utf8_decode( buf, base, bound )

return the utf8 encoded string in the buffer between offset and limit. Traverses the buffer, does not use buffer.toString. Note: for non-trivial strings buffer.toString() is faster.

Also known as decode.

qutf8.utf8_encodeJson( string, from, to, target, offset )

like utf8_encode, but control chars 0x00..0x19 are \-escaped \n or \u-escaped \u0001 and backslashes \ and double quotes " are backslash-escaped as \\ and \".

qutf8.utf8_encodeOverlong( string, from, to, target, offset )

like utf8_encode but '\0' chars are encoded as \xC0\x80 not \u0000. C0 80 and E0 80 80 are valid utf8 bytes and both decode into a 00 character.

qutf8.utf8_stringLength( buf, base, bound [,encoding] )

return the length of the utf8 encoded string found in the buffer between offset and limit. The string is presumed valid utf8 and is not tested for validity. Examines the buffer, does not use buffer.toString. Default encoding is 'utf8'.

qutf8.utf8_byteLength( string, from, to )

return the number of bytes needed to store the specified portion of the string. Examines the string, does not use Buffer.byteLength.

qutf8.base64_encode( buf, base, bound )

encode the byte range as a base64 string

qutf8.JsonDecoder( encoding )

this was an experiment in reassembly of split utf8 byte strings, and is still a work in progress. Over time it has evolved into a fast work-alike of require('string_decoder').

Change Log

0.1.4 - handle surrogate pairs
0.1.3 - normalize layout, new scanStringZUtf8, fix invalid utf8 handling, alias as utf8.encode and utf8.decode
0.1.2 - base64 fixes, cleanups
0.1.1 - speed up JsonDecoder, now up to 50% faster than string_decoder
0.1.0 - initial version, to get it out there

Todo

unit tests
benchmarks
reconcile method names, eg encodeUtf8 vs utf8_encode

Related Work

qbson - mongodb bson conversion functions

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
test		test
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
base64.js		base64.js
index.js		index.js
json-decoder.js		json-decoder.js
package.json		package.json
utf8.js		utf8.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

q-utf8

Api

qutf8.utf8_encode( string, from, to, target, offset )

qutf8.utf8_decode( buf, base, bound )

qutf8.utf8_encodeJson( string, from, to, target, offset )

qutf8.utf8_encodeOverlong( string, from, to, target, offset )

qutf8.utf8_stringLength( buf, base, bound [,encoding] )

qutf8.utf8_byteLength( string, from, to )

qutf8.base64_encode( buf, base, bound )

qutf8.JsonDecoder( encoding )

Change Log

Todo

Related Work

About

Releases

Packages

Languages

License

andrasq/node-q-utf8

Folders and files

Latest commit

History

Repository files navigation

q-utf8

Api

qutf8.utf8_encode( string, from, to, target, offset )

qutf8.utf8_decode( buf, base, bound )

qutf8.utf8_encodeJson( string, from, to, target, offset )

qutf8.utf8_encodeOverlong( string, from, to, target, offset )

qutf8.utf8_stringLength( buf, base, bound [,encoding] )

qutf8.utf8_byteLength( string, from, to )

qutf8.base64_encode( buf, base, bound )

qutf8.JsonDecoder( encoding )

Change Log

Todo

Related Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages