Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new Buffer doesn't handle unicode well #2344

Closed
calvinmetcalf opened this issue Aug 10, 2015 · 3 comments
Closed

new Buffer doesn't handle unicode well #2344

calvinmetcalf opened this issue Aug 10, 2015 · 3 comments
Labels
buffer Issues and PRs related to the buffer subsystem. doc Issues and PRs related to the documentations. i18n-api Issues and PRs related to the i18n implementation.

Comments

@calvinmetcalf
Copy link
Contributor

var arr = [255, 216, 255, 224, 0, 16, 74, 70, 73, 70, 0, 1]
//length 12;
var buf1 = new Buffer(arr)
// length 12
var hex1 = buf1.toString('hex');
// length 24
var str = buf1.toString('utf8')
// length 12
var buf2 = new Buffer(str);
//length 20
var hex2 = buf2.toString('hex');
// length 40;

results for hex2

browserify - efbfbd00104a4649460001 
node - efbfbdefbfbdefbfbdefbfbd00104a4649460001

the value of str in node is ����\u0000\u0010JFIF\u0000\u0001 which would suggest that the issue has to do with how new Buffer handles characters in text.

cc feross/buffer#66

@bnoordhuis
Copy link
Member

255 is never valid in a UTF-8 byte sequence. It probably worked by accident but invalid UTF-8 is replaced with U+FFFD (the replacement character) as of node.js v0.10.29. The V8 UTF-8 decoder became stricter in v3.x so that might be a factor as well.

I don't think there is anything to do here except make a note of it somewhere in the documentation..

@Fishrock123 Fishrock123 added buffer Issues and PRs related to the buffer subsystem. doc Issues and PRs related to the documentations. labels Aug 10, 2015
@bnoordhuis
Copy link
Member

U+FFFD (the replacement character)

That's 0xEF 0xBF 0xBD as UTF-8, by the way; the repeated pattern you see in the test's output.

@mscdex mscdex changed the title new Buffer doens't handle unicode well new Buffer doesn't handle unicode well Aug 10, 2015
@calvinmetcalf
Copy link
Contributor Author

fixed on the browserify end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. doc Issues and PRs related to the documentations. i18n-api Issues and PRs related to the i18n implementation.
Projects
None yet
Development

No branches or pull requests

4 participants