Segfaults when writing to Buffer using UCS2 encoding #2457

adalinesimonian · 2015-08-20T07:17:28Z

The following code causes a segfault when write is called:

var text = '1234567890123456789012345678901234567890123456789';
var len = Buffer.byteLength(text, 'ucs2');
var buf = new Buffer(len);
var bytesWritten = buf.write(text, 0, 'ucs2');

The odd thing about this bug is that it is intermittent, but consistent. Meaning, the code above always causes a segfault in my tests. However, when I change text to have an extra 0 at the end (12345678901234567890123456789012345678901234567890), it stops causing segfaults.

Even more confusing is that if I take that same piece of text with a 0 at the end, the variant that doesn't cause a segfault, and use it in the following code, it also always causes a segfault.

This code is also the code I've been using to test this bug. Basically, I've been calling this script with a bunch of sample text files, and the only encoding that ever causes a segfault is ucs2.

Test script:

var text = '12345678901234567890123456789012345678901234567890';

if (process.argv.length > 2) {
  var lastArg = process.argv[process.argv.length - 1];
  if (lastArg.indexOf('-') !== 0) {
    var fs = require('fs');
    text = fs.readFileSync(lastArg).toString();
  }
}

console.log('Text to write:', text);
console.log();
console.log(text.length + ' characters');

function write(encoding) {
  console.log();
  console.log('=== Writing using ' + encoding + ' encoding');
  console.log();

  try {

    var len = Buffer.byteLength(text, encoding);

    console.log('Required buffer length: ' + len);

    var buf = new Buffer(len);
    var bytesWritten = buf.write(text, 0, encoding);

    console.log('Wrote ' + bytesWritten + ' bytes');
    console.log(buf);

  } catch (err) {
    console.log('Error during write:', err);
  }
};

write('base64');
write('binary');
write('utf8');
write('hex');
write('ascii');
write('ucs2');

Example output:

Text to write: 12345678901234567890123456789012345678901234567890

50 characters

=== Writing using base64 encoding

Required buffer length: 37
Wrote 37 bytes
<Buffer d7 6d f8 e7 ae fc f7 4d 76 df 8e 7a ef cf 74 d7 6d f8 e7 ae fc f7 4d 76 df 8e 7a ef cf 74 d7 6d f8 e7 ae fc f7>

=== Writing using binary encoding

Required buffer length: 50
Wrote 50 bytes
<Buffer 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30>

=== Writing using utf8 encoding

Required buffer length: 50
Wrote 50 bytes
<Buffer 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30>

=== Writing using hex encoding

Required buffer length: 25
Wrote 25 bytes
<Buffer 12 34 56 78 90 12 34 56 78 90 12 34 56 78 90 12 34 56 78 90 12 34 56 78 90>

=== Writing using ascii encoding

Required buffer length: 50
Wrote 50 bytes
<Buffer 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30>

=== Writing using ucs2 encoding

Required buffer length: 100
[1]    3798 segmentation fault (core dumped)  node main.js

Backtrace:

Full backtrace

#0  0x0000000000b6c595 in void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) ()
#1  0x000000000085124a in v8::String::Write(unsigned short*, int, int, int) const ()
#2  0x0000000000d9980a in node::StringBytes::Write(v8::Isolate*, char*, unsigned long, v8::Local<v8::Value>, node::encoding, int*) ()
#3  0x0000000000d74de8 in void node::Buffer::StringWrite<(node::encoding)3>(v8::FunctionCallbackInfo<v8::Value> const&) ()
#4  0x00000000008653f7 in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) ()

This bug has acted very oddly, but yet I've managed to replicate it both on my current machine and on a fresh install of io.js on a separate computer. It affects versions 3.0.0 and 3.1.0, but I haven't tested any older versions.

The text was updated successfully, but these errors were encountered:

ChALkeR · 2015-08-20T07:42:31Z

I can't reproduce it using either of those two supplied scripts for some reason.

alexlamsl · 2015-08-20T08:11:05Z

Couldn't reproduce with iojs 2.5.0 on Windows Server 2012 R2

However, on an unrelated note, when I made a typo with encoding, Buffer.byteLength() does not err:

Buffer.byteLength(text, 'ucs2');
// 98
Buffer.byteLength(text, 'usc2');
// 48

Whereas buf.write() would throw TypeError: Unknown encoding: usc2

skomski · 2015-08-20T08:48:39Z

Seems like a compiler optimization bug: I can only reproduce it on master with gcc (5.2.0) if I compile with -O3 not -O2 and with clang it works fine on -O3.

ChALkeR · 2015-08-20T08:59:10Z

@vsimonian Did you build the binaries (using which you could reproduce the bug) yourself?
If yes, what was the compiler version and what were the options?

bnoordhuis · 2015-08-20T10:11:53Z

Can someone post a stack trace (from gdb or lldb) of the crash?

adalinesimonian · 2015-08-20T11:29:17Z

@ChALkeR I'm using binaries from the Nodesource Debian/Ubuntu repository. Should I give compiling io.js a try?

skomski · 2015-08-20T17:10:04Z

Only crashes with gcc -O3 hence this all can be fixed by specifying -fno-tree-loop-vectorize.
The problem is maybe undefined unaligned access which results with the vectorizer into a crash like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58039

* thread #1: tid = 19762, 0x0000000000b2e497 iojs`void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) + 338 at utils.h:1405, name = 'iojs', stop reason = invalid address (fault address: 0x0)
    frame #0: 0x0000000000b2e497 iojs`void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) + 338 at utils.h:1405
   1402       (chars >= static_cast<int>(kMinComplexMemCopy / sizeof(*dest)))) {
   1403     MemCopy(dest, src, chars * sizeof(*dest));
   1404   } else {
-> 1405     while (dest < limit) *dest++ = static_cast<sinkchar>(*src++);
   1406   }
   1407 }
   1408 
(lldb) bt
* thread #1: tid = 19762, 0x0000000000b2e497 iojs`void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) + 338 at utils.h:1405, name = 'iojs', stop reason = invalid address (fault address: 0x0)
  * frame #0: 0x0000000000b2e497 iojs`void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) + 338 at utils.h:1405
    frame #1: 0x0000000000b2e345 iojs`void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) [inlined] void v8::internal::CopyChars<unsigned char, unsigned short>(chars=<unavailable>, src=<unavailable>, dest=<unavailable>) at utils.h:1387
    frame #2: 0x0000000000b2e345 iojs`void v8::internal::String::WriteToFlat<unsigned short>(src=0x00003b639ecb8da3, sink=<unavailable>, f=0, t=18667829) + 69 at objects.cc:9082
    frame #3: 0x0000000000843f37 iojs`v8::String::Write(unsigned short*, int, int, int) const + 169 at api.cc:5087
    frame #4: 0x0000000000843e8e iojs`v8::String::Write(this=<unavailable>, buffer=0x00000000011cd90b, start=0, length=40, options=11) const + 46 at api.cc:5108
    frame #5: 0x0000000000d38121 iojs`node::StringBytes::Write(isolate=<unavailable>, buf=0x00000000011cd90b, buflen=80, val=(val_ = <parent has invalid value.>), encoding=UCS2, chars_written=0x0000000000000000) + 321 at string_bytes.cc:350
    frame #6: 0x0000000000d10ece iojs`node::Buffer::Ucs2Write(v8::FunctionCallbackInfo<v8::Value> const&) + 622 at node_buffer.cc:681
    frame #7: 0x0000000000d10c60 iojs`node::Buffer::Ucs2Write(args=0x00007fffffffd800) + 160 at node_buffer.cc:707
    frame #8: 0x000000000085979b iojs`v8::internal::FunctionCallbackArguments::Call(this=0x00007fffffffd890, f=0x0000000000d10bc0)(v8::FunctionCallbackInfo<v8::Value> const&)) + 155 at arguments.cc:33

indutny · 2015-08-20T18:32:35Z

@bnoordhuis could be alignment problem?

kzc · 2015-08-20T19:16:45Z

char* buf arg is not aligned: buf=0x00000000011cd90b

    frame #5: 0x0000000000d38121 iojs`node::StringBytes::Write(isolate=<unavailable>, buf=0x00000000011cd90b, buflen=80, val=(val_ = <parent has invalid value.>), encoding=UCS2, chars_written=0x0000000000000000) + 321 at string_bytes.cc:350

src/string_bytes.cc:

case UCS2: {
  uint16_t* const dst = reinterpret_cast<uint16_t*>(buf);  // can't safely do this

https://github.com/nodejs/node/blob/master/src/string_bytes.cc#L337

indutny · 2015-08-20T19:34:34Z

Yay, bullseye!

@kzc may I ask you to give a try to this patch?

diff --git a/src/string_bytes.cc b/src/string_bytes.cc
index 0abdbf8..03cfd96 100644
--- a/src/string_bytes.cc
+++ b/src/string_bytes.cc
@@ -340,8 +340,20 @@ size_t StringBytes::Write(Isolate* isolate,
         memcpy(buf, data, nbytes);
         nchars = nbytes / sizeof(*dst);
       } else {
-        nchars = buflen / sizeof(*dst);
-        nchars = str->Write(dst, 0, nchars, flags);
+        // Unaligned `dst`
+        if (reinterpret_cast<intptr_t>(dst) & 1) {
+          uint16_t tmp;
+          nchars = str->Write(&tmp, 0, 1, flags);
+
+          if (nchars != 0) {
+            nchars = (buflen / sizeof(*dst)) - nchars;
+            dst[0] = tmp;
+            nchars = str->Write(dst + 1, 1, nchars, flags) + 1;
+          }
+        } else {
+          nchars = buflen / sizeof(*dst);
+          nchars = str->Write(dst, 0, nchars, flags);
+        }
         nbytes = nchars * sizeof(*dst);
       }
       if (IsBigEndian()) {

kzc · 2015-08-20T19:38:09Z

@indutny - I didn't run it. Just inspected the source code after looking at the stack trace.

indutny · 2015-08-20T19:39:29Z

Oh, right! cc @skomski

kzc · 2015-08-20T19:41:53Z

regarding:

+        if (reinterpret_cast<intptr_t>(dst) & 2) {

did you mean:

+        if (reinterpret_cast<intptr_t>(dst) & 1) {

indutny · 2015-08-20T19:55:04Z

Yikes, sure! Thanks for pointing out.

kzc · 2015-08-20T20:08:36Z

+            nchars = str->Write(dst + 1, 1, nchars, flags) + 1;

wouldn't dst + 1 also be unaligned? Perhaps you want buf + 1?

The nchars logic may need some review.

indutny · 2015-08-20T20:09:34Z

Arrgh... You are right. Let me think about the best way to do it...

indutny · 2015-08-20T20:14:05Z

Ok, here is a revised patch:

diff --git a/src/string_bytes.cc b/src/string_bytes.cc
index 0abdbf8..138b302 100644
--- a/src/string_bytes.cc
+++ b/src/string_bytes.cc
@@ -340,8 +340,27 @@ size_t StringBytes::Write(Isolate* isolate,
         memcpy(buf, data, nbytes);
         nchars = nbytes / sizeof(*dst);
       } else {
-        nchars = buflen / sizeof(*dst);
-        nchars = str->Write(dst, 0, nchars, flags);
+        // Unaligned `dst`
+        if (reinterpret_cast<intptr_t>(dst) & 1) {
+          uint16_t tmp;
+          nchars = str->Write(&tmp, 0, 1, flags);
+
+          if (nchars != 0) {
+            nchars = (buflen / sizeof(*dst)) - nchars;
+            uint8_t* aligned_dst = reinterpret_cast<uint8_t*>(dst) + 1;
+            nchars = str->Write(reinterpret_cast<uint16_t*>(aligned_dst),
+                                1,
+                                nchars,
+                                flags);
+            memmove(aligned_dst + 1, aligned_dst, nchars * 2);
+
+            dst[0] = tmp;
+            nchars++;
+          }
+        } else {
+          nchars = buflen / sizeof(*dst);
+          nchars = str->Write(dst, 0, nchars, flags);
+        }
         nbytes = nchars * sizeof(*dst);
       }
       if (IsBigEndian()) {

@skomski please give it a try.

skomski · 2015-08-20T20:17:51Z

@indutny New patch works for both and original test cases:

text = '1 2 3 4 5 6 7 8 9 1 2'
Buffer(42).write(text, 0, 'ucs2')

text = '12345678901234567890123';
Buffer(44).write(text, 0, 'ucs2');

indutny · 2015-08-20T20:18:35Z

Hooray! ;) @bnoordhuis any comments, suggestions? If none - I'm going to fix the style and send a PR.

kzc · 2015-08-20T20:19:39Z

+            dst[0] = tmp;

isn't this still unaligned? It could work on x86 architecture, but not necessarily others.

indutny · 2015-08-20T20:20:40Z

@kzc this is correct... I guess I should do a memcpy instead.

bnoordhuis · 2015-08-20T20:32:14Z

@indutny I fixed a similar bug a while ago in commits 535fec8 and 56fde66, but for StringBytes::Encode(). There's probably a way to factor out the common code. You may have to take endianness into account.

kzc · 2015-08-20T20:45:31Z

@indutny - the logic regarding nchars and the memmove seems not quite right to me, but I may be mistaken. I think you'll need a number of test cases for unaligned buf UCS2 string conversions of varying lengths to prove it is correct.

indutny · 2015-08-21T00:59:41Z

Here is the fix: #2480 cc @kzc @bnoordhuis

Support unaligned output buffer when writing out UCS2 in `StringBytes::Write`. Fix: nodejs#2457

adalinesimonian · 2015-08-21T06:51:36Z

Not sure how much help this may be, but building io.js 3.1.0 with

'-DNODE_ARCH="x64"' '-DNODE_PLATFORM="linux"' '-DNODE_V8_OPTIONS=""' '-DNODE_WANT_INTERNALS=1' '-DHAVE_OPENSSL=1' '-D__POSIX__' '-DHTTP_PARSER_STRICT=0' '-D_LARGEFILE_SOURCE' '-D_FILE_OFFSET_BITS=64' '-D_POSIX_C_SOURCE=200112' -pthread -Wall -Wextra -Wno-unused-parameter -m64 -O3 -ffunction-sections -fdata-sections -fno-omit-frame-pointer -fno-rtti -fno-exceptions -std=gnu++0x

results in the code triggering a segfault, but building with @indutny's latest version of patch #2480 with the same options fixes the segfault bug.

indutny · 2015-08-21T08:04:01Z

Thank you @vsimonian !

shernshiou · 2015-08-21T14:39:36Z

+1

Support unaligned output buffer when writing out UCS2 in `StringBytes::Write`. Fix: #2457 PR-URL: #2480 Reviewed-By: Trevor Norris <[email protected]>

indutny · 2015-08-21T19:45:48Z

Fixed!

adalinesimonian · 2015-08-21T20:08:16Z

Fantastic work, @indutny! Thank you! 👍

Support unaligned output buffer when writing out UCS2 in `StringBytes::Write`. Fix: #2457 PR-URL: #2480 Reviewed-By: Trevor Norris <[email protected]>

ChALkeR added the buffer Issues and PRs related to the buffer subsystem. label Aug 20, 2015

indutny mentioned this issue Aug 21, 2015

string_bytes: fix unaligned write in UCS2 #2480

Closed

indutny added a commit to indutny/io.js that referenced this issue Aug 21, 2015

string_bytes: fix unaligned write in UCS2

d150144

Support unaligned output buffer when writing out UCS2 in `StringBytes::Write`. Fix: nodejs#2457

kkoopa mentioned this issue Aug 21, 2015

Odd Buffer behavior that leads to crash in iojs 3.0 #2484

Closed

arthurschreiber mentioned this issue Aug 21, 2015

Segmentation fault (core dumped) - Node 0.12.0 / iojs 1.2 tediousjs/tedious#247

Closed

indutny added a commit that referenced this issue Aug 21, 2015

string_bytes: fix unaligned write in UCS2

68e53dd

Support unaligned output buffer when writing out UCS2 in `StringBytes::Write`. Fix: #2457 PR-URL: #2480 Reviewed-By: Trevor Norris <[email protected]>

indutny closed this as completed Aug 21, 2015

indutny added a commit that referenced this issue Aug 22, 2015

string_bytes: fix unaligned write in UCS2

c1ce423

Support unaligned output buffer when writing out UCS2 in `StringBytes::Write`. Fix: #2457 PR-URL: #2480 Reviewed-By: Trevor Norris <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfaults when writing to Buffer using UCS2 encoding #2457

Segfaults when writing to Buffer using UCS2 encoding #2457

adalinesimonian commented Aug 20, 2015

ChALkeR commented Aug 20, 2015

alexlamsl commented Aug 20, 2015

skomski commented Aug 20, 2015

ChALkeR commented Aug 20, 2015

bnoordhuis commented Aug 20, 2015

adalinesimonian commented Aug 20, 2015

skomski commented Aug 20, 2015

indutny commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 20, 2015

indutny commented Aug 20, 2015

skomski commented Aug 20, 2015

indutny commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 20, 2015

bnoordhuis commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 21, 2015

adalinesimonian commented Aug 21, 2015

indutny commented Aug 21, 2015

shernshiou commented Aug 21, 2015

indutny commented Aug 21, 2015

adalinesimonian commented Aug 21, 2015

Segfaults when writing to Buffer using UCS2 encoding #2457

Segfaults when writing to Buffer using UCS2 encoding #2457

Comments

adalinesimonian commented Aug 20, 2015

ChALkeR commented Aug 20, 2015

alexlamsl commented Aug 20, 2015

skomski commented Aug 20, 2015

ChALkeR commented Aug 20, 2015

bnoordhuis commented Aug 20, 2015

adalinesimonian commented Aug 20, 2015

skomski commented Aug 20, 2015

indutny commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 20, 2015

indutny commented Aug 20, 2015

skomski commented Aug 20, 2015

indutny commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 20, 2015

bnoordhuis commented Aug 20, 2015

kzc commented Aug 20, 2015

indutny commented Aug 21, 2015

adalinesimonian commented Aug 21, 2015

indutny commented Aug 21, 2015

shernshiou commented Aug 21, 2015

indutny commented Aug 21, 2015

adalinesimonian commented Aug 21, 2015