Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document (or fix?) v8.deserialize' 2gb limitation for input buffer #40059

Closed
beaugunderson opened this issue Sep 10, 2021 · 4 comments
Closed
Labels
v8 module Issues and PRs related to the "v8" subsystem.

Comments

@beaugunderson
Copy link

Version

v16.9.0

Platform

Darwin theodore 20.6.0 Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64 x86_64

Subsystem

v8

What steps will reproduce the bug?

Here is a script that will demonstrate the issue:

#!/usr/bin/env node --max-old-space-size=32768

const v8 = require('v8');

const PADDING_STRING = `Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin non quam in diam laoreet rhoncus condimentum quis neque. Sed luctus arcu eget velit tincidunt rhoncus. Mauris eros libero, lobortis et dolor quis, interdum sagittis dolor. Maecenas sit amet nulla at risus ullamcorper gravida. Sed varius nulla vel faucibus accumsan. Sed luctus purus felis, sagittis vehicula justo sollicitudin sed. Duis laoreet lobortis condimentum. Nunc ac nisi quis dolor malesuada aliquet eget in quam.

Vivamus malesuada leo et nisi feugiat varius. Vivamus ut dapibus tellus. Nunc interdum metus eget odio accumsan efficitur. Donec ac nisl id justo ullamcorper porta. Aliquam molestie dictum purus, non tincidunt mi facilisis non. Praesent lorem felis, pretium at consectetur et, elementum nec nisi. Etiam placerat lorem at maximus vulputate. Donec sollicitudin pretium ligula. Curabitur eu porta leo, sit amet tempor leo. Vivamus venenatis massa metus, at tempor dolor pharetra vel. Cras eget turpis eu nisi elementum dapibus sit amet sit amet nibh. Aliquam rhoncus eros et mauris aliquam, rhoncus condimentum purus placerat. Nam mollis sollicitudin ante, non imperdiet nulla commodo vitae. Mauris sollicitudin quam ut ipsum dignissim, in mollis augue placerat. Morbi suscipit auctor hendrerit. Morbi dictum sagittis nulla nec posuere.

Suspendisse potenti. Proin vehicula est blandit, euismod velit sed, maximus augue. Nullam id rhoncus risus. Donec cursus lobortis porttitor. Donec fringilla, sem ac vehicula finibus, sem nisl ultricies leo, eu finibus sapien metus eu nibh. Fusce ut erat eu arcu aliquet tincidunt. Maecenas tristique enim non ante varius, quis semper justo efficitur. Integer maximus ultrices nisl at molestie.

Proin interdum, quam ut pellentesque congue, magna urna tristique felis, malesuada porta orci metus a purus. Morbi porttitor ex nec arcu mollis luctus. Ut quis tortor purus. Ut eu odio pharetra, fringilla ligula sit amet, sagittis lectus. Aenean ac quam vel ex mollis rutrum. Aliquam nulla leo, varius at mauris eu, porttitor egestas libero. Praesent sed feugiat augue, iaculis feugiat libero.

Suspendisse potenti. Suspendisse blandit ex quis nunc elementum pellentesque. Nam sagittis dui id faucibus faucibus. Pellentesque faucibus augue sit amet lorem cursus cursus. Pellentesque ut venenatis nisl, in placerat quam. Sed in enim at eros condimentum porttitor quis et lectus. Praesent sit amet pulvinar sapien. Quisque sodales mi ante, eu fermentum enim rhoncus sed. Praesent ac arcu eu erat cursus vulputate. Morbi cursus libero lectus, at tempus odio varius eget. Quisque finibus urna sed cursus rhoncus. Vivamus faucibus cursus imperdiet. Phasellus interdum sapien in odio rutrum, ut ornare turpis dictum. Sed mauris dui, molestie in odio nec, vulputate venenatis ex.`;

for (let i = 1; i < 64; i++) {
  const toSerialize = {};

  console.log(`2 ** ${i} (${2 ** i}) attributes`);

  for (let j = 0; j < 2 ** i; j++) {
    toSerialize[j] = PADDING_STRING;
  }

  const buffer = v8.serialize(toSerialize);

  console.log(`buffer length: ${buffer.length}`);

  v8.deserialize(buffer);
}

And here is the output on my system:

$ ./test-case.js

2 ** 1 (2) attributes
buffer length: 5593
2 ** 2 (4) attributes
buffer length: 11181
2 ** 3 (8) attributes
buffer length: 22357
2 ** 4 (16) attributes
buffer length: 44709
2 ** 5 (32) attributes
buffer length: 89413
2 ** 6 (64) attributes
buffer length: 178821
2 ** 7 (128) attributes
buffer length: 357702
2 ** 8 (256) attributes
buffer length: 715462
2 ** 9 (512) attributes
buffer length: 1430982
2 ** 10 (1024) attributes
buffer length: 2862022
2 ** 11 (2048) attributes
buffer length: 5724102
2 ** 12 (4096) attributes
buffer length: 11448262
2 ** 13 (8192) attributes
buffer length: 22896582
2 ** 14 (16384) attributes
buffer length: 45801415
2 ** 15 (32768) attributes
buffer length: 91611079
2 ** 16 (65536) attributes
buffer length: 183230407
2 ** 17 (131072) attributes
buffer length: 366469063
2 ** 18 (262144) attributes
buffer length: 732946375
2 ** 19 (524288) attributes
buffer length: 1465900999
2 ** 20 (1048576) attributes
buffer length: 2931810247
node:v8:344
  der.readHeader();
      ^

Error: Unable to deserialize cloned data.
    at DefaultDeserializer.readHeader (<anonymous>)
    at Object.deserialize (node:v8:344:7)
    at Object.<anonymous> (/Users/beau/p/zed-run/test-case.js:28:6)
    at Module._compile (node:internal/modules/cjs/loader:1101:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:79:12)
    at node:internal/main/run_main_module:17:47

How often does it reproduce? Is there a required condition?

100% of the time if the buffer to deserialize exceeds 2gb.

What is the expected behavior?

Either the documentation is updated to reflect the limit or the limit is removed.

What do you see instead?

Please see the output above.

Additional information

v8's serialize and deserialize are fantastic and very performant for large datasets; much faster than the alternatives like e.g. msgpackr. I'd love to continue using them as my dataset grows so that I can put off re-architecting things a bit further, but I realize it's probably a very extreme use case. Documenting the limitation will save the next person who hits it some time, however! I spent a lot of time assuming that my file writing code was broken in some way (which I had to rewrite because it also has a 2gb limitation; I hit both limits at the same time but didn't realize deserialize had the same limit).

In the short term I will probably shard keys across multiple files.

@VoltrexKeyva VoltrexKeyva added the v8 module Issues and PRs related to the "v8" subsystem. label Sep 10, 2021
@rayw000
Copy link
Contributor

rayw000 commented Sep 13, 2021

After tracking for a while, I found that the constructor of ValueDeserializer can only accept size argument to be a signed int (on 64-bit unix-like systems, it is 32-bit), or, the construction will be marked as has_aborted. Meanwhile, constructor of ValueSerializer doesn't have this limitation. That makes ValueSerializer can serialize huge object into a huge buffer larger than 2GB, which cannot be deserialized by ValueDeserializer. This is why we get DataCloneDeserializationError, which reads Unable to deserialize cloned data..

See line 3273 and 3281.

node/deps/v8/src/api/api.cc

Lines 3271 to 3283 in 5c1adda

ValueDeserializer::ValueDeserializer(Isolate* isolate, const uint8_t* data,
size_t size, Delegate* delegate) {
if (base::IsValueInRangeForNumericType<int>(size)) {
private_ = new PrivateData(
reinterpret_cast<i::Isolate*>(isolate),
base::Vector<const uint8_t>(data, static_cast<int>(size)), delegate);
} else {
private_ =
new PrivateData(reinterpret_cast<i::Isolate*>(isolate),
base::Vector<const uint8_t>(nullptr, 0), nullptr);
private_->has_aborted = true;
}
}

I've tried to simply use size_t, which ranges from 0 to 264-1 on 64-bit systems, instead of int as template argument to create ValueDeserializer instance as following

 ValueDeserializer::ValueDeserializer(Isolate* isolate, const uint8_t* data, 
                                      size_t size, Delegate* delegate) { 
-  if (base::IsValueInRangeForNumericType<int>(size)) {
+  if (base::IsValueInRangeForNumericType<size_t>(size)) {
     private_ = new PrivateData( 
         reinterpret_cast<i::Isolate*>(isolate), 
-        base::Vector<const uint8_t>(data, static_cast<int>(size)), delegate);
+        base::Vector<const uint8_t>(data, static_cast<size_t>(size)), delegate);
   } else { 
     private_ = 
         new PrivateData(reinterpret_cast<i::Isolate*>(isolate), 
                         base::Vector<const uint8_t>(nullptr, 0), nullptr); 
     private_->has_aborted = true; 
   } 
 } 

but get DataCloneDeserializationVersionError, which says Unable to deserialize cloned data due to invalid or unsupported version..

Let me explain why we get this error.

When we try to deserialize a huge buffer, we need to construct a ValueDeserializer object first, by calling its constructor. Line 1120 calculates buffer end by adding data.begin() and data.length(), and assign the result to ValueDeserializer::end_.

ValueDeserializer::ValueDeserializer(Isolate* isolate,
base::Vector<const uint8_t> data,
v8::ValueDeserializer::Delegate* delegate)
: isolate_(isolate),
delegate_(delegate),
position_(data.begin()),
end_(data.begin() + data.length()),
id_map_(isolate->global_handles()->Create(
ReadOnlyRoots(isolate_).empty_fixed_array())) {}

HOWEVER, data.begin() returns a pointer (64-bit width), while data.length() returns an integer (32-bit width). If we have a buffer larger than 2GB, data.begin() + data.length() # known as end_ will overflow. This makes data.begin() # known as position_ somehow greater than end_ (line 1143). In this case, version_ variable is not set (line 1146), calling GetWireFormatVersion() gets its initial value 0.

Maybe<bool> ValueDeserializer::ReadHeader() {
if (position_ < end_ &&
*position_ == static_cast<uint8_t>(SerializationTag::kVersion)) {
ReadTag().ToChecked();
if (!ReadVarint<uint32_t>().To(&version_) || version_ > kLatestVersion) {
isolate_->Throw(*isolate_->factory()->NewError(
MessageTemplate::kDataCloneDeserializationVersionError));
return Nothing<bool>();
}
}
return Just(true);
}

Perhaps we can just leave everything original, if we are okay to be unable to deserialize huge objects successfully serialized by v8. This is just a little bit weird, but it is safe.

Or, we can make every object serialized by ValueSerializer deserializable.

Possible Solution:

  1. Remove buffer size limit when trying to deserialize buffer. See https://chromium-review.googlesource.com/c/v8/v8/+/3170411
  2. Document buffer size limit of node.js. See buffer,doc: Throw error instead of assert when buffer too large #40243

You can pull rayw000@d556d61 and #40243 for test. This following code snippet

#!/usr/bin/env node --max-old-space-size=32768

const v8 = require('v8');

const str = `AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA`
const obj = {}
for (let i = 1; i < 22; i++) {
  for (let j = 0; j < 2 ** i; j++) {
    obj[j] = str;
  }
}
const buffer = v8.serialize(obj)

will output

node:v8:333
  return ser.releaseBuffer();
             ^

Error: Cannot create a Buffer larger than 0x100000000 bytes
    at Object.serialize (node:v8:333:14)
    at Object.<anonymous> (/Users/rayw000/repo/open-source/test-js/index.js:12:19)
    at Module._compile (node:internal/modules/cjs/loader:1095:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1147:10)
    at Module.load (node:internal/modules/cjs/loader:975:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
    at node:internal/main/run_main_module:17:47 {
  code: 'ERR_BUFFER_TOO_LARGE'
}

Node.js v17.0.0-pre

@VoltrexKeyva
Copy link
Member

I'm kinda +1 with @rayw000's idea, we could just remove the limitation and allow the deserialization of all the objects that can be serialized and document it somewhere.

@joyeecheung @addaleax what do you both think?

@rayw000
Copy link
Contributor

rayw000 commented Sep 18, 2021

I just submitted a patch to Gerrit: https://chromium-review.googlesource.com/c/v8/v8/+/3170411

lazyparser pushed a commit to lazyparser/v8 that referenced this issue Sep 27, 2021
1. Now there is no serializer/deserializer-specific buffer size limit.
2. Update AUTHORS

Ref: nodejs/node#40059

Change-Id: Iad4c6d8f68a91ef21d3c404fb7945949e69ad9e2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3170411
Reviewed-by: Jakob Kummerow <[email protected]>
Reviewed-by: Clemens Backes <[email protected]>
Commit-Queue: Jakob Kummerow <[email protected]>
Cr-Commit-Position: refs/heads/main@{#77084}
@targos targos closed this as completed in c83f47f Oct 23, 2021
targos pushed a commit that referenced this issue Oct 23, 2021
PR-URL: #40243
Fixes: #40059
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
targos pushed a commit that referenced this issue Oct 23, 2021
PR-URL: #40243
Fixes: #40059
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
targos pushed a commit that referenced this issue Oct 23, 2021
Original commit message:

    [deserialization] Remove unnecessarily limit on buffer size

    1. Now there is no serializer/deserializer-specific buffer size limit.
    2. Update AUTHORS

    Ref: #40059

    Change-Id: Iad4c6d8f68a91ef21d3c404fb7945949e69ad9e2
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3170411
    Reviewed-by: Jakob Kummerow <[email protected]>
    Reviewed-by: Clemens Backes <[email protected]>
    Commit-Queue: Jakob Kummerow <[email protected]>
    Cr-Commit-Position: refs/heads/main@{#77084}

Refs: v8/v8@422dc37

PR-URL: #40450
Fixes: #40059
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
targos pushed a commit that referenced this issue Oct 23, 2021
PR-URL: #40243
Fixes: #40059
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
targos pushed a commit that referenced this issue Oct 23, 2021
PR-URL: #40243
Fixes: #40059
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
@rayw000
Copy link
Contributor

rayw000 commented Oct 23, 2021

Since #40450 and #40243 are landed, now we can serialize any objects requiring buffer no larger than kMaxLength. For those requiring buffer larger than kMaxLength, an ERR_BUFFER_TOO_LARGE error will be thrown.
As to deserialization, all buffers do not violate object model can be deserialized, no matter how large they are, if possible.

BethGriggs pushed a commit that referenced this issue Nov 24, 2021
Original commit message:

    [deserialization] Remove unnecessarily limit on buffer size

    1. Now there is no serializer/deserializer-specific buffer size limit.
    2. Update AUTHORS

    Ref: #40059

    Change-Id: Iad4c6d8f68a91ef21d3c404fb7945949e69ad9e2
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3170411
    Reviewed-by: Jakob Kummerow <[email protected]>
    Reviewed-by: Clemens Backes <[email protected]>
    Commit-Queue: Jakob Kummerow <[email protected]>
    Cr-Commit-Position: refs/heads/main@{#77084}

Refs: v8/v8@422dc37

PR-URL: #40450
Fixes: #40059
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
BethGriggs pushed a commit that referenced this issue Nov 24, 2021
PR-URL: #40243
Fixes: #40059
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
BethGriggs pushed a commit that referenced this issue Nov 24, 2021
PR-URL: #40243
Fixes: #40059
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v8 module Issues and PRs related to the "v8" subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants