Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V8 serialize error on build with huge number of pages(100k+) #17233

Closed
ganapativs opened this issue Aug 30, 2019 · 19 comments · Fixed by #21993
Closed

V8 serialize error on build with huge number of pages(100k+) #17233

ganapativs opened this issue Aug 30, 2019 · 19 comments · Fixed by #21993
Assignees
Labels
type: bug An issue or pull request relating to a bug in Gatsby

Comments

@ganapativs
Copy link
Contributor

ganapativs commented Aug 30, 2019

Description

Getting various issues(related to V8 serialize etc) when trying to build large number of pages(80k+ docs of 10kb each) with latest gatsby+remark resulting in the build failure.

Basically build crashes with below errors.

Without loki

success run page queries - 2027.713 s — 3335/3335 1.64 queries/second

node[11428]: ../src/node_buffer.cc:412:MaybeLocal<v8::Object> node::Buffer::New(node::Environment *, char *, size_t): Assertion `length <= kMaxLength' failed.
1: 0x100033d65 node::Abort() [/usr/local/bin/node]
2: 0x100032dab node::MakeCallback(v8::Isolate*, v8::Local<v8::Object>, char const*, int, v8::Local<v8::Value>*, node::async_context) [/usr/local/bin/node]
3: 0x100046ff5 _register_buffer() [/usr/local/bin/node]
4: 0x100098391 node::(anonymous namespace)::SerializerContext::ReleaseBuffer(v8::FunctionCallbackInfo<v8::Value> const&) [/usr/local/bin/node]
5: 0x10022b83f v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo*) [/usr/local/bin/node]
6: 0x10022ad81 v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/usr/local/bin/node]
7: 0x10022a3d0 v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) [/usr/local/bin/node]
8: 0x23830e0841bd
9: 0x23830e093a09
error Command failed with signal "SIGABRT".
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

With loki(GATSBY_DB_NODES=loki)

success run page queries - 1976.632 s — 3335/3335 1.69 queries/second

Stacktrace:
ptr1=0x25b9d2202321
    ptr2=0x0
    ptr3=0x0
    ptr4=0x0
    failure_message_object=0x7ffeefbed370

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x355f68c041bd]
    1: StubFrame [pc: 0x355f68c85ad7]
Security context: 0x25b9bbd9e6c9 <JSObject>
    2: saveState [0x25b910175701] [/Users/guns/public/node_modules/gatsby/dist/db/index.js:30] [bytecode=0x25b9097edb61 offset=181](this=0x25b90cd96279 <Object map = 0x25b90f1ad969>)
    3: /* anonymous */ [0x25b9850fee71](this=0x25b93d108c59 <JSGlobal Object>,0x25b9d2202321 <the_hole>)
    4: StubFrame [pc: 0x355f68c42871]
    5: StubFrame [pc: 0x355f68c21b9a]
    6: EntryFrame [pc: 0x355f68c0ba01]

==== Details ================================================

[0]: ExitFrame [pc: 0x355f68c041bd]
[1]: StubFrame [pc: 0x355f68c85ad7]
[2]: saveState [0x25b910175701] [/Users/guns/public/node_modules/gatsby/dist/db/index.js:30] [bytecode=0x25b9097edb61 offset=181](this=0x25b90cd96279 <Object map = 0x25b90f1ad969>) {
// stack-allocated locals
var .generator_object = 0x25baaacb2ee9 <JSGenerator>
var /* anonymous */ = 0x25baaacb2eb9 <Promise map = 0x25b9b4783e89>
// expression stack (top to bottom)
[11] : 0x25b9d2202321 <the_hole>
[10] : 0x25b9097ed889 <String[24]: Error persisting state: >
[09] : 0x25b921b04c89 <Object map = 0x25b983544361>
[08] : 0x25b9794ede29 <JSBoundFunction (BoundTargetFunction 0x25b9794ecbf1)>
[07] : 0x25b9101767b9 <FunctionContext[9]>
[06] : 0x25b9850ff331 <CatchContext[5]>
[05] : 0x25b9101767b9 <FunctionContext[9]>
[04] : 0x25b9101767b9 <FunctionContext[9]>
[03] : 0x25b90cd96279 <Object map = 0x25b90f1ad969>
[02] : 0x25b910175701 <JSFunction saveState (sfi = 0x25b9996e74f9)>
--------- s o u r c e   c o d e ---------
function saveState() {\x0a  if (saveInProgress) return;\x0a  saveInProgress = true;\x0a\x0a  try {\x0a    await Promise.all(dbs.map(db => db.saveState()));\x0a  } catch (err) {\x0a    report.warn(`Error persisting state: ${err && err.message || err}`);\x0a  }\x0a\x0a  saveInProgress = false;\x0a}
-----------------------------------------
}

[3]: /* anonymous */ [0x25b9850fee71](this=0x25b93d108c59 <JSGlobal Object>,0x25b9d2202321 <the_hole>) {
// optimized frame
--------- s o u r c e   c o d e ---------
<No Source>
-----------------------------------------
}
[4]: StubFrame [pc: 0x355f68c42871]
[5]: StubFrame [pc: 0x355f68c21b9a]
[6]: EntryFrame [pc: 0x355f68c0ba01]
=====================

error Command failed with signal "SIGILL".
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Interestingly the build for 200k(4.5kb each post) runs successfully on [email protected] which uses JSON.stringify to persist state(shows a redux persisting state warning, but everything works).

Steps to reproduce

Repro repo: https://github.com/ganapativs/gatsby-v8-issue-repro (README has everything related to the issue and other observations).

Expected result

Build should be successful without V8 serialize error.

Actual result

Build crashed with V8 serialize error. DANGEROUSLY_DISABLE_OOM would have helped temporarily, but, it was removed recently 😅

Environment

System:
OS: macOS 10.15
CPU: (8) x64 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
Shell: 5.7.1 - /bin/zsh

Binaries:
Node: 10.6.0 - /usr/local/bin/node
Yarn: 1.7.0 - ~/.yarn/bin/yarn
npm: 6.1.0 - /usr/local/bin/npm

Languages:
Python: 2.7.16 - /usr/bin/python

Browsers:
Chrome: 76.0.3809.132
Safari: 13.0

npmPackages:
gatsby: 2.14.0 => 2.14.0
gatsby-plugin-styled-components: 3.1.3 => 3.1.3
gatsby-remark-autolink-headers: 2.1.8 => 2.1.8
gatsby-remark-prismjs: 3.3.9 => 3.3.9
gatsby-remark-sub-sup: 1.0.0 => 1.0.0
gatsby-source-mongodb: 2.1.9 => 2.1.9
gatsby-transformer-remark: 2.6.19 => 2.6.19

npmGlobalPackages:
gatsby-cli: 2.7.40

@ganapativs
Copy link
Contributor Author

@sidharthachatterjee As discussed the other day, created a repro here. Please look into it 👍

@ganapativs
Copy link
Contributor Author

Probably some related stuffs!
v8.serialize strange issue - nodejs/help#1059

I still get RangeError: Invalid string length warning on [email protected] in redux persist stage(uses JSON.stringify) and lokijs internally uses JSON.stringify to serialize. This article has a nice solution.

@LekoArts LekoArts added the type: bug An issue or pull request relating to a bug in Gatsby label Sep 4, 2019
@wardpeet
Copy link
Contributor

wardpeet commented Sep 4, 2019

Awesome @ganapativs thanks for creating this repro.

@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Sep 25, 2019
@gatsbot
Copy link

gatsbot bot commented Sep 25, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@ganapativs ganapativs removed the stale? Issue that may be closed soon due to the original author not responding any more. label Sep 30, 2019
@gatsbot
Copy link

gatsbot bot commented Oct 21, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Oct 21, 2019
@gatsbot
Copy link

gatsbot bot commented Nov 1, 2019

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.

Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else.

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community!

@gatsbot gatsbot bot closed this as completed Nov 1, 2019
@ganapativs
Copy link
Contributor Author

ganapativs commented Nov 4, 2019

This is a serious scalability bug! Auto closing isn't helpful here.

@ganapativs ganapativs reopened this Nov 4, 2019
@LekoArts LekoArts removed the stale? Issue that may be closed soon due to the original author not responding any more. label Nov 4, 2019
@LekoArts
Copy link
Contributor

LekoArts commented Nov 4, 2019

cc @gatsbyjs/core

@starrett67
Copy link

Currently facing this issue with a production site running on gatsby.

@ganapativs
Copy link
Contributor Author

I have tested this issue on gatsby 2.17.7 and node 13.0.1. Issue still persists.

Updated the repro repo.

@adonig
Copy link

adonig commented Jan 21, 2020

Does anyone have an idea how to prevent this from happening? My site has "only" 30k pages but I run into this issue.

@pvdz
Copy link
Contributor

pvdz commented Jan 30, 2020

Ok, I am looking into it now. Consider this a research post while I'm trying to dig in.

I think nodejs/help#1059 is interesting because that implies that there shouldn't be a concrete difference between the v8.serialize and json.stringify, apart from a more aggressive gc schedule. This could very well be the reason. Additionally we might consider that the performance improvement of using v8.serialize over json.stringify is only perceived and the cost is ultimately still paid before the process exits. That's an interesting fact.

Keep in mind, async operations may change the impact, as an async operation might give nodejs more idle time to run GC. Of course, if postponing GC leads to OOMs we need to re-evaluate that.


The repro.

I had to install mongo (sudo apt-get install mongodb mongo-clients) because that wasn't available on my xfce/ubuntu system.

I had to update the script a tiny bit to get the repo working;

    db.repro.insert({
			title: `${rest.title}-${i}`,
			body: `${rest.body}-${i}`,
			slug: `${rest.slug}-${i}`
    });

because I was getting "invalid id" errors by mongo (added the suffix because why not, I realize the repo doesn't have that).

Running it on 10k pages without bumping the memory quickly OOMs during sourcing.

Running it with 12GB;

Memory consumption remains fairly stable during the sourcing step (~4gb?).
Sourcing takes about 730s (yikes). I'll have to look into that on a separate note. On the next run this took just 70 seconds, maybe mongo is caching it?
The run queries step runs at literally 1 query per second. Will be testing a fix that was merged yesterday to improve this ~1000x. This patch will not apply to loki btw.
Memory consumption during the 1q/s "run queries" slowly increased to 7gb.
Immediatey after the run queries step it crashed out, well under the available memory.

Here's my runtime output:

 😶  13:50:04 (master) v8-issue-repro $ yarn clean; NODE_OPTIONS=--max_old_space_size=12288 SKIP_PAGE_BUILD=1 yarn build
yarn run v1.21.1
$ rm -rf public .cache
Done in 0.06s.
yarn run v1.21.1
$ NODE_ENV=production gatsby build
success open and validate gatsby-configs - 0.020s
success load plugins - 0.300s
success onPreInit - 0.009s
success delete html and css files from previous builds - 0.010s
success initialize cache - 0.007s
success copy gatsby files - 0.022s
success onPreBootstrap - 0.012s

 ERROR 

(node:5411) DeprecationWarning: current Server Discovery and Monitoring engine is deprecated, and will be removed in a future version. To use the new Server Discover and Monitoring engine, pass option { useUnifiedTopology: true } to the MongoClient constructor.

success source and transform nodes - 730.717s
success building schema - 2.564s
success createPages - 2.514s
success createPagesStatefully - 0.177s
success onPreExtractQueries - 0.003s
success update schema - 0.020s
success extract queries from components - 0.075s
warn The GraphQL query in the non-page component "v8-issue-repro/src/templates/post.js" will not be run.
Exported queries are only executed for Page components. It's possible you're
trying to create pages in your gatsby-node.js and that's failing for some
reason.

If the failing component(s) is a regular component and not intended to be a page
component, you generally want to use a <StaticQuery> (https://gatsbyjs.org/docs/static-query)
instead of exporting a page query.

If you're more experienced with GraphQL, you can also export GraphQL
fragments from components and compose the fragments in the Page component
query and pass data down into the child component — http://graphql.org/learn/queries/#fragments
success write out requires - 0.052s
success write out redirect data - 0.004s
success onPostBootstrap - 0.002s
⠀
info bootstrap finished - 739.732 s
⠀
warn Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`
success Building production JavaScript and CSS bundles - 8.467s
success Rewriting compilation hashes - 0.004s
success run queries - 2136.418s - 3335/3335 1.56/s
node/v10.18.1/bin/node[5411]: ../src/node_buffer.cc:420:v8::MaybeLocal<v8::Object> node::Buffer::New(node::Environment*, char*, size_t): Assertion `length <= kMaxLength' failed.
 1: 0x8fa090 node::Abort() [node/v10.18.1/bin/node]
 2: 0x8fa165  [node/v10.18.1/bin/node]
 3: 0x91b8fa  [node/v10.18.1/bin/node]
 4: 0x990755  [node/v10.18.1/bin/node]
 5: 0xb8e7df  [node/v10.18.1/bin/node]
 6: 0xb8f349 v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) [node/v10.18.1/bin/node]
 7: 0x5b254ddbe1d 
Aborted (core dumped)
error Command failed with exit code 134.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

The crash is not a regular OOM but a string length assertion error. Perhaps the serialized content is too much to bear. (After all, there are inherent limits to nodejs, a maximum string length is one of them.)

The above ran on [email protected] in node v10.17.

Bumping it to [email protected] (yarn add [email protected]).

This time the sourcing step improved, but there was no change to the run queries time. Debugging that it seems it doesn't use a filter at all. It seems the slowness is coming from the html handler. Will have to look into that later.

After about three or four restarts (while debugging) the build now OOMs during the createPages step, which took 2s before. And I cannot get it to move forward. In this case I can see the memory grow (relatively) rapidly and after ~2 minutes the 12gb are up and it OOMs. I took a break and picked it up the next day. When I got back to it this step was fine again, not sure what is causing this... Can anyone reliably repro this problem?

This makes me wonder whether there aren't two things at play here. My OOM problem certainly seems to stem from another source. I use gatsby clean for every step so it's not something that's stored in .cache or public. Maybe Mongo does something differently. Restarting it did not help.

Regenerating the db with 100 pages makes the run finish fine, in 10s. Not unexpected but good to see that still works. I checked into why the queries run so slow. Turns out they are actually under reporting their work; each query is running remark for every post on the page. By default there are 30 pages on each page so remark is called 30 times for that fact alone, but it visually counts as one step for the query.

If I go into gatsby-transformer-remark and add return 'foo' in the resolve(markdownNode, { format, pruneLength, truncate }) { function, then the queries speed up drastically and the 100k site completes within reasonable time (~4.5 minutes). (This does expose a scaling issue as the small site runs to completion much faster than a big site, but that's a different issue to deal with later). And with this hack the assert is still proced so that's a decent repro. And allows us to skip a decent chunk of code :)

success run queries - 186.415s - 3335/3335 17.89/s
node[8201]: ../src/node_buffer.cc:420:v8::MaybeLocal<v8::Object> node::Buffer::New(node::Environment*, char*, size_t): Assertion `length <= kMaxLength' failed.
 1: 0x8fa090 node::Abort() [node]
 2: 0x8fa165  [node]
 3: 0x91b8fa  [node]
 4: 0x990755  [node]
 5: 0xb8e7df  [node]
 6: 0xb8f349 v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) [node]
 7: 0x190294e5be1d 
Aborted (core dumped)

Then I confirmed whether this problem pops up with JSON.stringify; Change the v8 bits in persist.ts to use json instead. No assert crash.

Now this isn't necessarily a surprise. v8.serialize is more efficient because of a few reasons, but it probably also serializes more data than JSON.stringify, which is not designed to serialize functions or class instances. Let's take a look at the output for a small number of pages;

I compared the v8.serialize to the old way of doing json.stringify (using https://github.com/stefanprobst/gatsby/blob/c043816915c0e4b632730091c1d14df08d6249d4/packages/gatsby/src/redux/persist.js as a reference point). Both ways dump about 500k of data.

Before I "checked" the assert with a simple JSON.stringify. However, I need to run this again since the original way of stringifying was capturing a lot more. Running it again also shows warn Error persisting state: Invalid string length, but it's non-fatal and teh build completes. Probably not saving the redux state, though ;)

Next I checked whether I could catch the assertion error. This is more difficult because the error happens somewhere inside the serialzation call and we don't control the assert. It doesn't appear to be try-catchable... :( Arguably this is a bug in nodejs as the assert should instead throw a catchable error (like happens with JSON.stringify), but I doubt anyone cares to change that.


The way forward

So after some discussion we're going to try to chunk the pages nodes. Ultimately we're hitting the arbitrary buffer limit and there's no easy way to fix that. So instead we'll first serialize the redux state without the page nodes. Then we'll try to apply some heuristics to chunk the page nodes such that they stay well below this limit. This means the redux state has to be serialized across multiple files, but that should also means that it won't fatal.

We're gonna be a little busy the next week but watch this space.

@pvdz pvdz added the scaling label Jan 30, 2020
@ganapativs
Copy link
Contributor Author

Thanks for picking this up and that's a great analysis. I'll try to spend some time this weekend on this and see if I can figure out something more.

pvdz added a commit that referenced this issue Feb 18, 2020
We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages.

The solution is to shard the `nodes` property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically `Infinity`). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB.

The serialization process is updated to no longer put the `nodes` in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible.
@pvdz
Copy link
Contributor

pvdz commented Feb 18, 2020

For anyone bothered by this, can you confirm whether #21555 fixes your issue? I expect to merge this soon so if you're not comfortable with how to check this you can also wait for a bump.

pvdz added a commit that referenced this issue Feb 18, 2020
We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages.

The solution is to shard the `nodes` property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically `Infinity`). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB.

The serialization process is updated to no longer put the `nodes` in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible.
pvdz added a commit that referenced this issue Feb 19, 2020
We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages.

The solution is to shard the `nodes` property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically `Infinity`). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB.

The serialization process is updated to no longer put the `nodes` in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible.

Because the write is no longer atomized, the process will now write the redux cache to its own `redux` folder. When writing a new cache it will prepare the new cache in a tmp folder first, then move the existing `redux` folder to a temp location, move the new folder to `redux`, and then try to drop the old folder. This is about as transactional as you can get and should leave the cache in either a stale, empty, or updated state. But never in a partial state.
pvdz added a commit that referenced this issue Feb 19, 2020
We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages.

The solution is to shard the `nodes` property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically `Infinity`). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB.

The serialization process is updated to no longer put the `nodes` in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible.
pvdz added a commit that referenced this issue Feb 20, 2020
We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages.

The solution is to shard the `nodes` property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically `Infinity`). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB.

The serialization process is updated to no longer put the `nodes` in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible.

Because the write is no longer atomized, the process will now write the redux cache to its own `redux` folder. When writing a new cache it will prepare the new cache in a tmp folder first, then move the existing `redux` folder to a temp location, move the new folder to `redux`, and then try to drop the old folder. This is about as transactional as you can get and should leave the cache in either a stale, empty, or updated state. But never in a partial state.
pvdz added a commit that referenced this issue Feb 20, 2020
We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages.

The solution is to shard the `nodes` property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically `Infinity`). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB.

The serialization process is updated to no longer put the `nodes` in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible.

Because the write is no longer atomized, the process will now write the redux cache to its own `redux` folder. When writing a new cache it will prepare the new cache in a tmp folder first, then move the existing `redux` folder to a temp location, move the new folder to `redux`, and then try to drop the old folder. This is about as transactional as you can get and should leave the cache in either a stale, empty, or updated state. But never in a partial state.
pvdz added a commit that referenced this issue Feb 21, 2020
We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages.

The solution is to shard the `nodes` property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically `Infinity`). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB.

The serialization process is updated to no longer put the `nodes` in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible.

Because the write is no longer atomized, the process will now write the redux cache to its own `redux` folder. When writing a new cache it will prepare the new cache in a tmp folder first, then move the existing `redux` folder to a temp location, move the new folder to `redux`, and then try to drop the old folder. This is about as transactional as you can get and should leave the cache in either a stale, empty, or updated state. But never in a partial state.
pvdz added a commit that referenced this issue Feb 24, 2020
We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages.

The solution is to shard the `nodes` property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically `Infinity`). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB.

The serialization process is updated to no longer put the `nodes` in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible.

Because the write is no longer atomized, the process will now write the redux cache to its own `redux` folder. When writing a new cache it will prepare the new cache in a tmp folder first, then move the existing `redux` folder to a temp location, move the new folder to `redux`, and then try to drop the old folder. This is about as transactional as you can get and should leave the cache in either a stale, empty, or updated state. But never in a partial state.
pvdz added a commit that referenced this issue Feb 25, 2020
* fix(gatsby): Chunk nodes when serializing redux to prevent OOM

We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages.

The solution is to shard the `nodes` property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically `Infinity`). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB.

The serialization process is updated to no longer put the `nodes` in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible.

Because the write is no longer atomized, the process will now write the redux cache to its own `redux` folder. When writing a new cache it will prepare the new cache in a tmp folder first, then move the existing `redux` folder to a temp location, move the new folder to `redux`, and then try to drop the old folder. This is about as transactional as you can get and should leave the cache in either a stale, empty, or updated state. But never in a partial state.
@pvdz
Copy link
Contributor

pvdz commented Feb 25, 2020

The fix was published in [email protected]

Please report if there are still problems and provide a repro if that's the case.

(And hey, if your issue is now fixed, let me know too :) )

@pvdz pvdz closed this as completed Feb 25, 2020
@ganapativs
Copy link
Contributor Author

Thank you very much 👍

@nadiavanleur
Copy link

This, or a similar problem still occurs for me. Could you confirm this is related? #21957

@j218
Copy link

j218 commented Mar 4, 2020

Hey @pvdz my team is running into the same issue. Gatsby build breaks right after run queries with the same error log:
"node[8201]: ../src/node_buffer.cc:420:v8::MaybeLocalv8::Object node::Buffer::New(node::Environment*, char*, size_t): Assertion `length <= kMaxLength' failed."

I updated gatsby to the version mentioned and looked into gatsby-source-contentful. A site with 15k+ pages and a lot of content in rich text editor. Could this be another issue?

@pvdz
Copy link
Contributor

pvdz commented Mar 10, 2020

@nadiavanleur @j218 There was a typo (see last linked PR) which kind of mooted the fix in some cases. Can you confirm whether the problem still exists in >[email protected] ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug An issue or pull request relating to a bug in Gatsby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants