Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: fix near heap limit callback #42636

Closed
wants to merge 1 commit into from

Conversation

joyeecheung
Copy link
Member

  • Use the allocated space size to calculate the raised heap
    limit, as that is what V8 uses to determine whether it
    should crash - previously we use the used size for the
    calculation and that was too conservative and did not
    prevent the crashes effectively enough.
  • Use RequestInterrupt() to take the snapshot since we
    need to make sure that the heap limit is raised first
    before the snapshot can be taken.

Refs: #42115
Fixes: nodejs/node-v8#218

- Use the allocated space size to calculate the raised heap
  limit, as that is what V8 uses to determine whether it
  should crash - previously we use the used size for the
  calculation and that was too conservative and did not
  prevent the crashes effectively enough.
- Use RequestInterrupt() to take the snapshot since we
  need to make sure that the heap limit is raised first
  before the snapshot can be taken.
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Apr 7, 2022
@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 7, 2022

cc @targos @codebytere

This fixes the crash locally for me, with or without the V8 update. I have another fix in V8 for this issue (https://chromium-review.googlesource.com/c/v8/v8/+/3575468) but when testing it I realized that we should just use RequestInterrupt to take the snapshot, because even when the assertions are lifted, the current behavior still relies on that V8 allows the callback to be nested so we have a chance to raise the limit when it's invoked again from the GC it triggers, which seems more complicated than necessary if we can just put the snapshot code in a RequestInterrupt callback and go ahead raising the limit the first time the NearHeapLimitCallback is triggered.

@nodejs-github-bot
Copy link
Collaborator

Comment on lines +1578 to +1580
// space_size() returns the allocated size of a given space,
// we use this to calculate the new limit because V8 also
// uses the allocated size to determine whether it should crash.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// space_size() returns the allocated size of a given space,
// we use this to calculate the new limit because V8 also
// uses the allocated size to determine whether it should crash.
// space_size() returns the allocated size of a given space.
// We use this to calculate the new limit because V8 also
// uses the allocated size to determine whether it should crash.

uint64_t estimated_overhead = young_gen_size;
// The new limit must be higher than current_heap_limit or V8 might
// crash.
uint64_t minimun_new_limit = static_cast<uint64_t>(current_heap_limit + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: minimum_new_limit

@targos
Copy link
Member

targos commented Apr 12, 2022

@joyeecheung Are we waiting for the V8 patch instead of this PR?

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 12, 2022

@targos If we land the V8 fix while still allowing nesting (i.e. the old behavior) then this can land later (the V8 fix alone should be enough to unblock the V8 update without this patch). If the V8 fix disallows nesting this has to be land first and v8/node-ci needs to pick this up to unbreak the V8 CI before the V8 patch can then land. Given that we want the V8 update to unblock sooner I think the first option is better and the reviewer in the V8 CL seems to agree - you could chime in in the CL too about what you think is better, thanks

@joyeecheung
Copy link
Member Author

https://chromium-review.googlesource.com/c/v8/v8/+/3575468 has landed, I think we can fix the TODOs here once our copy of V8 has that fix.

@joyeecheung
Copy link
Member Author

Oops, sorry, we cannot address the TODOs here without another patch in V8, so either way this needs to go first to avoid breaking V8's CI. I will do some test to see if putting the snapshot request behind an interrupt affects the precision of the snapshot.

@joyeecheung
Copy link
Member Author

I just remembered about this PR. So from what I can remember, the status is:

  1. After https://chromium-review.googlesource.com/c/v8/v8/+/3575468 it's okay for us to continue taking snapshots directly from the near heap limit callback without an interrupt dance. The V8 team is also okay with maintaining this for us.
  2. The current leeway we add (which is being raised here a bit more) is still too conservative. I didn't have a clear idea about the reason back then but with node process killed by os during heap snapshot due to OOM #50711 and the investigation we did in https://v8.dev/blog/speeding-up-v8-heap-snapshots I am fairly certain that it was caused by the line-ends-population that V8 does in heap snapshot generation.

I think this one can be closed now because of 1. I will open a separate PR to make the extra leeway less conservative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GC crash in Environment::NearHeapLimitCallback with latest V8
4 participants