Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: cctest MultipleEnvironmentsPerIsolate & AtExitWithArgument are flaky (no-snapshot) #14206

Closed
refack opened this issue Jul 12, 2017 · 22 comments
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. flaky-test Issues and PRs related to the tests with unstable failures on the CI. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. lib / src Issues and PRs related to general changes in the lib or src directory. test Issues and PRs related to the tests. v8 engine Issues and PRs related to the V8 dependency.

Comments

@refack
Copy link
Contributor

refack commented Jul 12, 2017

  • Version: master
  • Platform: multiple
  • Subsystem: test,V8

Moving nodejs/build#789 to core
The test has intermitently failed on:

  • ppcbe-ubuntu1404
  • armv7-wheezy
  • centos5-64
  • docker_alpine34-64

https://ci.nodejs.org/job/node-test-commit-plinux/10137/nodes=ppcbe-ubuntu1404/console
https://ci.nodejs.org/job/node-test-commit-plinux/10138/nodes=ppcbe-ubuntu1404/console
https://ci.nodejs.org/job/node-test-commit-plinux/10139/nodes=ppcbe-ubuntu1404/console
https://ci.nodejs.org/job/node-test-commit-plinux/10156/nodes=ppcbe-ubuntu1404/
https://ci.nodejs.org/job/node-test-commit-arm/10826/nodes=armv7-wheezy/console
https://ci.nodejs.org/job/node-test-commit-plinux/10165/nodes=ppcbe-ubuntu1404/console
https://ci.nodejs.org/job/node-test-commit-plinux/10166/nodes=ppcbe-ubuntu1404/console
https://ci.nodejs.org/job/node-test-commit-linux/11117/nodes=centos5-64
https://ci.nodejs.org/job/node-test-commit-plinux/10172/nodes=ppcbe-ubuntu1404/console
https://ci.nodejs.org/job/node-test-commit-linux/11130/nodes=ubuntu1604_docker_alpine34-64/console

One of the more interesting outputs (from docker_alpine34-64):

[----------] 3 tests from EnvironmentTest
[ RUN      ] EnvironmentTest.AtExitWithEnvironment
[       OK ] EnvironmentTest.AtExitWithEnvironment (384 ms)
[ RUN      ] EnvironmentTest.AtExitWithArgument
[       OK ] EnvironmentTest.AtExitWithArgument (388 ms)
[ RUN      ] EnvironmentTest.MultipleEnvironmentsPerIsolate


#
# Fatal error in , line 0
# API fatal error handler returned after process out of memory
#
Received signal 4 ILL_ILLOPN 55c3c9aafe69

The more common output:

[----------] 3 tests from EnvironmentTest
[ RUN      ] EnvironmentTest.AtExitWithEnvironment
[       OK ] EnvironmentTest.AtExitWithEnvironment (362 ms)
[ RUN      ] EnvironmentTest.AtExitWithArgument
[       OK ] EnvironmentTest.AtExitWithArgument (354 ms)
[ RUN      ] EnvironmentTest.MultipleEnvironmentsPerIsolate
Build was aborted

This is the test code

TEST_F(EnvironmentTest, MultipleEnvironmentsPerIsolate) {
const v8::HandleScope handle_scope(isolate_);
const Argv argv;
Env env1 {handle_scope, isolate_, argv};
Env env2 {handle_scope, isolate_, argv};

AtExit(*env1, at_exit_callback1);
AtExit(*env2, at_exit_callback2);
RunAtExit(*env1);
EXPECT_TRUE(called_cb_1);
EXPECT_FALSE(called_cb_2);

RunAtExit(*env2);
EXPECT_TRUE(called_cb_2);
}

From
test/cctest/test_environment.cc

@refack refack added c++ Issues and PRs that require attention from people who are familiar with C++. flaky-test Issues and PRs related to the tests with unstable failures on the CI. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. lib / src Issues and PRs related to general changes in the lib or src directory. test Issues and PRs related to the tests. v8 engine Issues and PRs related to the V8 dependency. labels Jul 12, 2017
@addaleax
Copy link
Member

Quoting #13861 (comment):

Running the cctest locally makes valgrind complain in a way that points to us not cleaning up libuv handles properly in the multi-Environment tests, similar to the problem fixed by d5db4d2. If I’m right (and valgrind complaining should be a good sign for that), this is not a problem that’s going to effect regular users, and more likely to be a problem with the test itself.

I’ve also previously asked whether it’s possible to get a core dump from that machine to make sure; I think you might be the best person to have an answer?

I think all we need is somebody who has the time to dig into the valgrind logs to figure this out.

@Trott
Copy link
Member

Trott commented Jul 13, 2017

Still happening, of course: https://ci.nodejs.org/job/node-test-commit-plinux/10186/nodes=ppcbe-ubuntu1404/console

Is anyone on this? Anyone especially sensible to loop in on it?

@refack
Copy link
Contributor Author

refack commented Jul 13, 2017

I'll try on Windows, but !@#$@%#$%^ MMMV (it might not even be a problem here 🤷‍♂️ )

@refack
Copy link
Contributor Author

refack commented Jul 13, 2017

We can try to call @indutny to the flag (it's partly his fault, JK nothing but   ❤️ )

@mhdawson
Copy link
Member

mhdawson commented Jul 13, 2017

Created this issue to request access to the BE machine were it recreates most easily for @jBarz : nodejs/build#795

He's tried to reproduce locally with no luck.

@jBarz
Copy link
Contributor

jBarz commented Jul 13, 2017

I can reproduce on build machine. However, it is intermittent (about once in 5 tries)

@mhdawson
Copy link
Member

I wondering if we should exclude this test until we can figure out what is going on. It seems to happen quire regularly for PPC and alpine.

@jBarz
Copy link
Contributor

jBarz commented Jul 14, 2017

Continuing to investigate. So far I have found it to hang at the following location in api.cc from v8 when initializing the isolate.

Isolate::Scope isolate_scope(v8_isolate);
 if (params.entry_hook || !i::Snapshot::Initialize(isolate)) {
   isolate->Init(NULL);  // <--- Hangs here
 }

@refack
Copy link
Contributor Author

refack commented Jul 14, 2017

I wondering if we should exclude this test until we can figure out what is going on. It seems to happen quire regularly for PPC and alpine.

If we have a way to make sure it doesn't get neglected, I'm +1

@mhdawson
Copy link
Member

Any objects to excluding the test ? If I don't see anybody and nobody beats me too it I'll look at a PR for excluding on Monday.

@Trott
Copy link
Member

Trott commented Jul 15, 2017

Any objects to excluding the test ? If I don't see anybody and nobody beats me too it I'll look at a PR for excluding on Monday.

We have Code + Learn in Shanghai in about 30 hours. Hard to say how things will or won't work (lots of variables at play here that we haven't dealt with before), but I would expect at least 25 and probably closer to 50 PRs to result from it. A green CI would be terribly useful. So I'm going to go ahead and remove the affected hosts from CI before Code + Learn if no one objects.

/cc @nodejs/build

@jBarz
Copy link
Contributor

jBarz commented Jul 15, 2017

It seems like it hangs on trying to acquire the mutex to dispatch a job to the CompilerDispatcher class.

#define CODE_EVENT_DISPATCH(code)              \
  base::LockGuard<base::Mutex> guard(&mutex_); \
  for (auto it = listeners_.begin(); it != listeners_.end(); ++it) (*it)->code

Also, I cannot reproduce the error when I run the test in isolation:

out/Release/cctest --gtest_filter=EnvironmentTest.MultipleEnvironmentsPerIsolate

But I can reproduce it when I also run the test prior to it:

out/Release/cctest --gtest_filter=EnvironmentTest.AtExitWithArgument:EnvironmentTest.MultipleEnvironmentsPerIsolate

@Trott
Copy link
Member

Trott commented Jul 16, 2017

But I can reproduce it when I also run the test prior to it:

@jBarz The side effects leaking between tests is probably what @addaleax describes in #14206 (comment), yes?

Trott pushed a commit that referenced this issue Jul 16, 2017
This is a temporary measure until the issue is fixed.

PR-URL: #14246
Refs: #14206
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
@jBarz
Copy link
Contributor

jBarz commented Jul 16, 2017

I agree that there is a side effect from the preceding test case. However, not sure if the source of the leak is libuv or v8 itself.

addaleax pushed a commit that referenced this issue Jul 18, 2017
This is a temporary measure until the issue is fixed.

PR-URL: #14246
Refs: #14206
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Fishrock123 pushed a commit that referenced this issue Jul 19, 2017
This is a temporary measure until the issue is fixed.

PR-URL: #14246
Refs: #14206
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
@refack
Copy link
Contributor Author

refack commented Jul 19, 2017

@jBarz any news?

@refack
Copy link
Contributor Author

refack commented Jul 22, 2017

This seems to creep up and actually segfaults AtExitWithArgument:

[----------] 2 tests from EnvironmentTest
[ RUN      ] EnvironmentTest.AtExitWithEnvironment
[       OK ] EnvironmentTest.AtExitWithEnvironment (483 ms)
[ RUN      ] EnvironmentTest.AtExitWithArgument
Makefile:354: recipe for target 'test-ci' failed
make[1]: *** [test-ci] Segmentation fault (core dumped)
Makefile:529: recipe for target 'run-ci' failed
make: *** [run-ci] Error 2

https://ci.nodejs.org/job/node-test-commit-linux/11359/nodes=ubuntu1604-32/

@refack refack changed the title test: cctest:EnvironmentTest.MultipleEnvironmentsPerIsolate flakes (post snapshot) test: cctest MultipleEnvironmentsPerIsolate & AtExitWithArgument are flaky (no-snapshot) Jul 22, 2017
@refack
Copy link
Contributor Author

refack commented Jul 30, 2017

@jBarz @mhdawson any news? (I promised not to let this fall between the cracks)

@jBarz
Copy link
Contributor

jBarz commented Jul 31, 2017

Hi @refack , I was on vacation for a week. I am going to look at this either today/tomorrow and give an update asap.

@mhdawson
Copy link
Member

mhdawson commented Aug 1, 2017

@refack was just looking at the issue to see where we were, thanks for keeping an eye on it.

@refack
Copy link
Contributor Author

refack commented Aug 7, 2017

@jBarz @mhdawson
a ping a week
and performance is peak
(sorry not a poet)

@jBarz
Copy link
Contributor

jBarz commented Aug 8, 2017

hey @refack Last week was extremely busy with other things.
I am going to try to get this resolved this week.

addaleax added a commit to addaleax/node that referenced this issue Aug 10, 2017
The `Environment::destroy_ids_timer_handle` should be cleaned up
by `Environment::CleanupHandles()`. Fix that by adding it to the list.

This partially fixes a cctest.

Ref: nodejs#14206
addaleax added a commit to addaleax/node that referenced this issue Aug 10, 2017
The `IsolateData` instance is created before the `Environment` instance,
so free in reverse order.

Fixes: nodejs#14206
addaleax added a commit to addaleax/node that referenced this issue Aug 10, 2017
addaleax added a commit to addaleax/node that referenced this issue Aug 10, 2017
@addaleax addaleax mentioned this issue Aug 10, 2017
3 tasks
@addaleax
Copy link
Member

Fix is in #14749

addaleax added a commit that referenced this issue Aug 10, 2017
The `Environment::destroy_ids_timer_handle` should be cleaned up
by `Environment::CleanupHandles()`. Fix that by adding it to the list.

This partially fixes a cctest.

Ref: #14206
PR-URL: #14749
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
addaleax added a commit that referenced this issue Aug 10, 2017
This reverts commit 75bf8a9.

Ref: #14206
Ref: #14317
PR-URL: #14749
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
addaleax added a commit that referenced this issue Aug 10, 2017
This reverts commit 95ab966.

Ref: #14206
Ref: #14246
PR-URL: #14749
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
addaleax added a commit that referenced this issue Aug 10, 2017
The `Environment::destroy_ids_timer_handle` should be cleaned up
by `Environment::CleanupHandles()`. Fix that by adding it to the list.

This partially fixes a cctest.

Ref: #14206
PR-URL: #14749
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
addaleax added a commit that referenced this issue Aug 10, 2017
The `IsolateData` instance is created before the `Environment` instance,
so free in reverse order.

Fixes: #14206
PR-URL: #14749
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
addaleax added a commit that referenced this issue Aug 10, 2017
This reverts commit 75bf8a9.

Ref: #14206
Ref: #14317
PR-URL: #14749
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
addaleax added a commit that referenced this issue Aug 10, 2017
This reverts commit 95ab966.

Ref: #14206
Ref: #14246
PR-URL: #14749
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. flaky-test Issues and PRs related to the tests with unstable failures on the CI. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. lib / src Issues and PRs related to general changes in the lib or src directory. test Issues and PRs related to the tests. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests

5 participants