-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: exit sequence sanity tests [2] #25085
Conversation
It captured a new error - good from test's perspective, but makes it a do-not-land-yet candidate
source line: node/src/node_native_module.cc Line 102 in a361b94
native stack: (dbx) where
pthread_kill(??, ??) at 0x90000000051beb4
_p_raise(??) at 0x90000000051b6e8
raise.raise(??) at 0x90000000002bd4c
abort() at 0x9000000000827c4
_ZN4node5AbortEv() at 0x1001f2314
_ZN4node6AssertEPA4_KPKc(??) at 0x1001f23d0
_ZNK4node13native_module18NativeModuleLoader9GetSourceEPN2v87IsolateEPKc(??, ??, ??) at 0x100875330
_ZN4node13native_module18NativeModuleLoader16LookupAndCompileEN2v85LocalINS2_7ContextEEEPKcPSt6vectorINS3_INS2_6StringEEESaISA_EENS1_21CompilationResultTypeEPNS_11EnvironmentE(??, ??, ??, ??, ??, ??) at 0x100875e98
_ZN4node13native_module18NativeModuleLoader15CompileAsModuleEPNS_11EnvironmentEPKcNS1_21CompilationResultTypeE(??, ??, ??) at 0x1008768f4
_ZN4node13native_module18NativeModuleLoader15CompileFunctionERKN2v820FunctionCallbackInfoINS2_5ValueEEE(??) at 0x100876c28
builtins-api._ZN2v88internal12_GLOBAL__N_119HandleApiCallHelperILb0EEENS0_11MaybeHandleINS0_6ObjectEEEPNS0_7IsolateENS0_6HandleINS0_10HeapObjectEEESA_NS8_INS0_20FunctionTemplateInfoEEENS8_IS4_EENS0_16BuiltinArgumentsE(??, ??, ??, ??, ??, ??, ??) at 0x100048bf0 |
437f0d7
to
aa0d7fa
Compare
this can easily be accommodated in #25083, but needs to stay separate as:
|
|
||
// Allow workers to go live. | ||
setTimeout(() => { | ||
// process.exit(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove the commented out line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Trott - it was a mistake to comment that out! The abrupt exit is required for the thread interactions to come in to play. I opened it up now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so no wonder why CI did not report a single related failure. Realistically I was expecting one or two that this test can throw, based on its original run that threw up a lot!
Resume Build CI: https://ci.nodejs.org/job/node-test-commit/25439/ |
c17f254
to
4097ada
Compare
The test fails in AIX and Windows. AIX
windows
|
sorry, it was timeout on AIX! |
Interesting story from AIX: In a system where the no. of CPUs are less than 10 (the number of workers in this test), the already spawned workers spin their full allotted time in the JS loop to cause OOM, before main thread gets into action - mostly it was de-scheduled while one or more workers ran. So I want to revisit the |
@gireeshpunathil Is that normal for AIX/by design, non-preemptive multithreading? |
@addaleax - in terms of multi-tasking, there is nothing abnormal; AIX has a truly preemptive kernel. The difference comes in way of deciding / allocating the While no documented evidence exists, all my experiments has proven that:
cat tc.c #include <unistd.h>
#include <stdio.h>
int main() {
pid_t pid = fork();
if (pid == 0) write(1, "child\n", 6);
else write(1, "parent\n", 7);
} linux: clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f824a355a10) = 100129
strace: Process 100129 attached
[pid 100128] write(1, "parent\n", 7) = 7
[pid 100129] write(1, "child\n", 6) = 6 aix: 15270132: 100728997: kfork() = 18153586
18153586: kfork() (returning as child ...) = 0
18153586: 23462101: kwrite(1, " c h i l d\n", 6) = 6
18153586: 23462101: kfcntl(1, F_GETFL, 0x20000E0C) = 67110914
18153586: 23462101: kfcntl(2, F_GETFL, 0x2FF22FFC) = 67108865
18153586: 23462101: _exit(0)
15270132: 100728997: kwrite(1, " p a r e n t\n", 7) = 7 the same applies to In applications of most other languages this does not imply anything: slight difference in terms of CPU allocation, and who gets ahead slightly. But given that So in this case one or more workers ate up their remaining slice leading upto OOM before main thread gets a CPU. So I would like to revert the |
@gireeshpunathil Sure, if you think so. It just seemed pretty odd to me that AIX would act like this/that this could would OOM. |
Execute JS code in worker through same vm context while exiting from the main thread at arbitrary execution points, and make sure that the workers quiesce without crashing. `worker_threads` are not necessarily the subject of testing, those are used for easy simulation of multi-thread scenarios. Refs: nodejs#25007 PR-URL: nodejs#25085 Reviewed-By: Anna Henningsen <[email protected]>
29c7138
to
2ea2000
Compare
landed as 2ea2000 |
Execute JS code in worker through same vm context while exiting from the main thread at arbitrary execution points, and make sure that the workers quiesce without crashing. `worker_threads` are not necessarily the subject of testing, those are used for easy simulation of multi-thread scenarios. Refs: #25007 PR-URL: #25085 Reviewed-By: Anna Henningsen <[email protected]>
Execute JS code in worker through same vm context
while exiting from the main thread at arbitrary
execution points, and make sure that the workers
quiesce without crashing.
worker_threads
are not necessarily the subject oftesting, those are used for easy simulation of
multi-thread scenarios.
Refs: #25007
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes