-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parcel 2.3.1, unknown reason segmentation fault on first build on m1 macos #7702
Comments
not confirming on parcel 2.3.1 @ node 17.5.0 x86-64 @ linux 5.16. |
not confirming on parcel 2.2.1 @ node 17.5.0 aarch64 macos m1. |
I am having the same problem but am able to build using the --no-optimize flag on my build. |
Try adding "scripts": {
"build": "rm -rf dist/ && PARCEL_WORKERS=0 parcel build --no-source-maps --log-level verbose",
"dev": "PARCEL_WORKERS=0 parcel --port 3000",
"start": "serve dist"
} |
Thanks. This worked for me. Is |
I've been dealing with this issue as well. Here is a minimal repro case for testing:
It definitely has something to do with multithreading because
|
If your project is large you might have better luck with |
I'm also seeing this on my M1 Macbook Pro (but other colleagues are not on an M1 yet) and here's a reproduction: https://github.com/LekoArts/parcel-segfault-repro It also has a segfault log from PID 28527 received SIGSEGV for address: 0xb428
0 segfault-handler.node 0x000000010cecd458 _ZL16segfault_handleriP9__siginfoPv + 272
1 libsystem_platform.dylib 0x00000001bc2204e4 _sigtramp + 56
2 node 0x00000001005e5250 _ZN2v811HandleScope10InitializeEPNS_7IsolateE + 40
3 node 0x00000001005e531c _ZN2v811HandleScopeC1EPNS_7IsolateE + 20
4 node.abi93.glibc.node 0x000000010e657b74 _ZN3Nan11AsyncWorker12WorkCompleteEv + 36
5 node.abi93.glibc.node 0x000000010e657ee4 _ZN3Nan20AsyncExecuteCompleteEP9uv_work_si + 32
6 node 0x0000000100cff144 uv__work_done + 192
7 node 0x0000000100d028a4 uv__async_io + 320
8 node 0x0000000100d145b8 uv__io_poll + 1052
9 node 0x0000000100d02d34 uv_run + 380
10 node 0x000000010052ac48 _ZN4node6worker16WorkerThreadDataD2Ev + 204
11 node 0x00000001005279a8 _ZN4node6worker6Worker3RunEv + 684
12 node 0x000000010052acfc _ZZN4node6worker6Worker11StartThreadERKN2v820FunctionCallbackInfoINS2_5ValueEEEEN3$_38__invokeEPv + 56
13 libsystem_pthread.dylib 0x00000001bc209240 _pthread_start + 148
14 libsystem_pthread.dylib 0x00000001bc204024 thread_start + 8 edit: I changed the repo to only use the Parcel JS API, not the whole gatsby process. The old repro is still accessible at the gatsby-version branch |
@kriszyp I did some more testing in my repro repository (https://github.com/artnez/parcel-segfault-repro). Reverting to 2.1.7 does indeed fix the problem. Next thing I did was npm link your repo to the above repo and used git bisect to track down the problematic commit. It was this one: kriszyp/lmdb-js@3158415 |
@kriszyp One more update. Apologies for doing it here but I'm in a hurry. I was able to work around the issue by hardcoding overlapping sync to false. This option seems to be the cause of the segfault and turning it off makes everything work again.
|
@artnez Thank you for the great bisection, really appreciate it! I haven't been able to reproduce this with https://github.com/valkum/parcel-segfault-test yet (I get errors regardless of what lmdb-js version I use). Maybe I will have more luck with yours. BTW, you didn't happen to try the latest master to see if it worked did you? (there were a couple of more fixes to improve memory safety, that I hadn't published yet, because never was able to reproduce anything yet). |
I just tested with kriszyp/lmdb-js@544b3fd and I was still getting the segfault. My test environment is an M1 Mac (ARM64) on macOS Monterey 12.2.1. |
@yw662 You might wanna change the issue title, I'm also seeing this in gatsby's CI which doesn't run an M1 mac |
From #7720 seeing this on a x64 system running linux. |
I have been able to reproduce this now (with @artnez repo), and debugging it, so hopefully narrowing in on cause/fix. |
@LekoArts I am still not seeing it on x64 linux though. It may or may not be the same issue. |
I was having errors deploying to netlify, and this script,via @SuttonJack seemed to have fixed it. |
… deeper NodeJS worker thread termination issues can be solved, parcel-bundler/parcel#7702
@kriszyp Hi! We've tried out |
@LekoArts that's great to hear, and yes, tl;dr, hopefully v2.2.2 addresses this issue for now. For a little longer story... lmdb-js@v2 introduced a faster mechanism for committing transactions whereby commits can be written and proceed, and then OS cached data is flushed to disk asynchronously, and a later event indicates when this is completed. Other users found this to be extremely performant and effective and so this was turned on by default in v2.2. However, this is when these segfaults started occurring in parcel. Initially I had assumed there must be some memory handling fault in this new async flush mechanism that was corrupting memory and leading to these segfaults. Many rabbit trails into verifying memory handling before the segfault, showed no problems with memory handling, everything was solid. Eventually I realized that there was no prior memory corruption, the error was occurring exactly where the reported segfault stack trace (that you/LekoArts reported) said it was occurring 😱 ! This stack trace shows that the segfault occurs in creating a V8 handle/scope. Why would that segfault? This goes deep into how NodeJS handles async tasks in worker threads. When a write transaction in LMDB is completed, there is a second task that goes into NodeJS/uv_lib's task queue to flush disk. In the meantime, since the transaction is committed, parcel can (rightly) declare the job is done, and asks to terminate the threads. Thread termination is a pretty perilous and complicated action though; it is not like terminating a process where the OS knows exactly what the process owns and can automatically completely clean it up, thread termination requires application level thread cooperation, and in the case of NodeJS, the thread termination has a specific procedure for what it will stop doing and what it won't. NodeJS's conception of thread termination means that it will finish executing its current JS task(s), and then end and free the V8 isolate associated with the worker thread, but does not wait for pending tasks in the task queue to finish. However, these tasks in this queue still continue to execute since it is a part of uv_lib's shared worker pool. Consequently, when one of these tasks completes (specifically the disk flush task), it then queues up the completion callback to execute, but that completion (JS) callback is set to execute against a V8 isolate that no longer exists (has been freed), which leads to the segmentation fault. These seems like a NAN bug, in that it attempts to call the callback regardless of the associated isolate's state. So what can be done about this? The most direct solution would be to override the NAN functions to avoid calling the callback when the worker thread is terminated (there is also a persistent handle that has to be nulled out as well), and this does actually seem to prevent the segfault in the provided test case. However, this solution does not seem to be foolproof; if the task goes long enough, not only does it extend beyond the life of the V8 isolate, but the thread termination procedure that shuts down the uv_lib event loop will sometimes crash reporting that there are open uv_lib handles. More research is needed, but using NAN's async tasks just doesn't seem capable of working well with thread termination. However, for [email protected], I have been working on porting all the code from NAN to NAPI (which has a more stable API and requires distribution of far fewer binaries), and this seems like an appropriate place to potentially replace the NAN async tasks with direct NAPI based async tasks that hopefully work better. As for the v2.2.x line, I have simply turned the new overlapping sync option off by default in worker threads. This is a temporary measure; I certainly hope to fully enable this by default in the future, but only after ensuring that the async tasks can really work reliably in conjunction with thread termination. |
Wow, what a great find. Thanks for researching and debugging that, @kriszyp! 😍 |
Going to close this issue since it appears to be fixed by newer lmdb. If you are still seeing it, make sure you update to lmdb 2.2.2 in your lock file. |
I am no longer seeing this issue. |
For me I removed the .parcel_cache directory and that worked. |
🐛 bug report
🎛 Configuration (.babelrc, package.json, cli command)
platform: node 17.5.0 aarch64, macos 12.2 apple m1
package.json:
no other config files.
cli command:
npx parcel --version; rm -rf .parcel-cache; npm run build
output:
It seems to generate the correct result though.
🤔 Expected Behavior
There should be no segmentation fault.
😯 Current Behavior
It crashes with segfault, although the generated results seem good.
💁 Possible Solution
idk
🔦 Context
It does not seem to affect me though.
💻 Code Sample
It is a simple demo site with plain html + sass + vanilla typescript, no extra dependencies as you can see.
🌍 Your Environment
The text was updated successfully, but these errors were encountered: