Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parcel 2.3.1, unknown reason segmentation fault on first build on m1 macos #7702

Closed
yw662 opened this issue Feb 11, 2022 · 24 comments
Closed

Comments

@yw662
Copy link

yw662 commented Feb 11, 2022

🐛 bug report

🎛 Configuration (.babelrc, package.json, cli command)

platform: node 17.5.0 aarch64, macos 12.2 apple m1

package.json:

{
  "private": true,
  "source": "src/index.html",
  "scripts": {
    "build": "rm -rf dist/ && parcel build --no-source-maps --log-level verbose",
    "dev": "parcel --port 3000",
    "start": "serve dist"
  },
  "devDependencies": {
    "@parcel/transformer-sass": "latest",
    "parcel": "latest",
    "serve": "latest"
  }
}

no other config files.

cli command: npx parcel --version; rm -rf .parcel-cache; npm run build

output:

2.3.1

> build
> rm -rf dist/ && parcel build --no-source-maps --log-level verbose

✨ Built in 758ms

dist/index.html              2.62 KB    156ms
dist/favicon.4397b7fe.svg      787 B    147ms
dist/index.eb21fdc4.css      3.18 KB    310ms
dist/index.35aa3a8b.js       6.25 KB    113ms
sh: line 1: 10246 Segmentation fault: 11  parcel build --no-source-maps --log-level verbose

It seems to generate the correct result though.

🤔 Expected Behavior

There should be no segmentation fault.

😯 Current Behavior

It crashes with segfault, although the generated results seem good.

💁 Possible Solution

idk

🔦 Context

It does not seem to affect me though.

💻 Code Sample

It is a simple demo site with plain html + sass + vanilla typescript, no extra dependencies as you can see.

🌍 Your Environment

Software Version(s)
Parcel 2.3.1
Node 17.5.0 aarch64
npm/Yarn 8.5.0
Operating System macos 12.2 on apple m1
@yw662
Copy link
Author

yw662 commented Feb 11, 2022

not confirming on parcel 2.3.1 @ node 17.5.0 x86-64 @ linux 5.16.
Seems to be macos or aarch64 related.

@yw662
Copy link
Author

yw662 commented Feb 11, 2022

not confirming on parcel 2.2.1 @ node 17.5.0 aarch64 macos m1.

@yw662 yw662 changed the title Parcel 2.3.1, unknown reason segmentation fault on first build Parcel 2.3.1, unknown reason segmentation fault on first build on macos m1 Feb 11, 2022
@yw662 yw662 changed the title Parcel 2.3.1, unknown reason segmentation fault on first build on macos m1 Parcel 2.3.1, unknown reason segmentation fault on first build on m1 macos Feb 11, 2022
@cynthiateeters
Copy link

I am having the same problem but am able to build using the --no-optimize flag on my build.

@SuttonJack
Copy link

Try adding PARCEL_WORKERS=0 to your commands,

"scripts": {
  "build": "rm -rf dist/ && PARCEL_WORKERS=0 parcel build --no-source-maps --log-level verbose",
  "dev": "PARCEL_WORKERS=0 parcel --port 3000",
  "start": "serve dist"
 }

@cynthiateeters
Copy link

Try adding PARCEL_WORKERS=0 to your commands,

"scripts": {
  "build": "rm -rf dist/ && PARCEL_WORKERS=0 parcel build --no-source-maps --log-level verbose",
  "dev": "PARCEL_WORKERS=0 parcel --port 3000",
  "start": "serve dist"
 }

Thanks. This worked for me. Is PARCEL_WORKERS=0 discussed in the documentation?

@artnez
Copy link

artnez commented Feb 13, 2022

I've been dealing with this issue as well. Here is a minimal repro case for testing:
https://github.com/artnez/parcel-segfault-repro

$ sw_vers
ProductName:	macOS
ProductVersion:	12.2.1
BuildVersion:	21D62

$ node --version
v17.5.0

$ npx parcel --version
2.3.1

It definitely has something to do with multithreading because PARCEL_WORKER_BACKEND=process (switching to subprocess workers) fixes it. The trace also indicates so:

PID 69840 received SIGSEGV for address: 0xb4f8
0   segfault-handler.node               0x00000001035725f8 _ZL16segfault_handleriP9__siginfoPv + 252
1   libsystem_platform.dylib            0x00000001a22144e4 _sigtramp + 56
2   node                                0x000000010129c160 _ZN2v811HandleScopeC1EPNS_7IsolateE + 20
3   node                                0x000000010129c160 _ZN2v811HandleScopeC1EPNS_7IsolateE + 20
4   node.abi102.glibc.node              0x0000000103d70018 _ZN3Nan11AsyncWorker12WorkCompleteEv + 36
5   node.abi102.glibc.node              0x0000000103d70388 _ZN3Nan20AsyncExecuteCompleteEP9uv_work_si + 32
6   libuv.1.dylib                       0x000000010346b8c0 uv__work_done + 192
7   libuv.1.dylib                       0x000000010346ec38 uv__async_io + 320
8   libuv.1.dylib                       0x000000010347e458 uv__io_poll + 1592
9   libuv.1.dylib                       0x000000010346f058 uv_run + 320
10  node                                0x00000001011dd17c _ZN4node6worker16WorkerThreadDataD2Ev + 212
11  node                                0x00000001011dc914 _ZN4node6worker6Worker3RunEv + 1316
12  node                                0x00000001011dedd0 _ZZN4node6worker6Worker11StartThreadERKN2v820FunctionCallbackInfoINS2_5ValueEEEEN3$_38__invokeEPv + 56
13  libsystem_pthread.dylib             0x00000001a21fd240 _pthread_start + 148
14  libsystem_pthread.dylib             0x00000001a21f8024 thread_start + 8
Segmentation fault: 11

@artnez
Copy link

artnez commented Feb 13, 2022

PARCEL_WORKER_BACKEND=process

If your project is large you might have better luck with PARCEL_WORKER_BACKEND=process so that you get some multi processing. PARCEL_WORKERS=0 will probably do everything serially.

@LekoArts
Copy link
Contributor

LekoArts commented Feb 14, 2022

I'm also seeing this on my M1 Macbook Pro (but other colleagues are not on an M1 yet) and here's a reproduction: https://github.com/LekoArts/parcel-segfault-repro

It also has a segfault log from segfault-handler:

PID 28527 received SIGSEGV for address: 0xb428
0   segfault-handler.node               0x000000010cecd458 _ZL16segfault_handleriP9__siginfoPv + 272
1   libsystem_platform.dylib            0x00000001bc2204e4 _sigtramp + 56
2   node                                0x00000001005e5250 _ZN2v811HandleScope10InitializeEPNS_7IsolateE + 40
3   node                                0x00000001005e531c _ZN2v811HandleScopeC1EPNS_7IsolateE + 20
4   node.abi93.glibc.node               0x000000010e657b74 _ZN3Nan11AsyncWorker12WorkCompleteEv + 36
5   node.abi93.glibc.node               0x000000010e657ee4 _ZN3Nan20AsyncExecuteCompleteEP9uv_work_si + 32
6   node                                0x0000000100cff144 uv__work_done + 192
7   node                                0x0000000100d028a4 uv__async_io + 320
8   node                                0x0000000100d145b8 uv__io_poll + 1052
9   node                                0x0000000100d02d34 uv_run + 380
10  node                                0x000000010052ac48 _ZN4node6worker16WorkerThreadDataD2Ev + 204
11  node                                0x00000001005279a8 _ZN4node6worker6Worker3RunEv + 684
12  node                                0x000000010052acfc _ZZN4node6worker6Worker11StartThreadERKN2v820FunctionCallbackInfoINS2_5ValueEEEEN3$_38__invokeEPv + 56
13  libsystem_pthread.dylib             0x00000001bc209240 _pthread_start + 148
14  libsystem_pthread.dylib             0x00000001bc204024 thread_start + 8

edit: I changed the repo to only use the Parcel JS API, not the whole gatsby process. The old repro is still accessible at the gatsby-version branch

@artnez
Copy link

artnez commented Feb 16, 2022

It looks like pinning lmdb to a lower version fixes the problem (per #7720). I pinned it 2.0.2 in my repro repository above and everything is working again. Linked thread suggests 2.1.7 but I haven't tried that yet.

cc @kriszyp since this thread has traces.

@artnez
Copy link

artnez commented Feb 17, 2022

@kriszyp I did some more testing in my repro repository (https://github.com/artnez/parcel-segfault-repro). Reverting to 2.1.7 does indeed fix the problem.

Next thing I did was npm link your repo to the above repo and used git bisect to track down the problematic commit. It was this one: kriszyp/lmdb-js@3158415

@artnez
Copy link

artnez commented Feb 17, 2022

@kriszyp One more update. Apologies for doing it here but I'm in a hurry. I was able to work around the issue by hardcoding overlapping sync to false. This option seems to be the cause of the segfault and turning it off makes everything work again.

$ git diff
diff --git a/open.js b/open.js
index cd761caec..480ec8e0d 100644
--- a/open.js
+++ b/open.js
@@ -49,7 +49,7 @@ export function open(path, options) {
                remapChunks,
                keyBytes,
                pageSize: 4096,
-               overlappingSync: (options && (options.noSync || options.readOnly)) ? false : os != 'win32',
+               overlappingSync: false,
                // default map size limit of 4 exabytes when using remapChunks, since it is not preallocated and we can
                // make it super huge.
                mapSize: remapChunks ? 0x10000000000000 :

@kriszyp
Copy link
Contributor

kriszyp commented Feb 17, 2022

@artnez Thank you for the great bisection, really appreciate it! I haven't been able to reproduce this with https://github.com/valkum/parcel-segfault-test yet (I get errors regardless of what lmdb-js version I use). Maybe I will have more luck with yours. BTW, you didn't happen to try the latest master to see if it worked did you? (there were a couple of more fixes to improve memory safety, that I hadn't published yet, because never was able to reproduce anything yet).

@artnez
Copy link

artnez commented Feb 17, 2022

I just tested with kriszyp/lmdb-js@544b3fd and I was still getting the segfault. My test environment is an M1 Mac (ARM64) on macOS Monterey 12.2.1.

@LekoArts
Copy link
Contributor

LekoArts commented Feb 17, 2022

@yw662 You might wanna change the issue title, I'm also seeing this in gatsby's CI which doesn't run an M1 mac

@valkum
Copy link

valkum commented Feb 17, 2022

From #7720 seeing this on a x64 system running linux.

@kriszyp
Copy link
Contributor

kriszyp commented Feb 17, 2022

I have been able to reproduce this now (with @artnez repo), and debugging it, so hopefully narrowing in on cause/fix.

@yw662
Copy link
Author

yw662 commented Feb 17, 2022

@LekoArts I am still not seeing it on x64 linux though. It may or may not be the same issue.

@maiya-22
Copy link

"scripts": {
  "build": "rm -rf dist/ && PARCEL_WORKERS=0 parcel build --no-source-maps --log-level verbose",
 }

I was having errors deploying to netlify, and this script,via @SuttonJack seemed to have fixed it.

kriszyp added a commit to kriszyp/lmdb-js that referenced this issue Feb 22, 2022
… deeper NodeJS worker thread termination issues can be solved, parcel-bundler/parcel#7702
@LekoArts
Copy link
Contributor

@kriszyp Hi! We've tried out [email protected] in our application of Parcel inside Gatsby and I'm not seeing any segfaults anymore :)

@kriszyp
Copy link
Contributor

kriszyp commented Feb 22, 2022

@LekoArts that's great to hear, and yes, tl;dr, hopefully v2.2.2 addresses this issue for now. For a little longer story...

lmdb-js@v2 introduced a faster mechanism for committing transactions whereby commits can be written and proceed, and then OS cached data is flushed to disk asynchronously, and a later event indicates when this is completed. Other users found this to be extremely performant and effective and so this was turned on by default in v2.2. However, this is when these segfaults started occurring in parcel.

Initially I had assumed there must be some memory handling fault in this new async flush mechanism that was corrupting memory and leading to these segfaults. Many rabbit trails into verifying memory handling before the segfault, showed no problems with memory handling, everything was solid. Eventually I realized that there was no prior memory corruption, the error was occurring exactly where the reported segfault stack trace (that you/LekoArts reported) said it was occurring 😱 !

This stack trace shows that the segfault occurs in creating a V8 handle/scope. Why would that segfault? This goes deep into how NodeJS handles async tasks in worker threads. When a write transaction in LMDB is completed, there is a second task that goes into NodeJS/uv_lib's task queue to flush disk. In the meantime, since the transaction is committed, parcel can (rightly) declare the job is done, and asks to terminate the threads. Thread termination is a pretty perilous and complicated action though; it is not like terminating a process where the OS knows exactly what the process owns and can automatically completely clean it up, thread termination requires application level thread cooperation, and in the case of NodeJS, the thread termination has a specific procedure for what it will stop doing and what it won't. NodeJS's conception of thread termination means that it will finish executing its current JS task(s), and then end and free the V8 isolate associated with the worker thread, but does not wait for pending tasks in the task queue to finish. However, these tasks in this queue still continue to execute since it is a part of uv_lib's shared worker pool. Consequently, when one of these tasks completes (specifically the disk flush task), it then queues up the completion callback to execute, but that completion (JS) callback is set to execute against a V8 isolate that no longer exists (has been freed), which leads to the segmentation fault. These seems like a NAN bug, in that it attempts to call the callback regardless of the associated isolate's state.

So what can be done about this? The most direct solution would be to override the NAN functions to avoid calling the callback when the worker thread is terminated (there is also a persistent handle that has to be nulled out as well), and this does actually seem to prevent the segfault in the provided test case. However, this solution does not seem to be foolproof; if the task goes long enough, not only does it extend beyond the life of the V8 isolate, but the thread termination procedure that shuts down the uv_lib event loop will sometimes crash reporting that there are open uv_lib handles. More research is needed, but using NAN's async tasks just doesn't seem capable of working well with thread termination. However, for [email protected], I have been working on porting all the code from NAN to NAPI (which has a more stable API and requires distribution of far fewer binaries), and this seems like an appropriate place to potentially replace the NAN async tasks with direct NAPI based async tasks that hopefully work better.

As for the v2.2.x line, I have simply turned the new overlapping sync option off by default in worker threads. This is a temporary measure; I certainly hope to fully enable this by default in the future, but only after ensuring that the async tasks can really work reliably in conjunction with thread termination.

@devongovett
Copy link
Member

Wow, what a great find. Thanks for researching and debugging that, @kriszyp! 😍

@devongovett
Copy link
Member

Going to close this issue since it appears to be fixed by newer lmdb. If you are still seeing it, make sure you update to lmdb 2.2.2 in your lock file.

@yw662
Copy link
Author

yw662 commented Feb 27, 2022

I am no longer seeing this issue.

@Sambuxc
Copy link

Sambuxc commented Nov 28, 2023

For me I removed the .parcel_cache directory and that worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants