Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault of node 10.8.0 x64 on Fedora 27 when running from PATH #22319

Closed
rocketraman opened this issue Aug 14, 2018 · 24 comments
Closed

Comments

@rocketraman
Copy link

rocketraman commented Aug 14, 2018

  • Version:
node --version
v10.8.0
  • Platform:
$ cat /etc/os-release
NAME=Fedora
VERSION="27 (Twenty Seven)"
ID=fedora
VERSION_ID=27
PRETTY_NAME="Fedora 27 (Twenty Seven)"
ANSI_COLOR="0;34"
CPE_NAME="cpe:/o:fedoraproject:fedora:27"
HOME_URL="https://fedoraproject.org/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=27
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=27
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
$ uname -a
Linux edison 4.17.12-100.fc27.x86_64 #1 SMP Fri Aug 3 15:00:33 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem: Node

I get a segfault just by running node -- no arguments, just trying to get an interactive shell:

$ type node
node is /opt/node-v10.8.0-linux-x64/bin/node

$ node
fish: “node” terminated by signal SIGSEGV (Address boundary error)

Here is the output from coredumpctl:

           PID: 91027 (node)
           UID: 1000 (raman)
           GID: 1000 (raman)
        Signal: 11 (SEGV)
     Timestamp: Tue 2018-08-14 12:06:28 EDT (5s ago)
  Command Line: node
    Executable: /opt/node-v10.8.0-linux-x64/bin/node
 Control Group: /user.slice/user-1000.slice/session-2.scope
          Unit: session-2.scope
         Slice: user-1000.slice
       Session: 2
     Owner UID: 1000 (raman)
       Boot ID: 6d5855baaf2c4f51ad2472a1a4631ae8
    Machine ID: ced7a5bf77e545c28c0ba7b313e98ca1
      Hostname: edison
       Storage: /var/lib/systemd/coredump/core.node.1000.6d5855baaf2c4f51ad2472a1a4631ae8.91027.1534262788000000.lz4
       Message: Process 91027 (node) of user 1000 dumped core.
                
                Stack trace of thread 91027:
                #0  0x00007ff3938b8c5a _ZN2v816FunctionTemplate3NewEPNS_7IsolateEPFvRKNS_20FunctionCallbackInfoINS_5ValueEEEENS_5LocalIS4_EENSA_INS_9SignatureEEEi (libnode.so)
                #1  0x00007ff393e52481 _ZN4node17ContextifyContext4InitEPNS_11EnvironmentEN2v85LocalINS3_6ObjectEEE (libnode.so)
                #2  0x00007ff393e52443 _ZN4node14InitContextifyEN2v85LocalINS0_6ObjectEEENS1_INS0_5ValueEEENS1_INS0_7ContextEEE (libnode.so)
                #3  0x00000000008baff3 _ZN4nodeL10GetBindingERKN2v820FunctionCallbackInfoINS0_5ValueEEE (node)
                #4  0x0000000000b4ee49 _ZN2v88internal12_GLOBAL__N_119HandleApiCallHelperILb0EEENS0_11MaybeHandleINS0_6ObjectEEEPNS0_7IsolateENS0_6HandleINS0_10HeapObjectEEESA_NS8_INS0_20FunctionTemplateInfoEEENS8_IS4_EENS0_16B
                #5  0x0000000000b4f9b9 _ZN2v88internal21Builtin_HandleApiCallEiPPNS0_6ObjectEPNS0_7IsolateE (node)
                #6  0x00001fb8db2841bd n/a (n/a)
                #7  0x00001fb8db293a09 n/a (n/a)
                #8  0x00001fb8db293a09 n/a (n/a)
                #9  0x00001fb8db290f95 n/a (n/a)
                #10 0x00001fb8db28a4a1 n/a (n/a)
                #11 0x0000000000e597f3 _ZN2v88internal9Execution4CallEPNS0_7IsolateENS0_6HandleINS0_6ObjectEEES6_iPS6_ (node)
                #12 0x0000000000ae84b3 _ZN2v88Function4CallENS_5LocalINS_7ContextEEENS1_INS_5ValueEEEiPS5_ (node)
                #13 0x00000000008be828 _ZN4node15LoadEnvironmentEPNS_11EnvironmentE (node)
                #14 0x00000000008c2c96 _ZN4node5StartEPN2v87IsolateEPNS_11IsolateDataEiPKPKciS8_ (node)
                #15 0x00000000008c191f _ZN4node5StartEiPPc (node)
                #16 0x00007ff396579f2a __libc_start_main (libc.so.6)
                #17 0x000000000087dbe5 _start (node)

This is running node directly from the tar.xz download on nodejs.org (sha256 sum verified).

Node 8 and 9 work without any issues.

@rocketraman
Copy link
Author

rocketraman commented Aug 14, 2018

Just tried node-v10.8.1-nightly2018081382830a809b-linux-x64, and it works fine. So it seems this issue was already fixed somewhere along the way.

@rocketraman
Copy link
Author

Update, ok this is super-strange... never mind my last comment, the nightly has the same issue. It appears that the problem happens only when node is executed via a directory in the PATH:

Works fine when running node with the full path specified:

$ strace -e trace=file /opt/node-v10.8.0-linux-x64/bin/node
execve("/opt/node-v10.8.0-linux-x64/bin/node", ["/opt/node-v10.8.0-linux-x64/bin/"...], 0x7ffcae816b30 /* 125 vars */) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/opt/node-v10.8.0-linux-x64/bin/node", O_RDONLY|O_CLOEXEC) = 9
openat(AT_FDCWD, "/dev/urandom", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 9
readlink("/proc/self/exe", "/opt/node-v10.8.0-linux-x64/bin/"..., 8191) = 36
readlink("/proc/self/fd/1", "/dev/pts/16", 255) = 11
stat("/dev/pts/16", {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 16), ...}) = 0
openat(AT_FDCWD, "/dev/pts/16", O_RDWR|O_CLOEXEC) = 9
openat(AT_FDCWD, "/dev/null", O_RDONLY|O_CLOEXEC) = 10
readlink("/proc/self/fd/2", "/dev/pts/16", 255) = 11
stat("/dev/pts/16", {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 16), ...}) = 0
openat(AT_FDCWD, "/dev/pts/16", O_RDWR|O_CLOEXEC) = 11
getcwd("/home/raman/tmp", 4096)         = 16
getcwd("/home/raman/tmp", 4096)         = 16
readlink("/proc/self/fd/0", "/dev/pts/16", 255) = 11
stat("/dev/pts/16", {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 16), ...}) = 0
openat(AT_FDCWD, "/dev/pts/16", O_RDWR|O_CLOEXEC) = 12
> +++ exited with 0 +++

But borks when running the same binary from the PATH:

$ strace -e trace=file node
execve("/opt/node-v10.8.0-linux-x64/bin/node", ["node"], 0x7ffd6ef99d30 /* 125 vars */) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 9
openat(AT_FDCWD, "/lib64/libnode.so", O_RDONLY|O_CLOEXEC) = 9
openat(AT_FDCWD, "/dev/urandom", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 9
readlink("/proc/self/exe", "/opt/node-v10.8.0-linux-x64/bin/"..., 8191) = 36
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x25108dda45} ---
+++ killed by SIGSEGV (core dumped) +++
fish: “strace -e trace=file node” terminated by signal SIGSEGV (Address boundary error)

Also, it works fine as root, but not as an unprivileged user. Selinux does not appear to be the culprit.

@rocketraman rocketraman changed the title Segmentation fault of node 10.8.0 x64 on Fedora 27 Segmentation fault of node 10.8.0 x64 on Fedora 27 when running from PATH Aug 14, 2018
@richardlau
Copy link
Member

               Stack trace of thread 91027:
               #0  0x00007ff3938b8c5a _ZN2v816FunctionTemplate3NewEPNS_7IsolateEPFvRKNS_20FunctionCallbackInfoINS_5ValueEEEENS_5LocalIS4_EENSA_INS_9SignatureEEEi (libnode.so)

This is running node directly from the tar.xz download on nodejs.org (sha256 sum verified).

The tar.xz download from nodejs.org doesn't contain nor link to libnode.so so this looks like interference with another Node.js installation.

@rocketraman
Copy link
Author

The tar.xz download from nodejs.org doesn't contain nor link to libnode.so so this looks like interference with another Node.js installation.

Ah, interesting. Do you have any idea why this might be happening? As you can see from the strace the right node binary is being executed. I confirm it does not link to libnode.so:

# ldd /opt/node-v10.8.0-linux-x64/bin/node
        linux-vdso.so.1 (0x00007fff845e3000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f573a53d000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f573a335000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f5739fad000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f5739c62000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f5739a4b000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f573982d000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f5739477000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f573a741000)

And this issue does not happen with either node 8 or 9, which are installed in exactly the same way as 10.

@rocketraman
Copy link
Author

Attached is the output of LD_DEBUG=all node. It contains the following:

    120607:     file=libnode.so [0];  dynamically loaded by node [0]
    120607:     find library=libnode.so [0]; searching
    120607:      search cache=/etc/ld.so.cache
    120607:       trying file=/lib64/libnode.so

ld-debug.txt.gz

@rocketraman
Copy link
Author

I found out that libnode.so was owned by atom-libs, which appears to have been a leftover package since I uninstalled atom a while ago. Removing this package has fixed the problem, but I still find it odd that node 10 attempts to dynamically load this shared lib, which node 8 and 9 did not, and only in some cases such as not running it with an absolute path. While this is solved from my perspective, it certainly seems like this could be some bad logic in node 10.

@richardlau
Copy link
Member

At this point I'd cc @nodejs/platform-linux but we don't appear to have such a group 😞 .

@hmt
Copy link

hmt commented Sep 20, 2018

On Fedora 28 I had the same issue. I have no clue where my libnode.so came from. But thanks for giving the full debug logs. Learnt something new about strace and coredumps. 👍

@rocketraman
Copy link
Author

@hmt You can try

rpm -q --whatprovides /path/to/your/libnode.so

The path you can get from the output of LD_DEBUG=all node.

That might tell you something useful about which rpm it came from, and if it isn't a package on your system then it was probably installed manually at some point.

I still think that something is wrong with nodes internals though.

@Trott
Copy link
Member

Trott commented Nov 21, 2018

@nodejs/build Any thoughts/insights on this one?

@rvagg
Copy link
Member

rvagg commented Nov 22, 2018

nope, and I agree, Node shouldn't be trying to load libnode if it's a stand-alone build. I'm trying to find something in the commit list for 10 that are not in 8 that might point to a change that might be implicated but nothing is standing out. There has been a little bit of work to make compiling Node as a shared library so perhaps it's related to that. 🤷‍♂️ @bnoordhuis any thoughts on where to look for this one?

@Trott
Copy link
Member

Trott commented Nov 26, 2018

If It's specific to Fedora and/or other Red Hat OSes, maybe @danbev or @lance might have an idea?

@danbev
Copy link
Contributor

danbev commented Nov 27, 2018

If It's specific to Fedora and/or other Red Hat OSes, maybe @danbev or @lance might have an idea?

Nothing I can think of without having a closer look I'm afraid. If I have time I'll try to take a look later this week but it might not be until next week as this is a short week for me (not working Friday).

@richardlau
Copy link
Member

#25733 looks similar and was reported on Ubuntu.

@gireeshpunathil
Copy link
Member

@rocketraman - is it still reproducible?

IMO this has something to do with dynamic linker / loader's symbol lookup being over-riden by some system settings.

excerpts from the dump:

    120607:	relocation processing: /lib64/libnode.so
    120607:	symbol=_register_async_wrap;  lookup in file=node [0]
    120607:	symbol=_register_async_wrap;  lookup in file=/lib64/libdl.so.2 [0]
    120607:	symbol=_register_async_wrap;  lookup in file=/lib64/librt.so.1 [0]
    120607:	symbol=_register_async_wrap;  lookup in file=/lib64/libstdc++.so.6 [0]
    120607:	symbol=_register_async_wrap;  lookup in file=/lib64/libm.so.6 [0]
    120607:	symbol=_register_async_wrap;  lookup in file=/lib64/libgcc_s.so.1 [0]
    120607:	symbol=_register_async_wrap;  lookup in file=/lib64/libpthread.so.0 [0]
    120607:	symbol=_register_async_wrap;  lookup in file=/lib64/libc.so.6 [0]
    120607:	symbol=_register_async_wrap;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
    120607:	symbol=_register_async_wrap;  lookup in file=/lib64/libnode.so [0]
    120607:	binding file /lib64/libnode.so [0] to /lib64/libnode.so [0]: normal symbol `_register_async_wrap'

this would mean the symbol register_async_wrap , despite being present (statically bound) in node, was superseded in libnode.so!

excerpts from strace:

openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 9
openat(AT_FDCWD, "/lib64/libnode.so", O_RDONLY|O_CLOEXEC) = 9

the library was loaded (soon?) after the ld cache was contacted, indicating the library being selected by some system (ld) config?

is there any output from

$ ldconfig -p | grep node

is there any output from

$ env | grep LD_

@rocketraman
Copy link
Author

rocketraman commented Jan 8, 2020

@rocketraman - is it still reproducible?

@gireeshpunathil Yes, easily reproducible.

is there any output from

$ ldconfig -p | grep node
$ env | grep LD_

$ ldconfig -p | grep node
        libnode.so.64 (libc6,x86-64) => /lib64/libnode.so.64
        libnode.so (libc6,x86-64) => /lib64/libnode.so

$ env | grep LD_
<no output>

As I noted in comment #22319 (comment) the latter of those two comes from atom-libs:

# rpm -q --whatprovides /lib64/libnode.so
atom-libs-1.3.0-1.fc21.x86_64

and when that is removed, the problem goes away.

As also mentioned previously, when node is run as node it SIGSEGV, but if run with the absolute path, it works fine.

@gireeshpunathil
Copy link
Member

@rocketraman - I tried the same steps, but could not reproduce this issue. At the same time, I am unable to mimic the setting that resulted in your ouput of ldconfig -p | grep node - I installed atom using dnf install. Do you know what system configuration resulted in libnode.so being specially treated by ldconfig?

@rocketraman
Copy link
Author

@gireeshpunathil I installed atom a long time ago, and I found the rpm for atom-libs which provides that extra libnode.so in my local dnf repo. Here it is:

atom-libs.zip

Give that a shot?

@gireeshpunathil
Copy link
Member

ah! that did the trick! thanks.

[root@cd2735ca52b7 foo]# node
Segmentation fault
[root@cd2735ca52b7 foo]# 
[root@cd2735ca52b7 foo]# ldconfig -p | grep node
	libnode.so (libc6,x86-64) => /lib64/libnode.so
[root@cd2735ca52b7 foo]# 

same as your case!

I will debug this!

@gireeshpunathil
Copy link
Member

ok, further insights:

[root@cd2735ca52b7 foo]# rpm -qp --scripts ./atom-libs-1.3.0-1.fc21.x86_64.rpm 
warning: ./atom-libs-1.3.0-1.fc21.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID f4928260: NOKEY
postinstall program: /sbin/ldconfig
postuninstall program: /sbin/ldconfig
[root@cd2735ca52b7 foo]# 

this rpm indeed updates libnode.so for dynamic linker's run-time bindings cache, effectively bringing the symbols of the library for lookup for executables built with runtime linking option (-lrt).

If the object reference (node) does not contain slashes in it, the dynamic linker/loader searches the cache file /etc/ld.so.cache for libraries to be used for symbol lookup that is required in the said object. In this case, the cache was populated by ldconfig.

upto node v11.6.0 , node is built with -lrt option

[root@cd2735ca52b7 foo]# ldd /foo/node-v11.6.0-linux-x64/bin/node
	linux-vdso.so.1 (0x00007ffe64b2d000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f9ba9764000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f9ba955c000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f9ba91d4000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f9ba8e89000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f9ba8c72000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9ba8a54000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f9ba869e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f9ba9968000)
[root@cd2735ca52b7 foo]# 

pay special attention to librt.so.1.

So this explains:

  • why the issue does not occur with node v11.7.0 and above (specifically with commit 03ec4cea30)
  • why the issue does not occur with full path reference to node (slashes in the path)

Hope this helps.

@gireeshpunathil
Copy link
Member

@rocketraman - any concerns / observations on the above finding? I am closing this as resolved, feel free to re-open if you think there is something outstanding.

@rocketraman
Copy link
Author

Thanks @gireeshpunathil -- note that node 8 and 9 also did not suffer from this issue, so it looks like the librt stuff was added in 10, and then removed again in 11.7? I can't pretend to understand all the intricacies of this, but it sounds like the behavior has been explained, and good workarounds are available, so I'm good with closing it. Thanks!

@7xt
Copy link

7xt commented Mar 26, 2020

can someone explain this issue in layman term? I'm new in nodejs.

@rocketraman
Copy link
Author

rocketraman commented Mar 26, 2020

@od2 Basically node 10 to 11.7 had an alternate library loading mechanism (via librt) which was broken in certain situations, which includes loading an external (and incorrect) libnode when running node from the PATH (i.e. having no slashes in the executable name).

To "solve" the problem do one of the following:

  • use a node version before 10 or after 11.7,
  • run node with an absolute path i.e. /path/to/node instead of just node, or
  • delete the external libnode.so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants