Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated calls to LibGit2.fetch segfaults on FreeBSD 11.1 #23328

Closed
ararslan opened this issue Aug 18, 2017 · 39 comments
Closed

Repeated calls to LibGit2.fetch segfaults on FreeBSD 11.1 #23328

ararslan opened this issue Aug 18, 2017 · 39 comments
Labels
kind:bug Indicates an unexpected problem or unintended behavior system:freebsd Affects only FreeBSD

Comments

@ararslan
Copy link
Member

I upgraded my FreeBSD box from 11.0 (which is what the FreeBSD CI workers are running) to 11.1, and now I'm consistently getting segfaults in the Pkg tests:

$ JULIA_CPU_CORES=2 JULIA_TEST_MAXRSS_MB=600 ./julia test/runtests.jl pkg
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
INFO: Initializing package repository /tmp/T22S8Ijy/v0.7
INFO: Cloning METADATA from https://github.com/JuliaLang/METADATA.jl
INFO: No packages to install, update or remove
INFO: Cloning cache of Example from notarealprotocol://github.com/JuliaLang/Example.jl.git
INFO: Cloning cache of Example from https://github.com/JuliaLang/Example.jl.git
INFO: Installing Example v0.4.1
INFO: Package database updated
INFO: Checking out Example master...
INFO: Pulling Example latest master...
INFO: No packages to install, update or remove
INFO: Freeing Example
INFO: No packages to install, update or remove
INFO: Checking out Example master...
INFO: Pulling Example latest master...
INFO: No packages to install, update or remove
INFO: Freeing Example
INFO: No packages to install, update or remove
INFO: Removing Example v0.4.1
INFO: Package database updated
INFO: Package Example is not installed
INFO: Cloning Example from https://github.com/JuliaLang/Example.jl.git
INFO: Computing changes...
INFO: No packages to install, update or remove
INFO: Package database updated
INFO: Freeing Example
INFO: No packages to install, update or remove
INFO: Checking out Example master...
INFO: Pulling Example latest master...
INFO: No packages to install, update or remove
INFO: Freeing Example
INFO: No packages to install, update or remove
INFO: Cloning Example2 from /tmp/T22S8Ijy/v0.7/Example
INFO: Computing changes...
INFO: No packages to install, update or remove
INFO: Cloning Example3 from /tmp/T22S8Ijy/v0.7/Example
INFO: Computing changes...
INFO: No packages to install, update or remove
INFO: Checking out Example2 test-branch-1...
INFO: Pulling Example2 latest test-branch-1...
INFO: No packages to install, update or remove
INFO: Checking out Example3 test-branch-1...
INFO: Pulling Example3 latest test-branch-1...
INFO: No packages to install, update or remove
INFO: Checking out Example master...
INFO: Pulling Example latest master...
INFO: No packages to install, update or remove
INFO: Cloning Example4 from /tmp/T22S8Ijy/v0.7/Example
INFO: Computing changes...
INFO: No packages to install, update or remove
INFO: Checking out Example4 test-branch-2...
INFO: Pulling Example4 latest test-branch-2...
INFO: No packages to install, update or remove
[1]    2356 segmentation fault (core dumped)  JULIA_CPU_CORES=2 JULIA_TEST_MAXRSS_MB=600 ./julia test/runtests.jl pkg

Version info:

julia> versioninfo()
Julia Version 0.7.0-DEV.1383
Commit d126c66a9e* (2017-08-18 16:00 UTC)
Platform Info:
  OS: FreeBSD (x86_64-unknown-freebsd11.1)
  CPU: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, sandybridge)
Environment:

cc @iblis17

@ararslan ararslan added system:freebsd Affects only FreeBSD domain:packages Package management and loading test This change adds or pertains to unit tests labels Aug 18, 2017
@ararslan
Copy link
Member Author

ararslan commented Aug 19, 2017

Running it in GDB, I'm seeing a lot of dwarf errors regarding "wrong version in compilation unit header," as well as

Program received signal SIGSEV, Segmentation fault.
0x0000000801571c19 in ?? () /usr/local/lib/gcc5/libgcc_s.so.1

at the end. Full log here: https://gist.github.com/ararslan/a9694acc54da13f633edde6aa7230a59

It also left a core file in the package directory where the tests were running. Examining that with GDB, I get

(gdb) core julia.core
Core was generated by `./julia test/runtests.jl pkg'.
Program was terminated with signal 11, Segmentation fault.
#0  0x0000000803e75907 in ?? ()
(gdb) bt
Cannot access memory at address 0x7fffdfbfdd80

@iblislin
Copy link
Member

iblislin commented Aug 19, 2017 via email

@ararslan
Copy link
Member Author

Wow, the system GDB is way out of date, no wonder they're going to remove it. Okay, I tried again with the ports GDB. The output is much more informative, but it looks like it's stopping on something else; the test program doesn't get nearly as far before stopping, and it doesn't appear to be hitting a segfault in GDB. The full log is here: https://gist.github.com/ararslan/4347c22e925e98f5fcacda5b438854a9. Could be I'm misusing GDB somehow, so if anything about the session in the log looks fishy let me know and I can try it another way.

@ararslan
Copy link
Member Author

Using LLDB rather than GDB looks like it hits the right thing: https://gist.github.com/ararslan/acc5e587400affd93dff5e41f5dc1cda. (System LLDB, 4.0.0 in FreeBSD 11.1.)

@ararslan
Copy link
Member Author

Looks like it's this line that's hitting it: https://github.com/JuliaLang/julia/blob/master/test/pkg.jl#L298

@test_warn "INFO: Package Example: skipping update (pinned)..." Pkg.update()

I haven't been able to minimally reduce the issue yet though.

@iblislin
Copy link
Member

Found a workaround, can you confirm?:

# sysctl security.bsd.stack_guard_page=0

It appears that this option is enabled in FreeBSD 11.1-Release by default.

@ararslan
Copy link
Member Author

Interesting, it does appear that stack_guard_page is 0 in 11.0 and 1 in 11.1, though that isn't mentioned in the 11.1 release notes. Disabling the stack guard page allows the Pkg tests to pass without segfaulting.

@iblislin
Copy link
Member

iblislin commented Aug 22, 2017

not in release notes, but in wiki: https://wiki.freebsd.org/WhatsNew/FreeBSD11#Security

The stack protector is now set to strong (r288669)

https://svnweb.freebsd.org/base?view=revision&revision=288669

Edit: seems unrelated

--
I found this commit https://svnweb.freebsd.org/base?view=revision&revision=215307

this commit enable it by default https://svnweb.freebsd.org/base?view=revision&revision=320317

@ararslan
Copy link
Member Author

ararslan commented Aug 22, 2017

Okay, even better LLDB backtrace: https://gist.github.com/ararslan/62d5dfb03b529a56dce0ab5d239685d8. In particular, it shows libgit2 calls in thread backtrace all starting on line 112.

Edit: Buuuuuuuuut that call is in the wrong thread. :/ Thread 6 in the above gist hits the SIGSEGV.

@iblislin
Copy link
Member

Similar issue? https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221127#c12

I'm going to give flang a try

@ararslan
Copy link
Member Author

I doubt it's an issue with conflicting libgcc_s between Clang and GCC since we have the build system set up on FreeBSD to link everything to GCC's libgcc_s. That's what I implemented in #21788. It must be an issue with the stack guard setting, since the tests pass with it disabled and segfault with it enabled.

@iblislin
Copy link
Member

on 11.0-RELEASE-p12 the test-pkg passed

root@:~/julia # sysctl security.bsd.stack_guard_page
security.bsd.stack_guard_page: 1
root@:~/julia # uname -a
FreeBSD  11.0-RELEASE-p12 FreeBSD 11.0-RELEASE-p12 #0: Wed Aug  9 10:03:39 UTC 2017     [email protected]:/usr/obj/usr/src/sys/GENERIC  amd64

I think it's releated to r320317. It does change something besides setting stack guard as default; 11.0-RELEASE doesn't include this patch, so test-pkg passed with stack guard.

@ararslan
Copy link
Member Author

Excellent catch!

@iblislin
Copy link
Member

Ok, maybe we should create an minimal example, and send it to FreeBSD CURRENT mailing list.
The problematic part is libgit2 (perhaps plus unwind) right? I'm going to start from building simple c code.

@ararslan
Copy link
Member Author

I'm not sure the problem is libgit2; in the LLDB backtrace that contains libgit2 calls, the libgit2 calls are in a different thread than the one that hits the SIGSEGV. Do you know how to reproduce this minimally? I'm still trying to figure out what in Julia is actually triggering it.

@iblislin
Copy link
Member

😖
I managed to remove some test case in test/pkg.jl but that will make testing pass with 11.1's stack guard.
The condition to triggering SIGSEGV seems quite tricky.

@ararslan
Copy link
Member Author

Which tests did you have to remove in test/pkg.jl to get the tests passing? Can you post a diff?

@iblislin
Copy link
Member

please checkout this: https://gist.github.com/iblis17/e2199735aee9673585da8aa48e5d4984

In the original pkg.jl, the explosion point of SIGSEGV is near line 300, IIRC.
when I comment out some test cases like my gist, the test-pkg will pass. 😕

@iblislin
Copy link
Member

For record, segfault happened on -CURRENT with stack guard enabled, also.

└─[iblis@abeing]% uname -a
FreeBSD abeing 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r323335: Sun Sep 17 00:56:35 CST 2017     root@abeing:/usr/obj/usr/src/sys/GENERIC  amd64

@iblislin
Copy link
Member

iblislin commented Sep 17, 2017

another reproduce steps:

  1. git clone https://github.com/JuliaLang/julia.git repo
  2. cd repo && git reset --hard 67731c2c07 (It's just HEAD~500) edit: ignore this
  3. cat test.jl
for i  1:100
    info(i)
    LibGit2.fetch(LibGit2.GitRepo("./repo"))
end
  1. ./julia test.jl
└─[iblis@abeing]% ./julia test.jl
INFO: 1                           
INFO: 2                           
INFO: 3                           
INFO: 4                           
INFO: 5                           
INFO: 6                           
zsh: segmentation fault (core dumped)  ./julia test.jl

@ararslan
Copy link
Member Author

Good sleuthing! If you write the same thing in C using the functions from libgit2, do you also get a segfault? Also, why is the git reset necessary? Is it just so that HEAD doesn't point to the latest commit or is there something specific about 67731c2?

@iblislin
Copy link
Member

I checked it again, git reset is unnecessary.
I try to invoke libgit2 api in C: https://gist.github.com/iblis17/bbc621a78fda6ffcbca077fadba8ecdd#file-git2_fetch-c
but cannot get segfault.

@ararslan
Copy link
Member Author

ararslan commented Sep 19, 2017

I was able to reduce it to a single ccall:

import Base.LibGit2: GitRepo, GitRemote, RemoteCallbacks, CredentialPayload,
                     StrArrayStruct, FetchOptions, get, credentials_cb

repo = GitRepo("./repo")
rmt = get(GitRemote, repo, "origin")
fo = FetchOptions(callbacks=RemoteCallbacks(credentials_cb(), CredentialPayload()))

for i = 1:100
    info(i)
    ccall((:git_remote_fetch, :libgit2), Cint,
          (Ptr{Void}, Ptr{StrArrayStruct}, Ptr{FetchOptions}, Cstring),
          rmt.ptr, C_NULL, Ref(fo), "hi")
end

close(rmt)
close(repo)

For me it consistently faults on the sixth call, as it is in your example output above.

@StefanKarpinski
Copy link
Sponsor Member

Is it possible that one of the structs has the wrong specification?

@ararslan
Copy link
Member Author

I compared our implementations in base/libgit2/types.jl to the documentation of the corresponding types in libgit2 and they seem to match as far as I can tell, but there may be some subtle difference that I'm missing.

@ararslan
Copy link
Member Author

I'm going to call this a Julia bug since I haven't been able to reproduce it with C or Rust's libgit2 bindings.

@ararslan ararslan added kind:bug Indicates an unexpected problem or unintended behavior libgit2 The libgit2 library or the LibGit2 stdlib module and removed domain:packages Package management and loading test This change adds or pertains to unit tests labels Sep 19, 2017
@ararslan
Copy link
Member Author

@omus has noted that this seems only to occur when using multiple cores. That is, setting JULIA_CPU_CORES=1 avoids segfaulting.

@ararslan
Copy link
Member Author

Hm, setting JULIA_CPU_CORES=1 does not avoid segfaulting for me. I guess it's only when the actual VM is set to only use one core.

@ararslan ararslan changed the title Segfault in Pkg tests on FreeBSD 11.1 Repeated calls to LibGit2.fetch segfaults on FreeBSD 11.1 Sep 23, 2017
@wildart
Copy link
Member

wildart commented Sep 23, 2017

My guess that something happens during the credentials_cb call. It would be hard to track (it is Julia-C-Julia call) and it would segfault without any relevant information.

@ararslan
Copy link
Member Author

Why would something inside credentials_cb, which is outside of the loop, cause the repeated ccall to git_remote_fetch to segfault? You mean like the FetchOptions ends up being constructed incorrectly or something, and messes things up when being passed back and forth between Julia and C?

@ararslan
Copy link
Member Author

Okay, updated LLDB output on latest master with the above git_remote_fetch in a loop: https://gist.github.com/ararslan/3eb7df6f83d21242d5e6d53719ff2efc

@ararslan
Copy link
Member Author

LLDB backtrace on FreeBSD 12.0-CURRENT with security.bsd.stack_guard_page=1, built with DISABLE_LIBUNWIND=1: https://gist.github.com/iblis17/b9eb213150b1da48a46c460bd310187b

@ararslan
Copy link
Member Author

Plot twist: libgit2 may be a red herring. Setting USE_SYSTEM_CURL=1 allows things to work fine without segfaulting. I tried applying all of the curl patches in the Ports tree to our curl but that didn't do it. A difference between the system curl and ours is that the system's is built with OpenSSL while ours is built with mbedTLS, so I'm inclined to think that could be related.

@ararslan ararslan removed the libgit2 The libgit2 library or the LibGit2 stdlib module label Sep 28, 2017
@iblislin
Copy link
Member

iblislin commented Oct 25, 2017

I tried copying my /usr/local/lib/libcurl.so to julia/usr/lib/, but I still got segfault. :/

@ararslan
Copy link
Member Author

ararslan commented Nov 5, 2017

Using the system curl with USE_SYSTEM_CURL=1 no longer fixes this for me.

@ararslan
Copy link
Member Author

ararslan commented Nov 5, 2017

Never mind, it does—I just had to rebuild libgit2 and mbedTLS.

@ararslan
Copy link
Member Author

ararslan commented Dec 8, 2017

I'm not getting a segfault with a stock build of Julia on FreeBSD 12.0-CURRENT, built from source at r326614 with the GENERIC-NODEBUG kernel and MALLOC_PRODUCTION. The stack guard page is enabled.

@ararslan
Copy link
Member Author

I just tried current master on FreeBSD 11.1 and it didn't segfault. I'm doing a git clean -fdx and rebuilding from scratch to make sure it's reproducible.

@ararslan
Copy link
Member Author

Neither Iblis nor I can reproduce this on 11.1 with current master, so I'm going to close this and call it resolved. I have no idea what changed in Base that made it work, though I might bisect it out of curiosity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug Indicates an unexpected problem or unintended behavior system:freebsd Affects only FreeBSD
Projects
None yet
Development

No branches or pull requests

4 participants