Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI failure: double free or corruption (fasttop) #1013

Closed
NobodyXu opened this issue Sep 9, 2023 · 12 comments
Closed

CI failure: double free or corruption (fasttop) #1013

NobodyXu opened this issue Sep 9, 2023 · 12 comments
Labels
acknowledged an issue is accepted as shortcoming to be fixed help wanted Extra attention is needed

Comments

@NobodyXu
Copy link
Contributor

NobodyXu commented Sep 9, 2023

Current behavior 😯

From the CI:

$ /home/runner/work/gitoxide/gitoxide/target/debug/gix --no-verbose free pack receive -p 2 https://github.com/byron/gitoxide
Expected actual status 134 to be 0
index: 1d595f58aec6a1ae68d5706113d784c475ba3776
pack: 866345ea2794b3096d876fbdf278fa4ae6acde89

1b1fc257d5748c7c41e899bf2d1447ffd9f22d19 HEAD symref-target:refs/heads/main
1b8d9e6a408e480ae1912e919c37a26e5c46639d refs/heads/faster-discovery
43f695a9607f1f85f859f2ef944b785b5b6dd238 refs/heads/fix-823
# I've trimmed anything between there since it's way too long
08fd086e0fe12a8d37fc2894b77c7f8e4f37c269 refs/tags/v0.8.2
d3b20949d887c95e20753658d3e90897f625ccea refs/tags/v0.8.3
effb2a612d5912ea7bd9e7c65465ca3da3797a7a refs/tags/v0.8.4
bdf9fa5874bb72ad158a5e28fd60d95eab78e9cb refs/tags/v0.9.0 object:960eb0e5e5a7df117ed2ae2a8e2ec167b074c332
double free or corruption (fasttop)
error: Recipe `journey-tests` failed on line 188 with exit code 1

Expected behavior 🤔

It should not failed due to double free or corruption.

Steps to reproduce 🕹

It happens in my fork after I fetch the latest commit from upstream, not sure how to reproduce it.

@Byron Byron added the acknowledged an issue is accepted as shortcoming to be fixed label Sep 9, 2023
@Byron
Copy link
Member

Byron commented Sep 9, 2023

I have noticed this for a while in this CI as well and can say the following:

  • it started appearing without a change to unsafe code in gix
  • it stayed even after removing unnecessary unsafe code in gix
  • it always appears in the max build, which is the last journey test to run and which uses zlib-ng as well as sha1-asm
  • it never appeared in any of the other builds, like max-pure

My best guess is that this is a regression in zlib-ng, or maybe a regression in libz-sys, or some combination of both. I know debugging this can be a time-sink so I chose to ignore it until it disappears by itself. It has been going on for months though… .

@Byron
Copy link
Member

Byron commented Oct 14, 2023

As even with the latest version of flate2, v1.0.28, this issue still occours when cloning gitoxide using the max (default) configuration that uses C versions of zlib and SHA1 (maybe intermixed with assembly), I thought it's time to see if this issue is obviously observable.

So I ran it with valgrind --tool=memcheck --leak-check=full --track-origins=yes -s --log-file=valgrind-memcheck.out ./target/release/gix --no-verbose clone https://github.com/Byron/gitoxide on an ARM ubuntu VM. The log is valgrind-memcheck.out.zip.

Unfortunately, it doesn't show anything concerning, there is just some understandable memory leakage. clap shows some dependence on an uninitialized value for some reason, but I doubt it's the reason for the occasional crash. Maybe the issue is platform dependent and needs intel, maybe it truly needs to run many times to occour.

@Byron
Copy link
Member

Byron commented Oct 14, 2023

Oh, and I just realized that it runs a different command, trying again. Edit: Same result :/.
valgrind-memcheck.out.zip

I also ran it in debug mode with greatly reduced performance, and even though that makes the clap warnings go away it doesn't change the fact that nothing in terms of memory corruption is visible.

@Byron
Copy link
Member

Byron commented Oct 14, 2023

Great success!

I re-ran the release version of gix on an Intel VM, Ubuntu 22 LTS this time, just to be sure and it yielded the very same result, but… it showed invalid reads coming from libcrypto and openssl_connect.

==9229== ERROR SUMMARY: 15 errors from 15 contexts (suppressed: 0 from 0)
==9229== 
==9229== 1 errors in context 1 of 15:
==9229== Thread 2:
==9229== Invalid read of size 4
==9229==    at 0x4AC5608: CRYPTO_THREAD_set_local (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AEC92A: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4ABE493: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AC26C6: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4EDF690: __nptl_deallocate_tsd (nptl_deallocate_tsd.c:73)
==9229==    by 0x4EDF690: __nptl_deallocate_tsd (nptl_deallocate_tsd.c:22)
==9229==    by 0x4EE2949: start_thread (pthread_create.c:453)
==9229==    by 0x4F73BF3: clone (clone.S:100)
==9229==  Address 0x515061c is 28 bytes inside a block of size 80 free'd
==9229==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==9229==    by 0x4AC01D4: CRYPTO_free_ex_data (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB8D1E: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AC2928: OPENSSL_cleanup (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4E93494: __run_exit_handlers (exit.c:113)
==9229==    by 0x4E9360F: exit (exit.c:143)
==9229==    by 0x4E77D96: (below main) (libc_start_call_main.h:74)
==9229==  Block was alloc'd at
==9229==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==9229==    by 0x4ABA26D: CRYPTO_zalloc (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AEC761: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB872D: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4ABE304: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB8AA8: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AECC94: RAND_get0_primary (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AECFE6: RAND_status (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0xA52A76: ossl_connect_step1 (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA53A12: ossl_connect_common (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA4BF83: ssl_cf_connect (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA57D6F: cf_setup_connect (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229== 
==9229== 
==9229== 1 errors in context 2 of 15:
==9229== Invalid read of size 4
==9229==    at 0x4AC55F4: CRYPTO_THREAD_get_local (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AEC91D: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4ABE493: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AC26C6: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4EDF690: __nptl_deallocate_tsd (nptl_deallocate_tsd.c:73)
==9229==    by 0x4EDF690: __nptl_deallocate_tsd (nptl_deallocate_tsd.c:22)
==9229==    by 0x4EE2949: start_thread (pthread_create.c:453)
==9229==    by 0x4F73BF3: clone (clone.S:100)
==9229==  Address 0x515061c is 28 bytes inside a block of size 80 free'd
==9229==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==9229==    by 0x4AC01D4: CRYPTO_free_ex_data (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB8D1E: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AC2928: OPENSSL_cleanup (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4E93494: __run_exit_handlers (exit.c:113)
==9229==    by 0x4E9360F: exit (exit.c:143)
==9229==    by 0x4E77D96: (below main) (libc_start_call_main.h:74)
==9229==  Block was alloc'd at
==9229==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==9229==    by 0x4ABA26D: CRYPTO_zalloc (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AEC761: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB872D: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4ABE304: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB8AA8: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AECC94: RAND_get0_primary (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AECFE6: RAND_status (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0xA52A76: ossl_connect_step1 (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA53A12: ossl_connect_common (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA4BF83: ssl_cf_connect (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA57D6F: cf_setup_connect (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229== 
==9229== 
==9229== 1 errors in context 3 of 15:
==9229== Invalid read of size 4
==9229==    at 0x4AC5608: CRYPTO_THREAD_set_local (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AEC909: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4ABE493: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AC26C6: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4EDF690: __nptl_deallocate_tsd (nptl_deallocate_tsd.c:73)
==9229==    by 0x4EDF690: __nptl_deallocate_tsd (nptl_deallocate_tsd.c:22)
==9229==    by 0x4EE2949: start_thread (pthread_create.c:453)
==9229==    by 0x4F73BF3: clone (clone.S:100)
==9229==  Address 0x5150618 is 24 bytes inside a block of size 80 free'd
==9229==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==9229==    by 0x4AC01D4: CRYPTO_free_ex_data (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB8D1E: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AC2928: OPENSSL_cleanup (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4E93494: __run_exit_handlers (exit.c:113)
==9229==    by 0x4E9360F: exit (exit.c:143)
==9229==    by 0x4E77D96: (below main) (libc_start_call_main.h:74)
==9229==  Block was alloc'd at
==9229==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==9229==    by 0x4ABA26D: CRYPTO_zalloc (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AEC761: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB872D: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4ABE304: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB8AA8: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AECC94: RAND_get0_primary (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AECFE6: RAND_status (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0xA52A76: ossl_connect_step1 (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA53A12: ossl_connect_common (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA4BF83: ssl_cf_connect (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA57D6F: cf_setup_connect (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229== 
==9229== 
==9229== 1 errors in context 4 of 15:
==9229== Invalid read of size 4
==9229==    at 0x4AC55F4: CRYPTO_THREAD_get_local (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AEC8FC: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4ABE493: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AC26C6: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4EDF690: __nptl_deallocate_tsd (nptl_deallocate_tsd.c:73)
==9229==    by 0x4EDF690: __nptl_deallocate_tsd (nptl_deallocate_tsd.c:22)
==9229==    by 0x4EE2949: start_thread (pthread_create.c:453)
==9229==    by 0x4F73BF3: clone (clone.S:100)
==9229==  Address 0x5150618 is 24 bytes inside a block of size 80 free'd
==9229==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==9229==    by 0x4AC01D4: CRYPTO_free_ex_data (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB8D1E: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AC2928: OPENSSL_cleanup (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4E93494: __run_exit_handlers (exit.c:113)
==9229==    by 0x4E9360F: exit (exit.c:143)
==9229==    by 0x4E77D96: (below main) (libc_start_call_main.h:74)
==9229==  Block was alloc'd at
==9229==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==9229==    by 0x4ABA26D: CRYPTO_zalloc (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AEC761: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB872D: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4ABE304: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AB8AA8: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AECC94: RAND_get0_primary (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0x4AECFE6: RAND_status (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==9229==    by 0xA52A76: ossl_connect_step1 (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA53A12: ossl_connect_common (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA4BF83: ssl_cf_connect (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229==    by 0xA57D6F: cf_setup_connect (in /Users/byron/dev/github.com/Byron/gitoxide/target/release/gix)
==9229== 

valgrind-memcheck.out.zip

I will see if the same happens on the ARM version of ubuntu 22, and if not, will see if it's possible to switch the linux test to ARM.

@Byron
Copy link
Member

Byron commented Oct 14, 2023

Unfortunately, the ARM image shows the same issue (valgrind-memcheck.out.zip), and even if it wouldn't have it doesn't look like GitHub Actions provides ARM instances.

It's as unfortunate that Ubuntu 23 images aren't provided either, so there seems to be no way to circumvent the issue. Unless, of course, there is an update for the affected openssl and/or libcrypt libraries that I am unaware of.

Would definitely love your input here, @NobodyXu .

@NobodyXu
Copy link
Contributor Author

Maybe you could change the CI to use rustls and reqwest by default?

@Byron
Copy link
Member

Byron commented Oct 14, 2023

These build-types already run the journey tests prior to the one using openssl.

I mean, you are right, by the looks of it the only fix is to not run the max version of that test at least on CI until May 2027 :D. Maybe they fix something until then, after all, something has broken it. It's unclear though if it's OpenSSL or libcrypt.

Since it runs with other transports, it's probably fine to disable it in the meantime to get CI stable again.

@NobodyXu
Copy link
Contributor Author

Since it runs with other transports, it's probably fine to disable it in the meantime to get CI stable again.

Another option is to build libcurl linked with rustls + hyper or using wolfssl.
The whole compilation can be cached separately so that it's only reused when libcurl version is bumped.

@Byron
Copy link
Member

Byron commented Oct 14, 2023

Aaah, this means one could configure curl somehow to use a different backend. Maybe this could even be optional and via gix feature toggle so that people who rely on openssl still have working builds but on CI we could use a different backend.

Yes, this seems like the way to go 🙏.

@Byron
Copy link
Member

Byron commented Oct 15, 2023

129dc89 shows how to configure gix (CLI) to use rustls as curl backend which I'd expect fixes the random crashes on CI.
I validated that this indeed fixes the observable memory corruption with valgrind (valgrind-memcheck.txt.zip).

As a library user, you can now set the blocking-http-transport-curl-rustls feature to avoid openssl, and I would expect this to fix your CI issue as well.

@NobodyXu
Copy link
Contributor Author

Great!

The CI failure is from fork of gitoxide repository (to add something), so once you fixed it here you can close this issue.

@Byron
Copy link
Member

Byron commented Oct 15, 2023

(Hopefully) closed via #1067.

@Byron Byron closed this as completed Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants