-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indefinite blocks on NativeCrypto.SSL_do_handshake #864
Comments
Looking at the native stack trace in the referenced bug, I'm a bit surprised that it literally just calls The fact you're not seeing this with other TLS implementations and people are still seeing it with 2.4.0 does suggest it's a Conscrypt bug. However we're actively migrating away from this implementation of It would be great if you guys could test against the new implementation, but that would mean updating to 2.4.0 or later, and the dataproc bug suggests you're blocked by #864. On the plus side, we have someone actively working on that. And for testing purposes you could probably use 2.4.0 with some code to install your own |
May you elaborate on how to test a new socket implementation instead of Maybe the bug here could be as simple as that Conscrypt does not set a read timeout (or some other timeout) on |
Ah sorry, to test the engine-based implementation, call By default I believe the socket timeouts are 0 (i.e. indefinite). There's some complexity here between the implementations... Both implementations support read timeouts using the standard
|
Dataproc was updated to use Conscrypt 2.5.1 by default, I think that we can close this issue until we will have an evidence that it re-occurs on Conscrypt 2.5.1+. |
We got reoccurrence of the same hanging with Conscrypt 2.5.1 that uses new
|
Issue is still there in Conscrypt 2.5.1 "pool-7-thread-4" #33 prio=5 os_prio=0 tid=0x00007f3b383f3800 nid=0x35fe runnable [0x00007f3b3ca4e000] |
@prbprbprb Maybe this hang is caused by the JDK-8238579 issue? |
I think I've found the root cause. It requires a careful reading of the
All of the following stack traces and code walkthrough will refer to OpenJDK 8u322 and Conscrypt 2.5.2.
Following are stack traces from running a simple test harness that uses
The First, here is the test using Conscrypt, and running
Second, here is the test using the default security provider, resulting in an exception after the 20-second read timeout:
It's interesting to note that both processes are performing TLS handshake, but they are doing the handshake from different points of execution in Execution with Conscrypt proceeds into
Thus, it appears that Conscrypt, unlike the default security provider, is performing the TLS handshake at a time when the read timeout has not yet been applied. It seems like the To summarize:
It doesn't appear that there is any Java upgrade path that would resolve this issue. Reviewing |
Set socket read timeout (`fs.gs.http.read-timeout`) as early as possible on new sockets returned from the custom `SSLSocketFactory`. This guarantees the timeout is enforced during TLS handshakes when using Conscrypt as the security provider. See also google/conscrypt#864 .
Set socket read timeout (`fs.gs.http.read-timeout`) as early as possible on new sockets returned from the custom `SSLSocketFactory`. This guarantees the timeout is enforced during TLS handshakes when using Conscrypt as the security provider. See also google/conscrypt#864 .
Set socket read timeout (`fs.gs.http.read-timeout`) as early as possible on new sockets returned from the custom `SSLSocketFactory`. This guarantees the timeout is enforced during TLS handshakes when using Conscrypt as the security provider. See also google/conscrypt#864 . (cherry picked from commit 61f8629)
Set socket read timeout (`fs.gs.http.read-timeout`) as early as possible on new sockets returned from the custom `SSLSocketFactory`. This guarantees the timeout is enforced during TLS handshakes when using Conscrypt as the security provider. See also google/conscrypt#864 . (cherry picked from commit 61f8629)
Set socket read timeout (`fs.gs.http.read-timeout`) as early as possible on new sockets returned from the custom `SSLSocketFactory`. This guarantees the timeout is enforced during TLS handshakes when using Conscrypt as the security provider. See also google/conscrypt#864 . (cherry picked from commit 61f8629) (cherry picked from commit 667836b)
Set socket read timeout (`fs.gs.http.read-timeout`) as early as possible on new sockets returned from the custom `SSLSocketFactory`. This guarantees the timeout is enforced during TLS handshakes when using Conscrypt as the security provider. See also google/conscrypt#864 . (cherry picked from commit 61f8629) (cherry picked from commit 667836b)
Am seeing FX File Explorer (network) client use on Android stuck during Samsung Files (network) client also fails here. For my use case, its showing up with Android to Android and is happening on Android 8 test device. Android 14 test device isn't seeing this (NativeCrypto.SSL_do_handshake(Native Method)) with the exact same code pushed out to it. If Update 1: I randomly noticed today that Update 2: Two clients now working and Final update: |
We have user reports from Google Cloud Dataproc that threads/tasks would intermittently hang on NativeCrypto.SSL_do_handshake. There were some Java and native thread dumps shared in GoogleCloudDataproc/hadoop-connectors/issues/153, but we were unable to produce a solid reproduction so far.
Dataproc currently uses Conscrypt 1.4.2 on java-8-openjdk-amd64, and on the OS side we have heard from both Debian 9 (Debian-provided OpenJDK) and Debian 10 (AdoptOpenJDK). One user shared with GCP Support that they still saw the same hang even after manually updating to Conscrypt 2.4.0.
Below is an example of the thread dump.
The text was updated successfully, but these errors were encountered: