Try to fix deadlock #124

samoconnor · 2018-03-02T12:52:38Z

This call:
nb_available(::SSLContext)->mbedtls_ssl_read->f_recv->readbytes!()
can block holding the datalock and can stop calls write(::SSLContext) calls
(in other tasks) from getting the lock and writing. If a request is not
writen, the response may never come, and the read may block forever.

The forum post below says that the zero-size mbedtls_ssl_read call
used to find out how many bytes are available must be done in
non-blocking mode. This commit addes a nonblocking flag to try to
implement that.

This seems to work to avoid a deadlock in HTTP.jl where:

A request header has been written.
An async task has been schedueled to write the body.
The main task starts reading the response (in some cases the
response headers arrive before the request body is sent).

The expected behaviour is that the reading task will soon block to wait
for data, thereby yeilding to the async task, which will finish writing
the body. Before this commit, it would get stuck.

See: https://esp32.com/viewtopic.php?t=1101#p4945
"After the correct implementation of non-blocking sockets, the things
seem to be working fine"

This call: nb_available(::SSLContext)->mbedtls_ssl_read->f_recv->readbytes!() can block holding the datalock and can stop calls write(::SSLContext) calls (in other tasks) from getting the lock and writing. If a request is not writen, the response may never come, and the read may block forever. The forum post below says that the zero-size mbedtls_ssl_read call used to find out how many bytes are available must be done in non-blocking mode. This commit addes a nonblocking flag to try to implement that. This seems to work to avoid a deadlock in HTTP.jl where: - A request header has been written. - An async task has been schedueled to write the body. - The main task starts reading the response (in some cases the response headers arrive before the request body is sent). The expected behaviour is that the reading task will soon block to wait for data, thereby yeilding to the async task, which will finish writing the body. Before this commit, it would get stuck. See: https://esp32.com/viewtopic.php?t=1101#p4945 "After the correct implementation of non-blocking sockets, the things seem to be working fine"

- Add a txcount field to the Request struct - Increment txcount after the body is written - Modify RetryRequest.isrecoverable to allow retry of non-idempotent requests if txcount is zero. - Modify StreamRequest to Wait for other pipelined reads to complete before sending a non-idempotent request body. This should decrease the chance of sending a POST body that ends up failing and can't be retried. - Remove yeild() after @async writebody. This should increase the chance that the startread() realises that the connection is dead before the body is written. Depends on: JuliaLang/MbedTLS.jl#124

samoconnor · 2018-03-02T13:08:32Z

@Keno @quinnj @malmaud

codecov-io · 2018-03-02T13:08:48Z

Codecov Report

Merging #124 into master will decrease coverage by 3.74%.
The diff coverage is 75%.

@@            Coverage Diff             @@
##           master     #124      +/-   ##
==========================================
- Coverage   68.63%   64.88%   -3.75%     
==========================================
  Files          10       10              
  Lines         424      430       +6     
==========================================
- Hits          291      279      -12     
- Misses        133      151      +18

Impacted Files	Coverage Δ
src/ssl.jl	`66.11% <75%> (-5.19%)`	⬇️
src/error.jl	`44.44% <0%> (-11.12%)`	⬇️
src/x509_crt.jl	`63.15% <0%> (-10.53%)`	⬇️
src/entropy.jl	`78.94% <0%> (-10.53%)`	⬇️
src/ctr_drbg.jl	`66.66% <0%> (-9.53%)`	⬇️
src/pk.jl	`72.34% <0%> (-4.26%)`	⬇️
src/cipher.jl	`60.86% <0%> (-1.45%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cec4191...a67c763. Read the comment docs.

quinnj · 2018-03-02T13:12:04Z

in some cases the response headers arrive before the request body is sent

Is that just in 100-continue situations or are there other cases where this can happen?

quinnj · 2018-03-02T13:14:10Z

src/ssl.jl

-        ccall((:mbedtls_ssl_read, MBED_TLS),
-              Cint, (Ptr{Cvoid}, Ptr{Cvoid}, Csize_t), ctx.data, C_NULL, 0)
+        ctx.nonblocking = true
+        try


is this try-finally really needed? Can ccalls actually throw errors?

I'm not 100% sure.
But it's never good for a block of code to appear to restore the old state, but have a lurking possibility of exiting early.
(& It isn't a try/catch, its a try/finally)
I'm happy to leave a FIXME in the code suggesting that someone could take it out in the future if everything is working nicely...

samoconnor · 2018-03-02T13:14:31Z

in some cases the response headers arrive before the request body is sent

Is that just in 100-continue situations or are there other cases where this can happen?

No, it is quite common for the server to immediately respond with error response headers if it can tell it isn't happy based on the request headers. e.g. unsupported request, or bad credentials.

samoconnor · 2018-03-02T13:15:55Z

In fact there is a specific part in the RFC about how if the server says Connection: Close you must close before sending the remainder of the body. There is a test for this in test/loopback.jl

quinnj · 2018-03-02T13:17:56Z

src/ssl.jl

@@ -112,19 +114,23 @@ end
 function f_send(c_ctx, c_msg, sz)
    jl_ctx = unsafe_pointer_to_objref(c_ctx)
    jl_msg = unsafe_wrap(Array, c_msg, sz)
-    return Cint(write(jl_ctx, jl_msg))
+    return Cint(write(jl_ctx.bio, jl_msg))


is this right? we're calling write on the TCPSocket now instead of on the SSLContext?

Before the void* passed to the library was the socket.
Now it is the SSLContext. This is needed so we can see the nonblocking flag in the callback.
So, the .bio is needed now to get back to the socket.

Oh, that makes sense. 👍

quinnj · 2018-03-02T13:18:38Z

src/ssl.jl

    return Cint(n)
 end

 function set_bio!(ssl_ctx::SSLContext, jl_ctx::T) where {T<:IO}
    ssl_ctx.bio = jl_ctx
-    set_bio!(ssl_ctx, pointer_from_objref(jl_ctx), c_send[], c_recv[])
+    set_bio!(ssl_ctx, pointer_from_objref(ssl_ctx), c_send[], c_recv[])


what effect does this change end up having?

quinnj · 2018-03-02T13:22:47Z

Looks good. Thanks @samoconnor !

- Add a txcount field to the Request struct - Increment txcount after the body is written - Modify RetryRequest.isrecoverable to allow retry of non-idempotent requests if txcount is zero. - Modify StreamRequest to Wait for other pipelined reads to complete before sending a non-idempotent request body. This should decrease the chance of sending a POST body that ends up failing and can't be retried. - Remove yeild() after @async writebody. This should increase the chance that the startread() realises that the connection is dead before the body is written. Depends on: JuliaLang/MbedTLS.jl#124

samoconnor · 2018-03-04T11:35:39Z

Maybe more generally shouldn't be waiting for exactly sz bytes

yeah, I actually had code that did just that, but I thought it was best to keep the change small for now.

@vtjnash, @quinnj, see: #125

samoconnor mentioned this pull request Mar 2, 2018

Try to reduce risk of EOF error with POST requests JuliaWeb/HTTP.jl#211

Merged

quinnj reviewed Mar 2, 2018

View reviewed changes

quinnj merged commit 1835661 into JuliaLang:master Mar 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to fix deadlock #124

Try to fix deadlock #124

samoconnor commented Mar 2, 2018

samoconnor commented Mar 2, 2018

codecov-io commented Mar 2, 2018 •

edited

Loading

quinnj commented Mar 2, 2018

quinnj Mar 2, 2018 •

edited

Loading

samoconnor Mar 2, 2018

samoconnor commented Mar 2, 2018 •

edited

Loading

samoconnor commented Mar 2, 2018

quinnj Mar 2, 2018

samoconnor Mar 2, 2018

quinnj Mar 2, 2018

quinnj Mar 2, 2018

samoconnor Mar 2, 2018

quinnj commented Mar 2, 2018

samoconnor commented Mar 4, 2018

Try to fix deadlock #124

Try to fix deadlock #124

Conversation

samoconnor commented Mar 2, 2018

samoconnor commented Mar 2, 2018

codecov-io commented Mar 2, 2018 • edited Loading

Codecov Report

quinnj commented Mar 2, 2018

quinnj Mar 2, 2018 • edited Loading

Choose a reason for hiding this comment

samoconnor Mar 2, 2018

Choose a reason for hiding this comment

samoconnor commented Mar 2, 2018 • edited Loading

samoconnor commented Mar 2, 2018

quinnj Mar 2, 2018

Choose a reason for hiding this comment

samoconnor Mar 2, 2018

Choose a reason for hiding this comment

quinnj Mar 2, 2018

Choose a reason for hiding this comment

quinnj Mar 2, 2018

Choose a reason for hiding this comment

samoconnor Mar 2, 2018

Choose a reason for hiding this comment

quinnj commented Mar 2, 2018

samoconnor commented Mar 4, 2018

codecov-io commented Mar 2, 2018 •

edited

Loading

quinnj Mar 2, 2018 •

edited

Loading

samoconnor commented Mar 2, 2018 •

edited

Loading