Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resteasy Reactive first POST with InputStream hang #15479

Closed
masini opened this issue Mar 4, 2021 · 57 comments · Fixed by #15553
Closed

Resteasy Reactive first POST with InputStream hang #15479

masini opened this issue Mar 4, 2021 · 57 comments · Fixed by #15553
Labels
area/rest kind/bug Something isn't working triage/needs-reproducer We are waiting for a reproducer.
Milestone

Comments

@masini
Copy link
Contributor

masini commented Mar 4, 2021

Describe the bug
Using resteasy-reactive the first POST with an input stream hang forever

Expected behavior
Return the payload

Actual behavior
The socket stay connected forever if the client doesn't implement a timeout

To Reproduce
Clone this repo:

https://github.com/masini/test-case-resteasy-reactive-first-post-bug

Steps to reproduce the behavior:

  1. mvn quarkus:dev
  2. attach e debugger
  3. run the curl in the readme.md

Configuration

####HTTP #########
quarkus.http.port=8082
quarkus.http.ssl-port=9081

quarkus.http.access-log.enabled=true
quarkus.http.root-path=/resources

####SECURITY Start#########
quarkus.http.auth.basic=true
quarkus.security.users.embedded.enabled=true
quarkus.security.users.embedded.plain-text=true
quarkus.security.users.embedded.users.admin=admin
quarkus.security.users.embedded.roles.admin=TERMINAL_ROLE,APP_ROLE,TECHNICAL_ROLE

quarkus.http.ssl.certificate.key-store-file=tls/keystore.p12
quarkus.http.ssl.certificate.key-store-password=infogroup
quarkus.http.ssl.certificate.key-store-file-type=PKCS12

Environment (please complete the following information):

  • uname -a: Darwin WMICTLM1P.lan 19.6.0 Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64 x86_64
  • java -version:
    openjdk version "11.0.10" 2021-01-19
    OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9)
    OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.10+9, mixed mode)
  • Quarkus version or git rev: 1.12.0.Final (also tried with 1.12.1.Final, pushed this version)
  • mvnw --version:
    Maven home: /Users/ictlm1/.m2/wrapper/dists/apache-maven-3.6.3-bin/1iopthnavndlasol9gbrbg6bf2/apache-maven-3.6.3
    Java version: 11.0.10, vendor: AdoptOpenJDK, runtime: /Applications/sviluppo/java/jdk-11.0.10+9/Contents/Home
    Default locale: en_GB, platform encoding: UTF-8
    OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"

Additional context
This is a reproducer of a real problem in a real application.

In the real application this happens also without debugger and with the quarkus-run.jar, the reproducer instead is able to reproduce only with a debugger attached and not always. I think it's a matter of timings, but I'm not sure, I need to debug more.

@masini masini added the kind/bug Something isn't working label Mar 4, 2021
@quarkus-bot
Copy link

quarkus-bot bot commented Mar 4, 2021

/cc @FroMage, @geoand, @stuartwdouglas

@quarkus-bot quarkus-bot bot added the area/rest label Mar 4, 2021
@masini
Copy link
Contributor Author

masini commented Mar 5, 2021

I added a new end-point that call the POST using a JAX-RS client and simulate the problem.

Just launch in debug and call this at first:

curl -v -k https://localhost:9081/resources/test

you will see vertx complaining for a blocked thread.

@geoand
Copy link
Contributor

geoand commented Mar 5, 2021

Thanks for the all information. I'll take a look soon.

@geoand
Copy link
Contributor

geoand commented Mar 5, 2021

I just tried the sample project and I was not able to reproduce the issue. Is there anything special I need to know?

@masini
Copy link
Contributor Author

masini commented Mar 5, 2021

Hi Georgios, pleased that you took charge of this issue.

In the real application is very easy, you can build the jar and run-it, not only on my workstation but all the guys in the team.

The reproducer project is a little trickier, you need to start it in debug (I added the suspended configuration in the plugin) and the start the debug to let the application startup up.

Doing this I'm able to reproduce it quite always.

Now I try to add some context giving you some dumps during debug.

@geoand
Copy link
Contributor

geoand commented Mar 5, 2021

I tried again multiple times and could not reproduce the issue.

Is it possible that in the real world your application is receiving traffic before Quarkus has completely started?

@masini
Copy link
Contributor Author

masini commented Mar 5, 2021 via email

@geoand
Copy link
Contributor

geoand commented Mar 5, 2021

@stuartwdouglas can you try and reproduce this please?

@masini
Copy link
Contributor Author

masini commented Mar 5, 2021

Thanks @geoand , now I ask a colleague out of context if he can reproduce.

@masini
Copy link
Contributor Author

masini commented Mar 5, 2021

I debugged and when I have a problem this method raise the exception:

  private void io.vertx.core.http.impl.HttpServerRequestImpl#checkEnded() {
    if (isEnded()) {
      throw new IllegalStateException("Request has already been read");
    }
  }

but is not true that the request has been fullfilled, the business method has not been called yet.

Then the exception handler inside the org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run() method doesn't print nothing, the thread is ready to process the next task, but the socket is still open !!

@masini
Copy link
Contributor Author

masini commented Mar 5, 2021

More context.

Adding the @Blocking annotation resolve the problem for the test-case and also for the real application.

I don't want to make it blocking, I'll debug a little bit more to understand what is going on.

@masini
Copy link
Contributor Author

masini commented Mar 5, 2021

Hi guys, I pushed a better reproducer.

Now you can run directly the generater quarkus-run.jar and call the reproducer URL:

[GET https://localhost:9081/resources/test](GET https://localhost:9081/resources/test)

You no longer need to attach a debugger to generate the problem.

I did this observing the problem in production, it happens only when the POST is called after a GET, of course without a body.

I'll check further if I understand what is going on, but I admit that I don't understand well what the reactive part is doing under the hood.

@geoand
Copy link
Contributor

geoand commented Mar 5, 2021

Thanks, I'll take a look at the updated reproducer soon

@geoand
Copy link
Contributor

geoand commented Mar 6, 2021

I was able to reproduce the problem (it only happens if you hit https://localhost:9081/resources/test). I'll look at it close on Monday unless @stuartwdouglas beats me to it.

@masini
Copy link
Contributor Author

masini commented Mar 6, 2021

Good @geoand , that is a "fake" resources that simulate what I do in production to have it, a GET and then a POST with a body.

Finger crossed, I'm worried about the stress test, I mean if this really happens only the first time or if can happen under heavy load.

I'll let you know.

@geoand
Copy link
Contributor

geoand commented Mar 8, 2021

There seems to be some kind of race condition in Vert.x or Vert.x + Quarkus because I can reproduce the same problem (sometimes) with plain reactive routes (just add the following class to the reproducer above, and add com.squareup.okhttp3:okhttp dependency):

import io.quarkus.vertx.web.Body;
import io.quarkus.vertx.web.Route;
import io.quarkus.vertx.web.RouteBase;
import io.smallrye.mutiny.Uni;
import io.vertx.ext.web.RoutingContext;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;

import java.io.IOException;
import java.util.Collections;
import java.util.List;

import static io.vertx.core.http.HttpMethod.GET;
import static io.vertx.core.http.HttpMethod.POST;

@RouteBase(path = "/test2", produces = "application/json")
public class GreetingRoute {

    @Route(methods = GET, path = "/1")
    public List<TestDTO> getAll() {
        System.out.println("Returning empty");
        return Collections.emptyList();
    }

    @Route(methods = POST, path = "/2")
    public Uni<TestDTO> save(@Body TestDTO n) {
        System.out.println("Received input");
        Uni<TestDTO> res = Uni.createFrom().item(n);
        System.out.println("Returning uni");
        return res;
    }

    @Route(methods = GET, path = "")
    public void call(RoutingContext rc) throws IOException {
        OkHttpClient client = new OkHttpClient();

        System.out.println("Performing GET call");

        Request request = new Request.Builder()
                .url("http://localhost:8082/resources/test2/1")
                .header("Authorization", "Basic YWRtaW46YWRtaW4=")
                .header("Accept", "application/json")
                .get()
                .build();
        client.newCall(request).execute();

        System.out.println("Done with GET call");

        System.out.println("Performing POST call");

        request = new Request.Builder()
                .url("http://localhost:8082/resources/test2/2")
                .header("Authorization", "Basic YWRtaW46YWRtaW4=")
                .header("Accept", "application/json")
                .post(RequestBody.create(okhttp3.MediaType.get("application/json"), "{}"))
                .build();
        client.newCall(request).execute();

        System.out.println("Done with POST call");

        rc.response().end();
    }

}

Then with the GET call to https://localhost:9081/resources/test2/ , the call hangs after http://localhost:8082/resources/test2/1 is completed and only gets unstuck when the read timeout of the call to http://localhost:8082/resources/test2/2 is hit.
At this point for some reason the POST call is handled by the server (as if something got unstucj) even though of course the client has already received a timeout.

I am not really sure what to make of this, @stuartwdouglas, @cescoffier if you have any ideas, I would be grateful :)

@cescoffier
Copy link
Member

@geoand in your example, you are blocking the event loop, as client.newCall(request).execute() is blocking, and the reactive route is executed on the event loop. Can you try with @Blocking?

@geoand
Copy link
Contributor

geoand commented Mar 8, 2021

@cescoffier yes I know. With @Blocking it always works.

But now that I think about it, what I am seeing might be normal... Because I am blocking the event loop, the request to the application (over the network) could be attempted to be handled by the same event-loop thread - which is why it sometimes works (when not using the same thread) and sometimes doesn't (when the attempt is made to handle it from the same event loop thread).
Does that make sense?

@geoand
Copy link
Contributor

geoand commented Mar 8, 2021

@masini does the reproducer represent the actual scenario you are seeing?
I mean, are you actually attempting to hit your application with HTTP traffic from the application itself?

@cescoffier
Copy link
Member

@geoand exactly, if you block the event loop, it can't make progress. The other endpoint may be called on the same event loop or not, that's random.

@geoand
Copy link
Contributor

geoand commented Mar 8, 2021

Yeah, I was just surprised that because there are multiple event loop threads, I didn't expect to run into the problem easily.

@masini
Copy link
Contributor Author

masini commented Mar 8, 2021

@geoand in the production scenario the HTTP call come from an IOT device, so may be the "test" endpoint is not good.

But I'm able to reproduce it calling in sequence "test/1" and "test/2", without the HTTP client.

@geoand
Copy link
Contributor

geoand commented Mar 8, 2021

But I'm able to reproduce it calling in sequence "test/1" and "test/2", without the HTTP client.

Yes, this is exactly my point. That the use of the client is problematic in itself.
In a normal (meaning one that does not block the event loop) call sequence, I cannot reproduce the problem at all

@masini
Copy link
Contributor Author

masini commented Mar 8, 2021

How can I help you ? This is really a problem from our side.

@geoand
Copy link
Contributor

geoand commented Mar 8, 2021

I understand, but unless we have a reliable and properly representative way to reproduce the problem, there is little we can do

@cescoffier cescoffier added the triage/needs-reproducer We are waiting for a reproducer. label Mar 8, 2021
@antoniomacri
Copy link

antoniomacri commented Mar 8, 2021

Hi @geoand, I also cloned and reproduced the error by following these steps:

  1. ./mvnw quarkus:dev
  2. I attached the debugger
curl -X GET --location "http://localhost:8082/resources/test/1" \
    --basic --user admin:admin
curl -X POST --location "http://localhost:8082/resources/test/2" \
    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    -d "{
          \"property\": \"prova\"
        }" \
    --basic --user admin:admin

@geoand
Copy link
Contributor

geoand commented Mar 8, 2021

I'll try it again, but that is what I was doing and could never reproduce the problem

This was referenced Mar 12, 2021
@geoand
Copy link
Contributor

geoand commented Mar 12, 2021

Here is what I got when running your script:

execution: local
     script: src/test/k6/test_get_and_then_post_with_body.js
     output: -

  scenarios: (100.00%) 1 scenario, 20 max VUs, 3m30s max duration (incl. graceful stop):
           * default: Up to 20 looping VUs for 3m0s over 3 stages (gracefulRampDown: 30s, gracefulStop: 30s)


running (3m00.0s), 00/20 VUs, 499120 complete and 0 interrupted iterations
default ✓ [======================================] 00/20 VUs  3m0s

     data_received..................: 117 MB 649 kB/s
     data_sent......................: 178 MB 990 kB/s
     http_req_blocked...............: avg=102.2µs  min=37.64µs  med=98.85µs  max=4.36ms   p(90)=128.64µs p(95)=136.66µs
     http_req_connecting............: avg=70.52µs  min=25.67µs  med=68.23µs  max=4.32ms   p(90)=89.45µs  p(95)=95.31µs 
   ✓ http_req_duration..............: avg=1.63ms   min=97.07µs  med=261.04µs max=122.49ms p(90)=5.66ms   p(95)=8.71ms  
       { expected_response:true }...: avg=1.63ms   min=97.07µs  med=261.04µs max=122.49ms p(90)=5.66ms   p(95)=8.71ms  
     http_req_failed................: 0.00%  ✓ 0    ✗ 998240
     http_req_receiving.............: avg=1.43ms   min=9.77µs   med=62.8µs   max=30.4ms   p(90)=5.41ms   p(95)=8.48ms  
     http_req_sending...............: avg=37.97µs  min=9.79µs   med=36.1µs   max=3.39ms   p(90)=49.21µs  p(95)=53.64µs 
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s      
     http_req_waiting...............: avg=162.16µs min=50.49µs  med=127.97µs max=105.5ms  p(90)=178.97µs p(95)=208.58µs
     http_reqs......................: 998240 5545.725425/s
     iteration_duration.............: avg=3.6ms    min=442.91µs med=2.16ms   max=207.5ms  p(90)=9.31ms   p(95)=11.11ms 
     iterations.....................: 499120 2772.862712/s
     vus............................: 1      min=1  max=20  
     vus_max........................: 20     min=20 max=20  

@masini
Copy link
Contributor Author

masini commented Mar 12, 2021

Thank you @geoand for trying.

I retried just now and I got a "lot" of timeout, but I notice that your environment is a lot faster than my dev workstation, what are you using ?

I'll try on linux, production server run on that and not Mac OS X.

@geoand
Copy link
Contributor

geoand commented Mar 13, 2021

Thank you @geoand for trying.

I retried just now and I got a "lot" of timeout, but I notice that your environment is a lot faster than my dev workstation, what are you using ?

It uses a Ryzen 3950x CPU

I'll try on linux, production server run on that and not Mac OS X.

👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rest kind/bug Something isn't working triage/needs-reproducer We are waiting for a reproducer.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants