-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stress test failures in Java client #1812
Comments
Those just mean a full GC started. The only thing that message means is that garbage is being created too fast (perhaps due to lots of small requests). For the stress test we probably need to set Xms and Xmx as well as ParNewGen |
@sreecha @carl-mastrangelo it's actually a minor GC :-), but seems like nothing out of the ordinary? We explicitly turned on GC logging, which is why it's shown here. Pause times seem sufficiently low for us not to worry about them. I think this can be closed? Only thing that seems strange is that the heap grows without a Full GC. I always thought resizing the heap requires a Full GC. |
Ok. I didn't realize these were just ParNew logs. (Note that this happened when I ran about 10 stress clients each with the following settings:
and 4 of the clients failed. I am pasting the error messages from one of the clients..)
|
I think it would be a lot easier if you can try to repro this by running the stress test framework yourself. There is a one-time setup you need to do (to set up a google cloud platform account and install some python libraries on your machine) but I think it is totally worth it to debug any future stress-related issues The instructions are here: |
@buchgr, I just realized that you may not have a Google cloud platform account - so please ignore my previous comment for now. I'll chat with the java team here and get back to you.. thanks |
@buchgr Ideally the heap will never change size. All the advice internally is to lock NewSize and MaxNewSize to the same value, and also lock Xmx and Xms. |
@sreecha I believe the error messages are due to timeouts. See, the stress test client reuses the interop integration tests under hood, and these tests verify that e.g. a call ends within 5 seconds. That's where the error seems to come from.
In your opinion, is it possible that for some calls the latency is > 5 seconds? Also, yes I don't have a GCE account. I used to have one. Who should I talk to, to get one? I am happy to set up a test env. In the meantime, I ll submit a PR to refactor the interop tests to exlude these timeouts when using the stress test client. |
@sreecha can you retry with current master please? |
Steps to repro:
(I repro'ed this using a docker image. This may repro without the docker image too - I am not sure)
Get the
grpc
repo (you don't have to build it. Just need it for the docker scripts)This will take a few minutes to build (if it fails in the middle, just retry one more time. Sometimes, it is flaky)
root@b44016cf1c83:/# /var/local/git/grpc-java/interop-testing/build/install/grpc-interop-testing/bin/test-server --port=8080 --use_tls=false &
You will see the following errors:
The text was updated successfully, but these errors were encountered: