grpc keepalive: test server-to-client HTTP/2 pings #8645

gyuho · 2017-10-04T22:18:56Z

Need add tests around #8535.

xiang90 · 2017-12-15T19:27:20Z

Can you fill in more details here?

maybe @spzala can give this a try.

gyuho · 2017-12-15T19:38:46Z

Similar to

https://github.com/coreos/etcd/blob/9deaee3ea1b1f0c4119aab865eceff38eb5d5ade/clientv3/integration/black_hole_test.go#L33-L49

By configuring server-side keepalive time parameters in https://godoc.org/google.golang.org/grpc/keepalive#ServerParameters, we want to test (either manually or write integration tests):

server pings client after ServerParameters.Time
client had no activity during ServerParameters.Timeout
server closes the connection to the client

spzala · 2018-01-07T23:27:21Z

WIP.

gyuho · 2018-01-12T17:49:50Z

@spzala We can test this manually first.

I would try

Set up --grpc-keepalive-min-time, --grpc-keepalive-interval, --grpc-keepalive-timeout to etcd server (ref *: configure server keepalive #8535 and gRPC doc).
Since this is server-to-client HTTP/2 ping, we should disable client-to-server ping and discard incoming packets from server in client-side (iptables, tc).
Server closes the connection to the client (confirm by just looking at the logs maybe?)
Client comes back (blackhole removed) but connection closed so cannot talk to server.

Then translate this to integration tests with wrapper (https://github.com/coreos/etcd/blob/master/clientv3/integration/black_hole_test.go or later #9081).

spzala · 2018-01-12T18:21:56Z

@gyuho thanks much!!

spzala · 2018-08-30T02:46:13Z

@gyuho hi, I am trying to see the behavior with setting --grpc-keepalive-min-time and --grpc-keepalive-interval, --grpc-keepalive-timeout following steps you suggested above but I guess I am probably not doing it right and not able to see errors like too many ping or even server closing the connection. I have a single node cluster, and I tried testing with using simple go program from another machine (i.e. client pings server every few seconds (set higher than timeout or keep alive interval) I sent you earlier but without any luck of reproducing this behavior. Appreciates any help from you to go forward with it:) Thanks!

gyuho · 2018-08-31T17:46:12Z

@spzala

probably not doing it right and not able to see errors like too many ping or even server closing the connection.

Have you dropped packets? We want to simulate faulty networks with iptables.

spzala · 2018-08-31T18:52:12Z

@gyuho hi, thanks, so I run this on client machine iptables -A INPUT -s <serverip> -j DROP and I see that stopped server message and then when I unblock it, the connection was back and started receiving messages. But no error message or disconnect from server.

gyuho · 2018-09-06T17:40:51Z

We expect

Server closes the connection to the client (confirm by just looking at the logs maybe?)

You may tune gRPC keepalive timeout in server-side. The disconnect may have been too short for server-side keepalive to kick-in.

gyuho · 2018-09-06T17:42:19Z

I would also add more debugging lines or adjust log levels in gRPC side. Server may not display all logs.

spzala · 2018-10-18T01:54:55Z

@gyuho hi, I am getting back from some vacation and work travel :). I could run tests manually and I think we should try adding two integration test - one for MinTime (i.e. goaway too many pings error) and second for Timeout.
While testing manually, Timeout works as expected to me with actual closing of connection. With MinTime I see couple of things:

I could only see the log messages if I set up env variables export GRPC_GO_LOG_VERBOSITY_LEVEL=2 and export GRPC_GO_LOG_SEVERITY_LEVEL=info on both server and client CLI. I also have to comment this line

etcd/clientv3/logger.go

Line 47 in 7a759c1

grpclog.SetLoggerV2(lg)

So to rely on log messages, we need to think more here.
I see following log messages after enabling logging as step one above,
Server side: ERROR: 2018/10/15 22:35:18 transport: Got too many pings from the client, closing the connection.
Cline side: INFO: 2018/10/115 22:35:18 Client received GoAway with http2.ErrCodeEnhanceYourCalm.

I will be working on creating the integration test but have another related question, you mentioned to use blackhole - can I use the Blackhole() or something similar on client (clientv3.New(ccfg))?, something similar to

etcd/clientv3/integration/black_hole_test.go

Line 91 in 7a759c1

clus.Members[1].Blackhole()

Per what I see is the method is used for cluster members. Thanks!

gyuho · 2018-10-18T04:03:06Z

@spzala Tests would be great. Thanks a lot!

spzala · 2018-10-18T17:33:20Z

@gyuho thanks, and qq can you please help me understand how you are thinking on using blackhole on client side?. Any thoughts would be helpful. Thanks!

Partially Fixes etcd-io#8645

stale · 2020-04-06T22:51:14Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

gyuho added the area/testing label Oct 4, 2017

xiang90 added this to the v3.4.0 milestone Nov 26, 2017

spzala self-assigned this Jan 7, 2018

spzala added the WIP label Jan 10, 2018

gyuho removed the WIP - DO NOT MERGE label Feb 25, 2018

spzala added a commit to spzala/etcd that referenced this issue Oct 24, 2018

ClientV3: Add integration test for server timeout

bd16442

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Oct 24, 2018

ClientV3: Add integration test for server timeout

7de789e

Partially Fixes etcd-io#8645

spzala mentioned this issue Oct 24, 2018

ClientV3: Add integration test for server timeout #10211

Closed

spzala added a commit to spzala/etcd that referenced this issue Oct 25, 2018

ClientV3: Add integration test for server timeout

fdbaaa6

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Oct 25, 2018

ClientV3: Add integration test for server timeout

afb62d2

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Nov 6, 2018

ClientV3: Add integration test for server timeout

4cb9fe8

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Nov 6, 2018

ClientV3: Add integration test for server timeout

c1b0979

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Nov 25, 2018

ClientV3: Add integration test for server timeout

b8acf37

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Nov 25, 2018

ClientV3: Add integration test for server timeout

eb552fe

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Nov 29, 2018

ClientV3: Add integration test for server timeout

335fd83

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018

ClientV3: Add integration test for server timeout

0187061

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018

ClientV3: Add integration test for server timeout

3f6d508

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018

ClientV3: Add integration test for server timeout

e48af62

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018

ClientV3: Add integration test for server timeout

01afcf1

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018

ClientV3: Add integration test for server timeout

1f95e95

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018

ClientV3: Add integration test for server timeout

de0ecf3

Partially Fixes etcd-io#8645

spzala added a commit to spzala/etcd that referenced this issue Dec 1, 2018

ClientV3: Add integration test for server timeout

332e0a8

Partially Fixes etcd-io#8645

gyuho modified the milestones: etcd-v3.4, etcd-v3.5 Aug 5, 2019

stale bot added the stale label Apr 6, 2020

stale bot closed this as completed May 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grpc keepalive: test server-to-client HTTP/2 pings #8645

grpc keepalive: test server-to-client HTTP/2 pings #8645

gyuho commented Oct 4, 2017

xiang90 commented Dec 15, 2017

gyuho commented Dec 15, 2017

spzala commented Jan 7, 2018

gyuho commented Jan 12, 2018 •

edited

Loading

spzala commented Jan 12, 2018

spzala commented Aug 30, 2018 •

edited

Loading

gyuho commented Aug 31, 2018

spzala commented Aug 31, 2018 •

edited

Loading

gyuho commented Sep 6, 2018

gyuho commented Sep 6, 2018

spzala commented Oct 18, 2018

gyuho commented Oct 18, 2018

spzala commented Oct 18, 2018

stale bot commented Apr 6, 2020

grpc keepalive: test server-to-client HTTP/2 pings #8645

grpc keepalive: test server-to-client HTTP/2 pings #8645

Comments

gyuho commented Oct 4, 2017

xiang90 commented Dec 15, 2017

gyuho commented Dec 15, 2017

spzala commented Jan 7, 2018

gyuho commented Jan 12, 2018 • edited Loading

spzala commented Jan 12, 2018

spzala commented Aug 30, 2018 • edited Loading

gyuho commented Aug 31, 2018

spzala commented Aug 31, 2018 • edited Loading

gyuho commented Sep 6, 2018

gyuho commented Sep 6, 2018

spzala commented Oct 18, 2018

gyuho commented Oct 18, 2018

spzala commented Oct 18, 2018

stale bot commented Apr 6, 2020

gyuho commented Jan 12, 2018 •

edited

Loading

spzala commented Aug 30, 2018 •

edited

Loading

spzala commented Aug 31, 2018 •

edited

Loading