Is there any best practise on choosing connection pool-size? #228

liufuyang · 2020-03-24T11:21:18Z

Hi there, is there any best practise on choosing connection pool-size?

We have a service deployed on K8s as a pod and the load on it shift from 200 req/sec to 20 req/sec, during different time of a day. And each request will generate 2 simultaneously Bigtable read query. And each read query, seeing from the service side, takes around 15ms to 25ms waiting time (or called read latency )

So do you think we'd better to specify the connection pool size and is there some way to calculate a best value? I saw some method mentioned here https://jobs.zalando.com/en/tech/blog/how-to-set-an-ideal-thread-pool-size/?gh_src=4n3gxh1, do you think it makes sense to use them? I wonder what is the default channel size if we don't set them here? (The k8s pod has 2 CPU as requests, and the k8s cluster is a 32 core virtual machine.)

What we've noticed that, it seems we have some latency issues when trying to set the pool size to 4 or 8, and while setting it to 32 seems much better overall latency graph (we haven't tried 16 yet). We previously had to set this pool size on a low value as we noticed that when the load was low, the latency was very high and we had to introduce some artificial load to help warm those channels constantly and if the pool size is large then the warming effect would not work well when warmed with not a very high load (something like 10 to 30 reqs per sec).

BTW, the way we setup the client setting is like this:

final BigtableDataSettings.Builder settingsBuilder =
          BigtableDataSettings.newBuilder()
              .setProjectId("promotions-targeting")
              .setInstanceId("feature-store")
              .setAppProfileId("default")
              .setRefreshingChannel(true);

      settingsBuilder
          .stubSettings()
          .setTransportChannelProvider(
              InstantiatingGrpcChannelProvider.newBuilder().setPoolSize(8).build());

      BigtableDataSettings build = settingsBuilder.build();

Perhaps a following up question:
Do you think we should also try play with InstantiatingGrpcChannelProvider.newBuilder().setExecutorProvider(xxx) to let the connection running on provided executor or? What is the type of the default executor and what's its thread number if we don't use this setExecutorProvider setting? Thank you.

The text was updated successfully, but these errors were encountered:

liufuyang · 2020-03-24T12:07:56Z

Or maybe a short questions is, when the BT client is created, can I do some log on the k8s pod to see all those detailed settings of the client?

igorbernstein2 · 2020-03-26T20:04:01Z

Hi,

We default the channel pool size to 2 x CPUs:

java-bigtable/google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/data/v2/stub/EnhancedBigtableStubSettings.java

Line 243 in 9a0f983

return 2 * Runtime.getRuntime().availableProcessors();

This approach can sometimes backfire in linux contains (& k8s by extension), which can present the # of cpus as 1.

Initializing each channel is quite expensive The channel pool uses round robin for channel selection, so the first 2x cpu request will spike in latency. So I would recommend to send some fake read requests on application startup to warm up the channel pool. Also if a channel is idle for awhile, it will automatically disconnect, causing the next request to pay the initialization cost. So for low qps applications, I would recommend to size the pool to a single connection. Also, channels are required to reconnect every hour, so you might see periodic latency spikes (they are a bit more evident for low qps applications). To mitigate this, I would recommend to enable the auto channel refresh:

BigtableDataSettings.newBuilder()
        .setProjectId("...")
        .setInstanceId("...")
        .setRefreshingChannel(true);

Please note that this feature is experimental and might change in the future.

There are 2 benefits to using multiple channels:

Each gRPC channel can have at most 100 outstanding requests at a time. Having more connections, will allow for greater throughput.
Each gRPC channel will be connected to a different bigtable frontend, which will spread the load of accepting requests amongst more machines.

A colleague of mine found that the default setting of 2x CPUs for the channel pool size should fit high qps usecases fairly well.

Finally, I think I think that adding a way to easily print the default settings is an excellent feature request and I will add a separate issue for it.

igorbernstein2 · 2020-03-26T20:07:58Z

I will close this issue for now. If there is anything else I can do to clarify further, please re-open

product-auto-label bot added the api: bigtable Issues related to the googleapis/java-bigtable API. label Mar 24, 2020

yoshi-automation added the triage me I really want to be triaged. label Mar 25, 2020

kolea2 added type: question Request for information or clarification. Not an issue. and removed triage me I really want to be triaged. labels Mar 25, 2020

igorbernstein2 mentioned this issue Mar 26, 2020

Implement toString for Bigtable*Settings #233

Closed

igorbernstein2 closed this as completed Mar 26, 2020

JustinBeckwith assigned igorbernstein2 Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any best practise on choosing connection pool-size? #228

Is there any best practise on choosing connection pool-size? #228

liufuyang commented Mar 24, 2020

liufuyang commented Mar 24, 2020

igorbernstein2 commented Mar 26, 2020

igorbernstein2 commented Mar 26, 2020

Is there any best practise on choosing connection pool-size? #228

Is there any best practise on choosing connection pool-size? #228

Comments

liufuyang commented Mar 24, 2020

liufuyang commented Mar 24, 2020

igorbernstein2 commented Mar 26, 2020

igorbernstein2 commented Mar 26, 2020