Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any best practise on choosing connection pool-size? #228

Closed
liufuyang opened this issue Mar 24, 2020 · 3 comments
Closed

Is there any best practise on choosing connection pool-size? #228

liufuyang opened this issue Mar 24, 2020 · 3 comments
Assignees
Labels
api: bigtable Issues related to the googleapis/java-bigtable API. type: question Request for information or clarification. Not an issue.

Comments

@liufuyang
Copy link

Hi there, is there any best practise on choosing connection pool-size?

We have a service deployed on K8s as a pod and the load on it shift from 200 req/sec to 20 req/sec, during different time of a day. And each request will generate 2 simultaneously Bigtable read query. And each read query, seeing from the service side, takes around 15ms to 25ms waiting time (or called read latency )
image

So do you think we'd better to specify the connection pool size and is there some way to calculate a best value? I saw some method mentioned here https://jobs.zalando.com/en/tech/blog/how-to-set-an-ideal-thread-pool-size/?gh_src=4n3gxh1, do you think it makes sense to use them? I wonder what is the default channel size if we don't set them here? (The k8s pod has 2 CPU as requests, and the k8s cluster is a 32 core virtual machine.)

What we've noticed that, it seems we have some latency issues when trying to set the pool size to 4 or 8, and while setting it to 32 seems much better overall latency graph (we haven't tried 16 yet). We previously had to set this pool size on a low value as we noticed that when the load was low, the latency was very high and we had to introduce some artificial load to help warm those channels constantly and if the pool size is large then the warming effect would not work well when warmed with not a very high load (something like 10 to 30 reqs per sec).

BTW, the way we setup the client setting is like this:

final BigtableDataSettings.Builder settingsBuilder =
          BigtableDataSettings.newBuilder()
              .setProjectId("promotions-targeting")
              .setInstanceId("feature-store")
              .setAppProfileId("default")
              .setRefreshingChannel(true);

      settingsBuilder
          .stubSettings()
          .setTransportChannelProvider(
              InstantiatingGrpcChannelProvider.newBuilder().setPoolSize(8).build());

      BigtableDataSettings build = settingsBuilder.build();

Perhaps a following up question:
Do you think we should also try play with InstantiatingGrpcChannelProvider.newBuilder().setExecutorProvider(xxx) to let the connection running on provided executor or? What is the type of the default executor and what's its thread number if we don't use this setExecutorProvider setting? Thank you.

@product-auto-label product-auto-label bot added the api: bigtable Issues related to the googleapis/java-bigtable API. label Mar 24, 2020
@liufuyang
Copy link
Author

Or maybe a short questions is, when the BT client is created, can I do some log on the k8s pod to see all those detailed settings of the client?

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Mar 25, 2020
@kolea2 kolea2 added type: question Request for information or clarification. Not an issue. and removed triage me I really want to be triaged. labels Mar 25, 2020
@igorbernstein2
Copy link
Contributor

Hi,

We default the channel pool size to 2 x CPUs:

This approach can sometimes backfire in linux contains (& k8s by extension), which can present the # of cpus as 1.

Initializing each channel is quite expensive The channel pool uses round robin for channel selection, so the first 2x cpu request will spike in latency. So I would recommend to send some fake read requests on application startup to warm up the channel pool. Also if a channel is idle for awhile, it will automatically disconnect, causing the next request to pay the initialization cost. So for low qps applications, I would recommend to size the pool to a single connection. Also, channels are required to reconnect every hour, so you might see periodic latency spikes (they are a bit more evident for low qps applications). To mitigate this, I would recommend to enable the auto channel refresh:

BigtableDataSettings.newBuilder()
        .setProjectId("...")
        .setInstanceId("...")
        .setRefreshingChannel(true);

Please note that this feature is experimental and might change in the future.

There are 2 benefits to using multiple channels:

  1. Each gRPC channel can have at most 100 outstanding requests at a time. Having more connections, will allow for greater throughput.

  2. Each gRPC channel will be connected to a different bigtable frontend, which will spread the load of accepting requests amongst more machines.

A colleague of mine found that the default setting of 2x CPUs for the channel pool size should fit high qps usecases fairly well.

Finally, I think I think that adding a way to easily print the default settings is an excellent feature request and I will add a separate issue for it.

@igorbernstein2
Copy link
Contributor

I will close this issue for now. If there is anything else I can do to clarify further, please re-open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/java-bigtable API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

4 participants