Split configuration for open and idle database connections #16

andrew-edgar · 2017-01-13T09:24:12Z

Add the ability to separately configure the max open and max idle database connections.

cfdreddbot · 2017-01-13T09:24:13Z

Hey andrew-edgar!

Thanks for submitting this pull request! I'm here to inform the recipients of the pull request that you and the commit authors have already signed the CLA.

cf-gitbot · 2017-01-13T09:24:15Z

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/137585203

The labels on this github issue will be updated when the story is started.

andrew-edgar · 2017-01-13T09:25:17Z

I will create a new PR to support the two values in the bosh manifest.

andrew-edgar · 2017-01-13T09:31:05Z

Create PR for diego-release to correspond with this change ...

cloudfoundry/diego-release#258

emalm · 2017-01-16T06:35:50Z

Thanks, @andrew-edgar. Sounds reasonable as per discussion on https://github.com/cloudfoundry/bbs/issues/15. Prioritizing for the Diego team to pull in.

jvshahid · 2017-01-16T15:13:39Z

I'm just curious what's the use case for splitting the two properties ?

emalm · 2017-01-16T18:47:36Z

From private Slack communication with @andrew-edgar:

not being able to clean up idle connections is an issue. so we want an upper number (active connections at say 200) but outstanding idle at a lower number so that the pool can be managed better from the database perspective. creating 500 idle connections but only ever having 3-5 active is an issue. we should have a lower number closer to the number of constantly active connections

So sounds like it's valuable for DB performance to be able to tune that idle-connection pool size independently of the maximum number of open connections.

emalm · 2017-01-16T19:33:49Z

@andrew-edgar, do you have any more detailed information about the performance degradation you observed with the larger idle-connection pool? That could be useful information for the community in tuning their deployment configuration.

Thanks,
Eric

andrew-edgar · 2017-01-24T08:51:36Z

Here is a little more information from our DB Team as to the use case and issues seen in our postgres deployment.

From a postgres perspective we have to minimize the number of unused idle connections.
Connections by themselves consume resources so they are not for free. They consume RAM on the Server (some ram is consumed per connection basis even with no workload), TCP connections must been kept alive ..

Postgres by itself can sustain, with proper HW, more than 1000 max concurrent connections. The problem is... to do what ? Why I need a mainframe to maintain 1000 open idle connections?

The reason why we need to split min an max numbers is because:

min represent the number of connections which will be ALWAYS open no matter if they are used or not
max represent the number of max connections BBS can open under high peak

min connections is normally sized on the average workload we expect and max instead is sized to deal with peak workloads we can sustain.
We need BBS to start with opening the min number of connections at start time and then we need to scale based on increasing workload and decreasing when the workload goes away (basic principle of autoscale) (like UAA does) . I would expect that BBS code is properly dealing with a connection pool layer.. that permits to have an higher number of workers threads than db connections and avoiding a 1-1 mapping (similar to what is normally done with Java DataSource and DriverManager) with proper smart pre-allocation and de-allocation of connections as needed.

Coming to the numbers:
Actually we limited BBS to open 200 connections per node... So we have a total of 203 connections against Diego DB: 200 coming from Active BBS node and 3 from passive ones.
The highest number of connections BBS used (not idle transaction among these 203 connections) in the latest 3 days in production was 46 it was a rare peak as on average the num of not idle connections is 2.83

Postgres like MySQL settings must be related to to expected workload and size of VMs... it doesn't make sense to size DB on the max conn default value provided by BBS.
We need to identify a scale formula for entire Diego...
Based on actual data we have for production:

To maintain a Diego deployment with 300 cells we can have BBS deployed on a VM with 8 vCPU/ 32GB RAM that is using a max of 200 connections and a min of 50.

In this way I would expect on postgres:

peak of 200 BBS opened connections (under max theoretical supported workload)
average of 50 connections always maintained for BBS pool

So basically we want to be able to correctly configure the DB connections from BBS to handle peak as well as constant load correctly.

Thanks,

Andrew

jvshahid · 2017-01-24T19:13:14Z

hey @andrew-edgar, thanks for the detailed response. i'm still not convinced that this setting is necessary. responses inlined

From a postgres perspective we have to minimize the number of unused idle connections.
Connections by themselves consume resources so they are not for free. They consume RAM on the Server (some ram is consumed per connection basis even with no workload), TCP connections must been kept alive ..

If the mysql/postgres vm can handle (i.e. have enough ram) a burst of connections, i'm not sure why keep those connections around. Aside from the tcp keep alive, which i'm having trouble believing are causing issues, i don't understand the value. it's like using 50% of the ram allocated to the mysql/postgres vm and be able to burst to 90%. unless the infrastructure does some sort of over committing, what does that buy you ? even with over committing, what if there's a burst of connections and the infrastructure cannot allocate the extra ram ?

Postgres by itself can sustain, with proper HW, more than 1000 max concurrent connections. The problem is... to do what ? Why I need a mainframe to maintain 1000 open idle connections?

From my experience while running benchmarkbbs. There's a sweet spot somewhere in the range of 500 where the performance of the database is great. More connections and you reach some io limit. less connections and you're not utilizing all the resources properly.

Finally, the reason I'm bringing this up is that I don't think it is necessary to add another knob for someone to tweak. Thoughts ?

jvshahid · 2017-01-24T19:14:15Z

btw, this pr was already merged in c447d22

jvshahid · 2017-01-24T19:14:42Z

that said, i'm still interested in hearing your thoughts @andrew-edgar

emalm · 2017-01-24T19:35:12Z

We haven't cut a final release with this configuration exposed yet, so we can back it out if we think that's the appropriate thing to do.

jvshahid · 2017-01-25T16:16:36Z

I don't have a strong opinion against the pr. all i'm trying to do is eliminate another knob unless it is necessary. I had a brief conversation with @jfmyers9 yesterday and he convinced that there is a use case for it, namely shared database vm between CC and diego. but i'm not sure if this use cases is common or justify this change. i'll defer to you @ematpl to decide.

andrew-edgar · 2017-01-30T10:31:42Z

I agree with Jim. Our production we are actually sharing the UAADB and Diego in the same DB. So this is where we want to make sure we are not holding too many connections and resources in the DB that may effect the other DB running.

This in fact was the reason for the PR in the first place as our DB people found that the connections open for diego was effecting the performance of the DB machine as a whole which was doing both Diego and UAA.

I think if we "default" these values correctly it will also not be too much of a burden as this is definitely not something people would need to tweak unless their use case requires it. If you want to default them to 500 to match that would be fine as for us we would change that but for most customers with dedicated DB instances. They could leave it as the default values.

Thanks!

Andrew

Split configuration for open and idle database connections

ef57867

cf-gitbot added the unscheduled label Jan 13, 2017

andrew-edgar mentioned this pull request Jan 13, 2017

Enable setting idle and open connections in bbs cloudfoundry/diego-release#258

Closed

cf-gitbot added in progress delivered accepted and removed unscheduled in progress delivered labels Jan 20, 2017

jvshahid closed this Jan 24, 2017

cf-gitbot removed the accepted label Jan 24, 2017

Split configuration for open and idle database connections #16

Split configuration for open and idle database connections #16

Uh oh!

Conversation

andrew-edgar commented Jan 13, 2017

Uh oh!

cfdreddbot commented Jan 13, 2017

Uh oh!

cf-gitbot commented Jan 13, 2017

Uh oh!

andrew-edgar commented Jan 13, 2017

Uh oh!

andrew-edgar commented Jan 13, 2017

Uh oh!

emalm commented Jan 16, 2017

Uh oh!

jvshahid commented Jan 16, 2017

Uh oh!

emalm commented Jan 16, 2017

Uh oh!

emalm commented Jan 16, 2017

Uh oh!

andrew-edgar commented Jan 24, 2017

Uh oh!

jvshahid commented Jan 24, 2017

Uh oh!

jvshahid commented Jan 24, 2017

Uh oh!

jvshahid commented Jan 24, 2017

Uh oh!

emalm commented Jan 24, 2017

Uh oh!

jvshahid commented Jan 25, 2017

Uh oh!

andrew-edgar commented Jan 30, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants