Skip to content

Conversation

@andrew-edgar
Copy link
Contributor

Add the ability to separately configure the max open and max idle database connections.

@cfdreddbot
Copy link

Hey andrew-edgar!

Thanks for submitting this pull request! I'm here to inform the recipients of the pull request that you and the commit authors have already signed the CLA.

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/137585203

The labels on this github issue will be updated when the story is started.

@andrew-edgar
Copy link
Contributor Author

I will create a new PR to support the two values in the bosh manifest.

@andrew-edgar
Copy link
Contributor Author

Create PR for diego-release to correspond with this change ...

cloudfoundry/diego-release#258

@emalm
Copy link
Contributor

emalm commented Jan 16, 2017

Thanks, @andrew-edgar. Sounds reasonable as per discussion on https://github.com/cloudfoundry/bbs/issues/15. Prioritizing for the Diego team to pull in.

@jvshahid
Copy link
Contributor

I'm just curious what's the use case for splitting the two properties ?

@emalm
Copy link
Contributor

emalm commented Jan 16, 2017

From private Slack communication with @andrew-edgar:

not being able to clean up idle connections is an issue. so we want an upper number (active connections at say 200) but outstanding idle at a lower number so that the pool can be managed better from the database perspective. creating 500 idle connections but only ever having 3-5 active is an issue. we should have a lower number closer to the number of constantly active connections

So sounds like it's valuable for DB performance to be able to tune that idle-connection pool size independently of the maximum number of open connections.

@emalm
Copy link
Contributor

emalm commented Jan 16, 2017

@andrew-edgar, do you have any more detailed information about the performance degradation you observed with the larger idle-connection pool? That could be useful information for the community in tuning their deployment configuration.

Thanks,
Eric

@andrew-edgar
Copy link
Contributor Author

Here is a little more information from our DB Team as to the use case and issues seen in our postgres deployment.

From a postgres perspective we have to minimize the number of unused idle connections.
Connections by themselves consume resources so they are not for free. They consume RAM on the Server (some ram is consumed per connection basis even with no workload), TCP connections must been kept alive ..

Postgres by itself can sustain, with proper HW, more than 1000 max concurrent connections. The problem is... to do what ? Why I need a mainframe to maintain 1000 open idle connections?

The reason why we need to split min an max numbers is because:

  • min represent the number of connections which will be ALWAYS open no matter if they are used or not
  • max represent the number of max connections BBS can open under high peak

min connections is normally sized on the average workload we expect and max instead is sized to deal with peak workloads we can sustain.
We need BBS to start with opening the min number of connections at start time and then we need to scale based on increasing workload and decreasing when the workload goes away (basic principle of autoscale) (like UAA does) . I would expect that BBS code is properly dealing with a connection pool layer.. that permits to have an higher number of workers threads than db connections and avoiding a 1-1 mapping (similar to what is normally done with Java DataSource and DriverManager) with proper smart pre-allocation and de-allocation of connections as needed.

Coming to the numbers:
Actually we limited BBS to open 200 connections per node... So we have a total of 203 connections against Diego DB: 200 coming from Active BBS node and 3 from passive ones.
The highest number of connections BBS used (not idle transaction among these 203 connections) in the latest 3 days in production was 46 it was a rare peak as on average the num of not idle connections is 2.83

Postgres like MySQL settings must be related to to expected workload and size of VMs... it doesn't make sense to size DB on the max conn default value provided by BBS.
We need to identify a scale formula for entire Diego...
Based on actual data we have for production:

To maintain a Diego deployment with 300 cells we can have BBS deployed on a VM with 8 vCPU/ 32GB RAM that is using a max of 200 connections and a min of 50.

In this way I would expect on postgres:

  • peak of 200 BBS opened connections (under max theoretical supported workload)
  • average of 50 connections always maintained for BBS pool

So basically we want to be able to correctly configure the DB connections from BBS to handle peak as well as constant load correctly.

Thanks,

Andrew

@jvshahid
Copy link
Contributor

hey @andrew-edgar, thanks for the detailed response. i'm still not convinced that this setting is necessary. responses inlined

From a postgres perspective we have to minimize the number of unused idle connections.
Connections by themselves consume resources so they are not for free. They consume RAM on the Server (some ram is consumed per connection basis even with no workload), TCP connections must been kept alive ..

If the mysql/postgres vm can handle (i.e. have enough ram) a burst of connections, i'm not sure why keep those connections around. Aside from the tcp keep alive, which i'm having trouble believing are causing issues, i don't understand the value. it's like using 50% of the ram allocated to the mysql/postgres vm and be able to burst to 90%. unless the infrastructure does some sort of over committing, what does that buy you ? even with over committing, what if there's a burst of connections and the infrastructure cannot allocate the extra ram ?

Postgres by itself can sustain, with proper HW, more than 1000 max concurrent connections. The problem is... to do what ? Why I need a mainframe to maintain 1000 open idle connections?

From my experience while running benchmarkbbs. There's a sweet spot somewhere in the range of 500 where the performance of the database is great. More connections and you reach some io limit. less connections and you're not utilizing all the resources properly.

Finally, the reason I'm bringing this up is that I don't think it is necessary to add another knob for someone to tweak. Thoughts ?

@jvshahid
Copy link
Contributor

btw, this pr was already merged in c447d22

@jvshahid jvshahid closed this Jan 24, 2017
@jvshahid
Copy link
Contributor

that said, i'm still interested in hearing your thoughts @andrew-edgar

@emalm
Copy link
Contributor

emalm commented Jan 24, 2017

We haven't cut a final release with this configuration exposed yet, so we can back it out if we think that's the appropriate thing to do.

@jvshahid
Copy link
Contributor

I don't have a strong opinion against the pr. all i'm trying to do is eliminate another knob unless it is necessary. I had a brief conversation with @jfmyers9 yesterday and he convinced that there is a use case for it, namely shared database vm between CC and diego. but i'm not sure if this use cases is common or justify this change. i'll defer to you @ematpl to decide.

@andrew-edgar
Copy link
Contributor Author

I agree with Jim. Our production we are actually sharing the UAADB and Diego in the same DB. So this is where we want to make sure we are not holding too many connections and resources in the DB that may effect the other DB running.

This in fact was the reason for the PR in the first place as our DB people found that the connections open for diego was effecting the performance of the DB machine as a whole which was doing both Diego and UAA.

I think if we "default" these values correctly it will also not be too much of a burden as this is definitely not something people would need to tweak unless their use case requires it. If you want to default them to 500 to match that would be fine as for us we would change that but for most customers with dedicated DB instances. They could leave it as the default values.

Thanks!

Andrew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants