-
Notifications
You must be signed in to change notification settings - Fork 833
Replace StaticPool with QueuePool and add robust connection pooling options #7829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace StaticPool with QueuePool and add robust connection pooling options #7829
Conversation
… pool_recycle to ensure db connection alive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change @seongsukwon-moreh! I left a couple of minor comments.
One interesting additional improvement to consider (on this PR or otherwise) is how the engine is created in async case, which still uses rather inefficient NullPool:
if async_engine:
conn_string = conn_string.replace('postgresql://',
'postgresql+asyncpg://')
# This is an AsyncEngine, instead of a (normal, synchronous) Engine,
# so we should not put it in the cache. Instead, just return.
return sqlalchemy_async.create_async_engine(
conn_string, poolclass=sqlalchemy.NullPool)
We won't be able to use QueuePool for it because QueuePool does not support asyncio, but SqlAlchemy does provide AsyncAdaptedQueuePool for this purpose.
| sqlalchemy.create_engine( | ||
| conn_string, poolclass=sqlalchemy.pool.StaticPool)) | ||
| conn_string, | ||
| poolclass=sqlalchemy.pool.QueuePool, | ||
| pool_size=1, | ||
| max_overflow=5, | ||
| pool_pre_ping=True, | ||
| pool_recycle=1800)) | ||
| else: | ||
| _postgres_engine_cache[conn_string] = ( | ||
| sqlalchemy.create_engine( | ||
| conn_string, | ||
| poolclass=sqlalchemy.pool.QueuePool, | ||
| size=_max_connections, | ||
| max_overflow=0)) | ||
| pool_size=_max_connections, | ||
| max_overflow=0, | ||
| pool_pre_ping=True, | ||
| pool_recycle=1800)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're using QueuePool for both cases, we should be able to collapse the if statement down to couple of parameters (pool_size and max_overflow) instead of duplicating the entire statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that could be interesting re: comment above: For the case where _max_connections > 1, we can perhaps dynamically adjust max_overflow to be max(0, 5 - _max_connections) so we always guarantee the pool can scale up to a certain number of connections (in this case 5) without providing unnecessary overflow in case where one isn't needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code based on your suggestions. Please review it.
|
/smoke-test --aws -k basic --postgres (passed) |
|
/smoke-test --aws -k basic --postgres |
|
Thanks! Merging now. |
This PR addresses database connection stability issues in
db_utils.py. The server was frequently experiencing intermittent connection failures, most notably:sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly...This error also sometimes manifested on the web dashboard:
The changes are as follows:
Replace
StaticPoolwithQueuePoolfor_max_connections == 1:pool_pre_ping=TruetoStaticPoolexposed this issue, immediately causingpsycopg2.ProgrammingError: set_session cannot be used inside a transaction.StaticPoolwithQueuePool(pool_size=1). This maintains the intent of using a single connection "at rest" while gaining the critical safety ofQueuePool's connection reset (e.g.,rollback()) logic on check-in.Introduce
max_overflow=5for the_max_connections == 1case:QueuePool(pool_size=1, max_overflow=0)revealed a new issue:sqlalchemy.exc.TimeoutError: QueuePool limit of size 1 overflow 0 reached.... This proves the application does require more than one concurrent connection during brief periods of load.max_overflow=5. This respects the spirit of_max_connections=1(keeping a small resting pool) while allowing the application to handle real-world concurrent bursts (up to 1+5=6 connections) without timing out.Add Stability Options (
pre_ping,recycle) to allQueuePools:pool_pre_ping=True(to validate connections before use) andpool_recycle=1800(to refresh connections every 30 minutes) to allQueuePoolconfigurations.Standardize Parameter Name (
size->pool_size):elseblock used the parametersize. This causes aTypeErrorif mixed with otherpool_prefixed arguments (likepool_recycle).pool_size=_max_connectionsfor consistency and to prevent parameter-mixing errors.Tested (run the relevant ones):
bash format.sh/smoke-test(CI) orpytest tests/test_smoke.py(local)/smoke-test -k test_name(CI) orpytest tests/test_smoke.py::test_name(local)/quicktest-core(CI) orpytest tests/smoke_tests/test_backward_compat.py(local)