You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, we've just encountered a problem which seems to predictably occur when a Client instance disconnects. It results in promises returned from Client#query never settling as fulfilled nor rejected (when using callbacks, those are not fired, either).
The below gist is a trivial example (with output) to reproduce the behavior. If you cut off the connection to PostgreSQL at some point (in my case, it's a Docker container with postgres:9.6 being stopped), the next .query() call rejects with an error, but the next call after that never settles its promise, as illustrated by the output.
You'll notice that subsequent queries are never executed, and they never reject, either - they just hang.
The issue has several important consequences:
It breaks libraries and applications that rely on the Promise always being resolved or rejected, such as knex.js, causing eventual client pool exhaustion
Indeed, if you do suffer from a disconnect with knex,js and your connection pool is exhausted (because the .query() hangs the second time, never releasing the client), then restoring connectivity has no effect - the pool is forever exhausted. Note, however, that knex.js uses generic-pool rather than pg's default pool, and manages acquiring/releasing clients on its own. (This is another issue that I am currently pursuing - knex's connection recycling code is buggy at the moment.)
It leaves unresolved Promises, which are never a good thing and may trigger Node monitoring tools (and rightly so).
It most likely keeps adding queries to the internal Client queue indefinitely, leaking memory
I think I've pinned down this behavior to the following sequence of events (hold on, it's going to be a bumpy ride over client.js):
Assume the Client instance is connected and the connection has signalled readiness - the handler in line 141 has fired. Any connection error events will now be handled using connectedErrorHandler.
The connection goes down, thus emitting error (and end shortly after) on the con object.
connectedErrorHandler fires. In our example, we will assume that no query is currently active (if there is one, the situation converges to this case later, anyway), so the handler has no effect other than re-emitting the error event on the Client object.
Since this.readyForQuery === true, the passed query object becomes this.activeQuery and is taken off the queue.
this.readyForQuery is set to false (so that subsequent query() calls will queue up, rather than be submitted immediately).
The query is submitted to the connection. The connection emits an error because it's closed!
We go back to connectedErrorHandler, but this time, there is an active connection - error handling is delegated to activeQuery.handleError. This is how we get our query callback to fire or our promise to rejected.
It's all good for now, right? Not exactly, because even though our failed query has left the queue, this.readyForQuery has not been reset.
The user sends another query with Client#query. The query is again added to the queue (it now contains 1 element - the new query), but this time, in _pulseQueryQueue, we find that readyForQuery === false, so we do not process the queue. We exit the function, doing nothing.
We're now in a limbo - no new queries are going to get processed because we're not readyForQuery, and we never will be (since the connection is dead). Additionally, new queries are getting pushed into the queue at every call.
Trying to think of a solution right now. Perhaps having a "dead" flag that rejects all subsequent query calls would do. One other thing I can think of would be resetting readyForQuery every time we get a connection error, but it does not sound like a wise thing to do.
In any case, unless callbacks/promises returned from Client#query() fire every time eventually, there is a demonstrable hazard of getting hung calls in user apps and exhausted client pools in knex.js.
The text was updated successfully, but these errors were encountered:
Hi, we've just encountered a problem which seems to predictably occur when a Client instance disconnects. It results in promises returned from
Client#query
never settling as fulfilled nor rejected (when using callbacks, those are not fired, either).The below gist is a trivial example (with output) to reproduce the behavior. If you cut off the connection to PostgreSQL at some point (in my case, it's a Docker container with
postgres:9.6
being stopped), the next.query()
call rejects with an error, but the next call after that never settles its promise, as illustrated by the output.https://gist.github.com/rkaw92/9cbd71fdb98d74f01c624bb2a003cd2a
You'll notice that subsequent queries are never executed, and they never reject, either - they just hang.
The issue has several important consequences:
generic-pool
rather than pg's default pool, and manages acquiring/releasing clients on its own. (This is another issue that I am currently pursuing - knex's connection recycling code is buggy at the moment.)I think I've pinned down this behavior to the following sequence of events (hold on, it's going to be a bumpy ride over client.js):
connectedErrorHandler
.error
(andend
shortly after) on thecon
object.connectedErrorHandler
fires. In our example, we will assume that no query is currently active (if there is one, the situation converges to this case later, anyway), so the handler has no effect other than re-emitting theerror
event on the Client object.Client#query
. The Query object is pushed to the internal query queue and we enter_pulseQueryQueue
.this.readyForQuery === true
, the passed query object becomesthis.activeQuery
and is taken off the queue.this.readyForQuery
is set to false (so that subsequentquery()
calls will queue up, rather than be submitted immediately).connectedErrorHandler
, but this time, there is an active connection - error handling is delegated to activeQuery.handleError. This is how we get our query callback to fire or our promise to rejected.this.readyForQuery
has not been reset.Client#query
. The query is again added to the queue (it now contains 1 element - the new query), but this time, in_pulseQueryQueue
, we find thatreadyForQuery === false
, so we do not process the queue. We exit the function, doing nothing.readyForQuery
, and we never will be (since the connection is dead). Additionally, new queries are getting pushed into the queue at every call.Trying to think of a solution right now. Perhaps having a "dead" flag that rejects all subsequent
query
calls would do. One other thing I can think of would be resettingreadyForQuery
every time we get a connection error, but it does not sound like a wise thing to do.In any case, unless callbacks/promises returned from
Client#query()
fire every time eventually, there is a demonstrable hazard of getting hung calls in user apps and exhausted client pools in knex.js.The text was updated successfully, but these errors were encountered: