Client#query promise stuck when connection closed #1500

rkaw92 · 2017-11-14T21:20:47Z

Hi, we've just encountered a problem which seems to predictably occur when a Client instance disconnects. It results in promises returned from Client#query never settling as fulfilled nor rejected (when using callbacks, those are not fired, either).

The below gist is a trivial example (with output) to reproduce the behavior. If you cut off the connection to PostgreSQL at some point (in my case, it's a Docker container with postgres:9.6 being stopped), the next .query() call rejects with an error, but the next call after that never settles its promise, as illustrated by the output.

https://gist.github.com/rkaw92/9cbd71fdb98d74f01c624bb2a003cd2a

You'll notice that subsequent queries are never executed, and they never reject, either - they just hang.

The issue has several important consequences:

It breaks libraries and applications that rely on the Promise always being resolved or rejected, such as knex.js, causing eventual client pool exhaustion
- Indeed, if you do suffer from a disconnect with knex,js and your connection pool is exhausted (because the .query() hangs the second time, never releasing the client), then restoring connectivity has no effect - the pool is forever exhausted. Note, however, that knex.js uses generic-pool rather than pg's default pool, and manages acquiring/releasing clients on its own. (This is another issue that I am currently pursuing - knex's connection recycling code is buggy at the moment.)
It leaves unresolved Promises, which are never a good thing and may trigger Node monitoring tools (and rightly so).
It most likely keeps adding queries to the internal Client queue indefinitely, leaking memory

I think I've pinned down this behavior to the following sequence of events (hold on, it's going to be a bumpy ride over client.js):

Assume the Client instance is connected and the connection has signalled readiness - the handler in line 141 has fired. Any connection error events will now be handled using connectedErrorHandler.
The connection goes down, thus emitting error (and end shortly after) on the con object.
connectedErrorHandler fires. In our example, we will assume that no query is currently active (if there is one, the situation converges to this case later, anyway), so the handler has no effect other than re-emitting the error event on the Client object.
The user sends a query using Client#query. The Query object is pushed to the internal query queue and we enter _pulseQueryQueue.
Since this.readyForQuery === true, the passed query object becomes this.activeQuery and is taken off the queue.
this.readyForQuery is set to false (so that subsequent query() calls will queue up, rather than be submitted immediately).
The query is submitted to the connection. The connection emits an error because it's closed!
We go back to connectedErrorHandler, but this time, there is an active connection - error handling is delegated to activeQuery.handleError. This is how we get our query callback to fire or our promise to rejected.
It's all good for now, right? Not exactly, because even though our failed query has left the queue, this.readyForQuery has not been reset.
The user sends another query with Client#query. The query is again added to the queue (it now contains 1 element - the new query), but this time, in _pulseQueryQueue, we find that readyForQuery === false, so we do not process the queue. We exit the function, doing nothing.
We're now in a limbo - no new queries are going to get processed because we're not readyForQuery, and we never will be (since the connection is dead). Additionally, new queries are getting pushed into the queue at every call.

Trying to think of a solution right now. Perhaps having a "dead" flag that rejects all subsequent query calls would do. One other thing I can think of would be resetting readyForQuery every time we get a connection error, but it does not sound like a wise thing to do.

In any case, unless callbacks/promises returned from Client#query() fire every time eventually, there is a demonstrable hazard of getting hung calls in user apps and exhausted client pools in knex.js.

The text was updated successfully, but these errors were encountered:

charmander · 2017-11-14T23:43:28Z

Duplicate of #1454

charmander added bug duplicate labels Nov 14, 2017

charmander closed this as completed Nov 14, 2017

charmander marked this as a duplicate of #1454 Nov 14, 2017

rkaw92 mentioned this issue Nov 15, 2017

Queued query errors #1503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client#query promise stuck when connection closed #1500

Client#query promise stuck when connection closed #1500

rkaw92 commented Nov 14, 2017 •

edited

Loading

charmander commented Nov 14, 2017 •

edited

Loading

Client#query promise stuck when connection closed #1500

Client#query promise stuck when connection closed #1500

Comments

rkaw92 commented Nov 14, 2017 • edited Loading

charmander commented Nov 14, 2017 • edited Loading

rkaw92 commented Nov 14, 2017 •

edited

Loading

charmander commented Nov 14, 2017 •

edited

Loading