Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queued query errors #1503

Merged
merged 11 commits into from
Oct 3, 2018
Merged

Queued query errors #1503

merged 11 commits into from
Oct 3, 2018

Conversation

charmander
Copy link
Collaborator

@charmander charmander commented Nov 15, 2017

Separates socket errors from error messages, sends socket errors to all queries in the queue, marks clients as unusable after socket errors.

This is not very pleasant but should maintain backwards compatibility…?

I don’t know if this is how you want to handle the issue, but it should address #1454 and #632.

Maybe in a future version the query queue and clients pending connect() won’t need to exist at all?

@charmander charmander requested a review from brianc November 15, 2017 07:06
@charmander
Copy link
Collaborator Author

(Tests are failing because this is missing the changes for lib/native. Does it still offer a significant performance advantage, by the way?)

@rkaw92
Copy link

rkaw92 commented Nov 15, 2017

Hi, just read your test case. Can you please test with two or more queries after the connection is downed? As reported in #1500 , the first query after the connection fails was already emitting an error. It was the second query that was locked forever.

https://gist.github.com/rkaw92/9cbd71fdb98d74f01c624bb2a003cd2a#file-client-query-disconnect-js

connection.emit('error', err)
connection.emit('readyForQuery')
return
return new Error('A query must have either text or a name. Supplying neither is unsupported.')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an error doesn't seem like a widely-used practice. Why are we not throwing it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have to be caught to be passed to the callback and it wouldn’t be easy to distinguish from other errors at that point.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you mention it, do we need to distinguish it from other errors? Reporting an error that occurred in the stack when calling Query.submit as a "query error" sounds rather intuitive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn’t be any other errors so I prefer not to treat those like they’re expected if they do occur.

@charmander
Copy link
Collaborator Author

@rkaw92 In your case, the connection was terminated between queries. In the test, the connection is terminated during the first query, and there is no subsequent readyForQuery message.

lib/client.js Outdated
@@ -389,25 +422,36 @@ Client.prototype.query = function (config, values, callback) {
query._result._getTypeParser = this._types.getTypeParser.bind(this._types)
}

if (!this._queryable) {
query.handleError(new Error('Client has encountered a connection error and is not queryable'), this.connection)
return
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning undefined causes breakage for Promise users when calling .query() after the connection has gone down. Should return result instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rkaw92
Copy link

rkaw92 commented Nov 15, 2017

Something is not quite right, still. Tested with my original gist:

  • The first query after the failing connection does reject with an error - however:
  • The second query after the failing connection encounters !this._queryable and this causes an invalid return (which you've now fixed just a moment ago).

It appears I was wrong in my initial assessment of the sequence of events in case the connection goes down (ends) between queries (activeQuery === null) - at the time the connection ends (assuming it goes down gracefully, i.e. TCP RST, no timeout), there is no immediate error event, just an end. Here's what happens with the code from your branch:

  • Connection ends, emitting an end event
  • The con.once('end', ...) handler fires
    • error event is emitted on the Client object
  • Note that there was no error event on the actual connection object, nor an errorMessage (because the connection just went away without sending us anything)
    • We still have not entered connectedErrorMessageHandler nor connectedErrorHandler
  • User fires query1 - this does not trip the !this.queryable check so we go forward and post the query to the connection stream
  • Now we get a real error from the stream (because we tried to write() to it), so we run connectedErrorHandler and it sets this.queryable to false. Since a query is active, it gets a legit promise rejection.
  • User fires query2 - this time, the !this.queryable check causes an early exit.

After your return fix, it seems like the whole thing should work, but it may be a bit surprising to users that the first query after the connection goes down behaves differently than the second one (i.e. different error). What do you think?

@charmander
Copy link
Collaborator Author

I think a user should be checking error codes if they expect certain types of errors and not sending a second query after the connection failed. The end listener for the Connection not setting _queryable = false is an oversight, though.

@rkaw92
Copy link

rkaw92 commented Nov 15, 2017

Nice. All queries are now receiving the same error in the scenario from my gist.

@charmander
Copy link
Collaborator Author

@brianc, is the right way to implement this for pg-native to change https://github.com/brianc/node-pg-native/blob/25d12eb2a26d10a8c32036acf19253e131c266f5/index.js#L208 to a query error and all other errors to fatal errors?

@brianc
Copy link
Owner

brianc commented Nov 17, 2017

@charmander - yah I think so. I know it's kinda a pain to have to go change something over in pg-native and the over here as well. :( Maybe one day we can combine things into a monorepo and use yarn workspaces or something to manage it. Thanks for doing this by the way - this is awesome. 🤗

…-level errors

Separates socket errors from error messages, sends socket errors to all queries in the queue, marks clients as unusable after socket errors.

This is not very pleasant but should maintain backwards compatibility…?
This doesn’t match the original behaviour of the type errors, but it’s correct.
@rkaw92
Copy link

rkaw92 commented Feb 1, 2018

Guys, where are we on this? Any roadblocks?

brianc pushed a commit to brianc/node-pg-native that referenced this pull request May 4, 2018
@brianc
Copy link
Owner

brianc commented May 4, 2018

@charmander I updated pg-native w/ the breaking change to emit errors more regularly here. Looks like the tests are still failing here for some reason.

@charmander charmander force-pushed the queued-query-errors branch from 705d89c to 913a7e2 Compare May 5, 2018 03:31
@charmander
Copy link
Collaborator Author

@brianc Tests fixed. Thanks for all the work today!

@brianc
Copy link
Owner

brianc commented May 7, 2018

Ah nice! I'll merge this tonight, put together a final release for the 7.x branch, and the major version bump for the small change to not being able to call pg.Pool() without the new keyword from #1541

@aelath
Copy link

aelath commented Jul 13, 2018

👋Wanted to check in on progress of this PR. I've been following an issue around (#1105) and I think this PR might fix our issues we are seeing in production. When do you expect to merge this? Thanks!

Conflicts:
	test/integration/connection-pool/error-tests.js
@charmander
Copy link
Collaborator Author

@brianc ^?

@vitaly-t
Copy link
Contributor

@brianc What happened here? Why wasn't it merged?

@charmander

Copy link
Owner

@brianc brianc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes I fall off the edge of the earth. SO many github notification emails it's too hard to keep up. I apologize for letting this slip for so long. 😬 Merging & pushing a new minor version today.

@vitaly-t
Copy link
Contributor

vitaly-t commented Oct 9, 2018

Something makes me scratch my head in these changes....

It removed the callback verification inside Client.prototype._connect, presuming that the value is always set, since the method is always called from Client.prototype.connect:

process.nextTick(() => {
      callback(err)
    })

However, the internal method sets callback to null, just as it receives readyForQuery notification:

callback = null

And since the direct call was replaced to use nextTick, it is now entirely possible that when we get inside that tick, event readyForQuery has already fired, and then calling unverified callback() inside the tick will throw an error, pending a race condition.

@brianc
Copy link
Owner

brianc commented Oct 9, 2018

yeah that might be possible - could you write a failing test for this? I'll fix it in a PR if you can reproduce.

@vitaly-t
Copy link
Contributor

vitaly-t commented Oct 9, 2018

@brianc That's the known specific of race conditions, they are almost impossible to test in a controlled environment. I don't think I can create one, sorry.

I can only logically deduct that such a situation can occur under a heavy load. Time will tell, just as people start upgrading...

It would be a safe bet to return the check, that's for sure...

process.nextTick(() => {
    if(callback) {
          callback(err)
        }
    })

https://github.com/brianc/node-postgres/blob/master/lib/client.js#L77

This would also make internal method Client.prototype._connect consistent, as it does this verification in all other instances where the value of callback is used.

@brianc
Copy link
Owner

brianc commented Oct 9, 2018

I thought about this a bit more - I don't think its possible. Each call to client.connect creates a new function scope. If this 'error you have already used this client' error fires it will never reach the point in the code where readyForQuery is fired, and vise versa. It's also not reproducible even through manually firing readyForQuery or manipulating the event listener collection on client.connection.

@brianc
Copy link
Owner

brianc commented Oct 9, 2018

and checking for if(callback) is no longer actually needed in the _connect function because if the client.connect function is called without a callback it now returns a promise, which rejects, and either way _connect is always called with a callback function (either user supplied or the callbackified promise handler)

@vitaly-t
Copy link
Contributor

vitaly-t commented Oct 9, 2018

Ok, I trust you this, but time will tell, if there is any issue there 😉

And in the second post you referred to what I was describing myself, but the issue I mentioned was that internally callback is set to null, and since nextTick is asynchronous, a race condition seemed possible.

@rkaw92
Copy link

rkaw92 commented Oct 10, 2018

@brianc @vitaly-t Just did a quick reading of the function and it looks OK - generating the error and registering the handler for readyForQuery that modifies the function-scoped callback are mutually-exclusive within a given function call. This exclusion is achieved by the return statement in line 80.

On the weird occasion that handlers for both readyForQuery and error are fired in the same tick one after another, it could happen that the connectingErrorHandler will be missing a callback to call. I think leaving the if(callback) check in connectingErrorHandler should be left as a defensive measure for now and against future changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants