Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpq: another command is already in progress #69

Open
codygman opened this issue Mar 5, 2021 · 9 comments
Open

libpq: another command is already in progress #69

codygman opened this issue Mar 5, 2021 · 9 comments

Comments

@codygman
Copy link

codygman commented Mar 5, 2021

My team and others in the comments are still affected by this issue from lpsmith/postgresql-simple#177, so I'm copying it over here.

I hope to dig into the issue some more today and post back here but we'll see what happens.

@codygman
Copy link
Author

codygman commented Mar 5, 2021

Shouldn't queries that receive an asynchronous exception do something like call Database.PostgreSQL.LibPQ.cancel on the connection?

Also it looks like hasql has a regression test for this issue.

@codygman
Copy link
Author

codygman commented Mar 5, 2021

The minimal repro from the comments returns timeout as expected now and seems to be fixed with versions:

j$ stack exec -- ghc-pkg list | grep postgresql
    HDBC-postgresql-2.5.0.0
    persistent-postgresql-2.11.0.1
    postgresql-libpq-0.9.4.3
    postgresql-simple-0.6.4

The minimal repro was:

{-# LANGUAGE OverloadedStrings, ScopedTypeVariables #-}
import Database.PostgreSQL.Simple
import System.Timeout

main :: IO ()
main = do
  conn <- connectPostgreSQL "dbname='foo' user='bar' password='xxxx'"
  res <- timeout 1000000 $ withTransaction conn (query_ conn "select pg_sleep(5)")
  case res of
    Just [Only x] -> putStrLn "Done." >> return x
    Just _ -> error "Impossible."
    Nothing -> putStrLn "Timed out."

I'll post a minimal repro of my problem that still exists which is connections being returned to Data.Pool while a transaction has not been committed or rolled back as expected.

@codygman
Copy link
Author

codygman commented Mar 5, 2021

Actually I'm not able to reproduce my issue either with just postgresql-simple:

      let connectionInfo :: ConnectInfo =
            defaultConnectInfo
            { .. }
      pool <- createPool (connect connectionInfo) close 1 10 1
      threadId <- liftIO . Concurrent.forkIO $ withResource pool $ \conn -> do
        _ :: [Only ()] <- query_ conn "select pg_sleep(20)"
        pure ()

      void $ Concurrent.killThread threadId
      withResource pool $ \conn -> do
        _ :: [Only ()] <- query_ conn "select pg_sleep(2)"
        pure ()

For now I think this rules out postgresql-simple as the culprit for my bug, but perhaps that's not similar enough to my Persistent streaming code that exhibits this issue.

TODO post stripped down persistent code

Given that I'll create an issue upstream there but I'll leave this open in case others who commented experience this issue using postgresql-simple directly.

Here is the related issue on the Persistent side for SqlBackend having issues that also seems related: yesodweb/persistent#981

mzabani added a commit to mzabani/postgresql-simple that referenced this issue Mar 16, 2021
@mzabani
Copy link

mzabani commented Mar 16, 2021

Isn't it possible that in your sample above killThread kills the thread before the query gets sent to the server?
We have this problem at the company I work for as well. I investigated this a bit and pushed a PR that might fix it 🤞

@brandon-leapyear
Copy link

brandon-leapyear commented Jun 27, 2022

This is an old work account. Please reference @brandonchinn178 for all future communication


Are there any updates here? I'm still getting this issue. Adapting the initial repro gets me the same error:

{-# LANGUAGE OverloadedStrings, TypeApplications #-}
import Database.PostgreSQL.Simple
import System.Timeout

main :: IO ()
main = do
  conn <- connectPostgreSQL "..."
  res <- timeout 1000000 $ withTransaction conn (query_ conn "select pg_sleep(5)")
  case res of
    Just [Only ()] -> putStrLn "Done."
    Just _ -> error "Impossible."
    Nothing -> putStrLn "Timed out."
  withTransaction conn (query_ conn "select 1") >>= print @[Only Int]
Timed out.
Test.hs: libpq: failed (another command is already in progress
)

I just tried #71, and this no longer errors. Maybe we can merge it sometime soon? @phadej @mzabani

@mzabani
Copy link

mzabani commented Jun 27, 2022

It's been a while, but as it is I think we shouldn't merge #71. IIUC it is not possible to reliably fix connection state for new queries to run when an exception interrupts a query, in general.

My last comment in the PR suggests merging just the cancelAndClear function, which could then be used judiciously by users trying to reuse a connection whose query was interrupted.
But I believe this alone wouldn't allow users to fix at least some cases of "libpq: another command is already in progress" errors we see around, since for example transactions in postgresql-simple run queries in exception handlers .
For that, we'd have to do what @Yuras suggested in this comment and "fix existing code not to reuse connections on failure". That looks like just a one line removal fix, but it is not backwards compatible if users were previously relying on rolled back transactions on exceptions, so it may be more complicated than that.

I don't know the best way forward. Merging cancelAndClear independently seems like a possibility, and maybe then adding new withTransaction... functions that don't rollback on exceptions, document them and deprecate the old ones?

@brandon-leapyear
Copy link

brandon-leapyear commented Jun 27, 2022

This is an old work account. Please reference @brandonchinn178 for all future communication


ah wait never mind. We use persistent-postgresql, and it looks like persistent uses its own transaction logic, so postgresql-simple's issues with transactions shouldn't be causing our error. I'll post a separate issue with persistent-postgresql. Thanks!

@codygman
Copy link
Author

codygman commented Aug 23, 2022

For that, we'd have to do what @Yuras suggested #71 (comment) and "fix existing code not to reuse connections on failure". That looks like just a one line removal fix, but it is not backwards compatible if users were previously relying on rolled back transactions on exceptions, so it may be more complicated than that.

So users that don't use pools but depend on transaction rollback?

Merging cancelAndClear independently seems like a possibility, and maybe then adding new withTransaction... functions that don't rollback on exceptions, document them and deprecate the old ones?

Please do this. I'd like my team to be able to avoid maintain a fork with that PR.

Then hopefully the deprecation warning will solicit feedback from affected users. Maybe linking to the create new issue page in the deprecation warning will increase chances of feedback.

@mzabani
Copy link

mzabani commented Aug 24, 2022

I thought a bit about this and I think there is another solution to this problem.

We could add a new IORef or MVar field to Connection called something like connectionMayHaveRunningStatement :: IORef Bool. We would then catch exceptions with an uninterruptible mask when running queries and set that to True, but do nothing else in the exception handling block.

Then, before running any new queries with the same connection, we would cancel any running statements iff that value is set to True. We can be careful and give out good error messages in exceptions if cancelling the running statement fails (e.g. showing what query we would run next) so that users understand the context a bit better.

This may even enable us to use concurrency and timeout more freely, unless I'm missing something (definitely not my area of expertise here).

Existing withTransaction.. functions would remain as they are, and we might still want to deprecate and replace them, but they'll be far less likely to run into "libpq: another command is already in progress" when running ROLLBACK or if they fail they'll do so with a much more informative error. In any case, we'll be able to think about them separately from this discussion.

I can implement this (in fact I hacked it quickly and it passes the tests I created), but I'd like to hear from others if this sounds reasonable first. @Yuras @phadej

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants