-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError on releasing resources of a Connection when closing the DB. #208
Comments
@pbauer same error as in coredev Jenkins 5.2 pull reguest build here: |
In #181, I wrote the following comment about the change from
But the call to |
@jmuchemb Thanks, I will take a look! |
Hi @jmuchemb , I did some debugging, and things are beginning to make sense. Context: There are two threads:
In some cases, depending on timing, machine speed, etc., we get the failure described above in the previous comments. A quick walkthrough (I skip over some bits) of the teardown: _release_resources calls connection.close() Meanwhile, with really bad timing ;) Dummy-1 is finishing the request (see 2.) and closing its connection. t1: MainThread: DB.Close -> get connections in pool and begin the iteration of all the DB connections To put it short: Pattern:
Raw debug output:
Stack of Dummy-1
Stack of MainThread
|
I can reproduce it outside of a testing environment.
This is the output incl. my debugging statements:
|
my debug statements:
|
@jmuchemb Hi, did you have time to take a look at my results? |
So to sum up, Connection.close() is not thread-safe and from your report, it looks legitimate to close a connection at the same time as the DB is being closed. And it worked before because DB.close didn't actually close connections. It should be noted also that the bug is not limited to unregisterSynch(). We could also have race conditions while processing __onCloseCallbacks or else. For this kind of issue, the first idea that comes to mind is to protect Connection.close() with a new lock on the Connection object. There may be better solutions so I keep thinking about this. I doubt that reverting my change is correct: leaving connections registered to a transaction was really bad. |
@jmuchemb And yes, there could be other similar race conditions. Sorry for the overload of data in my report 😄, but I was reporting while digging deeper. And I agree with not reverting your commit, it is just making the issue surface, it is not real cause of the issue. Maybe @jimfulton has an opinion on the protection of Connection.close() ? |
any news on this one? |
I have investigated a bit more: We are in a zope4 setting, running the application normally, not as part of a test suite. (The test suite runs mentioned above made the issue surface more frequently). There is a main thread and a thread per request. A) A new request thread opens a connection, that registers with the transaction manager (in Connection.open() - calls transaction/_manager.py - registerSynch). B) The application is shut down - perhaps with SIGINT or SIGTERM. The request of A) is long running, so is still active while the application begins to shutdown. This means the request thread has not yet closed its connection. C) DB.close() iterates over all connections in the pool (with _connectionMap): For each connection: D) But the transaction_manager is a ThreadTransactionManager, a subclass of threading.local, so the I'm not sure what a good solution would be. |
Here's an update. @sunew thanks for the test, that helped a lot. Your initial assessment was correct and mine was wrong. :) I'm pondering a fix. |
I've added a fix on https://github.com/zopefoundation/ZODB/tree/sunew-failing-test-for-208 that depends on zopefoundation/transaction#68. When the transaction change is released, then I can release a ZODB fix. |
I tested that against Plone and found that the fix also requires changes in Zope where the transaction is used like this (I already added the additional @contextmanager
def load_app(module_info):
app_wrapper, realm, debug_mode = module_info
# Loads the 'OFS.Application' from ZODB.
app = app_wrapper()
try:
yield (app, realm, debug_mode)
finally:
if transaction.manager.manager._txn is not None:
# Only abort a transaction, if one exists. Otherwise the
# abort creates a new transaction just to abort it.
transaction.abort()
app._p_jar.close() |
That becomes quite invasive, no ? (actually, on performance side, I'm not fond of all the extra method calls in transaction) |
Sorry for being so late here, but great to see a fix. Thanks! |
…every connection reopen This continues c7c01ce (bigfile/zodb: ZODB.Connection can migrate between threads on close/open and we have to care): Until now we were retrieving zconn.transaction_manager on _ZBigFileH init, and further using that transaction manager for every connection reopen. However that is not correct because on every reopen connection can be given new transaction manager. We were not practically hitting the bug because until recently ZODB was, by default, using the same ThreadTransactionManager manager instance as Connection.transaction_manager for all connections, and not doing all steps needed to keep _ZBigFileH.transaction_manager in sync to Connection was forgiven - a particular transaction manager that was used was TransactionManager instance implicitly associated with current thread by global threading.Local transaction.manager . However starting from ZODB 5.5.0 Connection code was changed to remember as .transaction_manager the particular TransactionManager instance without any threading.Local games: zopefoundation/ZODB@b6ac40f153 zopefoundation/ZODB#208 zopefoundation/ZODB#226 Given that we were not syncing properly that broke wendelin.core tests, for example: bigfile/tests/test_filezodb.py::test_bigfile_filezodb_vs_conn_migration Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/kirr/src/wendelin/wendelin.core/bigfile/tests/test_filezodb.py", line 401, in T11 transaction.commit() # should be nothing File "/home/kirr/src/wendelin/venv/z-dev/local/lib/python2.7/site-packages/transaction/_manager.py", line 252, in commit return self.manager.commit() File "/home/kirr/src/wendelin/venv/z-dev/local/lib/python2.7/site-packages/transaction/_manager.py", line 131, in commit return self.get().commit() File "/home/kirr/src/wendelin/venv/z-dev/local/lib/python2.7/site-packages/transaction/_transaction.py", line 298, in commit self._synchronizers.map(lambda s: s.beforeCompletion(self)) File "/home/kirr/src/wendelin/venv/z-dev/local/lib/python2.7/site-packages/transaction/weakset.py", line 61, in map f(elt) File "/home/kirr/src/wendelin/venv/z-dev/local/lib/python2.7/site-packages/transaction/_transaction.py", line 298, in <lambda> self._synchronizers.map(lambda s: s.beforeCompletion(self)) File "/home/kirr/src/wendelin/wendelin.core/bigfile/file_zodb.py", line 671, in beforeCompletion assert txn is zconn.transaction_manager.get() AssertionError What is happening here is that one thread used the connection and ZBigFile/_ZBigFileH associated with it, then the connection was closed and released to DB pool. Then the connection was reopened but by another thread and thus with different TransactionManager instance and oops - _ZBigFileH.transaction_manager is different because it is TransactionManager instance that was used by the first thread. Fix it by resyncing _ZBigFileH.transaction_manager on every connection reopen. No new test as existing tests already cover the problem when run with ZODB >= 5.5.0 .
UPDATE: see this comment for an up to date description: #208 (comment)
When running tests (in plone.restapi) we sometimes get the following KeyError. It is related to timing issues - inserting a time.sleep(2) in the test setup causes the problem to occur more frequently.
It can also be reproduced outside of a testing environment, when zope is shutting down in the middle of ongoing requests. See a comment below for a method of easy reproduction.
ZODB 5.4.0
It does not happen with ZODB 5.3.0
It is related to this commit: 1b9475d
In line 948 we see:
c.close(False)
This is part of the traceback:
File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 948, in _release_resources c.close(False)
In 5.3.0 the corresponding line just sets the transaction manager to None:
c.transaction_manager = None
The text was updated successfully, but these errors were encountered: