KeyError on releasing resources of a Connection when closing the DB. #208

sunew · 2018-06-24T10:02:19Z

UPDATE: see this comment for an up to date description: #208 (comment)

When running tests (in plone.restapi) we sometimes get the following KeyError. It is related to timing issues - inserting a time.sleep(2) in the test setup causes the problem to occur more frequently.

It can also be reproduced outside of a testing environment, when zope is shutting down in the middle of ongoing requests. See a comment below for a method of easy reproduction.

ZODB 5.4.0
It does not happen with ZODB 5.3.0

It is related to this commit: 1b9475d

In line 948 we see: c.close(False)

This is part of the traceback:
File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 948, in _release_resources c.close(False)
In 5.3.0 the corresponding line just sets the transaction manager to None:
c.transaction_manager = None

  Set up plone.restapi.testing.PloneRestApiDXLayer:Functional in 0.000 seconds.
  Running:
    1/77 (1.3%) test_comments_add_root (plone.restapi.tests.test_documentation.TestCommenting)/work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/PubCore/ZServerPublisher.py:22: DeprecationWarning: publish_module is deprecated. Please import from ZServer.ZPublisher.Publish.
  from ZPublisher.Publish import publish_module
    8/77 (10.4%) test_documentation_expansion (plone.restapi.tests.test_documentation.TestCommenting)/work/buildout_cache/eggs/zope.deprecation-4.3.0-py2.7.egg/zope/deprecation/deprecation.py:88: DeprecationWarning: isDefaultPage is deprecated. Import from Products.CMFPlone.defaultpage instead
  name)
    22/77 (28.6%) test_documentation_file (plone.restapi.tests.test_documentation.TestDocumentation)/work/buildout_cache/eggs/Products.PortalTransforms-3.1.3-py2.7.egg/Products/PortalTransforms/libtransforms/commandtransform.py:105: DeprecationWarning: os.popen4 is deprecated.  Use the subprocess module.
  cin, couterr = os.popen4(command, 'b')
    32/77 (41.6%) test_documentation_jwt_login (plone.restapi.tests.test_documentation.TestDocumentation) (1.199 s)Traceback (most recent call last):
  File "./bin/test", line 442, in <module>
    '--test-path', '/work/buildout_cache/eggs/repoze.xmliter-0.6-py2.7.egg',
  File "/work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py", line 60, in run
    failed = run_internal(defaults, args, script_parts=script_parts)
  File "/work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py", line 73, in run_internal
    runner.run()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 189, in run
    self.run_tests()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 291, in run_tests
    self.skipped, self.import_errors)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 460, in run_layer
    import_errors)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 383, in run_tests
    test(result)
  File "/usr/lib/python2.7/unittest/case.py", line 393, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/case.py", line 370, in run
    result.stopTest(self)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 897, in stopTest
    self.testTearDown()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 821, in testTearDown
    layer.testTearDown()
  File "/work/playground/plone/plone52devel/src/plone.app.testing/plone/app/testing/layers.py", line 288, in testTearDown
    super(PloneTestLifecycle, self).testTearDown()
  File "/work/playground/plone/plone52devel/src/plone.testing/src/plone/testing/z2.py", line 945, in testTearDown
    self['zodbDB'].close()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 651, in close
    @self._connectionMap
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 518, in _connectionMap
    self.pool.map(f)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 213, in map
    self.all.map(f)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 62, in map
    f(elt)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 659, in _
    conn._release_resources()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 948, in _release_resources
    c.close(False)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 304, in close
    self.transaction_manager.unregisterSynch(self)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/_manager.py", line 106, in unregisterSynch
    self._synchs.remove(synch)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 51, in remove
    del self.data[id(obj)]
  File "/usr/lib/python2.7/weakref.py", line 86, in __delitem__
    del self.data[key]
KeyError: 139902031255440

The text was updated successfully, but these errors were encountered:

sunew · 2018-06-24T14:15:37Z

@pbauer same error as in coredev Jenkins 5.2 pull reguest build here:
https://jenkins.plone.org/job/pull-request-5.2/891/consoleFull

jmuchemb · 2018-06-24T15:53:58Z

In #181, I wrote the following comment about the change from c.transaction_manager = None to c.close(False):

I didn't plan to change this at the beginning but while running tests, I discovered that closing a DB left Connections registered to the global transaction manager. Before, these Connections were ignored by patching their afterCompletion/newTransaction.

But the call to unregisterSynch depends on the condition self.opened and self.transaction_manager is not None. Can you debug to see why your connection is in an inconsistent state ? For example, Is it possible that the call to Connection.open was interrupted between self.opened = time.time() and transaction_manager.registerSynch(self) ?

sunew · 2018-06-25T09:05:58Z

@jmuchemb Thanks, I will take a look!

sunew · 2018-06-28T20:50:24Z

Hi @jmuchemb , I did some debugging, and things are beginning to make sense.

Context:
It happens when running tests with plone.testing, using the FUNCTIONAL_TESTING layer.

There are two threads:

There is a thread (MainThread), starting the application (Zope 4.0b5), opening the DB.
A http request (from a testcase) is made to Zope, resulting in a new thread (Dummy-1) to process the request.
The http request succeeds.
The testcase is done, testTeardown is called.
On tear down of the test, the database is closed, with ZODB.DB.DB.close()

In some cases, depending on timing, machine speed, etc., we get the failure described above in the previous comments.

A quick walkthrough (I skip over some bits) of the teardown:
MainThread:
DB.close(), (lines 636-668 ZODB.DB.py):
iterates over connections in the pool,
calls conn._release_resources for each connection.

_release_resources calls connection.close()
close() calls self.transaction_manager.unregisterSynch(self)
and this calls self._synchs.remove(synch)

Meanwhile, with really bad timing ;) Dummy-1 is finishing the request (see 2.) and closing its connection.

t1: MainThread: DB.Close -> get connections in pool and begin the iteration of all the DB connections
t2: Dummy-1: connection.close() and unregisterSynch
t3: MainThread: during close() on the connection of Dummy-1 - The synch will be gone from the transactionmanager._synchs
Causing the KeyError.

To put it short:
The problem happens when the DB is being closed and is closing all the registered connections, while another thread is finishing a request and closing its connection.

Pattern:
1 mainthread registers 7f16efd94550
2 dummy1 registers 7f16efd73110
3 dummy1 unregisters 7f16efd73110
4 mainthread unregisters 7f16efd94550
5 mainthread tries to unregister the connection of dummy1 7f16efd73110

and we have failure

Raw debug output:

registerSynch: <Connection at 7f16efd94550> id: 139736490001744 thread:MainThread
synchs before register: [] 

registerSynch: <Connection at 7f16efd73110> id: 139736489865488 thread:Dummy-1
synchs before register: [] 

unregisterSynch: <Connection at 7f16efd73110> id: 139736489865488 thread:Dummy-1
synchs before unregister: [<Connection at 7f16efd73110>] 

unregisterSynch: <Connection at 7f16efd94550> id: 139736490001744 thread:MainThread
synchs before unregister: [<Connection at 7f16efd94550>] 

unregisterSynch: <Connection at 7f16efd73110> id: 139736489865488 thread:MainThread
synchs before unregister: [] 

Traceback (most recent call last):
  File "./bin/test", line 436, in <module>
    '--test-path', '/work/buildout_cache/eggs/zope.globalrequest-1.4-py2.7.egg',
  File "/work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py", line 60, in run
    failed = run_internal(defaults, args, script_parts=script_parts)
  File "/work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py", line 73, in run_internal
    runner.run()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 189, in run
    self.run_tests()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 291, in run_tests
    self.skipped, self.import_errors)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 460, in run_layer
    import_errors)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 383, in run_tests
    test(result)
  File "/usr/lib/python2.7/unittest/case.py", line 393, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/case.py", line 370, in run
    result.stopTest(self)
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 897, in stopTest
    self.testTearDown()
  File "/work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py", line 821, in testTearDown
    layer.testTearDown()
  File "/work/playground/plone/plone52devel/src/plone.app.testing/plone/app/testing/layers.py", line 288, in testTearDown
    super(PloneTestLifecycle, self).testTearDown()
  File "/work/playground/plone/plone52devel/src/plone.testing/src/plone/testing/z2.py", line 950, in testTearDown
    self['zodbDB'].close()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 651, in close
    @self._connectionMap
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 518, in _connectionMap
    self.pool.map(f)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 213, in map
    self.all.map(f)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 62, in map
    f(elt)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 659, in _
    conn._release_resources()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 948, in _release_resources
    c.close(False)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 304, in close
    self.transaction_manager.unregisterSynch(self)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/_manager.py", line 120, in unregisterSynch
    self._synchs.remove(synch)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 51, in remove
    del self.data[id(obj)]
  File "/usr/lib/python2.7/weakref.py", line 86, in __delitem__
    del self.data[key]
KeyError: 139736489865488

Stack of Dummy-1

(Pdb)   /work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/PubCore/ZServerPublisher.py(32)__init__()
-> response=b)
  /work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/ZPublisher/Publish.py(405)publish_module()
-> environ, debug, request, response)
  /work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/ZPublisher/Publish.py(291)publish_module_standard()
-> request.close()
  /work/buildout_cache/eggs/Zope-4.0b5-py2.7.egg/ZPublisher/BaseRequest.py(224)close()
-> self.clear()
  /work/buildout_cache/eggs/Zope-4.0b5-py2.7.egg/ZPublisher/HTTPRequest.py(211)clear()
-> BaseRequest.clear(self)
  /work/buildout_cache/eggs/Zope-4.0b5-py2.7.egg/ZPublisher/BaseRequest.py(216)clear()
-> self._held = None
  /work/buildout_cache/eggs/Zope-4.0b5-py2.7.egg/App/ZApplication.py(90)__del__()
-> self._jar.close()
> /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py(304)close()
-> self.transaction_manager.unregisterSynch(self)
  /work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/_manager.py(123)unregisterSynch()

Stack of MainThread

(Pdb)  /work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py(60)run()
-> failed = run_internal(defaults, args, script_parts=script_parts)
  /work/buildout_cache/eggs/collective.xmltestreport-1.3.4-py2.7.egg/collective/xmltestreport/runner.py(73)run_internal()
-> runner.run()
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(189)run()
-> self.run_tests()
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(291)run_tests()
-> self.skipped, self.import_errors)
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(460)run_layer()
-> import_errors)
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(383)run_tests()
-> test(result)
  /usr/lib/python2.7/unittest/case.py(393)__call__()
-> return self.run(*args, **kwds)
  /usr/lib/python2.7/unittest/case.py(370)run()
-> result.stopTest(self)
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(897)stopTest()
-> self.testTearDown()
  /work/buildout_cache/eggs/zope.testrunner-4.8.1-py2.7.egg/zope/testrunner/runner.py(821)testTearDown()
-> layer.testTearDown()
  /work/playground/plone/plone52devel/src/plone.app.testing/plone/app/testing/layers.py(288)testTearDown()
-> super(PloneTestLifecycle, self).testTearDown()
  /work/playground/plone/plone52devel/src/plone.testing/src/plone/testing/z2.py(950)testTearDown()
-> self['zodbDB'].close()
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py(651)close()
-> @self._connectionMap
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py(518)_connectionMap()
-> self.pool.map(f)
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py(213)map()
-> self.all.map(f)
  /work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py(62)map()
-> f(elt)
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py(659)_()
-> conn._release_resources()
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py(948)_release_resources()
-> c.close(False)
  /work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py(304)close()
-> self.transaction_manager.unregisterSynch(self)
> /work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/_manager.py(126)unregisterSynch()
-> self._synchs.remove(synch)

sunew · 2018-06-28T22:01:05Z

I can reproduce it outside of a testing environment.

insert time.sleep(10) somewhere in code run by a http request
start the request from a browser
before the 10 seconds have passed, stop zope (gracefully with "kill `cat var/instance.pid`")

This is the output incl. my debugging statements:

registerSynch: <Connection at 7f419b096810> id: 139919750686736 thread:Dummy-1
synchs before register: [] 

/work/buildout_cache/eggs/zope.deprecation-4.3.0-py2.7.egg/zope/deprecation/deprecation.py:88: DeprecationWarning: isDefaultPage is deprecated. Import from Products.CMFPlone.defaultpage instead
  name)
/work/buildout_cache/eggs/Chameleon-3.2-py2.7.egg/chameleon/tal.py:471: DeprecationWarning: Using len() is deprecated. Use the `length` attribute for the size of the current page, which is what we return now. Use the `sequence_length` attribute for the size of the entire sequence. 
  iterable = list(iterable) if iterable is not None else ()
2018-06-28 23:57:02 INFO SignalHandler Caught signal SIGTERM
2018-06-28 23:57:02 INFO Z2 Shutting down fast
2018-06-28 23:57:02 INFO ZServer closing HTTP to new connections

unregisterSynch: <Connection at 7f419b096810> id: 139919750686736 thread:MainThread
synchs before unregister: [] 

Traceback (most recent call last):
  File "/work/playground/plone/plone52devel/parts/instance/bin/interpreter", line 281, in <module>
    exec(compile(__file__f.read(), __file__, "exec"))
  File "/work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/Zope2/Startup/run.py", line 64, in <module>
    run()
  File "/work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/Zope2/Startup/run.py", line 29, in run
    starter.run()
  File "/work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/Zope2/Startup/starter.py", line 98, in run
    self.shutdown()
  File "/work/buildout_cache/eggs/ZServer-4.0b1-py2.7.egg/ZServer/Zope2/Startup/starter.py", line 103, in shutdown
    db.close()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 651, in close
    @self._connectionMap
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 518, in _connectionMap
    self.pool.map(f)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 213, in map
    self.all.map(f)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 62, in map
    f(elt)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/DB.py", line 659, in _
    conn._release_resources()
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 948, in _release_resources
    c.close(False)
  File "/work/buildout_cache/eggs/ZODB-5.4.0-py2.7.egg/ZODB/Connection.py", line 304, in close
    self.transaction_manager.unregisterSynch(self)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/_manager.py", line 126, in unregisterSynch
    self._synchs.remove(synch)
  File "/work/buildout_cache/eggs/transaction-2.2.1-py2.7.egg/transaction/weakset.py", line 51, in remove
    del self.data[id(obj)]
  File "/usr/lib/python2.7/weakref.py", line 86, in __delitem__
    del self.data[key]
KeyError: 139919750686736

sunew · 2018-06-28T22:39:32Z

my debug statements:
(transaction/_manager.py)

    def registerSynch(self, synch):
        """ See ITransactionManager.
        """
        import threading
        # if threading.currentThread().getName() == 'Dummy-1':
        #     from pdb import set_trace; set_trace()
        print '\n\x1b[1;33;40m' + \
            'registerSynch: ' + str(synch) + \
            ' id: ' + str(id(synch)) + \
            ' thread:' + threading.currentThread().getName() + \
            '\n' + 'synchs before register: ' + \
            str([ref() for ref in self._synchs.as_weakref_list()]) + '\x1b[0m \n'
        self._synchs.add(synch)
        if self._txn is not None:
            synch.newTransaction(self._txn)

    def unregisterSynch(self, synch):
        """ See ITransactionManager.
        """
        import threading
        # if threading.currentThread().getName() == 'Dummy-1':
        #     from pdb import set_trace; set_trace()
        print '\n\x1b[1;33;40m' + \
            'unregisterSynch: ' + str(synch) + \
            ' id: ' + str(id(synch)) + \
            ' thread:' + threading.currentThread().getName() + \
            '\n' + 'synchs before unregister: ' + \
            str([ref() for ref in self._synchs.as_weakref_list()]) + '\x1b[0m \n'
        # if synch not in self._synchs:
        #     from pdb import set_trace; set_trace()
        self._synchs.remove(synch)

sunew · 2018-07-04T05:38:46Z

@jmuchemb Hi, did you have time to take a look at my results?

jmuchemb · 2018-07-06T18:09:38Z

So to sum up, Connection.close() is not thread-safe and from your report, it looks legitimate to close a connection at the same time as the DB is being closed. And it worked before because DB.close didn't actually close connections.

It should be noted also that the bug is not limited to unregisterSynch(). We could also have race conditions while processing __onCloseCallbacks or else.

For this kind of issue, the first idea that comes to mind is to protect Connection.close() with a new lock on the Connection object. There may be better solutions so I keep thinking about this. I doubt that reverting my change is correct: leaving connections registered to a transaction was really bad.

sunew · 2018-07-06T18:50:55Z

@jmuchemb
Your summary: Yes, I agree with your conclusion.

And yes, there could be other similar race conditions. Sorry for the overload of data in my report 😄, but I was reporting while digging deeper.

And I agree with not reverting your commit, it is just making the issue surface, it is not real cause of the issue.

Maybe @jimfulton has an opinion on the protection of Connection.close() ?

sunew · 2018-08-14T17:01:40Z

any news on this one?

sunew · 2018-10-02T23:57:26Z

I have investigated a bit more:

We are in a zope4 setting, running the application normally, not as part of a test suite. (The test suite runs mentioned above made the issue surface more frequently).

There is a main thread and a thread per request.

A) A new request thread opens a connection, that registers with the transaction manager (in Connection.open() - calls transaction/_manager.py - registerSynch).
The connection is added to _synchs WeakSet of the transaction_manager.

B) The application is shut down - perhaps with SIGINT or SIGTERM. The request of A) is long running, so is still active while the application begins to shutdown. This means the request thread has not yet closed its connection.
The connection is still in DB pool, while shutdown calls ZODB.DB.close()

C) DB.close() iterates over all connections in the pool (with _connectionMap): For each connection:
conn._release_resources() -> conn.close() -> self.transaction_manager.unregisterSynch(self)

D) But the transaction_manager is a ThreadTransactionManager, a subclass of threading.local, so the _synchs WeakSet is local for each thread - this means the main thread sees a different _synchs than the request thread, causing the keyerror when trying to unregister the connection with the transaction manager from the main thread during db.close()

I'm not sure what a good solution would be.
@jimfulton any ideas or hints?

sunew · 2018-10-03T17:34:15Z

@jimfulton Failing test here: https://github.com/zopefoundation/ZODB/tree/sunew-failing-test-for-208

jimfulton · 2018-10-14T16:03:46Z

Here's an update. @sunew thanks for the test, that helped a lot. Your initial assessment was correct and mine was wrong. :) I'm pondering a fix.

jimfulton · 2018-10-14T17:53:03Z

I've added a fix on https://github.com/zopefoundation/ZODB/tree/sunew-failing-test-for-208 that depends on zopefoundation/transaction#68. When the transaction change is released, then I can release a ZODB fix.

pbauer · 2018-10-17T13:27:15Z

I tested that against Plone and found that the fix also requires changes in Zope where the transaction is used like this (I already added the additional .manager):

@contextmanager
def load_app(module_info):
    app_wrapper, realm, debug_mode = module_info
    # Loads the 'OFS.Application' from ZODB.
    app = app_wrapper()

    try:
        yield (app, realm, debug_mode)
    finally:
        if transaction.manager.manager._txn is not None:
            # Only abort a transaction, if one exists. Otherwise the
            # abort creates a new transaction just to abort it.
            transaction.abort()
        app._p_jar.close()

jmuchemb · 2018-10-17T13:48:39Z

That becomes quite invasive, no ?

(actually, on performance side, I'm not fond of all the extra method calls in transaction)

sunew · 2019-04-01T21:12:06Z

Sorry for being so late here, but great to see a fix. Thanks!

…every connection reopen This continues c7c01ce (bigfile/zodb: ZODB.Connection can migrate between threads on close/open and we have to care): Until now we were retrieving zconn.transaction_manager on _ZBigFileH init, and further using that transaction manager for every connection reopen. However that is not correct because on every reopen connection can be given new transaction manager. We were not practically hitting the bug because until recently ZODB was, by default, using the same ThreadTransactionManager manager instance as Connection.transaction_manager for all connections, and not doing all steps needed to keep _ZBigFileH.transaction_manager in sync to Connection was forgiven - a particular transaction manager that was used was TransactionManager instance implicitly associated with current thread by global threading.Local transaction.manager . However starting from ZODB 5.5.0 Connection code was changed to remember as .transaction_manager the particular TransactionManager instance without any threading.Local games: zopefoundation/ZODB@b6ac40f153 zopefoundation/ZODB#208 zopefoundation/ZODB#226 Given that we were not syncing properly that broke wendelin.core tests, for example: bigfile/tests/test_filezodb.py::test_bigfile_filezodb_vs_conn_migration Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/kirr/src/wendelin/wendelin.core/bigfile/tests/test_filezodb.py", line 401, in T11 transaction.commit() # should be nothing File "/home/kirr/src/wendelin/venv/z-dev/local/lib/python2.7/site-packages/transaction/_manager.py", line 252, in commit return self.manager.commit() File "/home/kirr/src/wendelin/venv/z-dev/local/lib/python2.7/site-packages/transaction/_manager.py", line 131, in commit return self.get().commit() File "/home/kirr/src/wendelin/venv/z-dev/local/lib/python2.7/site-packages/transaction/_transaction.py", line 298, in commit self._synchronizers.map(lambda s: s.beforeCompletion(self)) File "/home/kirr/src/wendelin/venv/z-dev/local/lib/python2.7/site-packages/transaction/weakset.py", line 61, in map f(elt) File "/home/kirr/src/wendelin/venv/z-dev/local/lib/python2.7/site-packages/transaction/_transaction.py", line 298, in <lambda> self._synchronizers.map(lambda s: s.beforeCompletion(self)) File "/home/kirr/src/wendelin/wendelin.core/bigfile/file_zodb.py", line 671, in beforeCompletion assert txn is zconn.transaction_manager.get() AssertionError What is happening here is that one thread used the connection and ZBigFile/_ZBigFileH associated with it, then the connection was closed and released to DB pool. Then the connection was reopened but by another thread and thus with different TransactionManager instance and oops - _ZBigFileH.transaction_manager is different because it is TransactionManager instance that was used by the first thread. Fix it by resyncing _ZBigFileH.transaction_manager on every connection reopen. No new test as existing tests already cover the problem when run with ZODB >= 5.5.0 .

sunew mentioned this issue Jun 24, 2018

Plone 5.2 / ZODB 5.4.0 - KeyError on Connection when running tests intermittently (timing issue) plone/plone.restapi#558

Closed

sunew mentioned this issue Jun 25, 2018

Plone 5.2: Problem with ZODB 5.4.0: Intermittent KeyError on releasing resources of a Connection when running tests. plone/Products.CMFPlone#2455

Closed

sunew changed the title ~~intermittent KeyError on releasing resources of a Connection~~ KeyError on releasing resources of a Connection when closing the DB. Jul 5, 2018

jmuchemb added the bug label Jul 6, 2018

davisagli mentioned this issue Aug 8, 2018

Plone 5.2 tests on Jenkins are hanging sometimes plone/Products.CMFPlone#2493

Closed

sunew added a commit to sunew/transaction that referenced this issue Sep 15, 2018

debug statements for zopefoundation/ZODB#208

d19ad1b

sunew added a commit that referenced this issue Oct 3, 2018

failing test for #208

5a760e8

jimfulton mentioned this issue Oct 14, 2018

Changed the implementation of ThreadTransactionManager zopefoundation/transaction#68

Merged

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 17, 2018

test fix for zopefoundation/ZODB#208

d118e9d

pbauer mentioned this issue Oct 17, 2018

try Zope 4.0b6 with newest ZODB before release plone/buildout.coredev#537

Closed

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 17, 2018

test fix for zopefoundation/ZODB#208

86bf71f

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 18, 2018

test fix for zopefoundation/ZODB#208

9b7f1ac

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 20, 2018

test fix for zopefoundation/ZODB#208

eb57224

icemac mentioned this issue Oct 23, 2018

Fix KeyError on releasing resources of a Connection when closing the DB. #226

Merged

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 24, 2018

test fix for zopefoundation/ZODB#208

7c4a54d

icemac pushed a commit to zopefoundation/Zope that referenced this issue Oct 24, 2018

Use new transaction release to get the fix for zopefoundation/ZODB#208.

19d3d07

icemac mentioned this issue Oct 24, 2018

Include fix for ZODB issue 208 zopefoundation/Zope#388

Merged

icemac pushed a commit to zopefoundation/Zope that referenced this issue Oct 24, 2018

Test against ZODB which contains fix for zopefoundation/ZODB#208.

bea835c

icemac closed this as completed in #226 Oct 25, 2018

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 26, 2018

test fix for zopefoundation/ZODB#208

d60621b

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 28, 2018

test fix for zopefoundation/ZODB#208

f0f9c3e

pbauer added a commit to plone/buildout.coredev that referenced this issue Oct 29, 2018

test fix for zopefoundation/ZODB#208

a514b3e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError on releasing resources of a Connection when closing the DB. #208

KeyError on releasing resources of a Connection when closing the DB. #208

sunew commented Jun 24, 2018 •

edited

Loading

sunew commented Jun 24, 2018

jmuchemb commented Jun 24, 2018 •

edited

Loading

sunew commented Jun 25, 2018

sunew commented Jun 28, 2018 •

edited

Loading

sunew commented Jun 28, 2018 •

edited

Loading

sunew commented Jun 28, 2018 •

edited

Loading

sunew commented Jul 4, 2018

jmuchemb commented Jul 6, 2018

sunew commented Jul 6, 2018

sunew commented Aug 14, 2018

sunew commented Oct 2, 2018

sunew commented Oct 3, 2018 •

edited

Loading

jimfulton commented Oct 14, 2018

jimfulton commented Oct 14, 2018

pbauer commented Oct 17, 2018 •

edited

Loading

jmuchemb commented Oct 17, 2018

sunew commented Apr 1, 2019

KeyError on releasing resources of a Connection when closing the DB. #208

KeyError on releasing resources of a Connection when closing the DB. #208

Comments

sunew commented Jun 24, 2018 • edited Loading

sunew commented Jun 24, 2018

jmuchemb commented Jun 24, 2018 • edited Loading

sunew commented Jun 25, 2018

sunew commented Jun 28, 2018 • edited Loading

sunew commented Jun 28, 2018 • edited Loading

sunew commented Jun 28, 2018 • edited Loading

sunew commented Jul 4, 2018

jmuchemb commented Jul 6, 2018

sunew commented Jul 6, 2018

sunew commented Aug 14, 2018

sunew commented Oct 2, 2018

sunew commented Oct 3, 2018 • edited Loading

jimfulton commented Oct 14, 2018

jimfulton commented Oct 14, 2018

pbauer commented Oct 17, 2018 • edited Loading

jmuchemb commented Oct 17, 2018

sunew commented Apr 1, 2019

sunew commented Jun 24, 2018 •

edited

Loading

jmuchemb commented Jun 24, 2018 •

edited

Loading

sunew commented Jun 28, 2018 •

edited

Loading

sunew commented Jun 28, 2018 •

edited

Loading

sunew commented Jun 28, 2018 •

edited

Loading

sunew commented Oct 3, 2018 •

edited

Loading

pbauer commented Oct 17, 2018 •

edited

Loading