Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another data corruption due to concurrency bug (ZEO5 regression; no external invalidations) #166

Closed
navytux opened this issue Apr 2, 2021 · 7 comments · Fixed by #174
Closed

Comments

@navytux
Copy link
Contributor

navytux commented Apr 2, 2021

( moving out from #155 (comment) since this seems to be a different problem )

Something really wrong is going on with ZEO5.

In addition to the test program provided in #155, where there are 8 simultaneous workers each of which opens its own storage-level connection, I've tried to run the test program from zopefoundation/ZODB#290 with minimal edit to use ZEO instead of MappingStorage, and it fails too.

In that test program there is only one storage-level connection and two simultaneous users that are using that single storage-level connection. In other words there are no external invalidations involved at all.

https://lab.nexedi.com/kirr/wendelin.core/blob/e2dc8651/lib/tests/testprog/zopenrace.py#L102-203

This failure indicates data corruption. Quoting zopefoundation/ZODB#290:

---- 8< ----
That in turn can lead to loose of Consistency of the database if mix of current
and old data is used to process a transaction. A classic example would be bank
accounts A, B and C with A<-B and A<-C transfer transactions. If transaction
that handles A<-C sees stale data for A when starting its processing, it
results in A loosing what it should have received from B.

...

The program simulates two clients: one (T2) constantly modifies two integer
objects preserving invariant that their values stay equal. The other client
(T1) constantly opens the database and verifies the invariant. T1 forces access
to one of the object to always go through loading from the database, and this
way if live cache becomes stale the bug is observed as invariant breakage.
---- 8< ----

Here is how it fails with ZEO:

.../wendelin.core$ WENDELIN_CORE_TEST_DB="<zeo>" py.test -vs -k openrace lib/tests/test_zodb.py
============================== test session starts ==============================
platform linux2 -- Python 2.7.18, pytest-4.6.11, py-1.9.0, pluggy-0.13.1 -- /home/kirr/src/wendelin/venv/z-dev/bin/python2
cachedir: .pytest_cache
rootdir: /home/kirr/src/wendelin/wendelin.core
collected 8 items / 6 deselected / 2 selected

lib/tests/test_zodb.py::test_zodb_zopenrace_basic OK
PASSED
lib/tests/test_zodb.py::test_zodb_zopenrace FAILED

=================================== FAILURES ====================================
______________________________ test_zodb_zopenrace ______________________________

    def test_zodb_zopenrace():
        # exercises ZODB.Connection + particular storage implementation
>       zopenrace.main()

lib/tests/test_zodb.py:382:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
<decorator-gen-2>:2: in main
    ???
../../tools/go/pygolang/golang/__init__.py:103: in _
    return f(*argv, **kw)
lib/tests/testprog/zopenrace.py:115: in main
    test(zstor)
<decorator-gen-3>:2: in test
    ???
../../tools/go/pygolang/golang/__init__.py:103: in _
    return f(*argv, **kw)
lib/tests/testprog/zopenrace.py:201: in test
    wg.wait()
golang/_sync.pyx:246: in golang._sync.PyWorkGroup.wait
    ???
golang/_sync.pyx:226: in golang._sync.PyWorkGroup.go.pyrunf
    ???
lib/tests/testprog/zopenrace.py:165: in T1
    t1()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def t1():
        transaction.begin()
        zconn = db.open()

        root = zconn.root()
        obj1 = root['obj1']
        obj2 = root['obj2']

        # obj1 - reload it from zstor
        # obj2 - get it from zconn cache
        obj1._p_invalidate()

        # both objects must have the same values
        i1 = obj1.i
        i2 = obj2.i
        if i1 != i2:
>           raise AssertionError("T1: obj1.i (%d)  !=  obj2.i (%d)" % (i1, i2))
E           AssertionError: T1: obj1.i (16)  !=  obj2.i (15)

lib/tests/testprog/zopenrace.py:156: AssertionError

P.S. Suggested patch from #155 (comment) does not help here - even though it seemingly fixes #155, it does not help to fix zopefoundation/ZODB#290 with ZEO5.

P.P.S. ZEO 5.2.2-1-g3d90ed42, ZODB 5.6.0-18-g75c703fcf, CPython 2.7.18

@navytux
Copy link
Contributor Author

navytux commented Apr 2, 2021

/cc @jimfulton

@d-maurer
Copy link
Contributor

d-maurer commented Apr 5, 2021 via email

@navytux
Copy link
Contributor Author

navytux commented Apr 5, 2021

@d-maurer, thanks for feedback.

I checked a bit more from scratch and, contrary to the problem from #155 (which fails on both py2 and py3), the test program in this issue fails on CPython2, but does not fail on CPython3. The difference might be thus somehow related to trollius.

To verify I used zoperace-zeo.py which is zopenrace.py from zopefoundation/ZODB#290 plus the following change that you suggested:

--- a/zopenrace.py.kirr0
+++ b/zopenrace.py
@@ -87,8 +87,7 @@
 """
 
 from __future__ import print_function
-from ZODB import DB
-from ZODB.MappingStorage import MappingStorage
+from ZEO import server, DB
 import transaction
 from persistent import Persistent
 
@@ -110,8 +109,8 @@ def __init__(self, i):
 
 
 def main():
-    zstor = MappingStorage()
-    db = DB(zstor)
+    address, stop = server()
+    db = DB(address)
 
 
     # init initializes the database with two integer objects - obj1/obj2 that are set to 0.
@@ -194,6 +193,7 @@ def t2():
     t2 = go(T2, N)
     t1.join()
     t2.join()
+    stop()
 
     if not all(okv):
         raise AssertionError('FAIL')

For me it fails with CPython 2.7.18 on two different machines. For completeness I attach full log of one such verification.

Please note, that I hit this ZEO bug only accidentally, while trying to find a place for my ZODB test programs in wendelin.core test suite, so that those test programs are run wrt all ZODB versions we care about (ZODB4, ZODB5 and ZODB4 with MVCC backport from ZODB5 (see https://lab.nexedi.com/nexedi/ZODB/commit/8e7eab33?expanded=1), and wrt all storages (FileStorage, ZEO and NEO) as explained in this patch.

It is possible that the bug is indeed in trollius. It is also possible that the bug is in ZEO5 and only being more likely to hit with trollius. For now I did not looked into ZEO5 to verify it for concurrency-correctness at all.

Thanks again for feedback,
Kirill

Log of verification with CPython 2.7.18 (+ ZODB-5.6.0, ZEO-5.2.2)
kirr@teco:~/tmp/trashme/Z$ virtualenv -p python2 py2.venv
created virtual environment CPython2.7.18.final.0-64 in 713ms
  creator CPython2Posix(dest=/home/kirr/tmp/trashme/Z/py2.venv, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/kirr/.local/share/virtualenv)
    added seed packages: pip==20.3.4, pkg_resources==0.0.0, setuptools==44.1.1, wheel==0.34.2
  activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator

kirr@teco:~/tmp/trashme/Z$ . py2.venv/bin/activate

(py2.venv) kirr@teco:~/tmp/trashme/Z$ python
Python 2.7.18 (default, Mar  9 2021, 11:09:26) 
[GCC 10.2.1 20210110] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 

(py2.venv) kirr@teco:~/tmp/trashme/Z$ pip install ZODB ZEO
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
Collecting ZODB
  Downloading ZODB-5.6.0-py2.py3-none-any.whl (413 kB)
     |████████████████████████████████| 413 kB 2.3 MB/s 
Collecting ZEO
  Downloading ZEO-5.2.2-py2.py3-none-any.whl (288 kB)
     |████████████████████████████████| 288 kB 13.7 MB/s 
Collecting zc.lockfile
  Downloading zc.lockfile-2.0-py2.py3-none-any.whl (9.7 kB)
Collecting persistent>=4.4.0
  Downloading persistent-4.6.4-cp27-cp27mu-manylinux2010_x86_64.whl (240 kB)
     |████████████████████████████████| 240 kB 15.2 MB/s 
Collecting zope.interface
  Downloading zope.interface-5.3.0-cp27-cp27mu-manylinux2010_x86_64.whl (243 kB)
     |████████████████████████████████| 243 kB 14.6 MB/s 
Collecting transaction>=2.4
  Downloading transaction-3.0.1-py2.py3-none-any.whl (47 kB)
     |████████████████████████████████| 47 kB 4.4 MB/s 
Collecting BTrees>=4.2.0
  Downloading BTrees-4.7.2-cp27-cp27mu-manylinux2010_x86_64.whl (2.9 MB)
     |████████████████████████████████| 2.9 MB 15.5 MB/s 
Collecting ZConfig
  Downloading ZConfig-3.5.0-py2.py3-none-any.whl (134 kB)
     |████████████████████████████████| 134 kB 22.8 MB/s 
Collecting zodbpickle>=1.0.1
  Downloading zodbpickle-2.0.0-cp27-cp27mu-manylinux2010_x86_64.whl (309 kB)
     |████████████████████████████████| 309 kB 22.2 MB/s 
Collecting six
  Downloading six-1.15.0-py2.py3-none-any.whl (10 kB)
Processing /home/kirr/.cache/pip/wheels/94/b0/5b/1448d37a25d9b08c1495306d43197e935f1740da4076849e15/zdaemon-4.3-cp27-none-any.whl
Collecting trollius; python_version == "2.7"
  Downloading trollius-2.2.post1.tar.gz (274 kB)
     |████████████████████████████████| 274 kB 16.5 MB/s 
Collecting futures; python_version == "2.7"
  Downloading futures-3.3.0-py2-none-any.whl (16 kB)
Requirement already satisfied: setuptools in ./py2.venv/lib/python2.7/site-packages (from zc.lockfile->ZODB) (44.1.1)
Collecting cffi; platform_python_implementation == "CPython"
  Downloading cffi-1.14.5-cp27-cp27mu-manylinux1_x86_64.whl (389 kB)
     |████████████████████████████████| 389 kB 25.9 MB/s 
Collecting pycparser
  Downloading pycparser-2.20-py2.py3-none-any.whl (112 kB)
     |████████████████████████████████| 112 kB 21.0 MB/s 
Building wheels for collected packages: trollius
  Building wheel for trollius (setup.py) ... done
  Created wheel for trollius: filename=trollius-2.2.post1-py2-none-any.whl size=122121 sha256=a442356ce1ee9f0165d6c2ee4678e52d5c489d8e4f00e6945e18d62c017e6e37
  Stored in directory: /home/kirr/.cache/pip/wheels/d7/3a/73/2eedf02f638d8dc7dbcac4dd151f26a331d1a1053d49a85544
Successfully built trollius
Installing collected packages: zc.lockfile, zope.interface, pycparser, cffi, persistent, transaction, BTrees, ZConfig, zodbpickle, six, ZODB, zdaemon, futures, trollius, ZEO
Successfully installed BTrees-4.7.2 ZConfig-3.5.0 ZEO-5.2.2 ZODB-5.6.0 cffi-1.14.5 futures-3.3.0 persistent-4.6.4 pycparser-2.20 six-1.15.0 transaction-3.0.1 trollius-2.2.post1 zc.lockfile-2.0 zdaemon-4.3 zodbpickle-2.0.0 zope.interface-5.3.0

(py2.venv) kirr@teco:~/tmp/trashme/Z$ wget https://lab.nexedi.com/kirr/misc/raw/b9092e3/zodb/zopenrace-zeo.py
--2021-04-05 10:51:33--  https://lab.nexedi.com/kirr/misc/raw/b9092e3/zodb/zopenrace-zeo.py
Распознаётся lab.nexedi.com (lab.nexedi.com)… 176.31.129.213, 161.97.166.226
Подключение к lab.nexedi.com (lab.nexedi.com)|176.31.129.213|:443... соединение установлено.
HTTP-запрос отправлен. Ожидание ответа… 200 OK
Длина: 7100 (6,9K) [text/plain]
Сохранение в: «zopenrace-zeo.py»

zopenrace-zeo.py                100%[=======================================================>]   6,93K  --.-KB/s    за 0s      

2021-04-05 10:51:33 (17,1 MB/s) - «zopenrace-zeo.py» сохранён [7100/7100]

(py2.venv) kirr@teco:~/tmp/trashme/Z$ ll
итого 12
drwxr-xr-x 5 kirr kirr 4096 апр  5 10:51 py2.venv
-rw-r--r-- 1 kirr kirr 7100 апр  5 10:51 zopenrace-zeo.py

(py2.venv) kirr@teco:~/tmp/trashme/Z$ chmod a+x zopenrace-zeo.py 

(py2.venv) kirr@teco:~/tmp/trashme/Z$ ./zopenrace-zeo.py 
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "./zopenrace-zeo.py", line 160, in T1
    t1()
  File "./zopenrace-zeo.py", line 153, in t1
    raise AssertionError("T1: obj1.i (%d)  !=  obj2.i (%d)" % (i1, i2))
AssertionError: T1: obj1.i (21)  !=  obj2.i (20)

Traceback (most recent call last):
  File "./zopenrace-zeo.py", line 205, in <module>
    main()
  File "./zopenrace-zeo.py", line 200, in main
    raise AssertionError('FAIL')
AssertionError: FAIL

@d-maurer
Copy link
Contributor

d-maurer commented Apr 6, 2021 via email

@d-maurer
Copy link
Contributor

d-maurer commented Apr 6, 2021 via email

@d-maurer
Copy link
Contributor

d-maurer commented Apr 7, 2021 via email

navytux added a commit to navytux/pygolang that referenced this issue Apr 7, 2021
Because else sys.modules['__main__'] points to the module of gpython
itself, not user program. This difference can be important when e.g.
user defines classes in main module and tries to pickle them. Such
pickling will fail if gpython is not careful to run user's code in its
own main.

Without this patch added test fails as

      File "check_main.py", line 51, in <module>
        main()
      File "check_main.py", line 39, in main
        assert hasattr(mainmod, 'MyUniqueClassXYZ'),        dir(mainmod)
    AssertionError: ['__builtins__', '__doc__', '__file__', '__name__', '__package__', 'main']

The problem was discovered while trying to run test program from
zopefoundation/ZEO#166 :

    $ gpython zopenrace-zeo.py
    Traceback (most recent call last):
      File "/home/kirr/tmp/trashme/Z/py2.venv/bin/gpython", line 8, in <module>
        sys.exit(main())
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/gpython/__init__.py", line 395, in main
        pymain(argv, init)
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/gpython/__init__.py", line 266, in pymain
        run()
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/gpython/__init__.py", line 172, in run
        _execfile(filepath, g)
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/gpython/__init__.py", line 275, in _execfile
        six.exec_(code, globals, locals)
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/six.py", line 719, in exec_
        exec("""exec _code_ in _globs_, _locs_""")
      File "<string>", line 1, in <module>
      File "zopenrace-zeo.py", line 205, in <module>
        main()
      File "zopenrace-zeo.py", line 190, in main
        init()
      File "zopenrace-zeo.py", line 126, in init
        transaction.commit()
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_manager.py", line 257, in commit
        return self.manager.commit()
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_manager.py", line 134, in commit
        return self.get().commit()
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_transaction.py", line 282, in commit
        reraise(t, v, tb)
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_transaction.py", line 273, in commit
        self._commitResources()
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_transaction.py", line 456, in _commitResources
        reraise(t, v, tb)
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_transaction.py", line 430, in _commitResources
        rm.commit(self)
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/ZODB/Connection.py", line 497, in commit
        self._commit(transaction)
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/ZODB/Connection.py", line 546, in _commit
        self._store_objects(ObjectWriter(obj), transaction)
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/ZODB/Connection.py", line 578, in _store_objects
        p = writer.serialize(obj)  # This calls __getstate__ of obj
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/ZODB/serialize.py", line 430, in serialize
        return self._dump(meta, obj.__getstate__())
      File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/ZODB/serialize.py", line 439, in _dump
        self._p.dump(state)
    PicklingError: Can't pickle <class '__main__.PInt'>: attribute lookup __main__.PInt failed

/reviewed-by @jerome
/reviewed-on https://lab.nexedi.com/nexedi/pygolang/merge_requests/15
@navytux
Copy link
Contributor Author

navytux commented Apr 8, 2021

@d-maurer, thanks a lot for investigating.

@jmuchemb I am not sure that the requirement above is already sufficient:
at various interfaces, the tid and the associated invalidations
are tightly coupled. The requirement wants lastTransaction updated
after the processing of the invalidations. What happens when between
this update invalidations for a new transaction are processed?

Since it was me to originally push for this interface change, let me say that you are right.
If the follow-up invalidation is processed in between we are getting into scenario that is explained in zopefoundation/ZODB#290 and leads to corrupt view of data when updating to next transaction.

The interface should state not only that "process invalidations" is ordered with lastTransaction via "happens before" (Inv_x < L_y for x ≤ y), but also that non of further invalidations are processed yet (L_y < Inv_z for y < z).
This should remove the possibility of future invalidation getting in between invalidate and lastTransaction.

By default I let @jmuchemb comment and handle interface amendment since it was his version of text to be merged into interfaces instead of zopefoundation/ZODB#313.


Looking at #167 I'm also sorry to say that I missed that ZEO5 does not confirm to requested interface amendment. I verified it for ZEO4 (zopefoundation/ZODB#307 (comment)), and from zopefoundation/ZODB#307 (comment) got the impression it is also the case for ZEO5. But I never looked inside to verifiy that by myself. Let me appologize...

Regarding #167 changes I would also suggest to move RaceConditionTests to ZODB itself so that all storages are excercised for this bug, similarly to e.g. test in PackableStorage:

https://github.com/zopefoundation/ZODB/blob/5.6.0-19-gdad778016/src/ZODB/tests/PackableStorage.py#L103
https://github.com/zopefoundation/ZODB/blob/5.6.0-19-gdad778016/src/ZODB/tests/testFileStorage.py#L41-L57
https://github.com/zopefoundation/ZODB/blob/5.6.0-19-gdad778016/src/ZODB/tests/testDemoStorage.py#L45-L56

(see also zopefoundation/ZODB#290 (comment) for similar call)

Thanks again for looking into this issue,
Kirill

navytux added a commit to navytux/ZODB that referenced this issue Apr 14, 2021
Add test that exercises open vs invalidation race condition that, if
happen, leads to data corruption. We are seeing such race happening on
storage level in ZEO (zopefoundation/ZEO#166),
and previously we've seen it also to happen on Connection level
(zopefoundation#290). By adding this test
to be exercised wrt all storages we make sure that all storages stay
free from this race.

And it payed out. Besides catching original problems from
zopefoundation#290 and
zopefoundation/ZEO#166 , this test also
discovered a concurrency bug in MVCCMappingStorage:

    Failure in test check_race_open_vs_invalidate (ZODB.tests.testMVCCMappingStorage.MVCCMappingStorageTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 492, in check_race_open_vs_invalidate
        self.fail(failure[0])
      File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
        raise self.failureException(msg)
    AssertionError: T1: obj1.value (24)  !=  obj2.value (23)

The problem with MVCCMappingStorage was that instance.poll_invalidations
was correctly taking main_lock with intention to make sure main data is
not mutated during analysis, but instance.tpc_finish and
instance.tpc_abort did _not_ taken main lock, which was leading to
committed data to be propagating into main storage in non-atomic way.

This bug was also observable if both obj1 and obj2 in the added test
were always loaded from the storage (added obj2._p_invalidate after
obj1._p_invalidate).

-> Fix MVCCMappingStorage by correctly locking main MVCCMappingStorage
instance when processing transaction completion.

/cc @d-maurer, @jamadden, @jmuchemb
navytux added a commit to navytux/ZODB that referenced this issue Apr 21, 2021
Add test that exercises open vs invalidation race condition that, if
happen, leads to data corruption. We are seeing such race happening on
storage level in ZEO (zopefoundation/ZEO#166),
and previously we've seen it also to happen on Connection level
(zopefoundation#290). By adding this test
to be exercised wrt all storages we make sure that all storages stay
free from this race.

And it payed out. Besides catching original problems from
zopefoundation#290 and
zopefoundation/ZEO#166 , this test also
discovered a concurrency bug in MVCCMappingStorage:

    Failure in test check_race_open_vs_invalidate (ZODB.tests.testMVCCMappingStorage.MVCCMappingStorageTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 492, in check_race_open_vs_invalidate
        self.fail(failure[0])
      File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
        raise self.failureException(msg)
    AssertionError: T1: obj1.value (24)  !=  obj2.value (23)

The problem with MVCCMappingStorage was that instance.poll_invalidations
was correctly taking main_lock with intention to make sure main data is
not mutated during analysis, but instance.tpc_finish and
instance.tpc_abort did _not_ taken main lock, which was leading to
committed data to be propagating into main storage in non-atomic way.

This bug was also observable if both obj1 and obj2 in the added test
were always loaded from the storage (added obj2._p_invalidate after
obj1._p_invalidate).

-> Fix MVCCMappingStorage by correctly locking main MVCCMappingStorage
instance when processing transaction completion.

/cc @d-maurer, @jamadden, @jmuchemb
/reviewed-on zopefoundation#345
navytux added a commit to zopefoundation/ZODB that referenced this issue Apr 21, 2021
Add test that exercises open vs invalidation race condition that, if
happen, leads to data corruption. We are seeing such race happening on
storage level in ZEO (zopefoundation/ZEO#166),
and previously we've seen it also to happen on Connection level
(#290). By adding this test
to be exercised wrt all storages we make sure that all storages stay
free from this race.

And it payed out. Besides catching original problems from
#290 and
zopefoundation/ZEO#166 , this test also
discovered a concurrency bug in MVCCMappingStorage:

    Failure in test check_race_open_vs_invalidate (ZODB.tests.testMVCCMappingStorage.MVCCMappingStorageTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 492, in check_race_open_vs_invalidate
        self.fail(failure[0])
      File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
        raise self.failureException(msg)
    AssertionError: T1: obj1.value (24)  !=  obj2.value (23)

The problem with MVCCMappingStorage was that instance.poll_invalidations
was correctly taking main_lock with intention to make sure main data is
not mutated during analysis, but instance.tpc_finish and
instance.tpc_abort did _not_ taken main lock, which was leading to
committed data to be propagating into main storage in non-atomic way.

This bug was also observable if both obj1 and obj2 in the added test
were always loaded from the storage (added obj2._p_invalidate after
obj1._p_invalidate).

-> Fix MVCCMappingStorage by correctly locking main MVCCMappingStorage
instance when processing transaction completion.

/cc @d-maurer, @jamadden, @jmuchemb
/reviewed-on #345
@navytux navytux mentioned this issue Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants