-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another data corruption due to concurrency bug (ZEO5 regression; no external invalidations) #166
Comments
/cc @jimfulton |
Kirill Smelkov wrote at 2021-4-1 21:59 -0700:
( moving out from #155 (comment) since this seems to be a different problem )
Something really wrong is going on with ZEO5.
I tried to reproduce your observation but failed:
ZEO==5.2.2; ZODB==5.6.0; CPython==3.6.9
I used your `zopenrace.py` with the following changes:
```
...
from ZEO import server, DB
...
def main():
address, stop = server()
db = DB(address)
...
N = 10000
t1 = go(T1, N)
t2 = go(T2, N)
t1.join()
t2.join()
stop()
Apparently, the problem depends on further details.
|
@d-maurer, thanks for feedback. I checked a bit more from scratch and, contrary to the problem from #155 (which fails on both py2 and py3), the test program in this issue fails on CPython2, but does not fail on CPython3. The difference might be thus somehow related to trollius. To verify I used zoperace-zeo.py which is --- a/zopenrace.py.kirr0
+++ b/zopenrace.py
@@ -87,8 +87,7 @@
"""
from __future__ import print_function
-from ZODB import DB
-from ZODB.MappingStorage import MappingStorage
+from ZEO import server, DB
import transaction
from persistent import Persistent
@@ -110,8 +109,8 @@ def __init__(self, i):
def main():
- zstor = MappingStorage()
- db = DB(zstor)
+ address, stop = server()
+ db = DB(address)
# init initializes the database with two integer objects - obj1/obj2 that are set to 0.
@@ -194,6 +193,7 @@ def t2():
t2 = go(T2, N)
t1.join()
t2.join()
+ stop()
if not all(okv):
raise AssertionError('FAIL') For me it fails with CPython 2.7.18 on two different machines. For completeness I attach full log of one such verification. Please note, that I hit this ZEO bug only accidentally, while trying to find a place for my ZODB test programs in wendelin.core test suite, so that those test programs are run wrt all ZODB versions we care about (ZODB4, ZODB5 and ZODB4 with MVCC backport from ZODB5 (see https://lab.nexedi.com/nexedi/ZODB/commit/8e7eab33?expanded=1), and wrt all storages (FileStorage, ZEO and NEO) as explained in this patch. It is possible that the bug is indeed in trollius. It is also possible that the bug is in ZEO5 and only being more likely to hit with trollius. For now I did not looked into ZEO5 to verify it for concurrency-correctness at all. Thanks again for feedback, Log of verification with CPython 2.7.18 (+ ZODB-5.6.0, ZEO-5.2.2)kirr@teco:~/tmp/trashme/Z$ virtualenv -p python2 py2.venv
created virtual environment CPython2.7.18.final.0-64 in 713ms
creator CPython2Posix(dest=/home/kirr/tmp/trashme/Z/py2.venv, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/kirr/.local/share/virtualenv)
added seed packages: pip==20.3.4, pkg_resources==0.0.0, setuptools==44.1.1, wheel==0.34.2
activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator
kirr@teco:~/tmp/trashme/Z$ . py2.venv/bin/activate
(py2.venv) kirr@teco:~/tmp/trashme/Z$ python
Python 2.7.18 (default, Mar 9 2021, 11:09:26)
[GCC 10.2.1 20210110] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
(py2.venv) kirr@teco:~/tmp/trashme/Z$ pip install ZODB ZEO
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
Collecting ZODB
Downloading ZODB-5.6.0-py2.py3-none-any.whl (413 kB)
|████████████████████████████████| 413 kB 2.3 MB/s
Collecting ZEO
Downloading ZEO-5.2.2-py2.py3-none-any.whl (288 kB)
|████████████████████████████████| 288 kB 13.7 MB/s
Collecting zc.lockfile
Downloading zc.lockfile-2.0-py2.py3-none-any.whl (9.7 kB)
Collecting persistent>=4.4.0
Downloading persistent-4.6.4-cp27-cp27mu-manylinux2010_x86_64.whl (240 kB)
|████████████████████████████████| 240 kB 15.2 MB/s
Collecting zope.interface
Downloading zope.interface-5.3.0-cp27-cp27mu-manylinux2010_x86_64.whl (243 kB)
|████████████████████████████████| 243 kB 14.6 MB/s
Collecting transaction>=2.4
Downloading transaction-3.0.1-py2.py3-none-any.whl (47 kB)
|████████████████████████████████| 47 kB 4.4 MB/s
Collecting BTrees>=4.2.0
Downloading BTrees-4.7.2-cp27-cp27mu-manylinux2010_x86_64.whl (2.9 MB)
|████████████████████████████████| 2.9 MB 15.5 MB/s
Collecting ZConfig
Downloading ZConfig-3.5.0-py2.py3-none-any.whl (134 kB)
|████████████████████████████████| 134 kB 22.8 MB/s
Collecting zodbpickle>=1.0.1
Downloading zodbpickle-2.0.0-cp27-cp27mu-manylinux2010_x86_64.whl (309 kB)
|████████████████████████████████| 309 kB 22.2 MB/s
Collecting six
Downloading six-1.15.0-py2.py3-none-any.whl (10 kB)
Processing /home/kirr/.cache/pip/wheels/94/b0/5b/1448d37a25d9b08c1495306d43197e935f1740da4076849e15/zdaemon-4.3-cp27-none-any.whl
Collecting trollius; python_version == "2.7"
Downloading trollius-2.2.post1.tar.gz (274 kB)
|████████████████████████████████| 274 kB 16.5 MB/s
Collecting futures; python_version == "2.7"
Downloading futures-3.3.0-py2-none-any.whl (16 kB)
Requirement already satisfied: setuptools in ./py2.venv/lib/python2.7/site-packages (from zc.lockfile->ZODB) (44.1.1)
Collecting cffi; platform_python_implementation == "CPython"
Downloading cffi-1.14.5-cp27-cp27mu-manylinux1_x86_64.whl (389 kB)
|████████████████████████████████| 389 kB 25.9 MB/s
Collecting pycparser
Downloading pycparser-2.20-py2.py3-none-any.whl (112 kB)
|████████████████████████████████| 112 kB 21.0 MB/s
Building wheels for collected packages: trollius
Building wheel for trollius (setup.py) ... done
Created wheel for trollius: filename=trollius-2.2.post1-py2-none-any.whl size=122121 sha256=a442356ce1ee9f0165d6c2ee4678e52d5c489d8e4f00e6945e18d62c017e6e37
Stored in directory: /home/kirr/.cache/pip/wheels/d7/3a/73/2eedf02f638d8dc7dbcac4dd151f26a331d1a1053d49a85544
Successfully built trollius
Installing collected packages: zc.lockfile, zope.interface, pycparser, cffi, persistent, transaction, BTrees, ZConfig, zodbpickle, six, ZODB, zdaemon, futures, trollius, ZEO
Successfully installed BTrees-4.7.2 ZConfig-3.5.0 ZEO-5.2.2 ZODB-5.6.0 cffi-1.14.5 futures-3.3.0 persistent-4.6.4 pycparser-2.20 six-1.15.0 transaction-3.0.1 trollius-2.2.post1 zc.lockfile-2.0 zdaemon-4.3 zodbpickle-2.0.0 zope.interface-5.3.0
(py2.venv) kirr@teco:~/tmp/trashme/Z$ wget https://lab.nexedi.com/kirr/misc/raw/b9092e3/zodb/zopenrace-zeo.py
--2021-04-05 10:51:33-- https://lab.nexedi.com/kirr/misc/raw/b9092e3/zodb/zopenrace-zeo.py
Распознаётся lab.nexedi.com (lab.nexedi.com)… 176.31.129.213, 161.97.166.226
Подключение к lab.nexedi.com (lab.nexedi.com)|176.31.129.213|:443... соединение установлено.
HTTP-запрос отправлен. Ожидание ответа… 200 OK
Длина: 7100 (6,9K) [text/plain]
Сохранение в: «zopenrace-zeo.py»
zopenrace-zeo.py 100%[=======================================================>] 6,93K --.-KB/s за 0s
2021-04-05 10:51:33 (17,1 MB/s) - «zopenrace-zeo.py» сохранён [7100/7100]
(py2.venv) kirr@teco:~/tmp/trashme/Z$ ll
итого 12
drwxr-xr-x 5 kirr kirr 4096 апр 5 10:51 py2.venv
-rw-r--r-- 1 kirr kirr 7100 апр 5 10:51 zopenrace-zeo.py
(py2.venv) kirr@teco:~/tmp/trashme/Z$ chmod a+x zopenrace-zeo.py
(py2.venv) kirr@teco:~/tmp/trashme/Z$ ./zopenrace-zeo.py
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "./zopenrace-zeo.py", line 160, in T1
t1()
File "./zopenrace-zeo.py", line 153, in t1
raise AssertionError("T1: obj1.i (%d) != obj2.i (%d)" % (i1, i2))
AssertionError: T1: obj1.i (21) != obj2.i (20)
Traceback (most recent call last):
File "./zopenrace-zeo.py", line 205, in <module>
main()
File "./zopenrace-zeo.py", line 200, in main
raise AssertionError('FAIL')
AssertionError: FAIL |
Kirill Smelkov wrote at 2021-4-5 01:50 -0700:
***@***.***, thanks for feedback.
I checked a bit more from scratch and, contrary to the problem from #155 (which fails on both py2 and py3), the test program in this issue fails on CPython2, but does not fail on CPython3. The difference might be thus somehow related to trollius.
I could reproduce the problem here with Python 2 -- investigating.
|
Dieter Maurer wrote at 2021-4-6 07:22 +0200:
Kirill Smelkov wrote at 2021-4-5 01:50 -0700:
***@***.***, thanks for feedback.
>
>I checked a bit more from scratch and, contrary to the problem from #155 (which fails on both py2 and py3), the test program in this issue fails on CPython2, but does not fail on CPython3. The difference might be thus somehow related to trollius.
I could reproduce the problem here with Python 2 -- investigating.
The problem is obviously associated with a race condition: the failure
happens after different rounds, rarely not all.
To understand what happens, I have activated logging
(`address, stop = server(log="/tmp/zeo.log")`)
and instrumented some methods in `ZEO.ClientStorage` and
`ZODB.mvccadapter`:
```
from decorator import decorator
@decorator
def trace(f, *args, **kw):
logger.debug("%r %r %r", f, args, kw)
return f(*args, **kw)
```
I also stop the test as soon as the first failure is observed.
One run has shown the error already for the first modification.
Below, I show the corresponding log file.
The `_invalidate_finish` comes from the modifying client,
the `loadBefore` from the verifying client. The serial in the
last `loadBefore` shows that it already sees the new transaction.
A possible explanation could be that the new transaction was
observable before `_invalidate_finish` has been completed.
```
2021-04-06T09:46:48 INFO ZEO.runzeo (3694) opening storage '1' using MappingStorage
------
2021-04-06T09:46:48 INFO ZEO.StorageServer StorageServer created RW with storages: 1:RW:Mapping Storage
------
2021-04-06T09:46:48 DEBUG trollius Using selector: EpollSelector
------
2021-04-06T09:46:48 INFO ZEO.asyncio.server listening on (u'127.0.0.1', 0)
------
2021-04-06T09:46:48 DEBUG ZEO.tests.forker SERVER CREATED
------
2021-04-06T09:46:48 DEBUG ZEO.tests.forker ADDRESS SENT
------
2021-04-06T09:46:48 INFO ZEO.ClientStorage ('127.0.0.1', 37011) ClientStorage (pid=3694) created RW/normal for storage: '1'
------
2021-04-06T09:46:48 INFO ZEO.cache created temporary cache file '<fdopen>'
------
2021-04-06T09:46:48 DEBUG trollius Using selector: EpollSelector
------
2021-04-06T09:46:48 DEBUG ZEO.asyncio.client disconnected <ZEO.asyncio.client.Client object at 0x7f5503226f90> None
------
2021-04-06T09:46:48 DEBUG ZEO.asyncio.client try_connecting
------
2021-04-06T09:46:48 DEBUG ZEO.asyncio.server Accepted connection
------
2021-04-06T09:46:48 INFO ZEO.asyncio.base Connected server protocol
------
2021-04-06T09:46:48 INFO ZEO.asyncio.base Connected Protocol(('127.0.0.1', 37011), '1', 0)
------
2021-04-06T09:46:48 INFO ZEO.asyncio.server received handshake 'Z5'
------
2021-04-06T09:46:48 INFO ZEO.ClientStorage ('127.0.0.1', 37011) Connected to storage: ('localhost', 37011)
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x01') {}
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 commit <Connection at 7f55027b3210>
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') lock: transactions waiting: 0
------
2021-04-06T09:46:48 BLATHER ZEO.StorageServer (127.0.0.1:37011) Preparing to commit transaction: 1 objects, 96 bytes
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') unlock: transactions waiting: 0
------
2021-04-06T09:46:48 DEBUG ZODB.mvccadapter <function _invalidate_finish at 0x7f550c9ea8d0> (<ZODB.mvccadapter.MVCCAdapter object at 0x7f55027b31d0>, '\x03\xdf1\x92\xd0\x88\x02\xdd', set(['\x00\x00\x00\x00\x00\x00\x00\x00']), <ZODB.mvccadapter.MVCCAdapterInstance object at 0x7f55027b32d0>) {}
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 commit
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 new transaction
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 commit <Connection at 7f55027b3210>
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') lock: transactions waiting: 0
------
2021-04-06T09:46:48 BLATHER ZEO.StorageServer (127.0.0.1:37011) Preparing to commit transaction: 3 objects, 291 bytes
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') unlock: transactions waiting: 0
------
2021-04-06T09:46:48 DEBUG ZODB.mvccadapter <function _invalidate_finish at 0x7f550c9ea8d0> (<ZODB.mvccadapter.MVCCAdapter object at 0x7f55027b31d0>, '\x03\xdf1\x92\xd0\x8c\xaf\xaa', set(['\x00\x00\x00\x00\x00\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x01', '\x00\x00\x00\x00\x00\x00\x00\x02']), <ZODB.mvccadapter.MVCCAdapterInstance object at 0x7f55027b32d0>) {}
------
2021-04-06T09:46:48 DEBUG txn.140003321820992 commit
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG txn.140003071555328 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x00', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x00', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x02', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003071555328 commit <Connection at 7f55027b3dd0>
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') lock: transactions waiting: 0
------
2021-04-06T09:46:48 BLATHER ZEO.StorageServer (127.0.0.1:37011) Preparing to commit transaction: 2 objects, 194 bytes
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG ZEO.StorageServer (127.0.0.1:37011) ('1') unlock: transactions waiting: 0
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x8c\xaf\xab') {}
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 abort
------
2021-04-06T09:46:48 DEBUG txn.140003079948032 new transaction
------
2021-04-06T09:46:48 DEBUG ZODB.mvccadapter <function _invalidate_finish at 0x7f550c9ea8d0> (<ZODB.mvccadapter.MVCCAdapter object at 0x7f55027b31d0>, '\x03\xdf1\x92\xd0\x93\xdc\xbb', set(['\x00\x00\x00\x00\x00\x00\x00\x01', '\x00\x00\x00\x00\x00\x00\x00\x02']), <ZODB.mvccadapter.MVCCAdapterInstance object at 0x7f55027b3e10>) {}
------
2021-04-06T09:46:48 DEBUG ZEO.ClientStorage <function loadBefore at 0x7f550322e7d0> (<ZEO.ClientStorage.ClientStorage object at 0x7f550426f610>, '\x00\x00\x00\x00\x00\x00\x00\x01', '\x03\xdf1\x92\xd0\x93\xdc\xbc') {}
------
2021-04-06T09:46:48 DEBUG txn.140003071555328 commit
```
|
Dieter Maurer wrote at 2021-4-6 10:10 +0200:
Dieter Maurer wrote at 2021-4-6 07:22 +0200:
>Kirill Smelkov wrote at 2021-4-5 01:50 -0700:
***@***.***, thanks for feedback.
>>
>>I checked a bit more from scratch and, contrary to the problem from #155 (which fails on both py2 and py3), the test program in this issue fails on CPython2, but does not fail on CPython3. The difference might be thus somehow related to trollius.
>
>I could reproduce the problem here with Python 2 -- investigating.
The problem is understood and related to PR #307:
during the development of this PR, it was recognized that it is vital
when the value returned by `lastTransaction` changes.
The `lastTransaction` interface now specifies that `lastTransaction`
must return the last transaction for which processing of
invalidations is complete.
@jmuchemb I am not sure that the requirement above is already sufficient:
at various interfaces, the tid and the associated invalidations
are tightly coupled. The requirement wants `lastTransaction` updated
after the processing of the invalidations. What happens when between
this update invalidations for a new transaction are processed?
For `ZEO.ClientStorage.ClientStorage`,
the `lastTransaction` value actually comes from
its cache. Its `LastTid` is regularly set before invalidation processing.
Thus, `ClientStorage` does not satisfy the requirement above.
|
Because else sys.modules['__main__'] points to the module of gpython itself, not user program. This difference can be important when e.g. user defines classes in main module and tries to pickle them. Such pickling will fail if gpython is not careful to run user's code in its own main. Without this patch added test fails as File "check_main.py", line 51, in <module> main() File "check_main.py", line 39, in main assert hasattr(mainmod, 'MyUniqueClassXYZ'), dir(mainmod) AssertionError: ['__builtins__', '__doc__', '__file__', '__name__', '__package__', 'main'] The problem was discovered while trying to run test program from zopefoundation/ZEO#166 : $ gpython zopenrace-zeo.py Traceback (most recent call last): File "/home/kirr/tmp/trashme/Z/py2.venv/bin/gpython", line 8, in <module> sys.exit(main()) File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/gpython/__init__.py", line 395, in main pymain(argv, init) File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/gpython/__init__.py", line 266, in pymain run() File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/gpython/__init__.py", line 172, in run _execfile(filepath, g) File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/gpython/__init__.py", line 275, in _execfile six.exec_(code, globals, locals) File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/six.py", line 719, in exec_ exec("""exec _code_ in _globs_, _locs_""") File "<string>", line 1, in <module> File "zopenrace-zeo.py", line 205, in <module> main() File "zopenrace-zeo.py", line 190, in main init() File "zopenrace-zeo.py", line 126, in init transaction.commit() File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_manager.py", line 257, in commit return self.manager.commit() File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_manager.py", line 134, in commit return self.get().commit() File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_transaction.py", line 282, in commit reraise(t, v, tb) File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_transaction.py", line 273, in commit self._commitResources() File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_transaction.py", line 456, in _commitResources reraise(t, v, tb) File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/transaction/_transaction.py", line 430, in _commitResources rm.commit(self) File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/ZODB/Connection.py", line 497, in commit self._commit(transaction) File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/ZODB/Connection.py", line 546, in _commit self._store_objects(ObjectWriter(obj), transaction) File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/ZODB/Connection.py", line 578, in _store_objects p = writer.serialize(obj) # This calls __getstate__ of obj File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/ZODB/serialize.py", line 430, in serialize return self._dump(meta, obj.__getstate__()) File "/home/kirr/tmp/trashme/Z/py2.venv/lib/python2.7/site-packages/ZODB/serialize.py", line 439, in _dump self._p.dump(state) PicklingError: Can't pickle <class '__main__.PInt'>: attribute lookup __main__.PInt failed /reviewed-by @jerome /reviewed-on https://lab.nexedi.com/nexedi/pygolang/merge_requests/15
@d-maurer, thanks a lot for investigating.
Since it was me to originally push for this interface change, let me say that you are right. The interface should state not only that "process invalidations" is ordered with lastTransaction via "happens before" ( By default I let @jmuchemb comment and handle interface amendment since it was his version of text to be merged into interfaces instead of zopefoundation/ZODB#313. Looking at #167 I'm also sorry to say that I missed that ZEO5 does not confirm to requested interface amendment. I verified it for ZEO4 (zopefoundation/ZODB#307 (comment)), and from zopefoundation/ZODB#307 (comment) got the impression it is also the case for ZEO5. But I never looked inside to verifiy that by myself. Let me appologize... Regarding #167 changes I would also suggest to move https://github.com/zopefoundation/ZODB/blob/5.6.0-19-gdad778016/src/ZODB/tests/PackableStorage.py#L103 (see also zopefoundation/ZODB#290 (comment) for similar call) Thanks again for looking into this issue, |
Add test that exercises open vs invalidation race condition that, if happen, leads to data corruption. We are seeing such race happening on storage level in ZEO (zopefoundation/ZEO#166), and previously we've seen it also to happen on Connection level (zopefoundation#290). By adding this test to be exercised wrt all storages we make sure that all storages stay free from this race. And it payed out. Besides catching original problems from zopefoundation#290 and zopefoundation/ZEO#166 , this test also discovered a concurrency bug in MVCCMappingStorage: Failure in test check_race_open_vs_invalidate (ZODB.tests.testMVCCMappingStorage.MVCCMappingStorageTests) Traceback (most recent call last): File "/usr/lib/python2.7/unittest/case.py", line 329, in run testMethod() File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 492, in check_race_open_vs_invalidate self.fail(failure[0]) File "/usr/lib/python2.7/unittest/case.py", line 410, in fail raise self.failureException(msg) AssertionError: T1: obj1.value (24) != obj2.value (23) The problem with MVCCMappingStorage was that instance.poll_invalidations was correctly taking main_lock with intention to make sure main data is not mutated during analysis, but instance.tpc_finish and instance.tpc_abort did _not_ taken main lock, which was leading to committed data to be propagating into main storage in non-atomic way. This bug was also observable if both obj1 and obj2 in the added test were always loaded from the storage (added obj2._p_invalidate after obj1._p_invalidate). -> Fix MVCCMappingStorage by correctly locking main MVCCMappingStorage instance when processing transaction completion. /cc @d-maurer, @jamadden, @jmuchemb
Add test that exercises open vs invalidation race condition that, if happen, leads to data corruption. We are seeing such race happening on storage level in ZEO (zopefoundation/ZEO#166), and previously we've seen it also to happen on Connection level (zopefoundation#290). By adding this test to be exercised wrt all storages we make sure that all storages stay free from this race. And it payed out. Besides catching original problems from zopefoundation#290 and zopefoundation/ZEO#166 , this test also discovered a concurrency bug in MVCCMappingStorage: Failure in test check_race_open_vs_invalidate (ZODB.tests.testMVCCMappingStorage.MVCCMappingStorageTests) Traceback (most recent call last): File "/usr/lib/python2.7/unittest/case.py", line 329, in run testMethod() File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 492, in check_race_open_vs_invalidate self.fail(failure[0]) File "/usr/lib/python2.7/unittest/case.py", line 410, in fail raise self.failureException(msg) AssertionError: T1: obj1.value (24) != obj2.value (23) The problem with MVCCMappingStorage was that instance.poll_invalidations was correctly taking main_lock with intention to make sure main data is not mutated during analysis, but instance.tpc_finish and instance.tpc_abort did _not_ taken main lock, which was leading to committed data to be propagating into main storage in non-atomic way. This bug was also observable if both obj1 and obj2 in the added test were always loaded from the storage (added obj2._p_invalidate after obj1._p_invalidate). -> Fix MVCCMappingStorage by correctly locking main MVCCMappingStorage instance when processing transaction completion. /cc @d-maurer, @jamadden, @jmuchemb /reviewed-on zopefoundation#345
Add test that exercises open vs invalidation race condition that, if happen, leads to data corruption. We are seeing such race happening on storage level in ZEO (zopefoundation/ZEO#166), and previously we've seen it also to happen on Connection level (#290). By adding this test to be exercised wrt all storages we make sure that all storages stay free from this race. And it payed out. Besides catching original problems from #290 and zopefoundation/ZEO#166 , this test also discovered a concurrency bug in MVCCMappingStorage: Failure in test check_race_open_vs_invalidate (ZODB.tests.testMVCCMappingStorage.MVCCMappingStorageTests) Traceback (most recent call last): File "/usr/lib/python2.7/unittest/case.py", line 329, in run testMethod() File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 492, in check_race_open_vs_invalidate self.fail(failure[0]) File "/usr/lib/python2.7/unittest/case.py", line 410, in fail raise self.failureException(msg) AssertionError: T1: obj1.value (24) != obj2.value (23) The problem with MVCCMappingStorage was that instance.poll_invalidations was correctly taking main_lock with intention to make sure main data is not mutated during analysis, but instance.tpc_finish and instance.tpc_abort did _not_ taken main lock, which was leading to committed data to be propagating into main storage in non-atomic way. This bug was also observable if both obj1 and obj2 in the added test were always loaded from the storage (added obj2._p_invalidate after obj1._p_invalidate). -> Fix MVCCMappingStorage by correctly locking main MVCCMappingStorage instance when processing transaction completion. /cc @d-maurer, @jamadden, @jmuchemb /reviewed-on #345
( moving out from #155 (comment) since this seems to be a different problem )
Something really wrong is going on with ZEO5.
In addition to the test program provided in #155, where there are 8 simultaneous workers each of which opens its own storage-level connection, I've tried to run the test program from zopefoundation/ZODB#290 with minimal edit to use ZEO instead of
MappingStorage
, and it fails too.In that test program there is only one storage-level connection and two simultaneous users that are using that single storage-level connection. In other words there are no external invalidations involved at all.
https://lab.nexedi.com/kirr/wendelin.core/blob/e2dc8651/lib/tests/testprog/zopenrace.py#L102-203
This failure indicates data corruption. Quoting zopefoundation/ZODB#290:
---- 8< ----
That in turn can lead to loose of Consistency of the database if mix of current
and old data is used to process a transaction. A classic example would be bank
accounts A, B and C with A<-B and A<-C transfer transactions. If transaction
that handles A<-C sees stale data for A when starting its processing, it
results in A loosing what it should have received from B.
...
The program simulates two clients: one (T2) constantly modifies two integer
objects preserving invariant that their values stay equal. The other client
(T1) constantly opens the database and verifies the invariant. T1 forces access
to one of the object to always go through loading from the database, and this
way if live cache becomes stale the bug is observed as invariant breakage.
---- 8< ----
Here is how it fails with ZEO:
P.S. Suggested patch from #155 (comment) does not help here - even though it seemingly fixes #155, it does not help to fix zopefoundation/ZODB#290 with ZEO5.
P.P.S. ZEO
5.2.2-1-g3d90ed42
, ZODB5.6.0-18-g75c703fcf
, CPython2.7.18
The text was updated successfully, but these errors were encountered: