Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption due to race condition with external invalidations (ZEO5 regression) #155

Closed
jmuchemb opened this issue Jan 21, 2020 · 10 comments · Fixed by #169
Closed

Data corruption due to race condition with external invalidations (ZEO5 regression) #155

jmuchemb opened this issue Jan 21, 2020 · 10 comments · Fixed by #169

Comments

@jmuchemb
Copy link
Member

jmuchemb commented Jan 21, 2020

I submitted zopefoundation/ZODB#291 to fix a similar issue (zopefoundation/ZODB#290) but one of the scripts written by @navytux still fails with ZEO5, when each worker thread has its own connection to the ZEO server. I was also unable to reproduce this issue with NEO, so I think it is a different bug, which is in ZEO5.

The script is https://gist.github.com/jmuchemb/b082dadd28c8a6a55c68e510b49873fd

@jamadden Can you try it with RelStorage ?

@jamadden
Copy link
Member

I ran the script with two different RelStorage backends and didn't see the AssertionError after several minutes. I also tried gevent instead of native threads, because that can change timings, and also saw no AssertionError.

@jmuchemb
Copy link
Member Author

(I edited the issue description because it's not only about Connection.open())

I think there's a race condition between Client.loadBefore and Client.invalidateTransaction. That's what the cache trace shows:

                   loads    hits  inv(h)  writes hitrate
Jan 21 20:26:41 00 '' 0000000000000000 0000000000000000 -
Jan 21 20:26-26       0       0       0       0     n/a
Jan 21 20:26:41 ==================== Restart ====================
Jan 21 20:26:41 20                0 0000000000000000 0000000000000000 -
Jan 21 20:26:41 24                0 0000000000000000 03d5510eaf3e4145 -
Jan 21 20:26:41 52                0 03d5510eaf3e4144 0000000000000000 - 122
Jan 21 20:26:41 22                0 03d5510eaf3e4144 0000000000000000 - 122
Jan 21 20:26:41 22                0 03d5510eaf3e4144 0000000000000000 - 122
# Load obj1
Jan 21 20:26:41 20                1 0000000000000000 0000000000000000 -
Jan 21 20:26:41 24                1 0000000000000000 03d5510eaf3e4145 -
Jan 21 20:26:41 52                1 03d5510eaf3e4144 0000000000000000 - 32
# Load obj2
Jan 21 20:26:41 20                2 0000000000000000 0000000000000000 -
Jan 21 20:26:41 24                2 0000000000000000 03d5510eaf3e4145 -
# Waiting from obj2 to be loaded from server.
# Process invalidations from a recently committed transactions.
Jan 21 20:26:41 1c                1 03d5510eaf7847cc 0000000000000000 -
Jan 21 20:26:41 10                2 03d5510eaf7847cc 0000000000000000 -
# obj2 received from server: end_tid is wrong.
Jan 21 20:26:41 52                2 03d5510eaf3e4144 0000000000000000 - 32
[...]
        Count Code Function (action)
            1  00  _setup_trace (initialization)
            1  10  invalidate (miss)
            1  1c  invalidate (hit, saving non-current)
            4  20  load (miss)
            3  22  load (hit)
            4  24  load (non-current, miss)
            4  52  store (current, non-version)

By looking at more the client code, I have 2 hypothesis:

  • the IO thread does not process packets in the same order they were received, in which case there would be many other race conditions
  • there's a race condition at the server side

Note that ClientCache.loadBefore may call ClientCache.load and on cache miss, it results in 2 cache traces. But ClientCache.loadBefore is also called from both ClientStorage.loadBefore and Client.load_before_threadsafe: if I hadn't commented the former during my tests, there would be up to 4 cache traces before loading from the server.

@jmuchemb
Copy link
Member Author

Same issue with Python 3.7 so it wouldn't be a bug in trollius.

Here is a new test:

Jan 22 16:42:00 24                1 0000000000000000 03d555ce02153934 -
Jan 22 16:42:00 10                1 03d555ce023c9211 0000000000000000 -
Jan 22 16:42:00 10                2 03d555ce023c9211 0000000000000000 -
Jan 22 16:42:00 52                1 03d555ce02153933 0000000000000000 - 37

And at server side, I have logs showing the following events (processed in the same thread):

  1. loadBefore(oid=1) completed for the failing client, with end_tid=None
  2. ZEOStorage._invalidate called for 03d555ce023c9211

At this level, for this run, the server worked as expected, and somewhere else the 2 messages to the client were swapped.

I stop here for now. I have enough evidence that my ZODB PR can be merged, and we're still far from using ZEO5 in our company.

While working on this issue, cache traces show something that seems to be another ZEO5 bug: I find 062cbec suspicious because it does not handle the case where several oids conflict. Or maybe this commit can simply be reverted.

@navytux
Copy link
Contributor

navytux commented Jan 23, 2020

Here is original cleaned test program that exposes this bug: https://lab.nexedi.com/kirr/wendelin.core/blob/ecd0e7f0/zloadrace5.py

Let me quote its description for completness:

---- 8< ----
Program zloadrace5.py demonstrates concurrency bug in ZODB5/ZEO5 that leads to data corruption.

The bug was not fully analyzed, but offhand it looks like ZEO5 does not properly synchronize on-client loads with invalidations which lead to stale live cache in ZODB connection and corrupt data provided by ZODB to application similarly to zopefoundation/ZODB#290.

The program simulates eight clients: every client once in a while modifies two integer objects preserving invariant that their values stay equal. At every iteration each client also verifies the invariant with access to one of the objects always going through loading from the database. This way if live cache becomes stale the bug is observed as invariant breakage.

Here is example failure:

$ ./zloadrace5.py     
No handlers could be found for logger "ZEO.asyncio.server"     
Traceback (most recent call last):     
  File "./zloadrace5.py", line 139, in <module>     
    main()     
  File "</home/kirr/src/tools/py/decorator/src/decorator.pyc:decorator-gen-1>", line 2, in main
  File "/home/kirr/src/tools/go/pygolang/golang/__init__.py", line 100, in _     
    return f(*argv, **kw)     
  File "./zloadrace5.py", line 123, in main     
    wg.wait()     
  File "golang/_sync.pyx", line 198, in golang._sync.PyWorkGroup.wait     
    pyerr_reraise(pyerr)     
  File "golang/_sync.pyx", line 178, in golang._sync.PyWorkGroup.go.pyrunf     
    f(pywg._pyctx, *argv, **kw)     
  File "</home/kirr/src/tools/py/decorator/src/decorator.pyc:decorator-gen-3>", line 2, in T  
  File "/home/kirr/src/tools/go/pygolang/golang/__init__.py", line 100, in _     
    return f(*argv, **kw)     
  File "./zloadrace5.py", line 112, in T     
    t1()     
  File "</home/kirr/src/tools/py/decorator/src/decorator.pyc:decorator-gen-8>", line 2, in t1 
  File "/home/kirr/src/tools/go/pygolang/golang/__init__.py", line 100, in _     
    return f(*argv, **kw)     
  File "./zloadrace5.py", line 94, in t1     
    raise AssertionError("T%s: obj1.i (%d)  !=  obj2.i (%d)" % (name, i1, i2))     
AssertionError: T7: obj1.i (1)  !=  obj2.i (0)     

NOTE ZODB4/ZEO4 do not expose this load vs invalidation race because there the issue was specially cared about. Here is relevant part of Connection._setstate() on ZODB4:

https://github.com/zopefoundation/ZODB/blob/4.4.5-4-g7a1a49111/src/ZODB/Connection.py#L949-L964

Basically it is

  1. check if oid was already invalidated and, if yes, load from storage with before=._txn_time i.e. transaction ID of first invalidated transaction this connection received after its transaction begin;
  2. if oid was not invalidated - call zstor.load(oid) with before=None - i.e. load latest data for that oid;
  3. check again after load whether oid was invalidated, and if yes - reload data again with before=._txn_time.

One can suppose that there can be a race window in that a new transaction was committed before load in 2, but corresponding invalidation messages were not yet sent by server, or not yet processed on client. If any of that were true, it would result in breakage of Isolation property so that in-progress
transaction would observe changes from transactions going simultaneously to it.

After some analysis it looks like it should not be the case:

ZEO4 server explicitly guarantees that it does not mix processing load requests inside tpc_finish + send invalidations. This way if load is processed after new commit, load reply is guaranteed to come to client after invalidation message. This was explicitly fixed by 71eb1456 (search for callAsyncNoPoll there) and later again by 94f275c3.

HOWEVER

ZODB5 shifts MVCC handling into storage layer and this way there is no 1-2-3 from the above in Connection.setstate(). ZEO5 server was also significantly reworked compared to ZEO4 and it looks like that rework reintroduced some concurrency bugs that lead to corrupt data.
---- 8< ----

/cc @jimfulton

P.S. I would also correct the subject of this issue from "Date corruption ..." to "Data corrpution ..." as date corruption is something different and more innocent.

@jmuchemb jmuchemb changed the title Date corruption due to race condition with external invalidations Data corruption due to race condition with external invalidations Jan 23, 2020
@jmuchemb
Copy link
Member Author

jmuchemb commented Feb 1, 2020

I found some time to look at further. By dumping packets between the client and the server, I could find that they're properly ordered, which means that the bug is on the client side. So I had less code to read and found that the bug is in Protocol.message_received (ZEO.asyncio.client):

  • server notifications like invalidations are processed immediately
  • replies are just set to the related future object: some futures process callbacks immediately (special Fut class in ZEO), other don't (e.g. asyncio.Future), and it's asyncio.Future in the case of loadBefore

The consequence is that the load_before callback (which stores the result in the cache) is called after that the next received message is processed. At the same time, I found that in case of concurrent loads of the same oid/serial, the result could be saved several times into the cache: maybe not buggy but not ideal for performance.

What I suggest is to do like for other RPCs, except that Fut is too simple in that only 1 callback can be added. Here is a patch that fixes the issue (without regression in the test suite):

--- a/src/ZEO/asyncio/client.py
+++ b/src/ZEO/asyncio/client.py
@@ -252,10 +252,19 @@ def load_before(self, oid, tid):
         message_id = (oid, tid)
         future = self.futures.get(message_id)
         if future is None:
-            future = asyncio.Future(loop=self.loop)
+            future = FutureManyImmediateCallbacks()
             self.futures[message_id] = future
             self._write(
                 self.encode(message_id, False, 'loadBefore', (oid, tid)))
+            @future.add_done_callback
+            def _(future):
+                try:
+                    data = future.result()
+                except Exception:
+                    return
+                if data:
+                    data, start, end = data
+                    self.client.cache.store(oid, start, end, data)
         return future
 
     # Methods called by the server.
@@ -608,9 +617,6 @@ def load_before_threadsafe(self, future, wait_ready, oid, tid):
                 future.set_exception(exc)
             else:
                 future.set_result(data)
-                if data:
-                    data, start, end = data
-                    self.cache.store(oid, start, end, data)
         elif wait_ready:
             self._when_ready(
                 self.load_before_threadsafe, future, wait_ready, oid, tid)
@@ -908,3 +914,21 @@ def result(self):
             raise self.exc
         else:
             return self._result
+
+class FutureManyImmediateCallbacks(Fut):
+
+    def __init__(self):
+        self.cb = []
+
+    def add_done_callback(self, cb):
+        self.cb.append(cb)
+
+    def set_exception(self, exc):
+        self.exc = exc
+        for cb in self.cb:
+            cb(self)
+
+    def set_result(self, result):
+        self._result = result
+        for cb in self.cb:
+            cb(self)

Not sure it's useful to keep an optimized class (Fut) for the case there can only be one callback. Or we should rename Fut.

navytux added a commit to navytux/ZODB that referenced this issue Nov 30, 2020
This backports to ZODB4 Connection ZODB5's approach to handle MVCC via
always calling storage.loadBefore() instead of "load for latest version
+ loadBefore if we were notified of database changes" approach.

Why?
----

Short answer: because Wendelin.core 2 needs to know at which particular
database state application-level ZODB connection is viewing the
database, and it is hard to implement such functionality correctly
without this backport. Please see appendix for the explanation.

What
----

This backports to ZODB4 the minimum necessary part of upstream commit 227953b
(Simplify MVCC by determining transaction start time using lastTransaction) +
follow-up correctness fixes:

zopefoundation#50
zopefoundation#56
zopefoundation#291
zopefoundation#307

In short:

- a Connection is always opened with explicitly corresponding to a particular database revision
- Connection uses only loadBefore with that revision to load objects
- every time a Connection is (re)opened, the result of queued invalidations and
  explicit query to storage.lastTransaction is carefully merged to refresh
  Connection's idea about which database state it corresponds to.

The "careful" in last point is important. Historically ZODB5 was first reworked
in commit 227953b (zopefoundation#56) to always
call lastTransaction to refresh state of Connection view. Since there
was no proper synchronisation with respect to process of handling
invalidations, that lead to data corruption issue due to race in
Connection.open() vs invalidations:

zopefoundation#290

That race and data corruption was fixed in commit b5895a5
(zopefoundation#291) by way of avoiding
lastTransaction call and relying only on invalidations channel when
refreshing Connection view.

This fix in turn led to another data corruption issue because in
presence of client-server reconnections, ZODB storage drivers can partly
skip notifying client with detailed invalidation messages:

zopefoundation#291 (comment)

A fix to that issue (zopefoundation#307)
proposed to change back to query storage for lastTransaction on every
Connection refresh, but to implement careful merging of lastTransaction
result and data from invalidation channel. However it was found that the
"careful merging" can work correctly only if we require from storage
drivers a particular ordering of invalidation events wrt lastTransaction
return and result:

zopefoundation#307 (comment)

While ZEO was already complying with that requirements, NEO had to be
fixed to support that:

zopefoundation#307 (comment)
https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec
https://lab.nexedi.com/nexedi/neoppod/commit/96a5c01f

Patch details
-------------

We change Connection._txn_time to be a "before" for the database state
to which Connection view corresponds. This state is hooked to be
initialized and updated in Connection._flush_invalidations - the
function that is called from both explicit Connection (re)open and at
transaction boundaries via Connection.afterCompletion hook.

Objects loading is factored into Connection._load which replaces old
"load + check invalidated + fallback to loadBefore" game in
Connection._setstate.

Connection.open now calls Connection._flush_invalidations
unconditionally - even if it was global cache reset event - because
besides invalidation flushes the latter is now responsible for querying
storage lastTransaction.

TmpStore - a "storage" that provides runtime support for savepoints - is
refactored correspondingly to delegate loading of original objects back
to underlying Connection.

DB.close is modified - similarly to ZODB5 - to release DB's Connections
carefully with preventing connections from DB poll from implicitly
starting new transactions via afterCompletion hook.

ZODB.nxd_patches is introduced to indicate to client software that this
particular patch is present and can be relied upon.

Tests are updated correspondingly. In 227953b Jim talks about
converting many tests - because

	"Lots of tests didn't clean up databases and connections properly"

and because new MVCC approach

	"makes database and connection hygiene a bit more important,
	especially for tests, because a connection will continue to interact
	with storages if it isn't properly closed, which can lead to errors if
	the storage is closed."

but finally implementing automatic cleanup at transaction boundaries
because there are too many tests to fix. We backport only automatic
cleanup + necessary explicit test fixes to keep the diff minimal.

All tests pass. This includes tests for ZODB itself, ZEO and NEO test
over hereby modified ZODB(*), my test programs from

zopefoundation#290	and
zopefoundation/ZEO#155

and ERP5 tests. Upcoming wendelin.core 2 also work with this change.

(*) ZEO, NEO and ERP5 tests fail sometimes, but there is no regression
here because ZEO, NEO and ERP5 tests are failing regularly, and in the
same way, even with unmodified ZODB.

Appendix. zconn_at
------------------

This appendix provides motivation for the backport:

For wendelin.core v2 we need a way to know at which particular database
state application-level ZODB connection is viewing the database. Knowing
that state, WCFS client library interacts with WCFS filesystem server
and, in simple terms, requests the server to provide data as of that
particular database state. Let us call the function that for a client
ZODB connection returns database state corresponding to its database
view zconn_at.

Now here is the problem: zconn_at is relatively easy to implement for
ZODB5 - see e.g. here:

https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.13-54-ga6a8f5b/lib/zodb.py#L142-181
https://lab.nexedi.com/nexedi/wendelin.core/commit/3bd82127

however, for ZODB4, since its operational models is not
directly MVCC, it is not that straightforward. Still, even for older
ZODB4, for every client connection, there _is_ such at that corresponds
to that connection view of the database.

We need ZODB4 support, because ZODB4 is currently the version that
Nexedi uses, and my understanding is that it will stay like this for not
a small time. I have the feeling that ZODB5 was reworked in better
direction, but without caring enough about quality which resulted in
concurrency bugs with data corruption effects like

zopefoundation#290
zopefoundation/ZEO#155
etc.

Even though the first one is now fixed (but it broke other parts and so
both ZODB had to be fixed again _and_ NEO had to be fixed for that ZODB
fix to work currently), I feel that upgrading into ZODB5 for Nexedi will
require some non-negligible amount of QA work, and thus it is better if
we move step-by-step - even if we eventually upgrade to ZODB5 - it is
better we first migrate wendelin.core 1 -> wendelin.core 2 with keeping
current version of ZODB.

Now please note what would happen if zconn_at gives, even a bit, wrong
answer: wcfs client will ask wcfs server to provide array data as of
different database state compared to current on-client ZODB connection.
This will result in that data accessed via ZBigArray will _not_
correspond to all other data accessed via regular ZODB mechanism.
It is, in other words, a data corruptions.
In some scenarios it can be harmless, but generally it is the worst
that can happen to a database.

It is good to keep in mind ZODB issue290 when imagining corner cases
that zconn_at has to deal with. Even though that issue is ZODB5 only, it
shows what kind of bugs it can be in zconn_at implementation for ZODB4.

Just for the reference: in Wendelin-based systems there is usually constant
stream of database ingestions coming from many places simultaneously. Plus many
activities crunching on the DB at the same time as well. And the more clients a
system handles, the more there will be level-of-concurrency increase. This
means that the problem of correctly handling concurrency issues in zconn_at is
not purely theoretical, but has direct relation to our systems.

--------

With this backport, zconn_at for ZODB4 becomes trivial and robust to implement:

https://lab.nexedi.com/kirr/wendelin.core/blob/484071b3/lib/zodb.py#L183-195

I would like to thank Joshua Wölfel whose internship helped this topic
to shape up:

https://www.erp5.com/project_section/wendelin-ia-project/forum/Joshua-internship-D8b7NNhWfz

/cc @Nexedi, @jwolf083
Signed-off-by: Kirill Smelkov <[email protected]>
@navytux
Copy link
Contributor

navytux commented Nov 30, 2020

For the reference: I've backported how ZODB.Connection implements MVCC from ZODB5 to ZODB4 and my crash test works correctly with that ZODB4 version and ZEO4:

https://lab.nexedi.com/nexedi/ZODB/merge_requests/1

The bug definitely looks to be indeed in ZEO5.

NexediGitlab pushed a commit to Nexedi/wendelin.core that referenced this issue Apr 1, 2021
… will depend on

The tests verify that there is no concurrency bugs around load,
Connection.open and invalidations. See e.g.

zopefoundation/ZODB#290
zopefoundation/ZEO#155

By including the tests into wendelin.core, we will have CI coverage for
all supported storages (FileStorage, ZEO, NEO), and for all supported
ZODB (currently ZODB4, ZODB4-wc2 and ZODB5).

ZEO5 is know to currently fail zloadrace.
However, even though ZODB#290 was fixed, ZEO5 turned out to also fail on zopenrace:

        def test_zodb_zopenrace():
            # exercises ZODB.Connection + particular storage implementation
    >       zopenrace.main()

    lib/tests/test_zodb.py:382:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    <decorator-gen-1>:2: in main
        ???
    ../../tools/go/pygolang/golang/__init__.py:103: in _
        return f(*argv, **kw)
    lib/tests/testprog/zopenrace.py:115: in main
        test(zstor)
    <decorator-gen-2>:2: in test
        ???
    ../../tools/go/pygolang/golang/__init__.py:103: in _
        return f(*argv, **kw)
    lib/tests/testprog/zopenrace.py:201: in test
        wg.wait()
    golang/_sync.pyx:246: in golang._sync.PyWorkGroup.wait
        ???
    golang/_sync.pyx:226: in golang._sync.PyWorkGroup.go.pyrunf
        ???
    lib/tests/testprog/zopenrace.py:165: in T1
        t1()
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

        def t1():
            transaction.begin()
            zconn = db.open()

            root = zconn.root()
            obj1 = root['obj1']
            obj2 = root['obj2']

            # obj1 - reload it from zstor
            # obj2 - get it from zconn cache
            obj1._p_invalidate()

            # both objects must have the same values
            i1 = obj1.i
            i2 = obj2.i
            if i1 != i2:
    >           raise AssertionError("T1: obj1.i (%d)  !=  obj2.i (%d)" % (i1, i2))
    E           AssertionError: T1: obj1.i (3)  !=  obj2.i (2)

    lib/tests/testprog/zopenrace.py:156: AssertionError
@navytux
Copy link
Contributor

navytux commented Apr 1, 2021

Something really wrong is going on with ZEO5.

In addition to the test program provided in this issue, where there are 8 simultaneous workers each of which opens its own storage-level connection, I've tried to run the test program from zopefoundation/ZODB#290 with minimal edit to use ZEO instead of MappingStorage, and it fails too.

In that test program there is only one storage-level connection and two simultaneous users that are using that single storage-level connection. In other words there are no external invalidations involved at all.

https://lab.nexedi.com/kirr/wendelin.core/blob/e2dc8651/lib/tests/testprog/zopenrace.py#L102-203

Here is how it fails:

.../wendelin.core$ WENDELIN_CORE_TEST_DB="<zeo>" py.test -vs -k openrace lib/tests/test_zodb.py
============================== test session starts ==============================
platform linux2 -- Python 2.7.18, pytest-4.6.11, py-1.9.0, pluggy-0.13.1 -- /home/kirr/src/wendelin/venv/z-dev/bin/python2
cachedir: .pytest_cache
rootdir: /home/kirr/src/wendelin/wendelin.core
collected 8 items / 6 deselected / 2 selected

lib/tests/test_zodb.py::test_zodb_zopenrace_basic OK
PASSED
lib/tests/test_zodb.py::test_zodb_zopenrace FAILED

=================================== FAILURES ====================================
______________________________ test_zodb_zopenrace ______________________________

    def test_zodb_zopenrace():
        # exercises ZODB.Connection + particular storage implementation
>       zopenrace.main()

lib/tests/test_zodb.py:382:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
<decorator-gen-2>:2: in main
    ???
../../tools/go/pygolang/golang/__init__.py:103: in _
    return f(*argv, **kw)
lib/tests/testprog/zopenrace.py:115: in main
    test(zstor)
<decorator-gen-3>:2: in test
    ???
../../tools/go/pygolang/golang/__init__.py:103: in _
    return f(*argv, **kw)
lib/tests/testprog/zopenrace.py:201: in test
    wg.wait()
golang/_sync.pyx:246: in golang._sync.PyWorkGroup.wait
    ???
golang/_sync.pyx:226: in golang._sync.PyWorkGroup.go.pyrunf
    ???
lib/tests/testprog/zopenrace.py:165: in T1
    t1()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def t1():
        transaction.begin()
        zconn = db.open()

        root = zconn.root()
        obj1 = root['obj1']
        obj2 = root['obj2']

        # obj1 - reload it from zstor
        # obj2 - get it from zconn cache
        obj1._p_invalidate()

        # both objects must have the same values
        i1 = obj1.i
        i2 = obj2.i
        if i1 != i2:
>           raise AssertionError("T1: obj1.i (%d)  !=  obj2.i (%d)" % (i1, i2))
E           AssertionError: T1: obj1.i (16)  !=  obj2.i (15)

lib/tests/testprog/zopenrace.py:156: AssertionError

P.S. Suggested patch from #155 (comment) does not help here - both issues stay unfixed even though it fixes original problem, it does not help to fix zopefoundation/ZODB#290 with ZEO5.

P.P.S. ZEO 5.2.2-1-g3d90ed42, ZODB 5.6.0-18-g75c703fcf

@navytux
Copy link
Contributor

navytux commented Apr 1, 2021

P.S. Suggested patch from #155 (comment) does not help - both issues stay unfixed.

Correction. The original problem from this issue is fixed. The problem from zopefoundation/ZODB#290 stays unfixed. Appologize for providing misleading information in the first place.

@navytux navytux changed the title Data corruption due to race condition with external invalidations Data corruption due to race condition with external invalidations (ZEO5 regression) Apr 2, 2021
@navytux
Copy link
Contributor

navytux commented Apr 2, 2021

P.S. Suggested patch from #155 (comment) does not help - both issues stay unfixed.

Correction. The original problem from this issue is fixed. The problem from zopefoundation/ZODB#290 stays unfixed. Appologize for providing misleading information in the first place.

Filed #155 (comment) as #166 since this seems to be a different problem.

navytux added a commit to navytux/ZEO that referenced this issue Apr 14, 2021
Currently loadBefore and prefetch spawn async protocol.load_before task,
and, after waking up on its completion, populate the cache with received
data. But in between the time when protocol.load_before handler is run
and the time when protocol.load_before caller wakes up, there is a
window in which event loop might be running some other code, including
the code that handles invalidateTransaction messages from the server.

This means that cache updates and cache invalidations can be processed on
the client not in the order that server sent it. And such difference in
the order can lead to data corruption if e.g server sent

	<- loadBefore oid serial=tid1 next_serial=None
	<- invalidateTransaction tid2 oid

and client processed it as

	invalidateTransaction tid2 oid
	cache.store(oid, tid1, next_serial=None)

because here the end effect is that invalidation for oid@tid2 is not
applied to the cache.

The fix is simple: perform cache updates right after loadBefore reply is
received.

Fixes: zopefoundation#155

The fix is based on analysis and initial patch by @jmuchemb:

zopefoundation#155 (comment)

For tests, similarly to zopefoundation/ZODB#345,
I wanted to include a general test for this issue into ZODB, so that all
storages - not just ZEO - are exercised for this race scenario. However
in ZODB test infrastructure there is currently no established general
way to open several client storage connections to one storage server.
This way the test for this issue currently lives in wendelin.core
repository (and exercises both NEO and ZEO there):

https://lab.nexedi.com/nexedi/wendelin.core/commit/c37a989d

I understand there is a room for improvement. For the reference, my
original ZEO-specific data corruption reproducer is here:

zopefoundation#155 (comment)
https://lab.nexedi.com/kirr/wendelin.core/blob/ecd0e7f0/zloadrace5.py

/cc @d-maurer, @jamadden, @dataflake, @jimfulton
@navytux
Copy link
Contributor

navytux commented Apr 14, 2021

A PR with fix based on @jmuchemb analysis and initial patch is here: #169.

navytux added a commit to navytux/ZEO that referenced this issue Apr 16, 2021
…neric _new_storage_client()

This allows ZODB tests to recognize ZEO as client-server storage and so
make "load vs external invalidate" test added in
zopefoundation/ZODB#345 to reproduce data
corruption problem reported at
zopefoundation#155.

For the reference: that problem should be fixed by
zopefoundation#169.

We drop

    # It's hard to find the actual address.
    # The rpc mgr addr attribute is a list.  Each element in the
    # list is a socket domain (AF_INET, AF_UNIX, etc.) and an
    # address.

note because at the time it was added (81f586c) it came with

    addr = self._storage._rpc_mgr.addr[0][1]

but nowdays after 0386718 getting to server address is just by ClientStorage._addr.
navytux added a commit to navytux/ZODB that referenced this issue Apr 16, 2021
For ZEO this data corruption bug was reported at
zopefoundation/ZEO#155 and fixed at
zopefoundation/ZEO#169.

Without that fix the failure shows e.g. as follows when running ZEO test
suite:

    Failure in test check_race_load_vs_external_invalidate (ZEO.tests.testZEO.BlobAdaptedFileStorageTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 621, in check_race_load_vs_external_invalidate
        self.fail([_ for _ in failure if _])
      File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
        raise self.failureException(msg)
    AssertionError: ['T1: obj1.value (7)  !=  obj2.value (8)']

Even if added test is somewhat similar to
check_race_loadopen_vs_local_invalidate, it is added anew without trying
to unify code. The reason here is that the probability to catch load vs
external invalidation race is significantly reduced when there are only
1 modify and 1 verify workers. The unification with preserving both
tests semantic would make test for "load vs local invalidate" harder to
follow. Sometimes a little copying is better than trying to unify too
much.

For the test to work, test infrastructure is amended with
._new_storage_client() method that complements ._storage attribute:
client-server storages like ZEO, NEO and RelStorage allow several
storage clients to be connected to single storage server. For
client-server storages test subclasses should implement
_new_storage_client to return new storage client that is connected to
the same storage server self._storage is connected to.

For ZEO ._new_storage_client() is added by zopefoundation/ZEO#170

Other client-server storages can follow to implement ._new_storage_client()
and this way automatically activate this "load vs external invalidation"
test when their testsuite is run.

Contrary to test for "load vs local invalidate" N is set to lower value (100),
because with 8 workers the bug is usually reproduced at not-so-high iteration
number (5-10-20).

/cc @d-maurer, @jamadden, @jmuchemb
navytux added a commit that referenced this issue Apr 20, 2021
…neric _new_storage_client() (#170)

This allows ZODB tests to recognize ZEO as client-server storage and so
make "load vs external invalidate" test added in
zopefoundation/ZODB#345 to reproduce data
corruption problem reported at
#155.

For the reference: that problem should be fixed by
#169.

We drop

    # It's hard to find the actual address.
    # The rpc mgr addr attribute is a list.  Each element in the
    # list is a socket domain (AF_INET, AF_UNIX, etc.) and an
    # address.

note because at the time it was added (81f586c) it came with

    addr = self._storage._rpc_mgr.addr[0][1]

but nowdays after 0386718 getting to server address is just by ClientStorage._addr.

/reviewed-on #170
/reviewed-by @d-maurer
navytux added a commit to navytux/ZEO that referenced this issue Apr 20, 2021
Currently loadBefore and prefetch spawn async protocol.load_before task,
and, after waking up on its completion, populate the cache with received
data. But in between the time when protocol.load_before handler is run
and the time when protocol.load_before caller wakes up, there is a
window in which event loop might be running some other code, including
the code that handles invalidateTransaction messages from the server.

This means that cache updates and cache invalidations can be processed on
the client not in the order that server sent it. And such difference in
the order can lead to data corruption if e.g server sent

	<- loadBefore oid serial=tid1 next_serial=None
	<- invalidateTransaction tid2 oid

and client processed it as

	invalidateTransaction tid2 oid
	cache.store(oid, tid1, next_serial=None)

because here the end effect is that invalidation for oid@tid2 is not
applied to the cache.

The fix is simple: perform cache updates right after loadBefore reply is
received.

Fixes: zopefoundation#155

The fix is based on analysis and initial patch by @jmuchemb:

zopefoundation#155 (comment)

A tests corresponding to the fix is coming coming through

zopefoundation#170  and
zopefoundation/ZODB#345

For the reference, my original ZEO-specific data corruption reproducer
is here:

zopefoundation#155 (comment)
https://lab.nexedi.com/kirr/wendelin.core/blob/ecd0e7f0/zloadrace5.py

/cc @jamadden, @dataflake, @jimfulton
/reviewed-by @jmuchemb, @d-maurer
/reviewed-on zopefoundation#169
navytux added a commit that referenced this issue Apr 20, 2021
Currently loadBefore and prefetch spawn async protocol.load_before task,
and, after waking up on its completion, populate the cache with received
data. But in between the time when protocol.load_before handler is run
and the time when protocol.load_before caller wakes up, there is a
window in which event loop might be running some other code, including
the code that handles invalidateTransaction messages from the server.

This means that cache updates and cache invalidations can be processed on
the client not in the order that server sent it. And such difference in
the order can lead to data corruption if e.g server sent

	<- loadBefore oid serial=tid1 next_serial=None
	<- invalidateTransaction tid2 oid

and client processed it as

	invalidateTransaction tid2 oid
	cache.store(oid, tid1, next_serial=None)

because here the end effect is that invalidation for oid@tid2 is not
applied to the cache.

The fix is simple: perform cache updates right after loadBefore reply is
received.

Fixes: #155

The fix is based on analysis and initial patch by @jmuchemb:

#155 (comment)

A tests corresponding to the fix is coming coming through

#170  and
zopefoundation/ZODB#345

For the reference, my original ZEO-specific data corruption reproducer
is here:

#155 (comment)
https://lab.nexedi.com/kirr/wendelin.core/blob/ecd0e7f0/zloadrace5.py

/cc @jamadden, @dataflake, @jimfulton
/reviewed-by @jmuchemb, @d-maurer
/reviewed-on #169
navytux added a commit to navytux/ZODB that referenced this issue Apr 21, 2021
For ZEO this data corruption bug was reported at
zopefoundation/ZEO#155 and fixed at
zopefoundation/ZEO#169.

Without that fix the failure shows e.g. as follows when running ZEO test
suite:

    Failure in test check_race_load_vs_external_invalidate (ZEO.tests.testZEO.BlobAdaptedFileStorageTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 621, in check_race_load_vs_external_invalidate
        self.fail([_ for _ in failure if _])
      File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
        raise self.failureException(msg)
    AssertionError: ['T1: obj1.value (7)  !=  obj2.value (8)']

Even if added test is somewhat similar to
check_race_loadopen_vs_local_invalidate, it is added anew without trying
to unify code. The reason here is that the probability to catch load vs
external invalidation race is significantly reduced when there are only
1 modify and 1 verify workers. The unification with preserving both
tests semantic would make test for "load vs local invalidate" harder to
follow. Sometimes a little copying is better than trying to unify too
much.

For the test to work, test infrastructure is amended with
._new_storage_client() method that complements ._storage attribute:
client-server storages like ZEO, NEO and RelStorage allow several
storage clients to be connected to single storage server. For
client-server storages test subclasses should implement
_new_storage_client to return new storage client that is connected to
the same storage server self._storage is connected to.

For ZEO ._new_storage_client() is added by zopefoundation/ZEO#170

Other client-server storages can follow to implement ._new_storage_client()
and this way automatically activate this "load vs external invalidation"
test when their testsuite is run.

Contrary to test for "load vs local invalidate" N is set to lower value (100),
because with 8 workers the bug is usually reproduced at not-so-high iteration
number (5-10-20).

/cc @d-maurer, @jamadden, @jmuchemb
/reviewed-on zopefoundation#345
navytux added a commit to zopefoundation/ZODB that referenced this issue Apr 21, 2021
For ZEO this data corruption bug was reported at
zopefoundation/ZEO#155 and fixed at
zopefoundation/ZEO#169.

Without that fix the failure shows e.g. as follows when running ZEO test
suite:

    Failure in test check_race_load_vs_external_invalidate (ZEO.tests.testZEO.BlobAdaptedFileStorageTests)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/unittest/case.py", line 329, in run
        testMethod()
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/BasicStorage.py", line 621, in check_race_load_vs_external_invalidate
        self.fail([_ for _ in failure if _])
      File "/usr/lib/python2.7/unittest/case.py", line 410, in fail
        raise self.failureException(msg)
    AssertionError: ['T1: obj1.value (7)  !=  obj2.value (8)']

Even if added test is somewhat similar to
check_race_loadopen_vs_local_invalidate, it is added anew without trying
to unify code. The reason here is that the probability to catch load vs
external invalidation race is significantly reduced when there are only
1 modify and 1 verify workers. The unification with preserving both
tests semantic would make test for "load vs local invalidate" harder to
follow. Sometimes a little copying is better than trying to unify too
much.

For the test to work, test infrastructure is amended with
._new_storage_client() method that complements ._storage attribute:
client-server storages like ZEO, NEO and RelStorage allow several
storage clients to be connected to single storage server. For
client-server storages test subclasses should implement
_new_storage_client to return new storage client that is connected to
the same storage server self._storage is connected to.

For ZEO ._new_storage_client() is added by zopefoundation/ZEO#170

Other client-server storages can follow to implement ._new_storage_client()
and this way automatically activate this "load vs external invalidation"
test when their testsuite is run.

Contrary to test for "load vs local invalidate" N is set to lower value (100),
because with 8 workers the bug is usually reproduced at not-so-high iteration
number (5-10-20).

/cc @d-maurer, @jamadden, @jmuchemb
/reviewed-on #345
@navytux navytux mentioned this issue Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants