Provide a way to retrieve raw extensions data #183

navytux · 2017-11-02T16:19:48Z

Currently when client has IStorageTransactionMetaData instance (e.g. on
storage iteration) it is possible to read transaction's .user and
.description in raw form, but .extension is always returned unpickled.

This creates several problems:

tools like zodb dump [1] cannot dump data exactly as stored on a
storage. This makes database potentially not bit-to-bit identical to
its original after restoring from such dump.
zodb dump output could be changing from run to run on the same
database. This comes from the fact that e.g. python dictionaries are
unordered and so when pickling a dict back to bytes the result could
be not the same as original.

( this problem can be worked-around partly to work reliably for e.g.
dict with str keys - by always emitting items in key sorted order,
but it is hard to make it work reliably for arbitrary types )

Both issues make it hard to verify integrity of database at the lowest
possible level after restoration, and make it hard to verify bit-to-bit
compatibility with non-python ZODB implementations.

To fix we provide a way to retrieve raw extension from transaction
metadata. This is done in backward-compatible way by introducing new
interface

IStorageTransactionInformationRaw

which can be optionally provided by an IStorageTransactionInformation to
indicate that transaction metadata can be also read in raw from.

Then BaseStorage.TransactionRecord ctor is extended with optional
raw_extension argument which if != None is remembered and makes
IStorageTransactionInformationRaw to be also provided by constructed
TransactionRecord instance.

We also adapt FileStorage & friends to provide/use new functionality.

Similarly to e.g. undo, storages should be indicating they will return
iterated transactions provided with raw metadata via implementing

supportsTransactionInformationRaw() -> True

[1] https://lab.nexedi.com/nexedi/zodbtools

See zopefoundation/ZODB#183 for context.

navytux · 2017-11-02T16:26:41Z

Related ZEO change: zopefoundation/ZEO#100

navytux · 2017-11-02T17:29:33Z

https://lab.nexedi.com/nexedi/zodbtools/commit/2634c956 shows how new functionality is used in zodb dump.

Please see https://lab.nexedi.com/nexedi/zodbtools/commit/75c03368#04505d86ceff8e0a7a413120968ef91d0842bc29_67_114 about all the hops one needs to jump without this support and still it cannot work fully reliably to dump extension in raw form as stored on disk.

navytux · 2017-11-17T19:44:15Z

Ping anyone? @jimfulton ?

navytux · 2018-01-11T16:57:34Z

@jimfulton, I understand everyone is busy, but it is pity there is no feedback on this at all for 2 months.

jimfulton · 2018-01-13T18:33:39Z

Sorry for the slow response. Thanks for squeaking. :)

I imagine that part of the lack of response is that this is an area where we (I) got things wrong fixing it is fraught with subtle opportunities to fail, as I discovered a year or two ago when I tried to clean up transaction metadata.

jimfulton · 2018-01-13T20:05:32Z

I appreciate why you want this. The solution seems overly complex to me. It adds a lot of code for a subtle benefit.

I'd like to see this implemented differently. I'd also like @jamadden's thoughts.

Here's an alternative approach.

Change ZODB.Connection.TransactionMetaData to save both extension and a new extension_bytes attribute. Currently, if given bytes, it creates a dict. Change this:

save as both extension and extension_bytes, deserializing or serializing as needed, depending on what's passed in.
Change callers that now deserialize before creating transaction records to stop doing that, at least for cases in ZODB. This pre-deserialization hasn't been needed for over a year. So for anything that iterates over raw data, the raw data will be preserved.

I would also avoid creating a new interface for now. For now, callers that want extsnion_bytes can do duck typing ro deal with implementations that don't have this. We can even add a note to the current interface. Pragmatically, you can address your needs by implementing this in ZODB proper and NEO.

I think my suggestion addresses your need and leads to a small net reduction in code rather than a significant increase.

Thoughts?

navytux · 2018-01-14T15:42:15Z

@jimfulton thanks for feedback. If I understood you right in essence what you propose is that we extend IStorageTransactionMetaData with optional extension_bytes attribute. If that could be done it would be indeed simpler.

However when I was working on the topic, if I recall correctly, I thought changing IStorageTransactionMetaData would be not good fit because:

if we just add new non-optional attribute it will change the interface thus rendering its all older implementers now non-complying to it thus breaking backward compatibility,
so far (to my knowledge) there is no "optional attribute" precedent in ZODB,
and in terms of interfaces speak optionally providing something extra is optionally implementing some extra interface.

Though practically speaking on my side I would be ok if IStorageTransactionMetaData is extended with such optional extension_bytes attribute with documentation explicitly saying it might be missing. If so the patch could be as follows (https://lab.nexedi.com/kirr/ZODB/commit/d5d43ba0):

diff --git a/src/ZODB/Connection.py b/src/ZODB/Connection.py
index 9468a3b6f..b651f664e 100644
--- a/src/ZODB/Connection.py
+++ b/src/ZODB/Connection.py
@@ -1312,6 +1312,12 @@ def __init__(self, user=u'', description=u'', extension=b''):
             description = description.encode('utf-8')
         self.description = description
 
+        # Provide .extension_bytes if we are given extension in its raw form.
+        # If not - leave created TransactionMetaData instance without
+        # .extension_bytes attribute at all.
+        if isinstance(extension, bytes):
+            self.extension_bytes = extension
+
         if not isinstance(extension, dict):
             extension = _compat.loads(extension) if extension else {}
         self.extension = extension
diff --git a/src/ZODB/FileStorage/FileStorage.py b/src/ZODB/FileStorage/FileStorage.py
index af3857ea5..e86e6ea20 100644
--- a/src/ZODB/FileStorage/FileStorage.py
+++ b/src/ZODB/FileStorage/FileStorage.py
@@ -1988,15 +1988,8 @@ def __next__(self):
 
             if h.status != "u":
                 pos = tpos + h.headerlen()
-                e = {}
-                if h.elen:
-                    try:
-                        e = loads(h.ext)
-                    except:
-                        pass
-
                 result = TransactionRecord(h.tid, h.status, h.user, h.descr,
-                                           e, pos, tend, self._file, tpos)
+                                           h.ext, pos, tend, self._file, tpos)
 
             # Read the (intentionally redundant) transaction length
             self._file.seek(tend)
diff --git a/src/ZODB/fsrecover.py b/src/ZODB/fsrecover.py
index 7bdc47fca..81359a5d6 100644
--- a/src/ZODB/fsrecover.py
+++ b/src/ZODB/fsrecover.py
@@ -143,12 +143,14 @@ def read_txn_header(f, pos, file_size, outp, ltid):
     pos = tpos+(23+ul+dl+el)
     user = f.read(ul)
     description = f.read(dl)
-    if el:
-        try: e = loads(f.read(el))
-        except: e = {}
-    else: e = {}
+    ext = f.read(el)
+    # reset extension to {} if it cannot be unpickled
+    try:
+        loads(ext)
+    except:
+        ext = b''
 
-    result = TransactionRecord(tid, status, user, description, e, pos, tend,
+    result = TransactionRecord(tid, status, user, description, ext, pos, tend,
                                f, tpos)
     pos = tend
 
diff --git a/src/ZODB/interfaces.py b/src/ZODB/interfaces.py
index 113e71824..fb2c785c3 100644
--- a/src/ZODB/interfaces.py
+++ b/src/ZODB/interfaces.py
@@ -546,6 +546,12 @@ class IStorageTransactionMetaData(Interface):
     extension = Attribute(
         "A dictionary carrying a transaction's extended_info data")
 
+    extension_bytes = Attribute(
+        """Transaction's extension data as raw bytes.
+
+        This attribute is optional and is not always present.
+        """)
+
 
     def set_data(ob, data):
         """Hold data on behalf of an object
diff --git a/src/ZODB/tests/IteratorStorage.py b/src/ZODB/tests/IteratorStorage.py
index 48fcbc9ee..de814ca24 100644
--- a/src/ZODB/tests/IteratorStorage.py
+++ b/src/ZODB/tests/IteratorStorage.py
@@ -22,6 +22,7 @@
 from ZODB.tests.MinPO import MinPO
 from ZODB.tests.StorageTestBase import zodb_pickle, zodb_unpickle
 from ZODB.utils import U64, p64, load_current
+from ZODB._compat import loads
 
 import ZODB.blob
 
@@ -49,6 +50,10 @@ def iter_verify(self, txniter, revids, val0):
 
 class IteratorStorage(IteratorCompare):
 
+    # override this in tests for storages that have support for
+    # .extension_bytes in iterated transactions.
+    supportsExtensionBytes = False
+
     def checkSimpleIteration(self):
         # Store a bunch of revisions of a single object
         self._oid = oid = self._storage.new_oid()
@@ -86,11 +91,20 @@ def checkUndoZombie(self):
 
     def checkTransactionExtensionFromIterator(self):
         oid = self._storage.new_oid()
-        revid = self._dostore(oid, data=MinPO(1))
+        ext_dict = {"hello": "world"}
+        revid = self._dostore(oid, data=MinPO(1), extension=ext_dict)
         iter = self._storage.iterator()
         count = 0
         for txn in iter:
-            self.assertEqual(txn.extension, {})
+            self.assertEqual(txn.extension, ext_dict)
+            # missing: not None - something unique - not to allow
+            # implementations to erroneously return None in .extension_bytes .
+            missing = object()
+            ext_bytes = getattr(txn, "extension_bytes", missing)
+            self.assertEqual(self.supportsExtensionBytes, ext_bytes is not missing)
+            if ext_bytes is not missing:
+                ext_dict_fromraw = loads(ext_bytes)
+                self.assertEqual(ext_dict_fromraw, ext_dict)
             count +=1
         self.assertEqual(count, 1)
 
@@ -215,6 +229,9 @@ def compare(self, storage1, storage2):
             eq(txn1.user,        txn2.user)
             eq(txn1.description, txn2.description)
             eq(txn1.extension,  txn2.extension)
+            rawext1 = getattr(txn1, "extension_bytes", None)
+            rawext2 = getattr(txn2, "extension_bytes", None)
+            eq(rawext1, rawext2)
             itxn1 = iter(txn1)
             itxn2 = iter(txn2)
             for rec1, rec2 in zip(itxn1, itxn2):
diff --git a/src/ZODB/tests/StorageTestBase.py b/src/ZODB/tests/StorageTestBase.py
index 1488e2330..275404a94 100644
--- a/src/ZODB/tests/StorageTestBase.py
+++ b/src/ZODB/tests/StorageTestBase.py
@@ -124,7 +124,7 @@ def tearDown(self):
         ZODB.tests.util.TestCase.tearDown(self)
 
     def _dostore(self, oid=None, revid=None, data=None,
-                 already_pickled=0, user=None, description=None):
+                 already_pickled=0, user=None, description=None, extension=None):
         """Do a complete storage transaction.  The defaults are:
 
          - oid=None, ask the storage for a new oid
@@ -149,6 +149,8 @@ def _dostore(self, oid=None, revid=None, data=None,
             t.user = user
         if description is not None:
             t.description = description
+        if extension is not None:
+            t.extension = extension
         try:
             self._storage.tpc_begin(t)
             # Store an object
diff --git a/src/ZODB/tests/testFileStorage.py b/src/ZODB/tests/testFileStorage.py
index 273304817..7f3bc9933 100644
--- a/src/ZODB/tests/testFileStorage.py
+++ b/src/ZODB/tests/testFileStorage.py
@@ -56,6 +56,8 @@ class FileStorageTests(
     ReadOnlyStorage.ReadOnlyStorage
     ):
 
+    supportsExtensionBytes = True
+
     def open(self, **kwargs):
         self._storage = ZODB.FileStorage.FileStorage('FileStorageTests.fs',
                                                      **kwargs)
@@ -77,7 +79,14 @@ def checkLongMetadata(self):
         except POSException.StorageError:
             pass
         else:
-            self.fail("expect long user field to raise error")
+            self.fail("expect long description field to raise error")
+        try:
+            self._dostore(extension={s: 1})
+        except POSException.StorageError:
+            pass
+        else:
+            self.fail("expect long extension field to raise error")
+
 
     def check_use_fsIndex(self):
 
-- 
2.15.1.716.g6967081ed5

Would that be ok?

Thanks,
Kirill

jimfulton · 2018-01-14T16:40:42Z

On Sun, Jan 14, 2018 at 8:42 AM, Kirill Smelkov ***@***.***> wrote: @jimfulton <https://github.com/jimfulton> thanks for feedback. If I understood you right in essence what you propose is that we extend IStorageTransactionMetaData with optional extension_bytes attribute.

No. I suggested maybe adding a note to that effect.

If that could be done it would be indeed simpler.

I suggested using duck typing instead.

However when I was working on the topic, if I recall correctly, I thought changing IStorageTransactionMetaData would be not good fit because: - if we just add new non-optional attribute it will change the interface thus rendering its all older implementers now non-complying to it thus breaking backward compatibility, - so far (to my knowledge) there is no "optional attribute" precedent in ZODB, - and in terms of interfaces speak optionally providing something extra is optionally implementing some extra interface. Though practically speaking on my side I would be ok if IStorageTransactionMetaData is extended with such optional extension_bytes attribute with documentation explicitly saying it might be missing. If so the patch could be as follows (https://lab.nexedi.com/kirr/ ZODB/commit/d528f451): diff --git a/src/ZODB/Connection.py b/src/ZODB/Connection.py index 9468a3b6f..b651f664e 100644--- a/src/ZODB/Connection.py+++ b/src/ZODB/Connection.py@@ -1312,6 +1312,12 @@ def __init__(self, user=u'', description=u'', extension=b''): description = description.encode('utf-8') self.description = description + # Provide .extension_bytes if we are given extension in its raw form.+ # If not - leave created TransactionMetaData instance without+ # .extension_bytes attribute at all.+ if isinstance(extension, bytes):+ self.extension_bytes = extension+

I would go further. If the input is bytes, save it, and save the deserialization, otherwise save the serialization and the dict value. IOW, save both extension_bytes and extension, serializing or deserializing as necessary. The choice of the name "extension_bytes" rather than "raw_extension" was conscious.

if not isinstance(extension, dict): extension = _compat.loads(extension) if extension else {} self.extension = extensiondiff --git a/src/ZODB/FileStorage/FileStorage.py b/src/ZODB/FileStorage/FileStorage.py index af3857ea5..e86e6ea20 100644--- a/src/ZODB/FileStorage/FileStorage.py+++ b/src/ZODB/FileStorage/FileStorage.py@@ -1988,15 +1988,8 @@ def __next__(self): if h.status != "u": pos = tpos + h.headerlen()- e = {}- if h.elen:- try:- e = loads(h.ext)- except:- pass- result = TransactionRecord(h.tid, h.status, h.user, h.descr,- e, pos, tend, self._file, tpos)+ h.ext, pos, tend, self._file, tpos)

Deleting code is awesome. :)

# Read the (intentionally redundant) transaction length self._file.seek(tend)diff --git a/src/ZODB/fsrecover.py b/src/ZODB/fsrecover.py index 7bdc47fca..81359a5d6 100644--- a/src/ZODB/fsrecover.py+++ b/src/ZODB/fsrecover.py@@ -143,12 +143,14 @@ def read_txn_header(f, pos, file_size, outp, ltid): pos = tpos+(23+ul+dl+el) user = f.read(ul) description = f.read(dl)- if el:- try: e = loads(f.read(el))- except: e = {}- else: e = {}

The original code is flawed. IDK (IDR) why we have a bare except. To any tests fail wo it?

+ ext = f.read(el)+ # reset extension to {} if it cannot be unpickled+ try:+ loads(ext)+ except:+ ext = b'' The meta-data object will unpickle when it gets bytes.

More importantly, I want the unpickling to not be done in a try/except. (Much less a bare except.) (And if there's some weird edge case that try/except addressed, let's re-confront it. :-))

- result = TransactionRecord(tid, status, user, description, e, pos, tend,+ result = TransactionRecord(tid, status, user, description, ext, pos, tend, f, tpos) pos = tend diff --git a/src/ZODB/interfaces.py b/src/ZODB/interfaces.py index 113e71824..fb2c785c3 100644--- a/src/ZODB/interfaces.py+++ b/src/ZODB/interfaces.py@@ -546,6 +546,12 @@ class IStorageTransactionMetaData(Interface): extension = Attribute( "A dictionary carrying a transaction's extended_info data") + extension_bytes = Attribute(+ """Transaction's extension data as raw bytes.++ This attribute is optional and is not always present.+ """)+

No. I'd avoid this for now.

def set_data(ob, data): """Hold data on behalf of an objectdiff --git a/src/ZODB/tests/IteratorStorage.py b/src/ZODB/tests/IteratorStorage.py index 48fcbc9ee..de814ca24 100644--- a/src/ZODB/tests/IteratorStorage.py+++ b/src/ZODB/tests/IteratorStorage.py@@ -22,6 +22,7 @@ from ZODB.tests.MinPO import MinPO from ZODB.tests.StorageTestBase import zodb_pickle, zodb_unpickle from ZODB.utils import U64, p64, load_current+from ZODB._compat import loads import ZODB.blob @@ -49,6 +50,10 @@ def iter_verify(self, txniter, revids, val0): class IteratorStorage(IteratorCompare): + # override this in tests for storages that have support for+ # .extension_bytes in iterated transactions.+ supportsExtensionBytes = False+

Nope.

def checkSimpleIteration(self): # Store a bunch of revisions of a single object self._oid = oid = self._storage.new_oid()@@ -86,11 +91,20 @@ def checkUndoZombie(self): def checkTransactionExtensionFromIterator(self): oid = self._storage.new_oid()- revid = self._dostore(oid, data=MinPO(1))+ ext_dict = {"hello": "world"}+ revid = self._dostore(oid, data=MinPO(1), extension=ext_dict) iter = self._storage.iterator() count = 0 for txn in iter:- self.assertEqual(txn.extension, {})+ self.assertEqual(txn.extension, ext_dict)+ # missing: not None - something unique - not to allow+ # implementations to erroneously return None in .extension_bytes .+ missing = object()+ ext_bytes = getattr(txn, "extension_bytes", missing)+ self.assertEqual(self.supportsExtensionBytes, ext_bytes is not missing)+ if ext_bytes is not missing:+ ext_dict_fromraw = loads(ext_bytes)+ self.assertEqual(ext_dict_fromraw, ext_dict)

Drop the supportsExtensionBytes. If code wants to work with code that might not support extension_bytes, it can try/except on the attribute access, at least for now. BTW, what's up with the camelCase? :)

count +=1 self.assertEqual(count, 1) diff --git a/src/ZODB/tests/StorageTestBase.py b/src/ZODB/tests/StorageTestBase.py index 1488e2330..275404a94 100644--- a/src/ZODB/tests/StorageTestBase.py+++ b/src/ZODB/tests/StorageTestBase.py@@ -124,7 +124,7 @@ def tearDown(self): ZODB.tests.util.TestCase.tearDown(self) def _dostore(self, oid=None, revid=None, data=None,- already_pickled=0, user=None, description=None):+ already_pickled=0, user=None, description=None, extension=None): """Do a complete storage transaction. The defaults are: - oid=None, ask the storage for a new oid@@ -149,6 +149,8 @@ def _dostore(self, oid=None, revid=None, data=None, t.user = user if description is not None: t.description = description+ if extension is not None:+ t.extension = extension try: self._storage.tpc_begin(t) # Store an objectdiff --git a/src/ZODB/tests/testFileStorage.py b/src/ZODB/tests/testFileStorage.py index 273304817..7f3bc9933 100644--- a/src/ZODB/tests/testFileStorage.py+++ b/src/ZODB/tests/testFileStorage.py@@ -56,6 +56,8 @@ class FileStorageTests( ReadOnlyStorage.ReadOnlyStorage ): + supportsExtensionBytes = True+ No def open(self, **kwargs): self._storage = ZODB.FileStorage.FileStorage('FileStorageTests.fs', **kwargs)@@ -77,7 +79,14 @@ def checkLongMetadata(self): except POSException.StorageError: pass else:- self.fail("expect long user field to raise error")+ self.fail("expect long description field to raise error")+ try:+ self._dostore(extension={s: 1})+ except POSException.StorageError:+ pass+ else:+ self.fail("expect long extension field to raise error")+

What is this testing? FTR. I hate reading diffs? You went through the trouble to create changes. Why not just make a new PR?

navytux · 2018-01-14T16:50:20Z

FTR. I hate reading diffs? You went through the trouble to create changes.
Why not just make a new PR?

Well, some of as are, yes, a bit old style. I've updated the diff in this PR and will comment on you feedback in the diff context.

navytux · 2018-01-14T17:44:35Z

@jimfulton thanks again for feedback. I went through your comments and put my replies there in the diff context.

I strongly suggest not to use only duck-typing and specify interface for the functionality first. Imho duck-typing, while convenient for quick-hacks, hurts the codebase in the longer term. And especially when in ZODB we have ZODB/interfaces.py as the specification it would be a pity to workaround it.

navytux · 2018-01-14T17:08:18Z

src/ZODB/fsrecover.py

-    if el:
-        try: e = loads(f.read(el))
-        except: e = {}
-    else: e = {}


@jimfulton says:

The original code is flawed. IDK (IDR) why we have a bare except.

To any tests fail wo it?

I don't know whether any test fails without it, but since it is FS recover to me it looks logical that if e.g. extension is corrupt, it is better to drop it, but still try to recover rest of transaction metadata.

In any way I would preserve current semantic within this patch and try to do one step at a time but well.

BTW this code is from you from 2001:

2d0d8c08#diff-b18ad2268d655f3885a3b21941d8f8f9R209

navytux · 2018-01-14T17:12:18Z

src/ZODB/interfaces.py

+
+        This attribute is optional and is not always present.
+        """)
+


@jimfulton says:

No. I'd avoid this for now.

Why? If we don't expose the functionality via interface it will be very obscure about what is going on when tools start to depend on internal behaviour of some internal implementation.

What is wrong via explicity specifying the interface?

navytux · 2018-01-14T17:17:34Z

src/ZODB/tests/IteratorStorage.py

+            self.assertEqual(self.supportsExtensionBytes, ext_bytes is not missing)
+            if ext_bytes is not missing:
+                ext_dict_fromraw = loads(ext_bytes)
+                self.assertEqual(ext_dict_fromraw, ext_dict)


@jimfulton says:

Drop the supportsExtensionBytes.

If code wants to work with code that might not support extension_bytes, it
can try/except on the attribute access, at least for now.

This is test code which tests that particular storage provides .extension_bytes as expected. What is expected is set via .supportsExtensionBytes and it is not universally there - i.e. FileStorage provides it while MappingStorage - does not.

BTW, what's up with the camelCase? :)

Maybe my intertia to name it similar to supportsUndo() function.
I'm ok to redo if you insist it should be supports_extension_bytes.

navytux · 2018-01-14T17:21:42Z

src/ZODB/tests/testFileStorage.py

+            pass
+        else:
+            self.fail("expect long extension field to raise error")
+


@jimfulton says:

What is this testing?

There is already-existing FileStorage test which checks if a commit with long metadata fails:

https://github.com/zopefoundation/ZODB/blob/de1f24ca/src/ZODB/tests/testFileStorage.py#L67

Since the commits in FileStorage tests are performed via _dostore() and my patch extends _dostore() to also accept extension, I natrally extend this test as well. (I grepped all places where _dostore was used and found it this way).

navytux · 2018-01-14T17:39:50Z

src/ZODB/Connection.py

+        # .extension_bytes attribute at all.
+        if isinstance(extension, bytes):
+            self.extension_bytes = extension
+


@jimfulton says:

I would go further. If the input is bytes, save it, and save the
deserialization, otherwise save the serialization and the dict value.

IOW, save both extension_bytes and extension, serializing or deserializing
as necessary. The choice of the name "extension_bytes" rather than
"raw_extension" was conscious.

While this seem to be possible and maybe I'm wrong, my feeling is that this brings too much magic and we'll be bitten by it later.

My preference would be we get the interface for this functionality right at first and only then improve implementation step by step.

Currently when client has IStorageTransactionMetaData instance (e.g. on storage iteration) it is possible to read transaction's .user and .description in raw form, but .extension is always returned unpickled. This creates several problems: - tools like `zodb dump` [1] cannot dump data exactly as stored on a storage. This makes database potentially not bit-to-bit identical to its original after restoring from such dump. - `zodb dump` output could be changing from run to run on the same database. This comes from the fact that e.g. python dictionaries are unordered and so when pickling a dict back to bytes the result could be not the same as original. ( this problem can be worked-around partly to work reliably for e.g. dict with str keys - by always emitting items in key sorted order, but it is hard to make it work reliably for arbitrary types ) Both issues make it hard to verify integrity of database at the lowest possible level after restoration, and make it hard to verify bit-to-bit compatibility with non-python ZODB implementations. To fix we provide a way to retrieve raw extension from transaction metadata. This is done in backward-compatible way by introducing new IStorageTransactionMetaDataRaw interface, which may be optionally provided by an IStorageTransactionMetaData instance, and if so there will be additional .extension_bytes attribute. Then ZODB.Connection.TransactionMetaData (the only in-tree IStorageTransactionMetaData implementer) constructor is extended to see whether it was given extension in raw form, and if yes, remember it as .extension_bytes in addition to deserialized .extension dict. This way storages that now pass raw extension to TransactionMetaData will automatically provide .extension_bytes in their iterated transactions. Teach FileStorage & friends to provide/use new functionality. /helped-by: Jim Fulton. [1] https://lab.nexedi.com/nexedi/zodbtools

navytux · 2018-04-05T18:07:08Z

@jimfulton, @jamadden I've updated the patch and personally I think it should be now all OK to go:

there is no more "optional .extension_bytes attribute" in IStorageTransactionMetaData -> the optional functionality that an instance of IStorageTransactionMetaData might also provide is now represented by IStorageTransactionMetaDataRaw.
this way all the interfaces are now clearly defined without anything not covered by zope.interfaces.
I've also corrected some minor style issues caught last time by Jim's review.

Once again, the patch is now not complex like it was in the beginning, and the access interface is now all well defined. I understand everyone is busy and getting this "raw" thing could be "that this is an area where we (I) got things wrong fixing it is fraught with subtle opportunities to fail" (#183 (comment)).

However let's please get this resolved.

It's been already 5 months while this patch is here and going through circles creates only frustrations and things like that. ZODB ecosystem is already not big, like it used to be some time ago, and at this point fragmenting it even more, I think, would be not good for anyone.

On my side I care first about getting interfaces well defined and only as a second step about implementation.

If needed I'm ready to open new PR or do whatever it makes sense for people to make reviewing easier.

Thanks beforehand,
Kirill

The patch in this PR is already updated with latest version. Full interdiff to previous patch revision follows:

diff --git a/src/ZODB/Connection.py b/src/ZODB/Connection.py
index b651f664e..d4df30bea 100644
--- a/src/ZODB/Connection.py
+++ b/src/ZODB/Connection.py
@@ -32,7 +32,7 @@
 from transaction.interfaces import ISavepointDataManager
 from transaction.interfaces import IDataManagerSavepoint
 from transaction.interfaces import ISynchronizer
-from zope.interface import implementer
+from zope.interface import implementer, alsoProvides
 
 import transaction
 
@@ -1317,6 +1317,7 @@ def __init__(self, user=u'', description=u'', extension=b''):
         # .extension_bytes attribute at all.
         if isinstance(extension, bytes):
             self.extension_bytes = extension
+            alsoProvides(self, ZODB.interfaces.IStorageTransactionMetaDataRaw)
 
         if not isinstance(extension, dict):
             extension = _compat.loads(extension) if extension else {}
diff --git a/src/ZODB/interfaces.py b/src/ZODB/interfaces.py
index fb2c785c3..e396a2c58 100644
--- a/src/ZODB/interfaces.py
+++ b/src/ZODB/interfaces.py
@@ -540,18 +540,15 @@ class IStorageTransactionMetaData(Interface):
 
     Note that unlike transaction.interfaces.ITransaction, the ``user``
     and ``description`` attributes are bytes, not text.
+
+    An instance of IStorageTransactionMetaData may also optionally provide
+    IStorageTransactionMetaDataRaw with raw form of transaction meta data.
     """
     user = Attribute("Bytes transaction user")
     description = Attribute("Bytes transaction Description")
     extension = Attribute(
         "A dictionary carrying a transaction's extended_info data")
 
-    extension_bytes = Attribute(
-        """Transaction's extension data as raw bytes.
-
-        This attribute is optional and is not always present.
-        """)
-
 
     def set_data(ob, data):
         """Hold data on behalf of an object
@@ -573,6 +570,21 @@ def data(ob):
         See set_data.
         """
 
+class IStorageTransactionMetaDataRaw(IStorageTransactionMetaData):
+    """Provide storage transaction meta data in raw form.
+
+    IStorageTransactionMetaDataRaw is additional interface which may
+    be optionally provided by an instance of IStorageTransactionMetaData.
+    If so, it provides information about raw form of transaction's meta data as
+    saved on storage.
+
+    NOTE as IStorageTransactionMetaData already provides ``user`` and
+    ``description`` as raw bytes, not text, IStorageTransactionMetaDataRaw
+    amends it only for ``extension`` part.
+    """
+
+    extension_bytes = Attribute("Transaction's extension data as raw bytes.")
+
 
 class IStorage(Interface):
     """A storage is responsible for storing and retrieving data of objects.
diff --git a/src/ZODB/tests/IteratorStorage.py b/src/ZODB/tests/IteratorStorage.py
index d87623f72..8a4e82b9b 100644
--- a/src/ZODB/tests/IteratorStorage.py
+++ b/src/ZODB/tests/IteratorStorage.py
@@ -22,6 +22,7 @@
 from ZODB.tests.MinPO import MinPO
 from ZODB.tests.StorageTestBase import zodb_pickle, zodb_unpickle
 from ZODB.utils import U64, p64, load_current
+from ZODB.interfaces import IStorageTransactionMetaDataRaw
 from ZODB._compat import loads
 
 import ZODB.blob
@@ -52,7 +53,7 @@ class IteratorStorage(IteratorCompare):
 
     # override this in tests for storages that have support for
     # .extension_bytes in iterated transactions.
-    supportsExtensionBytes = False
+    supports_extension_bytes = False
 
     def checkSimpleIteration(self):
         # Store a bunch of revisions of a single object
@@ -101,7 +102,9 @@ def checkTransactionExtensionFromIterator(self):
             # implementations to erroneously return None in .extension_bytes .
             missing = object()
             ext_bytes = getattr(txn, "extension_bytes", missing)
-            self.assertEqual(self.supportsExtensionBytes, ext_bytes is not missing)
+            self.assertEqual(self.supports_extension_bytes, ext_bytes is not missing)
+            self.assertEqual(self.supports_extension_bytes,
+                    IStorageTransactionMetaDataRaw.providedBy(txn))
             if ext_bytes is not missing:
                 ext_dict_fromraw = loads(ext_bytes)
                 self.assertEqual(ext_dict_fromraw, ext_dict)
@@ -221,6 +224,7 @@ class IteratorDeepCompare(object):
 
     def compare(self, storage1, storage2):
         eq = self.assertEqual
+        missing = object()
         iter1 = storage1.iterator()
         iter2 = storage2.iterator()
         for txn1, txn2 in zip(iter1, iter2):
@@ -229,8 +233,8 @@ def compare(self, storage1, storage2):
             eq(txn1.user,        txn2.user)
             eq(txn1.description, txn2.description)
             eq(txn1.extension,  txn2.extension)
-            rawext1 = getattr(txn1, "extension_bytes", None)
-            rawext2 = getattr(txn2, "extension_bytes", None)
+            rawext1 = getattr(txn1, "extension_bytes", missing)
+            rawext2 = getattr(txn2, "extension_bytes", missing)
             eq(rawext1, rawext2)
             itxn1 = iter(txn1)
             itxn2 = iter(txn2)
diff --git a/src/ZODB/tests/testFileStorage.py b/src/ZODB/tests/testFileStorage.py
index 7f3bc9933..90493fa16 100644
--- a/src/ZODB/tests/testFileStorage.py
+++ b/src/ZODB/tests/testFileStorage.py
@@ -56,7 +56,7 @@ class FileStorageTests(
     ReadOnlyStorage.ReadOnlyStorage
     ):
 
-    supportsExtensionBytes = True
+    supports_extension_bytes = True
 
     def open(self, **kwargs):
         self._storage = ZODB.FileStorage.FileStorage('FileStorageTests.fs',

Since zodbdump start (c0a6299 "zodbdump - Tool to dump content of a ZODB database (draft)") and up till now zodbdump output format was not good. For example user and description transaction properties were output without proper quoting, which in situation when there would be fancy characters in there would break the output. So start the format stabilization: - user and description are output as quoted, so now they are guaranteed to be on one line. The quoting character is always " (instead of e.g. smartly quoting either by ' or " as python does) for easier compatibility with ZODB implementations in other languages. - transaction extension is now printed as raw bytes, not as dict. The idea here is that `zodb dump` * should perform dump of raw data as stored inside ZODB so that later `zodb restore` could restore the database identically to the same state. * we should dump raw data instead of unpickled ones because generally on-disk extension's format can be any raw bytes and this information should be preserved. - transaction status is now also output as quoted to preserve line breakage on fancy status codes. - it is documented that sha1 is not the only allowed hash function that might be used. - in hashonly mode we add trailing " -" to obj string so that it is possible to distinguish outputs of `zodb dump` and `zodb dump -hashonly` without knowing a-priory the way it was produced. The reason to do so is that it would be not good to e.g. by accident feed hashonly output to (future) `zodb restore`, which, without having a way to see it should not consume object data would read following transaction information as raw object data with confusing later errors (and a small chance to restore completely different database without reporting error at all). Because ZODB iteration API gives us already unpickled extension and only that, for now to dump it as raw we get a long road to pickle it back also caring to try to pickle in stable order. Hopefully this will be only a fallback because of zopefoundation/ZODB#183 and next zodbtools patch. ~~~~ For testing purposes (currently only quoting function unit test) py.test usage is introduced. The code is also generally polished here and there.

…upport is available This patch test at runtime whether used storage can provide transaction metadata in raw form and if so used thi form directly without going through long fragile way of stable pickling extension back to bytes. Corresponding ZODB support is at zopefoundation/ZODB#183

We start to stabilize output format of `zodb dump`. It is actually now robust and the only thing I would contemplate to potentially change is to also cover transaction metadata by hash checksum. So please take a look at updated format (details in patch 1) to provide feedback because it is likely close to what it will be in its final form. We also add a program to generate test database which uses various fancy ZODB features and check `zodb dump` output on it to golden one (patch 3). To be able to dump transaction metadata in raw form ZODB is patched a bit: zopefoundation/ZODB#183 and we try to detect whether appropriate support is there at runtime and if yes use it to streamline obtaining transaction extension as raw (patch 2). Pleae see patch 1 (second half of `zodbdump.py` about what has to be taken on without such support and that it still can't work fully reliably). /cc @Nexedi /reviewed-on https://lab.nexedi.com/nexedi/zodbtools/merge_requests/3

IStorage implementations used to do this task themselves which leads to code duplication and sometimes bugs (one was fixed recently in NEO). Like for object serialization, this should be done by the upper layer (Connection). This commit also provides a way to get raw extensions data while iterating over transactions (this is actually the original purpose[2]). So far, extension data could only be retrieved unpickled, which caused several problems: - tools like `zodb dump` [1] cannot dump data exactly as stored on a storage. This makes database potentially not bit-to-bit identical to its original after restoring from such dump. - `zodb dump` output could be changing from run to run on the same database. This comes from the fact that e.g. python dictionaries are unordered and so when pickling a dict back to bytes the result could be not the same as original. ( this problem can be worked-around partly to work reliably for e.g. dict with str keys - by always emitting items in key sorted order, but it is hard to make it work reliably for arbitrary types ) Both issues make it hard to verify integrity of database at the lowest possible level after restoration, and make it hard to verify bit-to-bit compatibility with non-python ZODB implementations. For this, TransactionMetaData gets a new 'extension_bytes' attribute and and common usage becames: * Application committing a transaction: - 'extension' is set with a dictionary - the storage gets the bytes via 'extension_bytes' * Iteration: - the storage passes bytes as 'extension' parameter of TransactionMetaData - the application can get extension data either as bytes ('extension_bytes') or deserialized ('extension'): in the former case, no deserialization happens and the returned value is exactly what was passed by the storage Conversion between 'extension' and 'extension_bytes' is automatic and lazy, so that these attributes remain writable. [1] https://lab.nexedi.com/nexedi/zodbtools [2] #183 Co-Authored-By: Kirill Smelkov <[email protected]>

IStorage implementations used to do this task themselves which leads to code duplication and sometimes bugs (one was fixed recently in NEO). Like for object serialization, this should be done by the upper layer (Connection). This commit also provides a way to get raw extensions data while iterating over transactions (this is actually the original purpose[2]). So far, extension data could only be retrieved unpickled, which caused several problems: - tools like `zodb dump` [1] cannot dump data exactly as stored on a storage. This makes database potentially not bit-to-bit identical to its original after restoring from such dump. - `zodb dump` output could be changing from run to run on the same database. This comes from the fact that e.g. python dictionaries are unordered and so when pickling a dict back to bytes the result could be not the same as original. ( this problem can be worked-around partly to work reliably for e.g. dict with str keys - by always emitting items in key sorted order, but it is hard to make it work reliably for arbitrary types ) Both issues make it hard to verify integrity of database at the lowest possible level after restoration, and make it hard to verify bit-to-bit compatibility with non-python ZODB implementations. For this, TransactionMetaData gets a new 'extension_bytes' attribute and and common usage becomes: * Application committing a transaction: - 'extension' is set with a dictionary - the storage gets the bytes via 'extension_bytes' * Iteration: - the storage passes bytes as 'extension' parameter of TransactionMetaData - the application can get extension data either as bytes ('extension_bytes') or deserialized ('extension'): in the former case, no deserialization happens and the returned value is exactly what was passed by the storage Conversion between 'extension' and 'extension_bytes' is automatic and lazy, so that these attributes remain writable. [1] https://lab.nexedi.com/nexedi/zodbtools [2] #183 Co-Authored-By: Kirill Smelkov <[email protected]>

navytux · 2018-12-06T18:18:21Z

@jimfulton, @jamadden, it is sad to see no progress neither here, nor in #207. Personally I think this PR should be ok, but I would accept it to be any way. Whatever way it is the worst would be to remain with current status quo - to not being able to load raw data from the database at all.

Kirill

jmuchemb · 2018-12-06T18:22:02Z

#207 is almost ready to be merged. I just have to decide. So busy currently.

Currently we exercise zodbdump and zodbcommit+zodbdump with non-empty extensions, which works if ZODB is patched for txn.extension_bytes support, but fails on pristine ZODB. Support for txn.extension_bytes cannot get into upstream ZODB for more than a year: zopefoundation/ZODB#183 zopefoundation/ZODB#207 and even if it somehow will make it, it will likely be only in ZODB5, while we still care to support ZODB4 and ZODB3. Skipping zodbdump / zodbcommit tests, if a ZODB does not have txn.extension_bytes support, would result in significant reduction of zodbtools test coverage, because practically that is the current situation with all upstream ZODB{3,4,5}. Dropping test coverage for non-empty extensions is neither a good option. For those reason, let's rework the tests and test both zodbdump and zodbcommit with two scenarios: 1. on a test database where transactions extensions are always empty. This should work on all ZODB irregardless of whether txn.extension_bytes patch is there or not. 2. on a test database where transactions extensions are present. This should work if ZODB has txn.extension_bytes support, but if not, we can mark this case as xfail, since the failure is expected. This way we make the testsuite pass irregardless of whether txn.extension_bytes support is there, and we don't abandon dump/commit testing coverage. /helped-by Jérome Perrin <[email protected]>

until zopefoundation/ZODB#183 gets merged, let's run also the tests for this, since we have support for this extension.

IStorage implementations used to do this task themselves which leads to code duplication and sometimes bugs (one was fixed recently in NEO). Like for object serialization, this should be done by the upper layer (Connection). This commit also provides a way to get raw extensions data while iterating over transactions (this is actually the original purpose[2]). So far, extension data could only be retrieved unpickled, which caused several problems: - tools like `zodb dump` [1] cannot dump data exactly as stored on a storage. This makes database potentially not bit-to-bit identical to its original after restoring from such dump. - `zodb dump` output could be changing from run to run on the same database. This comes from the fact that e.g. python dictionaries are unordered and so when pickling a dict back to bytes the result could be not the same as original. ( this problem can be worked-around partly to work reliably for e.g. dict with str keys - by always emitting items in key sorted order, but it is hard to make it work reliably for arbitrary types ) Both issues make it hard to verify integrity of database at the lowest possible level after restoration, and make it hard to verify bit-to-bit compatibility with non-python ZODB implementations. For this, TransactionMetaData gets a new 'extension_bytes' attribute and and common usage becomes: * Application committing a transaction: - 'extension' is set with a dictionary - the storage gets the bytes via 'extension_bytes' * Iteration: - the storage passes bytes as 'extension' parameter of TransactionMetaData - the application can get extension data either as bytes ('extension_bytes') or deserialized ('extension'): in the former case, no deserialization happens and the returned value is exactly what was passed by the storage Conversion between 'extension' and 'extension_bytes' is automatic and lazy, so that these attributes remain writable. [1] https://lab.nexedi.com/nexedi/zodbtools [2] #183 Co-Authored-By: Kirill Smelkov <[email protected]>

IStorage implementations used to do this task themselves which leads to code duplication and sometimes bugs (one was fixed recently in NEO). Like for object serialization, this should be done by the upper layer (Connection). This commit also provides a way to get raw extensions data while iterating over transactions (this is actually the original purpose[2]). So far, extension data could only be retrieved unpickled, which caused several problems: - tools like `zodb dump` [1] cannot dump data exactly as stored on a storage. This makes database potentially not bit-to-bit identical to its original after restoring from such dump. - `zodb dump` output could be changing from run to run on the same database. This comes from the fact that e.g. python dictionaries are unordered and so when pickling a dict back to bytes the result could be not the same as original. ( this problem can be worked-around partly to work reliably for e.g. dict with str keys - by always emitting items in key sorted order, but it is hard to make it work reliably for arbitrary types ) Both issues make it hard to verify integrity of database at the lowest possible level after restoration, and make it hard to verify bit-to-bit compatibility with non-python ZODB implementations. For this, TransactionMetaData gets a new 'extension_bytes' attribute and and common usage becomes: * Application committing a transaction: - 'extension' is set with a dictionary - the storage gets the bytes via 'extension_bytes' * Iteration: - the storage passes bytes as 'extension' parameter of TransactionMetaData - the application can get extension data either as bytes ('extension_bytes') or deserialized ('extension'): in the former case, no deserialization happens and the returned value is exactly what was passed by the storage [1] https://lab.nexedi.com/nexedi/zodbtools [2] #183 Co-Authored-By: Kirill Smelkov <[email protected]>

jamadden · 2019-09-26T11:32:59Z

I believe this was resolved with #207

navytux added a commit to navytux/ZEO that referenced this pull request Nov 2, 2017

Add support to retrieve raw extension data to ZEO

a895e3f

See zopefoundation/ZODB#183 for context.

navytux mentioned this pull request Nov 2, 2017

Add support to retrieve raw extension data to ZEO zopefoundation/ZEO#100

Closed

icemac requested a review from jimfulton December 13, 2017 07:09

navytux force-pushed the y/rawext branch from 738647d to d5d43ba Compare January 14, 2018 16:49

navytux commented Jan 14, 2018

View reviewed changes

navytux force-pushed the y/rawext branch from d5d43ba to 7f5a67b Compare April 5, 2018 17:16

jmuchemb mentioned this pull request May 31, 2018

Make TransactionMetaData in charge of (de)serializing extension data #207

Merged

NexediGitlab pushed a commit to Nexedi/zodbtools that referenced this pull request Jan 31, 2019

tox: run tests against ZODB #183

c50bfb0

until zopefoundation/ZODB#183 gets merged, let's run also the tests for this, since we have support for this extension.

jamadden closed this Sep 26, 2019

navytux deleted the y/rawext branch July 7, 2020 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a way to retrieve raw extensions data #183

Provide a way to retrieve raw extensions data #183

navytux commented Nov 2, 2017

navytux commented Nov 2, 2017

navytux commented Nov 2, 2017

navytux commented Nov 17, 2017

navytux commented Jan 11, 2018

jimfulton commented Jan 13, 2018

jimfulton commented Jan 13, 2018

navytux commented Jan 14, 2018 •

edited

Loading

jimfulton commented Jan 14, 2018 via email

navytux commented Jan 14, 2018

navytux commented Jan 14, 2018

navytux Jan 14, 2018

navytux Jan 14, 2018

navytux Jan 14, 2018

navytux Jan 14, 2018 •

edited

Loading

navytux Jan 14, 2018

navytux commented Apr 5, 2018

navytux commented Dec 6, 2018

jmuchemb commented Dec 6, 2018

jamadden commented Sep 26, 2019

Provide a way to retrieve raw extensions data #183

Provide a way to retrieve raw extensions data #183

Conversation

navytux commented Nov 2, 2017

navytux commented Nov 2, 2017

navytux commented Nov 2, 2017

navytux commented Nov 17, 2017

navytux commented Jan 11, 2018

jimfulton commented Jan 13, 2018

jimfulton commented Jan 13, 2018

navytux commented Jan 14, 2018 • edited Loading

jimfulton commented Jan 14, 2018 via email

navytux commented Jan 14, 2018

navytux commented Jan 14, 2018

navytux Jan 14, 2018

Choose a reason for hiding this comment

navytux Jan 14, 2018

Choose a reason for hiding this comment

navytux Jan 14, 2018

Choose a reason for hiding this comment

navytux Jan 14, 2018 • edited Loading

Choose a reason for hiding this comment

navytux Jan 14, 2018

Choose a reason for hiding this comment

navytux commented Apr 5, 2018

navytux commented Dec 6, 2018

jmuchemb commented Dec 6, 2018

jamadden commented Sep 26, 2019

navytux commented Jan 14, 2018 •

edited

Loading

navytux Jan 14, 2018 •

edited

Loading