bytes_terminate(): speed up by avoiding unnecessary memory copying · kaitai-io/kaitai_struct_python_runtime@92a2d71

Commit

bytes_terminate(): speed up by avoiding unnecessary memory copying

This rewrite of the bytes_terminate() function fixes performance issues
of the old version without any known performance regressions.

`bytes.partition()` used before is quite fast on its own for what it's
supposed to do. However, there were two measurable performance issues in
our use of it, if we consider that the `data` argument is an immutable
`bytes` object (which is true in practice, since our read_bytes() method
returns `bytes` objects):

1. See https://docs.python.org/3/library/stdtypes.html#bytes.partition:

   > Split the sequence at the first occurrence of sep, and return a
   > 3-tuple containing the part before the separator, the separator
   > itself or its bytearray copy, **and the part after the separator**.

   The important part is that it creates and returns the part _after the
   separator_ even though we don't need it. Unfortunately, as of Python
   3.11.5, slicing an immutable `bytes` object incurs a memory copy,
   i.e. it is not a zero-copy operation. In other words, the
   bytes_terminate() function unnecessarily copied all bytes after the
   terminator, only to not use it afterward. Of course, it's faster to
   copy only the part up to the terminator - the earlier the terminator
   occurs, the faster the operation.

2. The `include_term=True` was significantly slower than
   `include_term=False`. This is because `bytes.partition()` always
   returns the part before the separator, but if we have a non-empty
   terminator byte to include, we have no other option than to
   concatenate two `bytes` objects. This copies memory because `bytes`
   objects are immutable.

See the benchmark at
https://gist.github.com/generalmimon/dcaec4d005d0d1a87bd237d94199203a
(results from Python 3.10.12 in WSL 2 on my computer are at the bottom) -
we can clearly see both mentioned issues. For `include_term=False`, if
the terminator is in the middle of a 128 KiB byte array, the new
implementation is 1.8x faster due to the issue 1) above. If the
terminator is near the end of these 128 KiB or is missing, there's no
measurable difference between the implementations. For
`include_term=True`, the new version is 2.0x faster than the old one if
the terminator is near the end due to 2) - the new impl has the same
performance regardless of `include_term`, whereas in the old impl
`include_term=True` was twice as slow as `include_term=False`.

Loading branch information

generalmimon committed Sep 28, 2023

1 parent d457617 commit 92a2d71

kaitaistruct.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -445,10 +445,10 @@ def bytes_strip_right(data, pad_byte):
  
        @staticmethod

        def bytes_terminate(data, term, include_term):

            new_data, term_byte, _ = data.partition(KaitaiStream.byte_from_int(term))

            if include_term:

                new_data += term_byte

            return new_data

            term_index = KaitaiStream.byte_array_index_of(data, term)

            if term_index == -1:

                return data[:]

            return data[:term_index + (1 if include_term else 0)]

        # endregion

0 comments on commit `92a2d71`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `92a2d71`

Commit

There are no files selected for viewing

0 comments on commit 92a2d71

0 comments on commit `92a2d71`