Skip to content

Bytes type in python 2 and python 3

ebranca edited this page Jun 14, 2014 · 1 revision

Classification

  • Affected Components : builtins

  • Operating System : Linux

  • Python Versions : 2.6.x, 2.7.x

  • Reproducible : Yes

Source code

ba = bytearray(range(8))
print(repr("ba=%s" % ba))

b = bytes(range(8))
print(repr("b=%s" % b))

if ba == b:
    print(repr("PASS => bytearray(range(8) == bytes(range(8)"))
else:
    print(repr("FAIL => bytearray(range(8) /= bytes(range(8)"))

print(ba.partition(b'\x03'))
print(ba.partition(3))

if ba == b:
    print(repr("PASS => ba.partition(b'\x03')) == ba.partition(3)"))
else:
    print(repr("FAIL =>ba.partition(b'\x03')) /= ba.partition(3)"))

Steps to Produce/Reproduce

To reproduce the problem copy the source code in a file and execute the script using the following command syntax:

$ python -OOBRtt test.py

Alternatively you can open python in interactive mode:

$ python -OOBRtt <press enter>

Then copy the lines of code into the interpreter.

Description

All methods using bytearray should have consistent behaviour across all version of python but this is not the case.

As we can see from the output of the code used for testing:

python -OOBRttu 'test.py' 
'ba=\x00\x01\x02\x03\x04\x05\x06\x07'
'b=[0, 1, 2, 3, 4, 5, 6, 7]'
'FAIL => bytearray(range(8) /= bytes(range(8)'
(bytearray(b'\x00\x01\x02'), bytearray(b'\x03'), bytearray(b'\x04\x05\x06\x07'))
(bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07'), bytearray(b''), bytearray(b''))
"FAIL =>ba.partition(b'\x03')) /= ba.partition(3)"

In Python 2.x builtin functions bytes and bytearray are using different logic.

Python 2.x
Case 1 - Allocate 8 bytes using bytearray
ba = bytearray(range(8))
print(repr("ba=%s" % ba))

Result: 'ba=\x00\x01\x02\x03\x04\x05\x06\x07'

Case 2 - Allocate 8 bytes using bytes
b = bytes(range(8))
print(repr("b=%s" % b))

Result: 'b=[0, 1, 2, 3, 4, 5, 6, 7]'

From the way python execute the code seems that if a is not a bytearray or a buffer object ba.partition(a) is actually executed as ba.partition(bytearray(a)).

In python 3.x bytes and bytearray do NOT have different representations.

Python 3.x
Case 1 - Allocate 8 bytes using bytearray

Result: "ba=bytearray(b'\\x00\\x01\\x02\\x03\\x04\\x05\\x06\\x07')"

Case 2 - Allocate 8 bytes using bytes

Result: "b=b'\\x00\\x01\\x02\\x03\\x04\\x05\\x06\\x07'"

The reason is that in python 2.x there is no bytes type, the bytes builtin is just an alias for the str type, and has to be considered as just another alias and literal syntax for str.

Python [documentation][04] contains a more detailed explanation and here the relevant snippet form the documentation:

Python 3.0 adopts Unicode as the language’s fundamental string type and denotes 8-bit literals differently, either as b'string' or using a bytes constructor. For future compatibility, Python 2.6 adds bytes as a synonym for the str type, and it also supports the b'' notation.

The 2.6 str differs from 3.0’s bytes type in various ways; most notably, the constructor is completely different. In 3.0, bytes([65, 66, 67]) is 3 elements long, containing the bytes representing ABC; in 2.6, bytes([65, 66, 67]) returns the 12-byte string representing the str() of the list.

The primary use of bytes in 2.6 will be to write tests of object type such as isinstance(x, bytes). This will help the 2to3 converter, which can’t tell whether 2.x code intends strings to contain either characters or 8-bit bytes; you can now use either bytes or str to represent your intention exactly, and the resulting code will also be correct in Python 3.0.

At the C level, Python 3.0 will rename the existing 8-bit string type, called PyStringObject in Python 2.x, to PyBytesObject. Python 2.6 uses #define to support using the names PyBytesObject(), PyBytes_Check(), PyBytes_FromStringAndSize(), and all the other functions and macros used with strings.

Workaround

We are not aware on any solution or package that safely implements both bytes type and bytearray across python versions.

Secure Implementation

WORK IN PROGRESS

References

[Python builtin functions][01] [01]:https://docs.python.org/2/library/functions.html

[Python String/Bytes][02] [02]:https://docs.python.org/2/c-api/string.html

[Python bug 20047][03] [03]:http://bugs.python.org/issue20047

[Python PEP 3112][04] [04]:https://docs.python.org/2/whatsnew/2.6.html#pep-3112-byte-literals

  • Home
  • [Security Concerns](Security Concerns)
Clone this wiki locally