-
Notifications
You must be signed in to change notification settings - Fork 107
Unicode string silently truncated
-
Affected Components : codecs
-
Operating System : Linux
-
Python Versions : 2.6.x, 2.7.x
-
Reproducible : Yes
# -*- coding: utf-8 -*-
import codecs
import io
import sys
try:
ascii
except NameError:
ascii = repr
b = b'\x41\xF5\x42\x43\xF4'
print("Correct-String %r") % ((ascii(b.decode('utf8', 'replace'))))
with open('temp.bin', 'wb') as fout:
fout.write(b)
with codecs.open('temp.bin', encoding='utf8', errors='replace') as fin:
print("TEST1-String %r") % (ascii(fin.read()))
with io.open('temp.bin', 'rt', encoding='utf8', errors='replace') as fin:
print("TEST2-String %r") % (ascii(fin.read()))
sys.exit(0)
To reproduce the problem copy the source code
in a file and execute the script using the following command syntax:
$ python -OOBRtt test.py
Alternatively you can open python in interactive mode:
$ python -OOBRtt <press enter>
Then copy the lines of code into the interpreter.
Execution of the test script produces the following results:
Correct-String "u'A\\ufffdBC\\ufffd'"
TEST1-String "u'A\\ufffdBC'"
TEST2-String "u'A\\ufffdBC\\ufffd'"
The problem is due to a problem in the codecs
module that detects the character F4
and assumes this is the first character of a sequence of characters and waits to receive the remaining 3 bytes, as a consequence the resulting string is truncated.
Correct-String "u'A\\ufffdBC\\ufffd'"
TEST1-String "u'A\\ufffdBC'"
A better and safer approach would be to read the entire stream and only then proceed to the decoding phase, as done by the io
module.
TEST2-String "u'A\\ufffdBC\\ufffd'"
We are not aware on any easy solution other than trying to avoid using 'codecs'
in cases like the one examined.
[Python module io][01] [01]:https://docs.python.org/2/library/io.html
[Python module codecs][02] [02]:https://docs.python.org/2/library/codecs.html
[Python bug 12508][03] [03]:http://bugs.python.org/issue12508
Main site: pythonsecurity.org
OWASP Page: owasp.org/index.php/OWASP_Python_Security_Project