Unicode Encoding Error in Python #701

haollhao · 2015-12-19T07:10:12Z

When I was trying to print some unicode strings, an error occurred:

 print(u'"Abstract":"' + printAbstract + u'",')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b5' in position 513: ordinal not in range(128)

So it seems that strings are always encoded in ascii, even though I have declared them to be in unicode form by preceding u. However, the code was able to run in Python 2.7.10 Shell. So is there any convention that I should follow in my program or is it a bug?

Thanks 😄

The text was updated successfully, but these errors were encountered:

rgbkrk · 2015-12-29T16:51:18Z

That is indeed a tricky bit of Python 2 to wrap our heads around and common enough that lots of people experience it.

Bascially, it comes down to the fact that printAbstract is likely a str, not a unicode string. I'm going to declare x to equal 😱 and get the same error:

In [1]: x = "😱"

In [2]: u'test' + x
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-2-b42a30d7afc1> in <module>()
----> 1 u'test' + x

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

What happens here is that x is being coerced into being unicode even though it's technically a binary string in ascii. When it tries to do the conversion, it fails. We can reproduce this simply with:

In [3]: unicode(x)
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-3-c268b90adfa4> in <module>()
----> 1 unicode(x)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

In [4]: x.encode('utf-8')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-562856162c51> in <module>()
----> 1 x.encode('utf-8')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

If instead we started with a unicode string and ended with a unicode string, it just works:

In [5]: x = u"😱"

In [6]: u'test' + x
Out[6]: u'test\U0001f631'

In [7]: print(u'test' + x)
test😱

As pointed out in unicode frustrations, this also depends on the terminal reported to Python which in our case in Atom script is based on your environment variables when running atom.

Long story short, you'll want to figure out how to handle unicode directly in your code because this will bite you in production far worse than it will in your editor.

catroll · 2016-06-22T00:59:45Z

CODE

# -*- coding: utf-8 -*-

print u'中国'

PS：中国 is China.

OUTPUT

when Ctrl + Alt + B：

Traceback (most recent call last):
  File "/home/catroll/test.py", line 3, in <module>
    print u'中国'
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
[Finished in 0.059s]

dbolton · 2016-12-01T19:28:33Z

It looks like atom-script on Windows uses cp1252 (Windows 1252) encoding by default instead of utf-8.

As a work-around you can specify the encoding as utf-8 in your code for the system out and error streams:

import sys
import io

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = 'utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding = 'utf-8')

print ('汉语/漢語')

ghost · 2018-11-10T21:21:06Z

Hello, i am having the same problem here but i found a few alternatives:
1-download package:(if you are using atom text editor)

2-use .encode('utf-8') this works depending on your case but give it a try.
3-open your .py file with pythonshell (same as methode 1).

drvid · 2018-11-16T18:50:17Z

I was able to fix these issues in my Atom script output by simply adding the environment variable PYTHONIOENCODING=utf8

ghost · 2018-11-16T19:03:31Z

hello, thank you for the help! I appreciate it a lot because i had to run my code outside of atom some times with the python interpreter so thank you so much.

…

On Fri, 16 Nov 2018 at 20:50, D. Starr ***@***.***> wrote: I was able to fix these issues in my Atom script output by simply adding the environment variable PYTHONIOENCODING=utf8 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#701 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AmgkDp0szcoAJpkwDJ1-krF1Lzxbg1HDks5uvwjrgaJpZM4G4luU> .

nixel2007 mentioned this issue Mar 17, 2016

UnicodeEncodeError when launching python2 script which contains non-ascii output #840

Open

nixel2007 mentioned this issue Nov 30, 2016

The Atom script package requires script execution screen encoding in Java #1166

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode Encoding Error in Python #701

Unicode Encoding Error in Python #701

haollhao commented Dec 19, 2015

rgbkrk commented Dec 29, 2015

catroll commented Jun 22, 2016

dbolton commented Dec 1, 2016

ghost commented Nov 10, 2018 •

edited by ghost

Loading

drvid commented Nov 16, 2018

ghost commented Nov 16, 2018 via email

Unicode Encoding Error in Python #701

Unicode Encoding Error in Python #701

Comments

haollhao commented Dec 19, 2015

rgbkrk commented Dec 29, 2015

catroll commented Jun 22, 2016

CODE

OUTPUT

dbolton commented Dec 1, 2016

ghost commented Nov 10, 2018 • edited by ghost Loading

drvid commented Nov 16, 2018

ghost commented Nov 16, 2018 via email

ghost commented Nov 10, 2018 •

edited by ghost

Loading