Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode Encoding Error in Python #701

Open
haollhao opened this issue Dec 19, 2015 · 6 comments
Open

Unicode Encoding Error in Python #701

haollhao opened this issue Dec 19, 2015 · 6 comments

Comments

@haollhao
Copy link

When I was trying to print some unicode strings, an error occurred:

 print(u'"Abstract":"' + printAbstract + u'",')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b5' in position 513: ordinal not in range(128)

So it seems that strings are always encoded in ascii, even though I have declared them to be in unicode form by preceding u. However, the code was able to run in Python 2.7.10 Shell. So is there any convention that I should follow in my program or is it a bug?

Thanks 😄

@rgbkrk
Copy link
Member

rgbkrk commented Dec 29, 2015

That is indeed a tricky bit of Python 2 to wrap our heads around and common enough that lots of people experience it.

Bascially, it comes down to the fact that printAbstract is likely a str, not a unicode string. I'm going to declare x to equal 😱 and get the same error:

In [1]: x = "😱"

In [2]: u'test' + x
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-2-b42a30d7afc1> in <module>()
----> 1 u'test' + x

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

What happens here is that x is being coerced into being unicode even though it's technically a binary string in ascii. When it tries to do the conversion, it fails. We can reproduce this simply with:

In [3]: unicode(x)
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-3-c268b90adfa4> in <module>()
----> 1 unicode(x)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

In [4]: x.encode('utf-8')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-562856162c51> in <module>()
----> 1 x.encode('utf-8')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

If instead we started with a unicode string and ended with a unicode string, it just works:

In [5]: x = u"😱"

In [6]: u'test' + x
Out[6]: u'test\U0001f631'

In [7]: print(u'test' + x)
test😱

As pointed out in unicode frustrations, this also depends on the terminal reported to Python which in our case in Atom script is based on your environment variables when running atom.

Long story short, you'll want to figure out how to handle unicode directly in your code because this will bite you in production far worse than it will in your editor.

@catroll
Copy link

catroll commented Jun 22, 2016

CODE

# -*- coding: utf-8 -*-

print u'中国'

PS:中国 is China.

OUTPUT

when Ctrl + Alt + B

Traceback (most recent call last):
  File "/home/catroll/test.py", line 3, in <module>
    print u'中国'
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
[Finished in 0.059s]

@dbolton
Copy link

dbolton commented Dec 1, 2016

It looks like atom-script on Windows uses cp1252 (Windows 1252) encoding by default instead of utf-8.

As a work-around you can specify the encoding as utf-8 in your code for the system out and error streams:

import sys
import io

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = 'utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding = 'utf-8')

print ('汉语/漢語')

@ghost
Copy link

ghost commented Nov 10, 2018

Hello, i am having the same problem here but i found a few alternatives:
1-download package:(if you are using atom text editor)
git_help
2-use .encode('utf-8') this works depending on your case but give it a try.
3-open your .py file with pythonshell (same as methode 1).

@drvid
Copy link

drvid commented Nov 16, 2018

I was able to fix these issues in my Atom script output by simply adding the environment variable PYTHONIOENCODING=utf8

@ghost
Copy link

ghost commented Nov 16, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants