Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 970: character maps to <undefined> #188

Open
Looki2000 opened this issue Jan 5, 2025 · 3 comments

Comments

@Looki2000
Copy link

After installing the library with pip and trying to initialize it, I'm getting the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\elect\AppData\Local\Programs\Python\Python312\Lib\site-packages\epitran\_epitran.py", line 39, in __init__
    self.epi = SimpleEpitran(code, preproc, postproc, ligatures, rev, rev_preproc, rev_postproc, tones=tones)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\elect\AppData\Local\Programs\Python\Python312\Lib\site-packages\epitran\simple.py", line 46, in __init__
    self.ft = panphon.FeatureTable()
              ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\elect\AppData\Local\Programs\Python\Python312\Lib\site-packages\panphon\featuretable.py", line 62, in __init__
    self.segments, self.seg_dict, self.names = self._read_bases(bases_fn, self.weights)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\elect\AppData\Local\Programs\Python\Python312\Lib\site-packages\panphon\featuretable.py", line 81, in _read_bases
    header = next(reader)
             ^^^^^^^^^^^^
  File "C:\Users\elect\AppData\Local\Programs\Python\Python312\Lib\encodings\cp1250.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 970: character maps to <undefined>

epi = epitran.Epitran("eng-Latn")
It does not matter what language.

Python 3.12.5
epitran 1.25.1

@nicloay
Copy link

nicloay commented Jan 7, 2025

Just to confirm that I have exactly the same issue with "deu-Latn"

@tlemangen
Copy link

Has this problem solved?

@tlemangen
Copy link

My error is: UnicodeDecodeError: 'gbk' codec can't decode byte 0xa3 in position 7832: illegal multibyte sequence.
So I modified panphon's code to force the file to be read using utf-8 encoding:

  1. Open the panphon/featuretable.py file.
  2. Find the _read_bases function, it should be in line 76.
  3. Modify the Open () function and specify the encoding to be utf-8, like
with open(fn, encoding='utf-8') as f:
    reader = csv.reader(f)
    header = next(reader)
    ...

This solution is effective for me, but I think it is temporary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants