Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubles with reading cyrillic chars #2

Open
tsouvarev opened this issue Dec 1, 2014 · 4 comments
Open

Troubles with reading cyrillic chars #2

tsouvarev opened this issue Dec 1, 2014 · 4 comments

Comments

@tsouvarev
Copy link

Hello, first of all, thanks for this port of dbfread

I have DBF file with cyrillic fields. When I try to read these fields, I get
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte in fields.py:213 which is return value.rstrip(b' ').decode('utf-8')

When I change utf-8 to cp866 (default encoding for DBF), everything is working just fine.

phargogh added a commit that referenced this issue Jun 1, 2015
@phargogh
Copy link
Owner

phargogh commented Jun 1, 2015

Hi @tsouvarev, Thanks for your feedback! I've changed a couple things in fields.py so that instead of trying to decode using utf-8, we decode with your system's encoding. This isn't really an ideal solution, but I think it might at least solve your issue for the moment.

Could you try a fresh clone and install and see if that fixes the issue for you?

@tsouvarev
Copy link
Author

@phargogh unfortunately, I have no dbf files right now to check your fix. But feel free to close this issue, if you think, that it is solved

@phargogh
Copy link
Owner

phargogh commented Jun 4, 2015

No worries! I'll leave it open for the time being until I can verify that it will work as expected.

@nmset
Copy link

nmset commented Apr 1, 2017

On Linux, I had to change 'locale.getpreferredencoding' to 'cp850' to fully import a DBF file created in Windows. Else, fields with accented characters are dropped. Could it be made to look for an user defined environment variable, kind of 'DBFPY3_DECODE_FROM', that points to the source encoding ? If none is declared, use 'locale.getpreferredencoding'. I know, it's hacky and not smart, just thinking it would be pragmatic. Thanks for this useful tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants