Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File encoding is always the system one and cannot be changed #189

Closed
reconman opened this issue Feb 6, 2022 · 6 comments · May be fixed by #190
Closed

File encoding is always the system one and cannot be changed #189

reconman opened this issue Feb 6, 2022 · 6 comments · May be fixed by #190

Comments

@reconman
Copy link

reconman commented Feb 6, 2022

In this line, open() is used without any encoding parameter, so the system one is always chosen: https://github.com/23andMe/Yamale/blob/master/yamale/readers/yaml_reader.py#L34

This leads to users not being able to read UTF-8 encoded files on Windows.

It should be possible to set the encoding as an optional parameter during make_schema() and make_data().

I would advocate for the encoding being UTF-8 by default.

@mildebrandt
Copy link

Hi @reconman , thanks for your interest in Yamale!

We cannot change the encoding to UTF-8 without releasing a major version change since that may break existing users. We can to default it to the user's default locale instead. But before we look at doing that, I'd like to find a way to do this without a change to Yamale.

Have you tried to enable UTF-8 mode when running python? https://www.python.org/dev/peps/pep-0540/

You can either use python -X uft8 or set the environment variable PYTHONUTF8=1.

Let me know if that works for your use case.

@reconman
Copy link
Author

reconman commented Feb 9, 2022

The project I'm maintaining is using Yamale as a library. Most of my users won't read the instructions and always miss the part where they have to set either of those.

I'd like to avoid that and directly set the encoding during the function call. When users clone my project, they already receive files with UTF-8 encoding and there are also some files provided by the community in UTF-8.

And it's easier to tell all users "use UTF-8 encoded YAML files" than to ask them if they're on Windows.

@mildebrandt
Copy link

Thanks for outlining your use case. I agree setting the encoding in yamale will work best for you.

@mildebrandt
Copy link

Since you're using Yamale as a library, would the following work for you?

import yamale

with open('./189.schema', 'r', encoding='utf-16') as f:
    schema = yamale.make_schema(content=f.read())

with open('./189.yaml', 'r', encoding='utf-8') as f:
    data = yamale.make_data(content=f.read())

yamale.validate(schema, data)

I'm trying to be careful about each additional parameter we add since it does increase the use cases we need to support.

@reconman
Copy link
Author

Yes, that works. If you don't want to change Yamale, then you can close this issue.

@mildebrandt
Copy link

I'm glad that solution works for you. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants