Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly specify a file encoding of UTF-8 everywhere #560

Merged
merged 1 commit into from
May 31, 2020

Conversation

tsibley
Copy link
Member

@tsibley tsibley commented May 29, 2020

Augur mostly assumes the default file encoding UTF-8, but this is only
true on systems where the system default or default locale use UTF-8.
On systems which use the POSIX "C" locale, for example, the Python's
default file encoding is ASCII, which can cause encoding failures like
that observed with augur traits in #559. UTF-8 is a near universal
standard for encodings these days.

Note that Python 3.7 includes PEP-0538 and PEP-0540 to help address the
difference between this common assumption and the reality of default
encodings, but a) they do not allow application code to reliably avoid
specifying encodings and b) Augur supports 3.6 anyway.

Resolves #559.

Augur mostly assumes the default file encoding UTF-8, but this is only
true on systems where the system default or default locale use UTF-8.
On systems which use the POSIX "C" locale, for example, the Python's
default file encoding is ASCII, which can cause encoding failures like
that observed with `augur traits` in #559.  UTF-8 is a near universal
standard for encodings these days.

Note that Python 3.7 includes PEP-0538 and PEP-0540 to help address the
difference between this common assumption and the reality of default
encodings, but a) they do not allow application code to reliably avoid
specifying encodings and b) Augur supports 3.6 anyway.

Resolves #559.
@codecov
Copy link

codecov bot commented May 29, 2020

Codecov Report

Merging #560 into master will not change coverage.
The diff coverage is 25.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #560   +/-   ##
=======================================
  Coverage   20.86%   20.86%           
=======================================
  Files          31       31           
  Lines        5138     5138           
  Branches     1305     1305           
=======================================
  Hits         1072     1072           
  Misses       4014     4014           
  Partials       52       52           
Impacted Files Coverage Δ
augur/align.py 38.46% <0.00%> (ø)
augur/export_v2.py 8.00% <0.00%> (ø)
augur/frequencies.py 9.56% <0.00%> (ø)
augur/import_beast.py 6.75% <0.00%> (ø)
augur/lbi.py 12.32% <0.00%> (ø)
augur/parse.py 10.60% <0.00%> (ø)
augur/reconstruct_sequences.py 17.24% <0.00%> (ø)
augur/traits.py 7.82% <0.00%> (ø)
augur/translate.py 16.99% <0.00%> (ø)
augur/tree.py 9.74% <0.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42a26ed...0e52323. Read the comment docs.

Copy link
Member

@rneher rneher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me. since this touches many files, probably good to merge sooner rather than later...

@rneher
Copy link
Member

rneher commented May 31, 2020

I made and equivalent change in TreeTime:
neherlab/treetime@5b8cc38

@rneher rneher merged commit b5dc7ff into master May 31, 2020
@rneher rneher deleted the explicit-encoding branch May 31, 2020 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ancestral trait reconstruction can fail for traits with more than 62 unique states
2 participants