Explicitly specify a file encoding of UTF-8 everywhere #560

tsibley · 2020-05-29T19:38:45Z

Augur mostly assumes the default file encoding UTF-8, but this is only
true on systems where the system default or default locale use UTF-8.
On systems which use the POSIX "C" locale, for example, the Python's
default file encoding is ASCII, which can cause encoding failures like
that observed with augur traits in #559. UTF-8 is a near universal
standard for encodings these days.

Note that Python 3.7 includes PEP-0538 and PEP-0540 to help address the
difference between this common assumption and the reality of default
encodings, but a) they do not allow application code to reliably avoid
specifying encodings and b) Augur supports 3.6 anyway.

Resolves #559.

Augur mostly assumes the default file encoding UTF-8, but this is only true on systems where the system default or default locale use UTF-8. On systems which use the POSIX "C" locale, for example, the Python's default file encoding is ASCII, which can cause encoding failures like that observed with `augur traits` in #559. UTF-8 is a near universal standard for encodings these days. Note that Python 3.7 includes PEP-0538 and PEP-0540 to help address the difference between this common assumption and the reality of default encodings, but a) they do not allow application code to reliably avoid specifying encodings and b) Augur supports 3.6 anyway. Resolves #559.

codecov · 2020-05-29T19:43:11Z

Codecov Report

Merging #560 into master will not change coverage.
The diff coverage is 25.00%.

@@           Coverage Diff           @@
##           master     #560   +/-   ##
=======================================
  Coverage   20.86%   20.86%           
=======================================
  Files          31       31           
  Lines        5138     5138           
  Branches     1305     1305           
=======================================
  Hits         1072     1072           
  Misses       4014     4014           
  Partials       52       52

Impacted Files	Coverage Δ
augur/align.py	`38.46% <0.00%> (ø)`
augur/export_v2.py	`8.00% <0.00%> (ø)`
augur/frequencies.py	`9.56% <0.00%> (ø)`
augur/import_beast.py	`6.75% <0.00%> (ø)`
augur/lbi.py	`12.32% <0.00%> (ø)`
augur/parse.py	`10.60% <0.00%> (ø)`
augur/reconstruct_sequences.py	`17.24% <0.00%> (ø)`
augur/traits.py	`7.82% <0.00%> (ø)`
augur/translate.py	`16.99% <0.00%> (ø)`
augur/tree.py	`9.74% <0.00%> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42a26ed...0e52323. Read the comment docs.

rneher

this looks good to me. since this touches many files, probably good to merge sooner rather than later...

rneher · 2020-05-31T08:27:01Z

I made and equivalent change in TreeTime:
neherlab/treetime@5b8cc38

tsibley mentioned this pull request May 29, 2020

Ancestral trait reconstruction can fail for traits with more than 62 unique states #559

Closed

rneher approved these changes May 29, 2020

View reviewed changes

rneher merged commit b5dc7ff into master May 31, 2020

rneher deleted the explicit-encoding branch May 31, 2020 08:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly specify a file encoding of UTF-8 everywhere #560

Explicitly specify a file encoding of UTF-8 everywhere #560

tsibley commented May 29, 2020

codecov bot commented May 29, 2020 •

edited

Loading

rneher left a comment

rneher commented May 31, 2020

Explicitly specify a file encoding of UTF-8 everywhere #560

Explicitly specify a file encoding of UTF-8 everywhere #560

Conversation

tsibley commented May 29, 2020

codecov bot commented May 29, 2020 • edited Loading

Codecov Report

rneher left a comment

Choose a reason for hiding this comment

rneher commented May 31, 2020

codecov bot commented May 29, 2020 •

edited

Loading