Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for compressed strain name files #730

Merged
merged 1 commit into from
Jun 2, 2021

Conversation

benjaminotter
Copy link
Contributor

@benjaminotter benjaminotter commented May 28, 2021

Description of proposed changes

Replaces the open call in the read_strains function with augur.io.open_file to support compressed strain name files.

Related issue(s)

Fixes #722

Testing

Tested with compressed include.txtand exclude.txt files from the ncov workflow

@benjaminotter benjaminotter requested a review from huddlej May 28, 2021 11:01
Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @benjaminotter! There were a couple of subtle bugs that I noted in the comments below. I'll add fixes for these and push a rebased version of all changes into a single commit. Then I think we're good to merge.

augur/utils.py Outdated
@@ -25,18 +27,6 @@ class AugurException(Exception):


@contextmanager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to delete this line, too, so we don't turn is_vcf into a context manager.

augur/utils.py Outdated
@@ -713,7 +703,7 @@ def read_strains(*files, comment_char="#"):
"""
strains = set()
for input_file in files:
with open(input_file, 'r', encoding='utf-8') as ifile:
with open_file(input_file, 'r', encoding='utf-8') as ifile:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to drop the encoding keyword argument here because xopen does not support this. This is more evidence in support of implementing our own compression backend.

Uses the new `open_file` from the `io` module to open strains files and
thereby support compressed inputs. Removes the unused older `open_file`
function in `utils.py`.

Fixes #722
@huddlej huddlej force-pushed the support-compressed-strain-files branch from 38e43c8 to 9246285 Compare May 28, 2021 19:12
@huddlej huddlej marked this pull request as ready for review May 28, 2021 19:12
@codecov
Copy link

codecov bot commented May 28, 2021

Codecov Report

Merging #730 (9246285) into master (15dfc8b) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #730      +/-   ##
==========================================
+ Coverage   31.51%   31.53%   +0.02%     
==========================================
  Files          41       41              
  Lines        5674     5666       -8     
  Branches     1373     1371       -2     
==========================================
- Hits         1788     1787       -1     
+ Misses       3812     3805       -7     
  Partials       74       74              
Impacted Files Coverage Δ
augur/utils.py 41.48% <100.00%> (+0.70%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15dfc8b...9246285. Read the comment docs.

@huddlej huddlej added this to the Feature release 12.1.0 milestone Jun 2, 2021
@huddlej huddlej merged commit bb6fcdf into master Jun 2, 2021
@huddlej huddlej deleted the support-compressed-strain-files branch June 2, 2021 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support compressed strain name files
2 participants