Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Bad Error if 'vcftools' not Present #321

Merged
merged 2 commits into from
Jul 25, 2019
Merged

Prevent Bad Error if 'vcftools' not Present #321

merged 2 commits into from
Jul 25, 2019

Conversation

emmahodcroft
Copy link
Member

Brought to attention by issue #320

If working with VCF data (ex: TB tutorial), if vcftools is not installed, a file with one blank line is output from augur filter (first step of tutorial) as filtered.vcf.gz. augur mask then tries to process this. However, the error that's produced is:
UnboundLocalError: local variable 'chromName' referenced before assignment

Which gives absolutely no information as to what the original problem was (in the previous step!).

I added some checks for existing and empty files in mask.py but this actually doesn't solve this problem, as the file isn't 'empty' (size 0).

The problem is caused because the call is something like:
vcftools --remove-indv G22696 --gzvcf data/lee_2015.vcf.gz --recode --stdout | gzip -c > results/filtered.vcf.gz

When the vcftools part fails, the gzip part goes ahead and zips up nothing, creating the one-blank-line file.

Interestingly, this is not caught by utils.py function run_shell_command (which is what puts the call through). This function uses subprocess.check_call to check the call is successful - but apparently this does not catch /bin/sh: 1: vcftools: not found. (I could not find much info about this online; it proved hard to google.)

It seemed the simplest way to address this was just to check if vcftools was installed (if using VCF data) at the beginning of filter and give the user a useful error message about it. I did this with which from shutil.

It might be preferable to implement this somehow in run_shell_command in a more general way, so that any external program call is checked - but I was unsure whether we could reliably break off the first bit of the command sent to run_shell_command as the part to check.

@emmahodcroft
Copy link
Member Author

I've seen this chromName error with people before, and while I'm pretty sure it's usually called by vcftools not being installed, it's a terrible error message that makes no sense.
So I went ahead and also added something around this in mask.py so it checks and throws a less-obscures error message.

@trvrb
Copy link
Member

trvrb commented Jul 24, 2019

This looks good to me @emmahodcroft. I don't have a smarter way to check for vcftools than to what you do by checking if which("vcftools") is None. Robust error checking is a great thing to be pushing for.

@emmahodcroft
Copy link
Member Author

Happy to merge this, but wonder if there might be a way to implement this better by including some kind of check in run_shell_command - so that sensible errors are generated if any third-party-program we call isn't present.
@tsibley might be best positioned to advise? (I seem to remember you implemented run_shell_command initially, a very good idea!) Don't have to fix here, but could make an issue someone could address later, if you think this is possible?

@trvrb
Copy link
Member

trvrb commented Jul 24, 2019

@emmahodcroft --- Thomas is out through Aug 2. I don't know if this changes your plans here.

@emmahodcroft
Copy link
Member Author

emmahodcroft commented Jul 25, 2019

Ah! Ok, I'll make an issue for this and tag him, and we can leave it or close it as he thinks best :)
And in the meantime, can merge this, which solves at least one problem!

@emmahodcroft emmahodcroft merged commit 66c02b7 into master Jul 25, 2019
@jameshadfield jameshadfield deleted the maskErr branch July 25, 2019 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants