-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero coverage for all samples #17
Comments
Hello. The "scaffold_stats" command has not changed between v0.0.21 and v0.0.23. Can you run the different dataset you have through v0.0.23 to confirm that it isn't a RefineM issue? Assuming this is fine it would suggest something unique about your current dataset which would help narrow down the issue. |
OK. I think this has something to do with my dataset. The headers of the BAM files and the contigs are "SPAdes"-style but "fixed" with sed and samtools to be compatible with Anvio. sed 's/./_/g' all_spades_contigs.fasta > all_spades_contigs_fixed.fasta I also confirmed that the bins have matching headers. |
I believe the problem is that my reads are single reads, not paired end. Example output: total reads: 25707472
|
Hello. That would do it. If you pass the "--all_reads" flag, CheckM will use singletons reads in the coverage estimate. |
OK that worked, but the error in refinem outliers did not disappear. [2018-02-07 23:33:01] INFO: RefineM v0.0.23 |
Hello. If you can send me your scaffold_stats.tsv file I can look into what is causing the issue. donovan[dot]parks[at]gmail.com |
I've sent you a Google drive link since its 91MB compressed. |
Hello. The issue is that one of your contigs has zero coverage across all your samples (namely, c_000000311451). This probably speaks to an underlying issue with your assembly and/or read mapping files as lots of the contigs have extremely low coverage. I will add a warning in the next release of RefineM so the program continues to run and reports such problematic contigs, but it is probably best to try and determine why the coverage is so low for some contigs. If you wish to move forward with the current data, you need to remove c_000000311451 from the scaffold_stats.tsv file. |
Hi, I'm just getting back to this problem. I'm still getting the error after exploring some things as described below. First, these bins were from 10 samples, 5 depths in a lake, total sample coassembly.
1,2,3 may not be relevant, but 4 is - the bottom line is that for calculating "bin outliers" which is what I want to do, I don't need coverage for these contigs because they aren't in any bins. |
Hi. Can I consider this issue closed? It appears the zero coverage contigs are real and not an issue of RefineM. If you wish to use RefineM, you will need to remove all zero coverage contigs from any files provided to the program. |
OK. Thanks! |
Reopening this issue, but also this is sort of a follow up to #11. After removing the zero-coverage contigs, remapping, and recalculating scaffold stats, as well as removing single contig bins (as suggested in #11), I was getting a new stats.py error: prob = _betai(0.5*df, 0.5, df/(df+t_squared)) After looking around in the code and trying to figure things out, I came across this scipy issue. Which seemed relevant (I'm using two BAM files, so I manually changed the stats.py code to the fix mentioned in the issue. I'm struggling to figure this one out, so I'm not sure what is going on here, but I'm also wondering if this pearson calculation is necessary for my data, as on the front page you state the only the "mean absolute error criteria is used"? |
Can you confirm you are using the latest version of RefineM? If you can send me a simple example that produces this problem I can take a look. Ideally, a single genome and the exact RefineM commands you ran that result in the issue. |
Hi, I have encountered the same issue. But after I filtered out the contigs whose Genome Id is "unbinned" or "bin.unbinned", this issue was gone. |
I am having an issue with "refinem outliers" where I'm getting an error: "invalid value encountered in double_scalars." This originally happened in v0.0.22 but after looking at the github issues pages, I updated to v0.0.23 and still got this issue.
I examined my scaffold_stats output, and found that the coverage values in coverage.tsv were all zero (in both v0.0.22 and v0.0.23). I looked at some output on a different dataset run on v0.0.21 and did not see this.
It is possible this is an issue with how I am calling "refinem scaffold_stats", but I'm not getting a warning that it isn't reading my files correctly (refinem scaffold_stats appears to run without error).
The text was updated successfully, but these errors were encountered: