-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
too many open files #118
Comments
Hi Richard, The large number (509) indicates that many paired reads are extremely far apart in the file (or there are no mates at all, e.g. all reads have the same direction or not properly named). |
Hi, Since this is a cancer sample, there are lots of regions with very high coverage. The average coverage is just over 100, but there are lots of regions higher than 1000X. In total there are over 3 billion reads. The alignment rate is above 95% and over 98% of the aligned reads are mapped in proper pairs with a mean insert size of 415bp. By our standards, these are pretty good stats for a human genome library. Any ideas? |
Try to increase |
When we ran Picard we capped the RAM usage at 25Gigs. It used all of that and likely would have used a lot more had we allowed. Using the sambamba pipe like I showed above I've seen each process use 50Gigs. If we move this to our production pipe, we'll need to make it so we can merge and mark dups 60Gigs total. Do you think that is possible? |
Even 30Gigs total should be possible. I've fixed a few leaks recently (#116), so top consumption of the latest binary build should be significantly lower. |
thanks, I'll give it a whirl and let you know what I find. |
Hi again, Looks like I got farther this time, but ended up with a different error: "time ./sambamba_02_02_2015 merge /dev/stdout P*bam | ./sambamba_02_02_2015 markdup --overflow-list-size 1000000 --hash-table-size 1000000 --tmpdir sambamba_testing /dev/stdin sambamba_marked.bam |
Ouch. The streaming input is not supported by this tool, it makes a list of file offsets and then reads the file again. Sorry for wasting 15h of computational time. I'm closing this issue and opening another one, regarding the documentation. |
Hi.
I've been trying to merge and dup mark a large data set. Once merged, the coverage will be about 120 X.
Each time I try, I have been getting the following error (or something similar):
"sambamba-markdup: sambamba_testing/sambamba-pid23155-nwfz/sorted.509.bam.vo: Too many open files"
I noticed in the help that I could reduce the number of open files by specifying a larger value for "--overflow-list-size". However, I still get the same error.
Here is the command I've been using - can you point out anything I can change to get past the error?
"sambamba_v0.5.1 merge /dev/stdout P*bam | sambamba_v0.5.1 markdup --overflow-list-size 1000000 --tmpdir sambamba_testing /dev/stdin sambamba_marked.bam
finding positions of the duplicate reads in the file...
sambamba-markdup: sambamba_testing/sambamba-pid23155-nwfz/sorted.509.bam.vo: Too many open files
"
thanks,
Richard
The text was updated successfully, but these errors were encountered: