-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower evidence for fusions with Arriba #41
Comments
Hi Prateek, STAR has an issue with the alignment of split reads when the paired-end mates overlap. You should run STAR with In addition, you can run STAR with Lastly, you can increase the value of I should note that the columns Regards, |
Hi Sebastian, Thanks for your input here. I fired off several experiments based on your recommendations but I am running into an issue with STAR (2.7.3a) process not communicating with the system that it has finished. I have posted this issue on STAR repo as well. Will get back to you once that gets resolved. Prateek |
Seems like the issue I was running into was due to working with a pre-packaged binary instead of building from source. New experiments underway. |
Hi Sebastian, while I am observing that there is a clear boost to the signal of True positive fusions using --peOverlapNbasesMin 10 and --alignSplicedMateMapLminOverLmate 0.5 , there are a few fusion events that start losing evidence as well. Do you happen to know why that might be happening? |
fusions.discarded.tsv:SLC34A2 ROS1 +/+ -/- 4:25665939 6:117645578 CDS CDS translocation downstream downstream 0 0 1 18 63 low. . duplicates(1),min_support . . . . fusions.tsv:SLC34A2 ROS1 +/+ -/- 4:25665952 6:117645578 splice-site splice-site translocation downstream downstream 0 4 1 15 63 high . . duplicates(1) ACCCACTCTTCAGGACTC-GGGATCAAGTGGTCAG___AGAGAGACACCAAAGGGAAGATTCTCTGTTTCTTCCAAGGGATTGGGAGATTGATTTTACTTCTCGGATTTCTCTACTTTTTCGTGTGCTCCCTGGATATTCTTAGTAGCGCCTTCCAGCTGGTTGGAG|ATGATTTTTGGATACCAGAAACAAGTTTCATACTTACTATTATAGTTGGAATATTTCTGGTTGTTACAATCCCACTGACCTTTG___TCTGGCATAGAAGATTAAAGAATCAAAAAAGTGCCAAGGAAGGGGTGACAGTGCTTATAAACGAAGACAAAGAGTTGGCTGAGCTGCGAGGTCTGGC out-of-frame PTLQDsgssgqretpkgrfsvsskglgdfyfsdfstfscapwiflvapsswle|mifgyqkqvsylll When I turn those two flags on, I lose any evidence for this fusion |
STAR logs from the default run vs experiment. Seems there is a substantial difference of chimeric reads sample_pipeline_params_arriba_opt.txt |
Looking at your log files, the parameter I have tested both parameters independently on some of my own test data and noticed that both of them occasionally cause chimeric reads to be lost (although there is a big net gain). The reasons why these reads are lost are not fully clear to me. The lost chimeric alignments seem to have the following pattern: STAR still aligns the non-chimeric segment of the read. The alignment is a few bases longer, but still a substantial number of bases are clipped. However, STAR does not seem to attempt to search for a chimeric alignment using the clipped segment, even though it maps uniquely to the genome (according to Blat). If I find some time tomorrow, I can do some more tests. In case you have the time it would be helpful if you could send me a pair of reads that gets lost for ROS1 fusion. When you run Arriba with Thanks a lot for bringing this to my attention. Perhaps I will need to reconsider if/how to change the default parameters of STAR. |
I am running a master parameter sweep across the following STAR params to find the best set (including defaults) for a given sample with known fusions. Criteria I am evaluating on:
--chimSegmentMin This is probably going to take a couple of weeks before I summarize results |
Wow, this is very useful. Thanks a lot, very much appreciated! In order to save some computation, you might want to limit the repeated realignment to only those reads that are either unmapped or that have clipped bases. It probably makes no sense to realign reads that map perfectly to the reference genome, because they would probably be mapped to the same locus regardless of the chimeric parameters. Just a thought. |
Hi Sebastian, Thanks for the recommendation. Yes I have limited the analysis to the unmapped/clipped bases reads. Just wanted to let you know that our cluster is sooooper busy so this would have to wait for atleast 2-3 months. Here is the set of reads that gets lost: |
Hi Prateek, Thanks for following up with the sequences. Can you also send me the full STAR command? When I run the latest version of STAR the reads are aligned, regardless of whether I use the old or the optimized parameters. Moreover, can you confirm that the reads are indeed not aligned with the optimized parameters? I want to rule out that Arriba simply ignores them for whatever reason. So far I have only made moderate progress in investigating this issue. Here is what I have found out so far: Apparently, some reads are not aligned when the Regards, |
Yes thats what I have been seeing as well in the so-far completed parameter sweep for STAR. I am not moving forward with the My current STAR command is:
|
I have a separate sort step that sorts the BAMs and then I submit it to Arriba. This is due to the fact that we have 4 fastqs (2 R1 and 2 R2 from different lanes). 2 sets of STAR jobs are run followed by sort+merge and then submit to Arriba. |
I spot checked a few of the reads from the set that I had sent in the star output Unmapped.out.mate* files, I cannot find these reads in those. |
For completion, here is the set of values that I observe for unmapped reads with or without peOverlapNbasesMin:
|
Hi Prateek, I cannot reproduce the alignment behavior with the example reads you sent to me. Even when I use the STAR parameters you stated, the reads always align the same way. But after close inspection of the fusion predictions of two of my own samples (one with 50nt read length, the other with 100nt), I now have a better overview of the negative effects of the parameters When the above mentioned parameters are adjusted as suggested by me in the beginning, then instead of clipping a read at the breakpoint and making a chimeric alignment, STAR prefers to extend the alignment for a couple of bases (3-10). The extended alignment usually contains one or two mismatches and is finally also clipped (when too many mismatches accumulate), but the clipped segment is shorter. I am not sure, why this shorter clipped segment is not used for a chimeric alignment. To my impression it was often long enough to be mappable. There is no single parameter to blame for this, Probably, only the developer of STAR can explain this behavior and what to do about it. I'll collect some example reads and consult him. |
Hi Prateek, I reported the loss of some chimeric alignments caused by Regards, |
Hi Prateek, Arriba 2.0.0 has been released and the optimized alignment parameters ( Regards, |
Hi,
We have been observing consistently low evidence (split_reads) for fusions detected in cell lines/solid tumor samples as compared to when I use FusionCatcher or Pizzly. Is this expected or is there parameter tuning that can regulate this behavior?
Thanks,
Prateek
The text was updated successfully, but these errors were encountered: