How are secondary assignments chosen? #1285
Replies: 3 comments
-
Hi Matthew, by default, there is no specific algorithmic logic to chose the primary alignment.
If you want to make it truly random, you would need to use Cheers |
Beta Was this translation helpful? Give feedback.
-
Is the alignment detection by STAR done in a sequential manner, i.e. trying to match starting right from the start of chr1 and searching through until it finds it? Or is it more random than that. Take for instance the primary alignment referenced above:
Say the exact same sequence existed on chr1 (or any of the other chromosomes before chr21), would this be found first? |
Beta Was this translation helpful? Give feedback.
-
Hi Matthew, it's "more random" - it actually depends on the order in which anchor seeds are found, which in itself depends on the ordering of genome suffixes in the suffix array. If you are interested in all the top alignments, you cannot rely on STAR's ordering and you need to reorder them according to your needs. Cheers |
Beta Was this translation helpful? Give feedback.
-
I am currently dealing with a lot of multimappers, and I would just like to better understand how the primary alignment is chosen over others, when the score is identical, as is (seemingly) everything else about the mapping.
I will give an example. This is the alignment assigned as "primary":
D00364:490:HKLHCBCX3:1:1108:18109:34469 16 chr21 8214838 0 7S105M * 0 0 GGGATCGGGGATCGAGACCCGTCGCCGCGCTCTCCCCCCTCCCGGCGCCCACCCCCGCGGGGAATCCCCCGCGAGGGGGGTCTCCCCGGCGGGGGCGCGCCGGCGTCTCCTC HHIHHIIIIHHIIHHHHHDEHEIIHIIHGHEDDHHHFD</EDDHIHGDIHIHHEHHDHHFGFHHCIIHHDGIIIIIHHEG@?E<HHHIIIIIIIIHIHDDDEGIHEEGF@F< NH:i:22 HI:i:1 AS:i:99 nM:i:2 NH:i:0 HI:i:0 AS:i:99 nM:i:2 uT:A:3
(Note, this is a remap from a previous mapping, hence the duplicate tags)
There are lots of other alignments assigned as "secondary", but in particular I was interested in these six, which seem to have not only the same score, but also all the other tags are identical:
D00364:490:HKLHCBCX3:1:1108:18109:34469 16 chr21 8214838 0 7S105M * 0 0 GGGATCGGGGATCGAGACCCGTCGCCGCGCTCTCCCCCCTCCCGGCGCCCACCCCCGCGGGGAATCCCCCGCGAGGGGGGTCTCCCCGGCGGGGGCGCGCCGGCGTCTCCTC HHIHHIIIIHHIIHHHHHDEHEIIHIIHGHEDDHHHFD</EDDHIHGDIHIHHEHHDHHFGFHHCIIHHDGIIIIIHHEG@?E<HHHIIIIIIIIHIHDDDEGIHEEGF@F< NH:i:22 HI:i:1 AS:i:99 nM:i:2 NH:i:0 HI:i:0 AS:i:99 nM:i:2 uT:A:3
D00364:490:HKLHCBCX3:1:1108:18109:34469 272 chr21 8259067 0 7S105M * 0 0 GGGATCGGGGATCGAGACCCGTCGCCGCGCTCTCCCCCCTCCCGGCGCCCACCCCCGCGGGGAATCCCCCGCGAGGGGGGTCTCCCCGGCGGGGGCGCGCCGGCGTCTCCTC HHIHHIIIIHHIIHHHHHDEHEIIHIIHGHEDDHHHFD</EDDHIHGDIHIHHEHHDHHFGFHHCIIHHDGIIIIIHHEG@?E<HHHIIIIIIIIHIHDDDEGIHEEGF@F< NH:i:22 HI:i:3 AS:i:99 nM:i:2 NH:i:0 HI:i:0 AS:i:99 nM:i:2 uT:A:3
D00364:490:HKLHCBCX3:1:1108:18109:34469 272 chr21 8442102 0 7S105M * 0 0 GGGATCGGGGATCGAGACCCGTCGCCGCGCTCTCCCCCCTCCCGGCGCCCACCCCCGCGGGGAATCCCCCGCGAGGGGGGTCTCCCCGGCGGGGGCGCGCCGGCGTCTCCTC HHIHHIIIIHHIIHHHHHDEHEIIHIIHGHEDDHHHFD</EDDHIHGDIHIHHEHHDHHFGFHHCIIHHDGIIIIIHHEG@?E<HHHIIIIIIIIHIHDDDEGIHEEGF@F< NH:i:22 HI:i:2 AS:i:99 nM:i:2 NH:i:0 HI:i:0 AS:i:99 nM:i:2 uT:A:3
D00364:490:HKLHCBCX3:1:1108:18109:34469 272 chr22_KI270733v1_random 131160 0 7S105M * 0 0 GGGATCGGGGATCGAGACCCGTCGCCGCGCTCTCCCCCCTCCCGGCGCCCACCCCCGCGGGGAATCCCCCGCGAGGGGGGTCTCCCCGGCGGGGGCGCGCCGGCGTCTCCTC HHIHHIIIIHHIIHHHHHDEHEIIHIIHGHEDDHHHFD</EDDHIHGDIHIHHEHHDHHFGFHHCIIHHDGIIIIIHHEG@?E<HHHIIIIIIIIHIHDDDEGIHEEGF@F< NH:i:22 HI:i:13 AS:i:99 nM:i:2 NH:i:0 HI:i:0 AS:i:99 nM:i:2 uT:A:3
D00364:490:HKLHCBCX3:1:1108:18109:34469 272 chr22_KI270733v1_random 176239 0 7S105M * 0 0 GGGATCGGGGATCGAGACCCGTCGCCGCGCTCTCCCCCCTCCCGGCGCCCACCCCCGCGGGGAATCCCCCGCGAGGGGGGTCTCCCCGGCGGGGGCGCGCCGGCGTCTCCTC HHIHHIIIIHHIIHHHHHDEHEIIHIIHGHEDDHHHFD</EDDHIHGDIHIHHEHHDHHFGFHHCIIHHDGIIIIIHHEG@?E<HHHIIIIIIIIHIHDDDEGIHEEGF@F< NH:i:22 HI:i:14 AS:i:99 nM:i:2 NH:i:0 HI:i:0 AS:i:99 nM:i:2 uT:A:3
D00364:490:HKLHCBCX3:1:1108:18109:34469 272 chrUn_GL000220v1 114304 0 7S105M * 0 0 GGGATCGGGGATCGAGACCCGTCGCCGCGCTCTCCCCCCTCCCGGCGCCCACCCCCGCGGGGAATCCCCCGCGAGGGGGGTCTCCCCGGCGGGGGCGCGCCGGCGTCTCCTC HHIHHIIIIHHIIHHHHHDEHEIIHIIHGHEDDHHHFD</EDDHIHGDIHIHHEHHDHHFGFHHCIIHHDGIIIIIHHEG@?E<HHHIIIIIIIIHIHDDDEGIHEEGF@F< NH:i:22 HI:i:18 AS:i:99 nM:i:2 NH:i:0 HI:i:0 AS:i:99 nM:i:2 uT:A:3
D00364:490:HKLHCBCX3:1:1108:18109:34469 272 chrUn_GL000220v1 158276 0 7S105M * 0 0 GGGATCGGGGATCGAGACCCGTCGCCGCGCTCTCCCCCCTCCCGGCGCCCACCCCCGCGGGGAATCCCCCGCGAGGGGGGTCTCCCCGGCGGGGGCGCGCCGGCGTCTCCTC HHIHHIIIIHHIIHHHHHDEHEIIHIIHGHEDDHHHFD</EDDHIHGDIHIHHEHHDHHFGFHHCIIHHDGIIIIIHHEG@?E<HHHIIIIIIIIHIHDDDEGIHEEGF@F< NH:i:22 HI:i:19 AS:i:99 nM:i:2 NH:i:0 HI:i:0 AS:i:99 nM:i:2 uT:A:3
How does STAR decide one is primary, and the other aren't? Is it random, or is there some logic to it? I thought it might be to do with the indexing, i.e. whichever comes first in the index is marked as primary, but in other cases involving the same loci, the primary alignment isn't always the same. If there is no reason for one being primary over the others I know I can use the
--outSAMprimaryFlag AllBestScore
option to keep them all, I just wondered whether there was some reason I am missing?Thanks in advance for the advice!
Matthew
Beta Was this translation helpful? Give feedback.
All reactions