You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hope this message finds you well. I am currently exploring smoothxg algorithm. I came across a statement in the paper that raised some questions for me:
"A key issue is that pairwise alignments derived across our input are not mutually normalized, leading to different representations of small variants like indels in low-complexity sequences, which in turn generate complex looping motifs that are difficult to process."
I am particularly interested in understanding what is meant by "complex looping motifs" in this context. Could you provide a simple example or elaborate on the nature of these motifs? I am eager to gain a deeper insight into this aspect of the algorithm.
Thank you for your time and assistance. I appreciate the work you have put into smoothxg, and I look forward to hearing from you.
The text was updated successfully, but these errors were encountered:
You can see directly by saving the output of pggb after each step and comparing. Look at the seqwish graph vs the final graph. There will be motifs in short tandem repeats which become extremely dense and complex. For instance, a single C might represent an entire 20bp homopolymer with many diverse alleles. This isn't necessarily wrong, but it's a representation that can be hard to reason about and work with. It's hard to visualize and doesn't match MSA models that tend to be understood.
Smoothxg's optimization goal is to ensure that the graph has a local partial order, where local is defined as the length parameter given to -G. This defaults to ~1kbp, but you can increase or decrease it with the caveat that very large values are computationally prohibitive because the partial order alignment algorithms we use are quadratic in sequence and graph length.
I hope this message finds you well. I am currently exploring smoothxg algorithm. I came across a statement in the paper that raised some questions for me:
"A key issue is that pairwise alignments derived across our input are not mutually normalized, leading to different representations of small variants like indels in low-complexity sequences, which in turn generate complex looping motifs that are difficult to process."
I am particularly interested in understanding what is meant by "complex looping motifs" in this context. Could you provide a simple example or elaborate on the nature of these motifs? I am eager to gain a deeper insight into this aspect of the algorithm.
Thank you for your time and assistance. I appreciate the work you have put into smoothxg, and I look forward to hearing from you.
The text was updated successfully, but these errors were encountered: