How to understand the optimisation goals of smoothxg #204

YX-Xiang · 2024-01-19T07:28:47Z

I hope this message finds you well. I am currently exploring smoothxg algorithm. I came across a statement in the paper that raised some questions for me:

"A key issue is that pairwise alignments derived across our input are not mutually normalized, leading to different representations of small variants like indels in low-complexity sequences, which in turn generate complex looping motifs that are difficult to process."

I am particularly interested in understanding what is meant by "complex looping motifs" in this context. Could you provide a simple example or elaborate on the nature of these motifs? I am eager to gain a deeper insight into this aspect of the algorithm.

Thank you for your time and assistance. I appreciate the work you have put into smoothxg, and I look forward to hearing from you.

ekg · 2024-01-19T10:03:14Z

You can see directly by saving the output of pggb after each step and comparing. Look at the seqwish graph vs the final graph. There will be motifs in short tandem repeats which become extremely dense and complex. For instance, a single C might represent an entire 20bp homopolymer with many diverse alleles. This isn't necessarily wrong, but it's a representation that can be hard to reason about and work with. It's hard to visualize and doesn't match MSA models that tend to be understood.

Smoothxg's optimization goal is to ensure that the graph has a local partial order, where local is defined as the length parameter given to -G. This defaults to ~1kbp, but you can increase or decrease it with the caveat that very large values are computationally prohibitive because the partial order alignment algorithms we use are quadratic in sequence and graph length.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to understand the optimisation goals of smoothxg #204

How to understand the optimisation goals of smoothxg #204

YX-Xiang commented Jan 19, 2024

ekg commented Jan 19, 2024

How to understand the optimisation goals of smoothxg #204

How to understand the optimisation goals of smoothxg #204

Comments

YX-Xiang commented Jan 19, 2024

ekg commented Jan 19, 2024