Question over code #35

RenzoTale88 · 2023-05-11T16:03:51Z

Hi, I was having a look at the code in script nanomonsv/count_sread_by_alignment.py. At lines 320-323 there is the following code:

        supporting_reads = [rname for rname in supporting_reads if \
            (alignment_info_var_1[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres and \
            alignment_info_var_1[rname][0] >= alignment_info_ref_2[rname][0] + self.var_ref_margin_thres) or \
            (alignment_info_var_2[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres and \
             alignment_info_var_2[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres)]

Lines 322 and 323 seems to perform the same check twice. Is this a mistake, and code should be:

        supporting_reads = [rname for rname in supporting_reads if \
            (alignment_info_var_1[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres and \
            alignment_info_var_1[rname][0] >= alignment_info_ref_2[rname][0] + self.var_ref_margin_thres) or \
            (alignment_info_var_2[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres and \
             alignment_info_var_2[rname][0] >= alignment_info_ref_2[rname][0] + self.var_ref_margin_thres)]

Otherwise, perhaps line 323 should be dropped.

Thanks
Andrea

The text was updated successfully, but these errors were encountered:

friend1ws · 2023-05-15T07:15:42Z

First, I appreciate your interest in Nanomonsv and your detailed examination of the code. I agree that we should modify the code at line 323. After implementing this adjustment, I've noticed a minor alteration in the results.

Simultaneously, I'm considering whether a more comprehensive reassessment of the entire logic is necessary. Please wait a moment for my next response. Anyway, thank you very much!

RenzoTale88 · 2023-05-24T12:38:51Z

@friend1ws thanks for confirming this. Would it be possible to have an explanation of what these thresholds filter for?
thanks in advance for any help :)

friend1ws · 2023-06-29T01:57:55Z

Hi, sorry for the very late response.
Now, we thoroughly investigated this point for version 0.7.0.

During the validation step, we extract segments (200bp each) from around the breakpoints of Structural Variations (SVs) to check if they exist in the sequence data for both the tumor and normal samples. However, there are some SVs with insertions. Therefore, we generate two variant segments using either the point before or after the insertion as the starting point, and perform alignment. This information is contained in:
alignment_info_var_1:
alignment_info_var_2:

At the same time, we also extract segments of approximately 200bp from the reference sequence around the breakpoint (reference segment), and check if these reference segments exist in the sequence data of both the tumor and normal samples. This information is stored in:
alignment_info_ref_1:
alignment_info_ref_2:

For example, the following condition:
alignment_info_var_1[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres
implies "variant segment 1 has aligned with the read with "rname" ID, scoring higher than reference segment 1 with a sufficient margin (self.var_ref_margin_thres)".

In strictly implementing the aforementioned conditions in this procedure, I believe the approach you pointed out was correct.

However, we discovered a phenomenon where this condition could lead to a higher matching rate in unrelated, distant regions around the breakpoint, particularly when the breakpoint involves repeats. This would, in turn, result in a lower sensitivity than usual. On the other hand, for smaller deletions and insertions, it often proved necessary to apply this condition.

Thus, as a final decision, in version 0.7.0, we decided to apply this condition for small insertions and deletions (<= 300bp) and to completely remove it for other cases. We may further alter these conditions in the future, but with this method, I believe we have achieved a slight increase in sensitivity compared to before.

Anyway, thank you very much again!

friend1ws closed this as completed Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question over code #35

Question over code #35

RenzoTale88 commented May 11, 2023

friend1ws commented May 15, 2023

RenzoTale88 commented May 24, 2023

friend1ws commented Jun 29, 2023

Question over code #35

Question over code #35

Comments

RenzoTale88 commented May 11, 2023

friend1ws commented May 15, 2023

RenzoTale88 commented May 24, 2023

friend1ws commented Jun 29, 2023