Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question over code #35

Closed
RenzoTale88 opened this issue May 11, 2023 · 3 comments
Closed

Question over code #35

RenzoTale88 opened this issue May 11, 2023 · 3 comments

Comments

@RenzoTale88
Copy link

Hi, I was having a look at the code in script nanomonsv/count_sread_by_alignment.py. At lines 320-323 there is the following code:

        supporting_reads = [rname for rname in supporting_reads if \
            (alignment_info_var_1[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres and \
            alignment_info_var_1[rname][0] >= alignment_info_ref_2[rname][0] + self.var_ref_margin_thres) or \
            (alignment_info_var_2[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres and \
             alignment_info_var_2[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres)]

Lines 322 and 323 seems to perform the same check twice. Is this a mistake, and code should be:

        supporting_reads = [rname for rname in supporting_reads if \
            (alignment_info_var_1[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres and \
            alignment_info_var_1[rname][0] >= alignment_info_ref_2[rname][0] + self.var_ref_margin_thres) or \
            (alignment_info_var_2[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres and \
             alignment_info_var_2[rname][0] >= alignment_info_ref_2[rname][0] + self.var_ref_margin_thres)]

Otherwise, perhaps line 323 should be dropped.

Thanks
Andrea

@friend1ws
Copy link
Owner

First, I appreciate your interest in Nanomonsv and your detailed examination of the code. I agree that we should modify the code at line 323. After implementing this adjustment, I've noticed a minor alteration in the results.

Simultaneously, I'm considering whether a more comprehensive reassessment of the entire logic is necessary. Please wait a moment for my next response. Anyway, thank you very much!

@RenzoTale88
Copy link
Author

@friend1ws thanks for confirming this. Would it be possible to have an explanation of what these thresholds filter for?
thanks in advance for any help :)

@friend1ws
Copy link
Owner

Hi, sorry for the very late response.
Now, we thoroughly investigated this point for version 0.7.0.

During the validation step, we extract segments (200bp each) from around the breakpoints of Structural Variations (SVs) to check if they exist in the sequence data for both the tumor and normal samples. However, there are some SVs with insertions. Therefore, we generate two variant segments using either the point before or after the insertion as the starting point, and perform alignment. This information is contained in:
alignment_info_var_1:
alignment_info_var_2:

At the same time, we also extract segments of approximately 200bp from the reference sequence around the breakpoint (reference segment), and check if these reference segments exist in the sequence data of both the tumor and normal samples. This information is stored in:
alignment_info_ref_1:
alignment_info_ref_2:

For example, the following condition:
alignment_info_var_1[rname][0] >= alignment_info_ref_1[rname][0] + self.var_ref_margin_thres
implies "variant segment 1 has aligned with the read with "rname" ID, scoring higher than reference segment 1 with a sufficient margin (self.var_ref_margin_thres)".

In strictly implementing the aforementioned conditions in this procedure, I believe the approach you pointed out was correct.

However, we discovered a phenomenon where this condition could lead to a higher matching rate in unrelated, distant regions around the breakpoint, particularly when the breakpoint involves repeats. This would, in turn, result in a lower sensitivity than usual. On the other hand, for smaller deletions and insertions, it often proved necessary to apply this condition.

Thus, as a final decision, in version 0.7.0, we decided to apply this condition for small insertions and deletions (<= 300bp) and to completely remove it for other cases. We may further alter these conditions in the future, but with this method, I believe we have achieved a slight increase in sensitivity compared to before.

Anyway, thank you very much again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants