Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Wrong coordinates in results #114

Closed
1 of 3 tasks
fmalmeida opened this issue Feb 20, 2024 · 13 comments
Closed
1 of 3 tasks

[BUG] Wrong coordinates in results #114

fmalmeida opened this issue Feb 20, 2024 · 13 comments
Assignees
Labels

Comments

@fmalmeida
Copy link

fmalmeida commented Feb 20, 2024

Describe the bug
Hi,
First thanks for the nice work on this tool. I have been using this tool in a pipeline of mine and it has been working awesomely.

Recently I tried using it with some vibrio genomes, and it has been showing problems with the annotation of integrons that happen in the very start of the sequences.

If many fails because it has sometimes generated results at the very first base and writing it as 0-index for example. And in some other, it has generated wrong negative start positions as below:

13      Integron_Finder integron        69515   74987   .       +       1       ID=integron_01;integron_type=complete
24      Integron_Finder integron        25      12675   .       +       1       ID=integron_01;integron_type=CALIN
25      Integron_Finder integron        19      9958    .       +       1       ID=integron_01;integron_type=CALIN
27      Integron_Finder integron        6936    9536    .       +       1       ID=integron_01;integron_type=complete
31      Integron_Finder integron        478     4564    .       +       1       ID=integron_01;integron_type=CALIN
32      Integron_Finder integron        66      4604    .       +       1       ID=integron_01;integron_type=CALIN
33      Integron_Finder integron        117     4047    .       +       1       ID=integron_01;integron_type=CALIN
37      Integron_Finder integron        -2      3108    .       +       1       ID=integron_01;integron_type=CALIN
38      Integron_Finder integron        2       2804    .       +       1       ID=integron_01;integron_type=CALIN
44      Integron_Finder integron        70      1709    .       +       1       ID=integron_01;integron_type=CALIN
46      Integron_Finder integron        -17     1603    .       +       1       ID=integron_01;integron_type=CALIN

I am thus, sharing the gbk files that were generated by integron_finder itself during analysis so that you can see the generated results, while at the same time having the contig sequence for reproducing it.

gbk_37_and_46.zip

To Reproduce

integron_finder --local-max --func-annot --pdf --gbk --cpu 4 vibrio31.fna

Expected behavior

The minimum allowed starting base should be 1, not 0 nor negative.

OS:

  • Linux
  • Windows
  • Mac

Integron_Finder Version:

version 2.0.1

@jeanrjc
Copy link
Contributor

jeanrjc commented Feb 20, 2024

Hello,

could you share vibrio31.fna ?

Thanks

@fmalmeida
Copy link
Author

Hello hello,
The two problematic contigs shared in the two genbank files (output of integron finder) in the zip file are not sufficient?
I am not sure I can share the whole genome ( I can ask if not sufficient ).
Cheers.

@fmalmeida
Copy link
Author

Here is the fna file of the genome, containing the two contigs ( 37 and 46 ).
vibrio31_subset.fna.gz

@jeanrjc
Copy link
Contributor

jeanrjc commented Feb 20, 2024

Ah ok, I found the bug, it's because there is a hit on the very first position but the attC model is truncated. And when a model is truncated, we corrected the position, such that the real start of the attC site starts a bit before.

The bug is around L95 in infernal.py I think.

I don't have much time to fix that now, feel free to propose a PR if you can. Otherwise, me or @bneron might try to fix that when we can.

Best

@bneron
Copy link
Contributor

bneron commented Feb 21, 2024

I'm going to work on it

@bneron
Copy link
Contributor

bneron commented Feb 21, 2024

If I understand the problem, the position should be 0 in this case, isn't it?

@fmalmeida
Copy link
Author

Actually, I believe should be 1.

I believe genbank and gff files are 1-index based.

@jeanrjc
Copy link
Contributor

jeanrjc commented Feb 21, 2024

yes, and we should also check for the same case where the attC model is truncated at the end of a contig (not only at the start as in this issue).

bneron added a commit that referenced this issue Feb 22, 2024
when attc site start at the first position and the model is truncated
cordinates are wrong (negative)

see #114
@bneron
Copy link
Contributor

bneron commented Feb 22, 2024

@jeanrjc could you check the fix I just made

df.loc[idx, "pos_beg"] = df.loc[idx].apply(lambda x: max(x["pos_end_tmp"] - (len_model_attc - x["cm_fin"]),

bneron added a commit that referenced this issue Feb 22, 2024
test when attc site model is found found on first or last position and
model is trucated
check that seq beg and end are correct
see #114
@jeanrjc
Copy link
Contributor

jeanrjc commented Feb 22, 2024

@jeanrjc could you check the fix I just made

df.loc[idx, "pos_beg"] = df.loc[idx].apply(lambda x: max(x["pos_end_tmp"] - (len_model_attc - x["cm_fin"]),

It works for me ! Thanks

@jeanrjc
Copy link
Contributor

jeanrjc commented Apr 25, 2024

is this merged @bneron ?

@Ales-ibt
Copy link

Ales-ibt commented May 7, 2024

Hello, I am getting the same error with IntegronFinder v2.0.2. I can see the bug was fixed but It is not yet in the current release. Could you guys please add this fix to main?

@bneron
Copy link
Contributor

bneron commented Jun 6, 2024

fixed in integron_finder 2.0.5 version

@bneron bneron closed this as completed Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants