-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error GFF file #24
Comments
I have removed the last line with the word "(continue)" from F1021_gff_file.txt and then
Then AMRFinder worked:
For the format of the GFF file see https://github.com/ncbi/amr/wiki/Running-AMRFinderPlus#input-file-formats. How was this GFF file created? |
Hi Irene, We've heard of this issue before with regards to Prokka appending the assembly to the GFF. So I'm guessing you're using Prokka. As you discerned, you have to chop it off before passing the GFF to amrfinder. If I've guessed right about Prokka or the annotation output you're dealing with is in the same format, here's a couple of perl one liners that have worked for Prokka output before. This should get you a GFF file that will work for AMRFinderPlus:
If you don't have the nucleotide FASTA you can use this to get it from the Prokka GFF
Then you can run AMRFinderPlus in full combined mode like:
Thanks for posting the issue, and please let us know if the above one-liners work for you. If they do I'll add them to the documentation so other people won't have the same issue. There's an example GFF file distributed with the software if that will help (https://github.com/ncbi/amr/blob/master/test_prot.gff), I will see if I can improve the documentation of the GFF file format. Arjun |
Hi Arjun, Yes, you were right, the genome was annotated with PROKKA. The first perl command worked for me and so AMRFinderPlus in full combined mode. But the second perl command just created an empty .fna file. I encourage you to improve the documentation of the GFF file format with that to help other users. But now, I want to identify known and unknown mutations. Does AMRFinder find both or just know mutations? The report generated with --mutation_all shows many mutations that are not shown in the output file. I don't know how it works as CmeR mutations are not shown in the output file even in the point mutation report for the specific organism Coverage of reference sequence is 100 % and Identity to reference sequence is 99,05 %, could they be unknown mutations? Thanks for you help and for keeping this tool updated! Irene |
Hi Irene, I'm not sure why the second perl script didn't work, but possibly it's because you stripped out the assembly before passing the GFF file to the one liner. AMRFinderPlus does not report mutations that are not in its database. Because it is specifically designed to probe for sets of curated genes and curated known resistance mutations, it will only probe for genes and mutations in the PD Reference Gene Database. The --mutation-all option is designed to differentiate between identifying a known resistance associated mutation, identifying the database variant, or identifying an alternate residue at that site; and not finding the site at all. It does not call mutations at other sites in the gene or at other genes in the genome. I will discuss with the team about possibly adding an option to identify all differences from the reference protein, but until now AMRFinderPlus has been very focused on only identifying resistance-associated elements known in wild bacteria. At this point we are not including laboratory induced mutations in the database. If there is a published account of a resistance-associated mutation that we do not include, we could have missed it, please let us know if that's the case. You can see what genes/mutations are probed by AMRFinderPlus by looking in the PD Reference Gene Catalog at https://www.ncbi.nlm.nih.gov/pathogens/isolates#/refgene/ In addition to what's in the Reference Gene Catalog, the AMRFinderPlus database includes HMMs and a tree structure for the genes, but those aren't relevant for point mutation identification. We only have one mutation in cmeR for Campylobacter: cmeR_G86A, so if your assembly has that mutation and AMRFinderPlus is not detecting it, then we may have a bug. Other that AMRFinderPlus shouldn't be reporting novel sites. Thanks for your interest and let us know if you have more questions. |
Hi Arjun, Up to now, AMRFinder satisfies my desires and I will use it in combination with other tools to find unknown mutations. Thanks!! Irene |
Hi i had change my prokka_gff file by using this commonds - perl -pe '/^##FASTA/ && exit; s/(\W)Name=/$1OldName=/i; s/ID=([^;]+)/ID=$1;Name=$1/' <prokka_output.gff> > <for_amrfinder.gff> but still i am getting same error. what should i do? _###
_ 1**. > ### GFF file mismatch. HOSTNAME: ? Thank you! |
Could you attach the files?
|
Hi, i am attaching file here please have a look. |
Thank you! The goal of a .gff-file is to link the .faa- and .fna-files. I have done this:
Then this worked:
|
Hi, First of all thank you for your time, actually it work when i run individually by using above command but it shows some error when i am trying to run in loop on multiple file and each gff give different error. *** ERROR *** HOSTNAME: ?
*** ERROR *** HOSTNAME: ?
GFF file mismatch. Thank you! |
It should be
|
I am having trouble with GFF file. First, it says "line 1924: 9 fields are expected in each line". I think the problem is that this file contains contig sequences, so I deleted them and just keep annotation information. Not sure if this was the problem, could be?
But, then it says "Protein id xxxxxx is not in the .gff file"
F1021_gff_file.txt
(this is a fragment of the GFF file as this format is not supported in an attacfed file)
F1021_protein.txt
I read the information in "Known issues" but I still don't know how the .gff file should look like. Could you please tell me how it shoud be the 9 th field in the attached .gff example file?
Thanks!
The text was updated successfully, but these errors were encountered: