Error GFF file #24

ireneortega · 2020-04-17T19:32:59Z

I am having trouble with GFF file. First, it says "line 1924: 9 fields are expected in each line". I think the problem is that this file contains contig sequences, so I deleted them and just keep annotation information. Not sure if this was the problem, could be?

But, then it says "Protein id xxxxxx is not in the .gff file"
F1021_gff_file.txt
(this is a fragment of the GFF file as this format is not supported in an attacfed file)

F1021_protein.txt

I read the information in "Known issues" but I still don't know how the .gff file should look like. Could you please tell me how it shoud be the 9 th field in the attached .gff example file?

Thanks!

vbrover · 2020-04-17T21:20:04Z

I have removed the last line with the word "(continue)" from F1021_gff_file.txt and then

cat F1021_gff_file.txt | sed 's/;locus_tag=/;Name=/1' > aa.gff

Then AMRFinder worked:

amrfinder -p F1021_protein.txt -g aa.gff

For the format of the GFF file see https://github.com/ncbi/amr/wiki/Running-AMRFinderPlus#input-file-formats.

How was this GFF file created?

evolarjun · 2020-04-18T02:02:06Z

Hi Irene,

We've heard of this issue before with regards to Prokka appending the assembly to the GFF. So I'm guessing you're using Prokka. As you discerned, you have to chop it off before passing the GFF to amrfinder. If I've guessed right about Prokka or the annotation output you're dealing with is in the same format, here's a couple of perl one liners that have worked for Prokka output before.

This should get you a GFF file that will work for AMRFinderPlus:

perl -pe '/^##FASTA/ && exit; s/(\W)Name=/$1OldName=/i; s/ID=([^;]+)/ID=$1;Name=$1/' <prokka_output.gff>  > <for_amrfinder.gff>

If you don't have the nucleotide FASTA you can use this to get it from the Prokka GFF

perl -ne 'print if ($p); /^##FASTA/ && $p++' <prokka_output.gff> > <for_amrfinder.fna>

Then you can run AMRFinderPlus in full combined mode like:

amrfinder -p <protein.fa> -n <for_amrfinder.fna> -g <for_amrfinder.gff> > amrfinder_output.tsv

Thanks for posting the issue, and please let us know if the above one-liners work for you.

If they do I'll add them to the documentation so other people won't have the same issue. There's an example GFF file distributed with the software if that will help (https://github.com/ncbi/amr/blob/master/test_prot.gff), I will see if I can improve the documentation of the GFF file format.

Arjun

ireneortega · 2020-04-18T10:50:37Z

Hi Arjun,

Yes, you were right, the genome was annotated with PROKKA. The first perl command worked for me and so AMRFinderPlus in full combined mode. But the second perl command just created an empty .fna file. I encourage you to improve the documentation of the GFF file format with that to help other users.

But now, I want to identify known and unknown mutations. Does AMRFinder find both or just know mutations? The report generated with --mutation_all shows many mutations that are not shown in the output file. I don't know how it works as CmeR mutations are not shown in the output file even in the point mutation report for the specific organism Coverage of reference sequence is 100 % and Identity to reference sequence is 99,05 %, could they be unknown mutations?

Thanks for you help and for keeping this tool updated!

Irene

evolarjun · 2020-04-19T11:27:36Z

Hi Irene,

I'm not sure why the second perl script didn't work, but possibly it's because you stripped out the assembly before passing the GFF file to the one liner.

AMRFinderPlus does not report mutations that are not in its database. Because it is specifically designed to probe for sets of curated genes and curated known resistance mutations, it will only probe for genes and mutations in the PD Reference Gene Database. The --mutation-all option is designed to differentiate between identifying a known resistance associated mutation, identifying the database variant, or identifying an alternate residue at that site; and not finding the site at all. It does not call mutations at other sites in the gene or at other genes in the genome.

I will discuss with the team about possibly adding an option to identify all differences from the reference protein, but until now AMRFinderPlus has been very focused on only identifying resistance-associated elements known in wild bacteria. At this point we are not including laboratory induced mutations in the database.

If there is a published account of a resistance-associated mutation that we do not include, we could have missed it, please let us know if that's the case.

You can see what genes/mutations are probed by AMRFinderPlus by looking in the PD Reference Gene Catalog at https://www.ncbi.nlm.nih.gov/pathogens/isolates#/refgene/

In addition to what's in the Reference Gene Catalog, the AMRFinderPlus database includes HMMs and a tree structure for the genes, but those aren't relevant for point mutation identification. We only have one mutation in cmeR for Campylobacter: cmeR_G86A, so if your assembly has that mutation and AMRFinderPlus is not detecting it, then we may have a bug. Other that AMRFinderPlus shouldn't be reporting novel sites.

Thanks for your interest and let us know if you have more questions.
Arjun

ireneortega · 2020-04-21T10:29:42Z

Hi Arjun,

Up to now, AMRFinder satisfies my desires and I will use it in combination with other tools to find unknown mutations. Thanks!!

Irene

neelam19051 · 2022-06-24T09:24:19Z

Hi i had change my prokka_gff file by using this commonds - perl -pe '/^##FASTA/ && exit; s/(\W)Name=/$1OldName=/i; s/ID=([^;]+)/ID=$1;Name=$1/' <prokka_output.gff> > <for_amrfinder.gff> but still i am getting same error. what should i do?

_###

amrfinder -p P_aeruginosa_ZPPH33.faa -g P_aeruginosa_ZPPH33_a.gff -n P_aeruginosa_ZPPH33.fna -O Pseudomonas_aeruginosa --plus

_
Running: amrfinder -p P_aeruginosa_ZPPH33.faa -g P_aeruginosa_ZPPH33_a.gff -n P_aeruginosa_ZPPH33.fna -O Pseudomonas_aeruginosa --plus
Software directory: '/home/bvs/anaconda3/envs/myenv/bin/'
Software version: 3.10.30
Database directory: '/home/bvs/anaconda3/envs/myenv/share/amrfinderplus/data/2022-05-26.1'
Database version: 2022-05-26.1
AMRFinder combined translated and protein and mutation search

1**. > ### GFF file mismatch.
2. > *** ERROR ***
3. > gff_check.cpp: Protein id "JMCBFLMO_00001_Chromosomal_replication_initiator_protein_DnaA" is not in the .gff-file**
4.

HOSTNAME: ?
SHELL: /bin/bash
PWD: /home/bvs/neelam/annotated/annotated/amrfinder_all
PATH: /home/bvs/anaconda3/envs/myenv/bin:/home/bvs/.local/bin:/home/bvs/bin:/home/bvs/anaconda3/condabin:/home/bvs/perl5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin
Progam name: gff_check
Command line: /home/bvs/anaconda3/envs/myenv/bin/gff_check P_aeruginosa_ZPPH33_a.gff -prot P_aeruginosa_ZPPH33.faa -dna P_aeruginosa_ZPPH33.fna -log /tmp/amrfinder.UxeqP3.log

Thank you!

vbrover · 2022-06-24T15:28:54Z

Could you attach the files?

P_aeruginosa_ZPPH33.faa 
P_aeruginosa_ZPPH33_a.gff
P_aeruginosa_ZPPH33.fna

neelam19051 · 2022-06-27T04:07:32Z

Hi, i am attaching file here please have a look.

P_aeruginosa_ZPPH33.zip

vbrover · 2022-06-27T14:41:14Z

Thank you!

The goal of a .gff-file is to link the .faa- and .fna-files.
The software creating the .gff-files must use the sequence identifiers from the .faa- and .fna-files.

I have done this:

sed 's/^>\([^_]\+_[^_]\+\)_/>\1 /1'  P_aeruginosa_ZPPH33.faa > aa

Then this worked:

amrfinder  -p aa  -g P_aeruginosa_ZPPH33.gff  -n P_aeruginosa_ZPPH33.fna

neelam19051 · 2022-07-02T11:20:21Z

Hi, First of all thank you for your time, actually it work when i run individually by using above command but it shows some error when i am trying to run in loop on multiple file and each gff give different error.

*** ERROR ***
Protein sequence looks like a nucleotide sequence

HOSTNAME: ?
SHELL: /bin/bash
PWD: /home/bvs/neelam/annotated/annotated/amrfinder_all
PATH: /home/bvs/anaconda3/envs/myenv/bin:/home/bvs/.local/bin:/home/bvs/bin:/home/bvs/anaconda3/condabin:/home/bvs/perl5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin
Progam name: fasta_check
Command line: /home/bvs/anaconda3/envs/myenv/bin/fasta_check P_aeruginosa_KBP_PA_F19.fna -aa -log /tmp/amrfinder.fH4Vy1.log
P_aeruginosa_KBP_PA_F19.gff
Running: amrfinder -p P_aeruginosa_KBP_PA_F19.gff -g P_aeruginosa_KBP_PA_F19.gff -n P_aeruginosa_KBP_PA_F19.gff
Software directory: '/home/bvs/anaconda3/envs/myenv/bin/'
Software version: 3.10.30
Database directory: '/home/bvs/anaconda3/envs/myenv/share/amrfinderplus/data/2022-05-26.1'
Database version: 2022-05-26.1
AMRFinder combined translated and protein search

include -O ORGANISM, --organism ORGANISM option to add mutation searches and suppress common proteins

*** ERROR ***
File P_aeruginosa_KBP_PA_F19.gff, line 1: FASTA should start with '>'

HOSTNAME: ?
SHELL: /bin/bash
PWD: /home/bvs/neelam/annotated/annotated/amrfinder_all
PATH: /home/bvs/anaconda3/envs/myenv/bin:/home/bvs/.local/bin:/home/bvs/bin:/home/bvs/anaconda3/condabin:/home/bvs/perl5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin
Progam name: fasta_check
Command line: /home/bvs/anaconda3/envs/myenv/bin/fasta_check P_aeruginosa_KBP_PA_F19.gff -aa -log /tmp/amrfinder.2EXvS5.log
P_aeruginosa_KCP_1.faa
Running: amrfinder -p P_aeruginosa_KCP_1.faa -g P_aeruginosa_KCP_1.faa -n P_aeruginosa_KCP_1.faa
Software directory: '/home/bvs/anaconda3/envs/myenv/bin/'
Software version: 3.10.30
Database directory: '/home/bvs/anaconda3/envs/myenv/share/amrfinderplus/data/2022-05-26.1'
Database version: 2022-05-26.1
AMRFinder combined translated and protein search

include -O ORGANISM, --organism ORGANISM option to add mutation searches and suppress common proteins

GFF file mismatch.
*** ERROR ***
File P_aeruginosa_KCP_1.faa, line 1: 9 fields are expected in each line

Thank you!

vbrover · 2022-07-02T15:35:08Z

Running: amrfinder -p P_aeruginosa_KBP_PA_F19.gff -g P_aeruginosa_KBP_PA_F19.gff -n P_aeruginosa_KBP_PA_F19.gff

It should be

amrfinder  -p P_aeruginosa_KBP_PA_F19.faa  -g P_aeruginosa_KBP_PA_F19.gff  -n P_aeruginosa_KBP_PA_F19.fna

evolarjun closed this as completed Apr 21, 2020

ShresthaRima mentioned this issue Feb 19, 2021

redundant gff files #49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error GFF file #24

Error GFF file #24

ireneortega commented Apr 17, 2020 •

edited

Loading

vbrover commented Apr 17, 2020

evolarjun commented Apr 18, 2020

ireneortega commented Apr 18, 2020 •

edited

Loading

evolarjun commented Apr 19, 2020

ireneortega commented Apr 21, 2020

neelam19051 commented Jun 24, 2022

vbrover commented Jun 24, 2022

neelam19051 commented Jun 27, 2022

vbrover commented Jun 27, 2022

neelam19051 commented Jul 2, 2022 •

edited

Loading

vbrover commented Jul 2, 2022

Error GFF file #24

Error GFF file #24

Comments

ireneortega commented Apr 17, 2020 • edited Loading

vbrover commented Apr 17, 2020

evolarjun commented Apr 18, 2020

ireneortega commented Apr 18, 2020 • edited Loading

evolarjun commented Apr 19, 2020

ireneortega commented Apr 21, 2020

neelam19051 commented Jun 24, 2022

vbrover commented Jun 24, 2022

neelam19051 commented Jun 27, 2022

vbrover commented Jun 27, 2022

neelam19051 commented Jul 2, 2022 • edited Loading

vbrover commented Jul 2, 2022

ireneortega commented Apr 17, 2020 •

edited

Loading

ireneortega commented Apr 18, 2020 •

edited

Loading

neelam19051 commented Jul 2, 2022 •

edited

Loading