Skip to content
Steve Bond edited this page Sep 17, 2016 · 4 revisions

--annotate, -ano

Description

Add feature annotations to sequences. The GenBank and EMBL flat file specifications provide for rich annotation, so the default output format from this tool is genbank; this can be overridden with the -o flag.

Arguments

The first two arguments are required and their order matters. The remaining arguments are optional, with annotate automatically detecting the order in which they are being passed in.

New feature type ( str )

Any feature type (or 'key') can be specified, but only 16 characters will be printed to GenBank/EMBL format. Furthermore, to comply with the strict GenBank/EMBL specification, you must select from a set of approved feature keys. A warning will be printed if choosing keys outside of the specification, which can be silenced with the -q flag.

Location ( str )

The start and end positions of the new feature should be passed in as a single string in the format 'start-end'. If the feature is compound (e.g., the coding sequence within a whole gene), multiple locations can be combined into a single string in the format 'start1-end1,start2-end2,start3-end3,...'.

Strand ( str )

Optional. If working with DNA, you can specify which strand the feature is on with the '+' (sense) or '-' (anti-sense) characters.

Qualifiers ( str )

Optional. Qualifiers are additional information or sub-features. While qualifiers are completely free-form in SeqBuddy, each key in the GenBank/EMBL specification has a set group of approved qualifiers. Represent your qualifiers in the form 'qualifier_name=information', and there is no restriction on how many are specified.

Specify records ( regular expression )

Optional. Specify which sequence(s) the new feature should be applied to. The pull_recs function is used to get the subset of sequences that will be affected, and regular expressions are understood. Multiple regular expressions can be passed in if desired.

Examples

Input file 1: Mle-Panxα4_cds.gb

LOCUS       Mle-Panxα4              1275 bp    DNA              UNA 02-JAN-2015
DEFINITION  cDNA and genomic - ML129317a.
ACCESSION   Mle-Panxα4
VERSION     Mle-Panxα4
KEYWORDS    .
SOURCE
  ORGANISM  . . .
            .
FEATURES             Location/Qualifiers
     TMD1            82..144
     TMD2            391..453
     TMD3            643..705
     TMD4            913..1005
ORIGIN
        1 atggttattg agctgctagc tggatacaaa ggtctgtccc cgtttaaaga cgcgactgtt
       61 gacgactcat gggaccaaat aaaccgatgt tacgtgttca tcgccatggt ggtgatgggt
      121 gctgtgacta caatgaggca atactctgga acattgattg catgtgacgg gttcacgaag
      181 ttccaccctc agtttgcaga agattactgc tggagcatag gaatgtacac ggtacgcgag
      241 gcctatgact tgcccagcag tatggttgca taccccggag tgataccctg ggatatgcct
      301 gcatgtgttc cacgtctcct gaagaacgga accaggacca aatgtggcag tgagaaggac
      361 gttatgccct cagagaaaat ctaccacttg tggtaccagt gggcaagttt ctacttctgg
      421 atagtggcta tactgtacta cgcgccgtat ataatgttca aacagttggg agggggagag
      481 tacaagcccc tgatcaagct actttgtctt gcgtctggat ctcctgaaca acagatgcag
      541 gacatccagg agcgtgtcgt caagtggctt ttcttcaggt ttaagaccta catattcgct
      601 aagggttact acgcgtggct acgtaaaaac agtttcagta tcgctatcgg cgtgacaaaa
      661 ttgtcctatc tcctgataac tatccttgtg ttctacttaa caggcttcat gttcgaatat
      721 ggctctaaca cgtggtaccg gtacggtgct gactggtacg gtaccagatt ctcctcgtac
      781 cacgaaacta acaactcaat cacactcaca aaggacatca tcttcccaaa gatggtagcg
      841 tgtgagatca agcgatgggg tccctcaggg attgaggttg agaccgctca gtgcgtactt
      901 gccccgaatg tgctctacca gtaccttttc ctctttactt ggtacctcct gatcgcggta
      961 ttcttcacta acctcatcag ttgtttcctc cacatttctg agatgttctt ctctaacggt
     1021 acgtacaaca ggatgataga tcaaggaatg ttgccagaca agcccagtta tcggtacgtc
     1081 ttcatgaaca ttggcgccgg tggcagagag atagtccaga ttctaacaga caattccaac
     1141 cccctcttgt ttagcaagat atttgacgat cttaccaatt tactaatcac tacttccaaa
     1201 aacgctgacg tcattgaaaa cctgtcgaag ttggattcct ccgtaattga actaggcagc
     1261 aaagactcaa tctaa
//

Usage example 1

$: sb Mle-Panxα4_cds.gb -ano 'misc_feature' '1-10'

Output

LOCUS       Mle-Panxα4              1275 bp    DNA              UNA 02-JAN-2015
DEFINITION  cDNA and genomic - ML129317a.
ACCESSION   Mle-Panxα4
VERSION     Mle-Panxα4
KEYWORDS    .
SOURCE
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     misc_feature    1..10
     TMD1            82..144
     TMD2            391..453
     TMD3            643..705
     TMD4            913..1005
ORIGIN
        1 atggttattg agctgctagc tggatacaaa ggtctgtccc cgtttaaaga cgcgactgtt
       61 gacgactcat gggaccaaat aaaccgatgt tacgtgttca tcgccatggt ggtgatgggt
      121 gctgtgacta caatgaggca atactctgga acattgattg catgtgacgg gttcacgaag
      181 ttccaccctc agtttgcaga agattactgc tggagcatag gaatgtacac ggtacgcgag
      241 gcctatgact tgcccagcag tatggttgca taccccggag tgataccctg ggatatgcct
      301 gcatgtgttc cacgtctcct gaagaacgga accaggacca aatgtggcag tgagaaggac
      361 gttatgccct cagagaaaat ctaccacttg tggtaccagt gggcaagttt ctacttctgg
      421 atagtggcta tactgtacta cgcgccgtat ataatgttca aacagttggg agggggagag
      481 tacaagcccc tgatcaagct actttgtctt gcgtctggat ctcctgaaca acagatgcag
      541 gacatccagg agcgtgtcgt caagtggctt ttcttcaggt ttaagaccta catattcgct
      601 aagggttact acgcgtggct acgtaaaaac agtttcagta tcgctatcgg cgtgacaaaa
      661 ttgtcctatc tcctgataac tatccttgtg ttctacttaa caggcttcat gttcgaatat
      721 ggctctaaca cgtggtaccg gtacggtgct gactggtacg gtaccagatt ctcctcgtac
      781 cacgaaacta acaactcaat cacactcaca aaggacatca tcttcccaaa gatggtagcg
      841 tgtgagatca agcgatgggg tccctcaggg attgaggttg agaccgctca gtgcgtactt
      901 gccccgaatg tgctctacca gtaccttttc ctctttactt ggtacctcct gatcgcggta
      961 ttcttcacta acctcatcag ttgtttcctc cacatttctg agatgttctt ctctaacggt
     1021 acgtacaaca ggatgataga tcaaggaatg ttgccagaca agcccagtta tcggtacgtc
     1081 ttcatgaaca ttggcgccgg tggcagagag atagtccaga ttctaacaga caattccaac
     1141 cccctcttgt ttagcaagat atttgacgat cttaccaatt tactaatcac tacttccaaa
     1201 aacgctgacg tcattgaaaa cctgtcgaag ttggattcct ccgtaattga actaggcagc
     1261 aaagactcaa tctaa
//

Usage example 2 (specify anti-sense strand and set two qualifiers)

$: sb Mle-Panxα4_cds.gb -ano 'misc_feature' '1-10,20-30' - 'foo=bar' 'hello=world'

Output

LOCUS       Mle-Panxα4              1275 bp    DNA              UNA 02-JAN-2015
DEFINITION  cDNA and genomic - ML129317a.
ACCESSION   Mle-Panxα4
VERSION     Mle-Panxα4
KEYWORDS    .
SOURCE      
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     misc_feature    complement(order(20..30,1..10))
                     /foo="bar"
                     /hello="world"
     TMD1            82..144
     TMD2            391..453
     TMD3            643..705
     TMD4            913..1005
ORIGIN
        1 atggttattg agctgctagc tggatacaaa ggtctgtccc cgtttaaaga cgcgactgtt
       61 gacgactcat gggaccaaat aaaccgatgt tacgtgttca tcgccatggt ggtgatgggt
      121 gctgtgacta caatgaggca atactctgga acattgattg catgtgacgg gttcacgaag
      181 ttccaccctc agtttgcaga agattactgc tggagcatag gaatgtacac ggtacgcgag
      241 gcctatgact tgcccagcag tatggttgca taccccggag tgataccctg ggatatgcct
      301 gcatgtgttc cacgtctcct gaagaacgga accaggacca aatgtggcag tgagaaggac
      361 gttatgccct cagagaaaat ctaccacttg tggtaccagt gggcaagttt ctacttctgg
      421 atagtggcta tactgtacta cgcgccgtat ataatgttca aacagttggg agggggagag
      481 tacaagcccc tgatcaagct actttgtctt gcgtctggat ctcctgaaca acagatgcag
      541 gacatccagg agcgtgtcgt caagtggctt ttcttcaggt ttaagaccta catattcgct
      601 aagggttact acgcgtggct acgtaaaaac agtttcagta tcgctatcgg cgtgacaaaa
      661 ttgtcctatc tcctgataac tatccttgtg ttctacttaa caggcttcat gttcgaatat
      721 ggctctaaca cgtggtaccg gtacggtgct gactggtacg gtaccagatt ctcctcgtac
      781 cacgaaacta acaactcaat cacactcaca aaggacatca tcttcccaaa gatggtagcg
      841 tgtgagatca agcgatgggg tccctcaggg attgaggttg agaccgctca gtgcgtactt
      901 gccccgaatg tgctctacca gtaccttttc ctctttactt ggtacctcct gatcgcggta
      961 ttcttcacta acctcatcag ttgtttcctc cacatttctg agatgttctt ctctaacggt
     1021 acgtacaaca ggatgataga tcaaggaatg ttgccagaca agcccagtta tcggtacgtc
     1081 ttcatgaaca ttggcgccgg tggcagagag atagtccaga ttctaacaga caattccaac
     1141 cccctcttgt ttagcaagat atttgacgat cttaccaatt tactaatcac tacttccaaa
     1201 aacgctgacg tcattgaaaa cctgtcgaag ttggattcct ccgtaattga actaggcagc
     1261 aaagactcaa tctaa
//

Input file 2: Mle-Panx_pep.fa

>Mle-Panxα1 cDNA - ML078817.
mywifeicqeikraqscrkfaidgpfdwtnriimptlmviccflqtftfmfgsniscigf
eklernfveeycwtqgiytskaaynmplhtpypgiapcvpeydpvtqkywlpcgveeedk
ayhlwyqwvpfyflavavgyylpflilkgsklhqvkplitylmnqrnletdpnhlvgkls
hwifrqlvysrfaatstirmywhdwglvllvcsvkilyltvslihlfatakmfhignwft
ygimfarrsnshtthvkdvffpkmvackietwsftgknhlhgmcvlalnvmnqylflivw
yvnviiiflnsisciytivkfcspnivhhrivnssslddhhdftrmfgyvgpsgriilak
msehmpgymlkqvakkvtekidieneknrgraptikftkvngqpselarqplmhlnalml
gmvpqnlpepkiqniqrsqkkvrflv*
>Mle-Panxα4 cDNA and genomic - ML129317a.
mviellagykglspfkdatvddswdqinrcyvfiamvvmgavttmrqysgtliacdgftk
fhpqfaedycwsigmytvreaydlpssmvaypgvipwdmpacvprllkngtrtkcgsekd
vmpsekiyhlwyqwasfyfwivailyyapyimfkqlgggeykplikllclasgspeqqmq
diqervvkwlffrfktyifakgyyawlrknsfsiaigvtklsyllitilvfyltgfmfey
gsntwyrygadwygtrfssyhetnnsitltkdiifpkmvaceikrwgpsgievetaqcvl
apnvlyqylflftwylliavfftnliscflhisemffsngtynrmidqgmlpdkpsyryv
fmnigaggreivqiltdnsnpllfskifddltnllittsknadvienlskldssvielgs
kdsi*
>Mle-Panxα12 cDNA - ML25997a.
mvidilsgfkgitpfkgitlddgwdqinrsfmfvlcvlmgtvvtvrqyaggiiscdgftk
ysgsfsedycwtqglytikeaydlltmnvpypgvipedmptcierelinggrvscpdpet
vkpptrvyhlwyqwvpfyfwlaaaafffpyliykhfgvgdlkpliqmlhnpivdegdqnc
maekasmwlfyklnvfmnentifailtekhrlffivmlvkvlyliisilalyltdemfhi
gsfvsygsewatslpegdnettlvkdklfpkmvaceikrwgptgleeeqgmcvlapnvin
qylflilwfaiifciacnclsvlfaltklvfvlgsykrllasaflkdelhykhmffnigt
sgrvllqivatnvsprvfesimanlatkliaerlkgngkgsv*

Usage example 3 (only modify specific sequence)

$: sb Mle-Panx_pep.fa -ano 'misc_feature' '1-10,20-30' - 'foo=bar' 'hello=world' 'Panxα4'

Output

LOCUS       Mle-Panxα1               447 aa                     UNK 01-JAN-1980
DEFINITION  Mle-Panxα1 cDNA - ML078817.
ACCESSION   Mle-Panxα1
VERSION     Mle-Panxα1
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 mywifeicqe ikraqscrkf aidgpfdwtn riimptlmvi ccflqtftfm fgsniscigf
       61 eklernfvee ycwtqgiyts kaaynmplht pypgiapcvp eydpvtqkyw lpcgveeedk
      121 ayhlwyqwvp fyflavavgy ylpflilkgs klhqvkplit ylmnqrnlet dpnhlvgkls
      181 hwifrqlvys rfaatstirm ywhdwglvll vcsvkilylt vslihlfata kmfhignwft
      241 ygimfarrsn shtthvkdvf fpkmvackie twsftgknhl hgmcvlalnv mnqylflivw
      301 yvnviiifln sisciytivk fcspnivhhr ivnssslddh hdftrmfgyv gpsgriilak
      361 msehmpgyml kqvakkvtek idieneknrg raptikftkv ngqpselarq plmhlnalml
      421 gmvpqnlpep kiqniqrsqk kvrflv*
//
LOCUS       Mle-Panxα12              403 aa                     UNK 01-JAN-1980
DEFINITION  Mle-Panxα12 cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 mvidilsgfk gitpfkgitl ddgwdqinrs fmfvlcvlmg tvvtvrqyag giiscdgftk
       61 ysgsfsedyc wtqglytike aydlltmnvp ypgvipedmp tciereling grvscpdpet
      121 vkpptrvyhl wyqwvpfyfw laaaafffpy liykhfgvgd lkpliqmlhn pivdegdqnc
      181 maekasmwlf yklnvfmnen tifailtekh rlffivmlvk vlyliisila lyltdemfhi
      241 gsfvsygsew atslpegdne ttlvkdklfp kmvaceikrw gptgleeeqg mcvlapnvin
      301 qylflilwfa iifciacncl svlfaltklv fvlgsykrll asaflkdelh ykhmffnigt
      361 sgrvllqiva tnvsprvfes imanlatkli aerlkgngkg sv*
//
LOCUS       Mle-Panxα4               425 aa                     UNK 01-JAN-1980
DEFINITION  Mle-Panxα4 cDNA and genomic - ML129317a.
ACCESSION   Mle-Panxα4
VERSION     Mle-Panxα4
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     misc_feature    order(1..10,20..30)
                     /foo="bar"
                     /hello="world"
ORIGIN
        1 mviellagyk glspfkdatv ddswdqinrc yvfiamvvmg avttmrqysg tliacdgftk
       61 fhpqfaedyc wsigmytvre aydlpssmva ypgvipwdmp acvprllkng trtkcgsekd
      121 vmpsekiyhl wyqwasfyfw ivailyyapy imfkqlggge ykplikllcl asgspeqqmq
      181 diqervvkwl ffrfktyifa kgyyawlrkn sfsiaigvtk lsyllitilv fyltgfmfey
      241 gsntwyryga dwygtrfssy hetnnsitlt kdiifpkmva ceikrwgpsg ievetaqcvl
      301 apnvlyqylf lftwylliav fftnliscfl hisemffsng tynrmidqgm lpdkpsyryv
      361 fmnigaggre ivqiltdnsn pllfskifdd ltnllittsk nadvienlsk ldssvielgs
      421 kdsi*
//

Main Toolkit Pages





Further Reading

Clone this wiki locally