Skip to content

AB Generate alignment

Steve Bond edited this page Feb 1, 2017 · 28 revisions

--generate_alignment, -ga

Description

Generate a multiple sequence alignment using third party alignment tools. Basic default parameters are built into the wrapper for 'quick-and-dirty' alignments with any of the supported tools, or you can specify further parameters as desired. All necessary format conversions are handled by AlignBuddy and the output will be returned in the same format as the input (unless over-ridden with the -o flag, as is normal). This is particularly useful if aligning sequences in a richly annotated format like GenBank, as the annotations are re-mapped back onto the new alignment at the end of the job.

As the job runs, any output the tool normally generates will be streamed to stderr for your reference (suppressible with the '-q' flag). If the program generates files as part of its normal operation, these are sent to a temporary directory and deleted once AlignBuddy finishes the job. To save these files, specify a directory with the '-k' flag (example 4).

Alignment tools currently supported

The alignment programs listed below are currently supported by AlignBuddy. The default binary names that AlignBuddy will search for in your PATH are the exact names listed below, except all in lower case (e.g., 'mafft', instead of 'MAFFT'). If your version of the software has a different name or is not in your system PATH, explicitly set the name or path as the first positional argument.

Note that the binaries for these programs are not included with the BuddySuite, so they must be installed separately. Let us know if you are regularly using a non-supported tool, because we can probably start supporting you!

Arguments

Alignment tool ( str )

Optional. If not set, AlignBuddy will try to find an alignment program on your system and will execute the first one it detects. Otherwise, specify the name of the alignment tool in your PATH or the path to the binary on your system; the actual name of the program is not important, as AlignBuddy will determine which program you are calling automatically.

Program specific parameters ( str )

Optional. There are many optional parameters that each alignment tool may accept (see their documentation for details). This argument injects further commands into the final call that AlignBuddy makes to the wrapped program. It can only be used if an alignment tool is specified as the first argument, and make sure to enclose all additional tool specific arguments in double quotes (so AlignBuddy doesn't try to interpret them itself). See example 2 for a demonstration of the proper syntax.

Examples

Input file: Mnemiopsis_Panxs.gb

LOCUS       Mle-Panxα3               200 aa                     UNA 02-JAN-2015
DEFINITION  cDNA - ML036514a.
ACCESSION   Mle-Panxα3
VERSION     Mle-Panxα3
KEYWORDS    .
SOURCE      
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             order(1..50,51..111,112..152,153..183,184..200)
                     /created_by="User"
                     /label="ML036514a"
                     /modified_by="User"
     TMD1            29..49
     TMD2            132..152
ORIGIN
        1 mlllgslgti knlsifkdls lddwldqmnr tfmflllcfm gtivavsqyt gkniscdgft
       61 kfgedfsqdy cwtqglytik eaydlpesqi pypgiipenv pacrehalkn ggkivcpped
      121 qvkpltrarh lwyqwipfyf wviapvfylp ymfvkrmgld rmkpllkims dyyhcttetp
      181 seeiivkcad wvynsivdrl
//
LOCUS       Mle-Panxα4               200 aa                     UNA 02-JAN-2015
DEFINITION  cDNA and genomic - ML129317a.
ACCESSION   Mle-Panxα4
VERSION     Mle-Panxα4
KEYWORDS    .
SOURCE      
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     TMD1            28..48
     TMD2            131..151
ORIGIN
        1 mviellagyk glspfkdatv ddswdqinrc yvfiamvvmg avttmrqysg tliacdgftk
       61 fhpqfaedyc wsigmytvre aydlpssmva ypgvipwdmp acvprllkng trtkcgsekd
      121 vmpsekiyhl wyqwasfyfw ivailyyapy imfkqlggge ykplikllcl asgspeqqmq
      181 diqervvkwl ffrfktyifa
//
LOCUS       Mle-Panxα6               200 aa                     UNA 02-JAN-2015
DEFINITION  cDNA - ML25993a.
ACCESSION   Mle-Panxα6
VERSION     Mle-Panxα6
KEYWORDS    .
SOURCE      
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             order(1..42,43..92,93..125,126..171,172..200)
                     /created_by="User"
                     /label="ML25993"
                     /modified_by="User"
     TMD1            28..48
     TMD2            131..151
ORIGIN
        1 mlleilanfk gatpfkeivl ddkwdqinrc ymfllcvifg tvvtfrqytg giiacdgltk
       61 fsaafaedyc wtqglytike aydivdnslp ypgllpedap pclsrrlvsg griecppadl
      121 yleptrvhht wyqwipfyfw visiafigpy ivykqlgvne lkpilamlhn pvdgddvtkd
      181 qiskvsrwla iklnifiqek
//
LOCUS       Mle-Panxα5               200 aa                     UNA 02-JAN-2015
DEFINITION  cDNA - ML223536a.
ACCESSION   Mle-Panxα5
VERSION     Mle-Panxα5
KEYWORDS    .
SOURCE      
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             order(1..49,50..94,95..135,136..200)
                     /created_by="User"
                     /label="ML223536a"
                     /modified_by="User"
     TMD1            28..48
     TMD2            133..153
ORIGIN
        1 miywvwavfk rmapfkvvtl ddrwdqmnrs fmmpltmsfa ylidygiiag stikctgfed
       61 sfrseafvde ycwtqgiytl reaydlentk ipypgiipeg fpncmpyerw dgmkvecpke
      121 eqylkptrvy hlyyqhiqly fwlvctlfyl pymvgiclgf nytkplinll hnpltrdeee
      181 lealldkaar slrlrldiys
//

Usage example 1

If no arguments are passed in, AlignBuddy will try to find an alignment program on your system. In this example, MAFFT is found.

$: alb Mnemiopsis_Panxs.gb -ga

Output

nseq =  4
distance =  ktuples
iterate =  0
cycle =  2
nguidetree = 2
nthread = 0
sueff_global = 0.100000
done.
scoremtx = 1
Gap Penalty = -1.53, +0.00, +0.00

tuplesize = 6, dorp = p


Making a distance matrix ..
    1 / 4
done.

Constructing a UPGMA tree ...
    0 / 4
done.

Progressive alignment 1/2...
STEP     1 / 3 f
Reallocating..done. *alloclen = 1404
STEP     3 / 3 d
done.

Constructing a UPGMA tree ...
    0 / 4
done.

Progressive alignment 2/2...
STEP     1 / 3 f
Reallocating..done. *alloclen = 1404
STEP     3 / 3 d
done.

disttbfast (aa) Version 7.186 alg=A, model=BLOSUM62, 1.53, -0.00, -0.00, noshift, amax=0.0
0 thread(s)


Strategy:
 FFT-NS-2 (Fast but rough)
 Progressive method (guide trees were built 2 times.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.


Returning to AlignBuddy...

LOCUS       Mle-Panxα3               212 aa                     UNK 01-JAN-1980
DEFINITION
ACCESSION   Mle-Panxα3
VERSION     Mle-Panxα3
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             order(1..50,51..113,114..154,155..192,193..209)
                     /created_by="User"
                     /label="ML036514a"
                     /modified_by="User"
     TMD1            29..49
     TMD2            134..154
ORIGIN
        1 mlllgslgti knlsifkdls lddwldqmnr tfmflllcfm gtivavsqyt gkniscdgft
       61 k--fgedfsq dycwtqglyt ikeaydlpes qipypgiipe nvpacrehal knggkivcpp
      121 edqvkpltra rhlwyqwipf yfwviapvfy lpymfvkrmg ldrmkpllki msdyyhctte
      181 tp-------s eeiivkcadw vynsivdrl- --
//
LOCUS       Mle-Panxα4               212 aa                     UNK 01-JAN-1980
DEFINITION
ACCESSION   Mle-Panxα4
VERSION     Mle-Panxα4
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     TMD1            29..49
     TMD2            134..154
ORIGIN
        1 -mviellagy kglspfkdat vddswdqinr cyvfiamvvm gavttmrqys gtliacdgft
       61 k--fhpqfae dycwsigmyt vreaydlpss mvaypgvipw dmpacvprll kngtrtkcgs
      121 ekdvmpseki yhlwyqwasf yfwivailyy apyimfkqlg ggeykplikl lc----lasg
      181 sp----eqqm qdiqervvkw lffrfktyif a-
//
LOCUS       Mle-Panxα6               212 aa                     UNK 01-JAN-1980
DEFINITION
ACCESSION   Mle-Panxα6
VERSION     Mle-Panxα6
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             order(2..43,44..95,96..128,129..182,183..212)
                     /created_by="User"
                     /label="ML25993"
                     /modified_by="User"
     TMD1            29..49
     TMD2            134..154
ORIGIN
        1 -mlleilanf kgatpfkeiv lddkwdqinr cymfllcvif gtvvtfrqyt ggiiacdglt
       61 k--fsaafae dycwtqglyt ikeaydivdn slpypgllpe dappclsrrl vsggriecpp
      121 adlyleptrv hhtwyqwipf yfwvisiafi gpyivykqlg vnelkpilam l--------h
      181 npv-dgddvt kdqiskvsrw laiklnifiq ek
//
LOCUS       Mle-Panxα5               212 aa                     UNK 01-JAN-1980
DEFINITION
ACCESSION   Mle-Panxα5
VERSION     Mle-Panxα5
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             order(2..50,51..95,96..136,137..209)
                     /created_by="User"
                     /label="ML223536a"
                     /modified_by="User"
     TMD1            29..49
     TMD2            134..154
ORIGIN
        1 -miywvwavf krmapfkvvt lddrwdqmnr sfmmpltmsf aylidygiia gstikctgfe
       61 dsfrseafvd eycwtqgiyt lreaydlent kipypgiipe gfpncmpyer wdgmkvecpk
      121 eeqylkptrv yhlyyqhiql yfwlvctlfy lpymvgiclg fnytkplinl l--------h
      181 npltrdeeel ealldkaars lrlrldiys- --
//

Usage example 2

Specify a specific version of PRANK not in your system PATH

$: alb Mnemiopsis_Panxs.gb -ga /path/to/prank_v140603 -o fasta

Output

-----------------
 PRANK v.140603:
-----------------

Input for the analysis
 - aligning sequences in '/Volumes/Zippy/.sysTemp/tmpfn_qudo0/tmp.fa'
 - using inferred alignment guide tree
 - option '+F' is not used; it can be enabled with '+F'
 - external tools available:
    MAFFT for initial alignment
    Exonerate for alignment anchoring
    BppAncestor for ancestral state reconstruction

Warning: sequence names changed.


Generating multiple alignment: iteration 1.
#3#(3/3): 97% aligned
Alignment score: 341

Generating multiple alignment: iteration 2.
#3#(3/3): 97% aligned
Alignment score: 343

Generating multiple alignment: iteration 3.
#3#(3/3): 99% computed
Alignment score: 343

Generating multiple alignment: iteration 4.
#3#(3/3): 99% computed
Alignment score: 343

Generating multiple alignment: iteration 5.
#3#(3/3): 99% computed
Alignment score: 343


Writing
 - alignment to '/Volumes/Zippy/.sysTemp/tmpfn_qudo0/result.best.fas'

Analysis done. Total time 5s

Returning to AlignBuddy...

>Mle-Panxα5_cDNA_-_ML223536a.
-MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAGSTIKCTGFE
DSFRSEAFVDEYCWTQGIYTLREAYDLENTKIPYPGIIPEGFPNCMPYERWDGMKVECPK
EEQYLKPTRVYHLYYQHIQLYFWLVCTLFYLPYMVGICLGFNYTKPLINLLHNPLT-RDE
EELEALLDKAARSLRLRLDIY---S
>Mle-Panxα4_cDNA_and_genomic_-_ML129317a.
-MVIELLAGYKGLSPFKDATVDDSWDQINRCYVFIAMVVMGAVTTMRQYSGTLIACDGFT
KF--HPQFAEDYCWSIGMYTVREAYDLPSSMVAYPGVIPWDMPACVPRLLKNGTRTKCGS
EKDVMPSEKIYHLWYQWASFYFWIVAILYYAPYIMFKQLGGGEYKPLIKLLCLASG-SPE
QQMQDIQERVVKWLFFRFKTYIFA-
>Mle-Panxα6_cDNA_-_ML25993a.
-MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTGGIIACDGLT
KF--SAAFAEDYCWTQGLYTIKEAYDIVDNSLPYPGLLPEDAPPCLSRRLVSGGRIECPP
ADLYLEPTRVHHTWYQWIPFYFWVISIAFIGPYIVYKQLGVNELKPILAMLHNPVD-GDD
VTKDQIS-KVSRWLAIKLNIFIQEK
>Mle-Panxα3_cDNA_-_ML036514a.
MLLLGSLGTIKNLSIFKDLSLDDWLDQMNRTFMFLLLCFMGTIVAVSQYTGKNISCDGFT
KF--GEDFSQDYCWTQGLYTIKEAYDLPESQIPYPGIIPENVPACREHALKNGGKIVCPP
EDQVKPLTRARHLWYQWIPFYFWVIAPVFYLPYMFVKRMGLDRMKPLLKIMSDYYHCTTE
TPSEEIIVKCADWVY---NSIVDRL

Usage example 3

Pass in extra parameters to further refine your alignment.

$: alb Mnemiopsis_Panxs.gb -ga clustalomega "--iter=2" -o clustal

Output

Using 24 threads
Read 4 sequences (type: Protein) from /Volumes/Zippy/.sysTemp/tmpijrd7orz/tmp.fa
not more sequences (4) than cluster-size (100), turn off mBed
Calculating pairwise ktuple-distances...
Ktuple-distance calculation progress done. CPU time: 0.00u 0.01s 00:00:00.01 Elapsed: 00:00:00
Guide-tree computation done.
Progressive alignment progress done. CPU time: 0.02u 0.00s 00:00:00.02 Elapsed: 00:00:00
Iteration step 1 out of 2
Computing new guide tree (iteration step 1032320)
Calculating pairwise aligned identity distances...
Pairwise identity calculation progress done. CPU time: 0.00u 0.00s 00:00:00.00 Elapsed: 00:00:00
Guide-tree computation done.
Computing HMM from alignment
Progressive alignment progress done. CPU time: 0.06u 0.01s 00:00:00.06 Elapsed: 00:00:00
Iteration step 2 out of 2
Computing new guide tree (iteration step 1032320)
Calculating pairwise aligned identity distances...
Pairwise identity calculation progress done. CPU time: 0.00u 0.00s 00:00:00.00 Elapsed: 00:00:00
Guide-tree computation done.
Computing HMM from alignment
Progressive alignment progress done. CPU time: 0.07u 0.00s 00:00:00.07 Elapsed: 00:00:00
Alignment written to /Volumes/Zippy/.sysTemp/tmpijrd7orz/result

Returning to AlignBuddy...

CLUSTAL X (1.81) multiple sequence alignment


Mle-Panxα3                          MLLLGSLGTIKNLSIFKDLSLDDWLDQMNRTFMFLLLCFMGTIVAVSQYT
Mle-Panxα4                          -MVIELLAGYKGLSPFKDATVDDSWDQINRCYVFIAMVVMGAVTTMRQYS
Mle-Panxα6                          -MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYT
Mle-Panxα5                          -MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIA

Mle-Panxα3                          GKNISCDGFTK--FGEDFSQDYCWTQGLYTIKEAYDLPESQIPYPGIIPE
Mle-Panxα4                          GTLIACDGFTK--FHPQFAEDYCWSIGMYTVREAYDLPSSMVAYPGVIPW
Mle-Panxα6                          GGIIACDGLTK--FSAAFAEDYCWTQGLYTIKEAYDIVDNSLPYPGLLPE
Mle-Panxα5                          GSTIKCTGFEDSFRSEAFVDEYCWTQGIYTLREAYDLENTKIPYPGIIPE

Mle-Panxα3                          NVPACREHALKNGGKIVCPPEDQVKPLTRARHLWYQWIPFYFWVIAPVFY
Mle-Panxα4                          DMPACVPRLLKNGTRTKCGSEKDVMPSEKIYHLWYQWASFYFWIVAILYY
Mle-Panxα6                          DAPPCLSRRLVSGGRIECPPADLYLEPTRVHHTWYQWIPFYFWVISIAFI
Mle-Panxα5                          GFPNCMPYERWDGMKVECPKEEQYLKPTRVYHLYYQHIQLYFWLVCTLFY

Mle-Panxα3                          LPYMFVKRMGLDRMKPLLKIMSDYYHCTTETPSEEIIVKCADWVYNSIVD
Mle-Panxα4                          APYIMFKQLGGGEYKPLIKLLCLAS-GSPEQQMQDIQERVVKWLFFRFKT
Mle-Panxα6                          GPYIVYKQLGVNELKPILAMLHNPVDGDD--VTKDQISKVSRWLAIKLNI
Mle-Panxα5                          LPYMVGICLGFNYTKPLINLLHNPLTRDE-EELEALLDKAARSLRLRLDI

Mle-Panxα3                          RL---
Mle-Panxα4                          YIFA-
Mle-Panxα6                          FIQEK
Mle-Panxα5                          YS---

Usage example 4

Keep all temporary files

$: alb Mnemiopsis_Panxs.gb -ga clustalw2 -o phylip-sequential -k ~/alignment_files

Output


Returning to AlignBuddy...

 4 205
Mle-Panxα4  -MVIELLAGYKGLSPFKDATVDDSWDQINRCYVFIAMVVMGAVTTMRQYSGTLIACDGFTK--FHPQFAEDYCWSIGMYTVREAYDLPSSMVAYPGVIPWDMPACVPRLLKNGTRTKCGSEKDVMPSEKIYHLWYQWASFYFWIVAILYYAPYIMFKQLGGGEYKPLIKLLCLAS-GSPEQQMQDIQERVVKWLFFRFKTYIFA-
Mle-Panxα6  -MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTGGIIACDGLTK--FSAAFAEDYCWTQGLYTIKEAYDIVDNSLPYPGLLPEDAPPCLSRRLVSGGRIECPPADLYLEPTRVHHTWYQWIPFYFWVISIAFIGPYIVYKQLGVNELKPILAMLHNPVDGDD--VTKDQISKVSRWLAIKLNIFIQEK
Mle-Panxα3  MLLLGSLGTIKNLSIFKDLSLDDWLDQMNRTFMFLLLCFMGTIVAVSQYTGKNISCDGFTK--FGEDFSQDYCWTQGLYTIKEAYDLPESQIPYPGIIPENVPACREHALKNGGKIVCPPEDQVKPLTRARHLWYQWIPFYFWVIAPVFYLPYMFVKRMGLDRMKPLLKIMSDYYHCTTETPSEEIIVKCADWVYNSIVDRL---
Mle-Panxα5  -MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAGSTIKCTGFEDSFRSEAFVDEYCWTQGIYTLREAYDLENTKIPYPGIIPEGFPNCMPYERWDGMKVECPKEEQYLKPTRVYHLYYQHIQLYFWLVCTLFYLPYMVGICLGFNYTKPLINLLHNPLTRDE-EELEALLDKAARSLRLRLDIYS---

A new directory was created:

$: ls ~/alignment_file
>>> result  tmp.dnd tmp.fa

Main Toolkit Pages





Further Reading

Clone this wiki locally