Multi-objective formulation of MSA for phylogeny estimation (Do application-aware measures guide towards better phylogenetic tree?)
This is a Java Netbeans project built on top of jMetalMSA, an open source software tool, which we extended to compute multiple sequence problem (MSA) using evolutionary multi-objective (EMO) algorithms guided by an application-aware measure and added necessary datasets. All dependencies (i.e., required modules) of this project are managed by Apache Maven, so it is very easy to install and run. Below we describe its different aspects.
To use this project, the following software tools are required:
To download this project, just clone the Git repository hosted in GitHub:
git clone https://github.com/ali-nayeem/MSA.git
Once cloned, you can compile the software and generate a jar file with the following command:
mvn package
This sentence will generate a directory called target
which will contain a file called jmetalmsa-1.0-SNAPSHOT-jar-with-dependencies.jar
The object-oriented architecture of jMetalMSA is shown in Figure above, is composed of four core classes (Java interfaces). Three of them (MSAProblem, MSAAlgorithm, and MSASolution) inherits from their counterparts in jMetal (the inheritance relationships are omitted in the diagram), and there is a class Score to represent a given MSA scoring function.
Here we summarize different features that we used in this study
In our study, we used the following algorithms:
- NSGA-II (
org.uma.jmetalmsa.algorithm.nsgaii
) - NSGA-III (
org.uma.jmetalmsa.algorithm.algoyy.NSGAIIIYY
)
For NSGA-III, we adopt the Java implementation of Dr. Yuan Yuan avilable at https://github.com/yyxhdy/ManyEAs
The crossover operator (org.uma.jmetalmsa.crossover.SPXMSACrossover
) is the Single-Point Crossover adapted to alignments, randomly selects a position from the parent A by splitting it into two blocks and the parent B is tailored so that the right piece can be joined to the left piece of
the first parent (PA1) and vice versa. Selected blocks are crossed between these two parents
The list of mutation operators included here in package org.uma.jmetalmsa.mutation
are:
- Shift-closed gaps: Closed gaps are randomly chosen and shifted to another position.
- Non-gap group splitting: a non-gap group is selected randomly, and it is split into two groups.
- One gap insertion: Inserts a gap in a random position for each sequence.
- Two adjacent gap groups merging: Selects a random group of gaps and merge with its nearest group of gaps.
- Multiple mutation
To conduct this study we added the following objectives in package org.uma.jmetalmsa.score.impl
- Entropy
- Similarity based on gap containing columns
- Similarity based on non-gap columns
- Concentration of gaps
We used three datasets listed below:
- 100-taxon simulated dataset (inside
dataset/100S/
) - Biological rRNA datasets (
dataset/100S/23S.E
anddataset/100S/23S.E.aa_ag
) - BAliBASE 3.0 Benchmark (inside
example/bb3_release/
)
To experiment with NSGA-II and NSGA-III on three datasets we implemented the three Java classes in package org.uma.jmetalmsa.experiment
as follows:
- NSGAIIStudy
- NSGAIIStudyBalibase
- NSGAIIIStudy
To execute the class named NSGAIIStudy
, run the following command in terminal from the project root:
java -cp target/jmetalmsa-1.0-SNAPSHOT-jar-with-dependencies.jar org.uma.jmetalmsa.experiment.NSGAIIStudy