MutMap User Guide

version 2.3.6

What is MutMap?

Bulked segregant analysis, as implemented in MutMap (Abe et al., 2012), is a powerful and efficient method to identify agronomically important loci in crop plants. MutMap requires whole-genome resequencing of a single individual from the original cultivar and the pooled sequences of F2 progeny from a cross between the original cultivar and mutant. MutMap uses the sequence of the original cultivar to polarize the site frequencies of neighbouring markers and identifies loci with an unexpected site frequency, simulating the genotype of F2 progeny. The updated pipeline is approximately 5-8 times faster than the previous pipeline, is easier for novice users to use, and can be easily installed through bioconda with all dependencies.

Citation

Yu Sugihara, Lester Young, Hiroki Yaegashi, Satoshi Natsume, Daniel J. Shea, Hiroki Takagi, Helen Booker, Hideki Innan, Ryohei Terauchi, Akira Abe (2022). High performance pipeline for MutMap and QTL-seq. PeerJ, 10:e13170.
Akira Abe, Shunichi Kosugi, Kentaro Yoshida, Satoshi Natsume, Hiroki Takagi, Hiroyuki Kanzaki, Hideo Matsumura, Kakoto Yoshida, Chikako Mitsuoka, Muluneh Tamiru, Hideki Innan, Liliana Cano, Sophien Kamoun & Ryohei Terauchi (2012). Genome sequencing reveals agronomically important loci in rice using MutMap. Nature Biotechnol. 30:174-179.

Installation

Dependencies

Software

Python (>=3.5) libraries

matplotlib
numpy
pandas
seaborn (optional)

Installation via bioconda

You can install MutMap via bioconda.

conda create -c bioconda -n mutmap mutmap
conda activate mutmap

Manual installation

If you encounter an error during installation, you can install MutMap manually.

git clone https://github.com/YuSugihara/MutMap.git
cd MutMap
pip install -e .

You will then need to install other dependencies manually. We highly recommend installing SnpEff and Trimmomatic via bioconda.

conda install -c bioconda snpeff
conda install -c bioconda trimmomatic

After installation, please check whether SnpEff and Trimmomatic are working by using the commands below.

snpEff --help
trimmomatic --help

Usage

If your reference genome contains more than 50 contigs, only the 50 longest contigs will be plotted.

mutmap -h

usage: mutmap -r <FASTA> -c <BAM|FASTQ> -b <BAM|FASTQ>
              -n <INT> -o <OUT_DIR> [-T] [-e <DATABASE>]

MutMap version 2.3.6

options:
  -h, --help         show this help message and exit
  -r , --ref         Reference FASTA file.
  -c , --cultivar    FASTQ or BAM file of cultivar. If specifying
                     FASTQ, separate paired-end files with a comma,
                     e.g., -c fastq1,fastq2. This option can be
                     used multiple times.
  -b , --bulk        FASTQ or BAM file of mutant bulk. If specifying
                     FASTQ, separate paired-end files with a comma,
                     e.g., -b fastq1,fastq2. This option can be
                     used multiple times.
  -n , --N-bulk      Number of individuals in the mutant bulk.
  -o , --out         Output directory. The specified directory must not
                     already exist.
  -t , --threads     Number of threads. If a value less than 1 is specified,
                     MutMap will use the maximum available threads. [2]
  -w , --window      Window size in kilobases (kb). [2000]
  -s , --step        Step size in kilobases (kb). [100]
  -D , --max-depth   Maximum depth of variants to be used. This cutoff
                     applies to both the cultivar and the bulk. [250]
  -d , --min-depth   Minimum depth of variants to be used. This cutoff
                     applies to both the cultivar and the bulk. [8]
  -N , --N-rep       Number of replicates for simulations to generate
                     null distribution. [5000]
  -T, --trim         Trim FASTQ files using Trimmomatic.
  -a , --adapter     FASTA file containing adapter sequences. This option
                     is used when "-T" is specified for trimming.
  --trim-params      Parameters for Trimmomatic. Input parameters
                     must be comma-separated in the following order:
                     Phred score, ILLUMINACLIP, LEADING, TRAILING,
                      SLIDINGWINDOW, MINLEN. To remove Illumina adapters,
                     specify the adapter FASTA file with "--adapter".
                     If not specified, adapter trimming will be skipped.
                     [33,<ADAPTER_FASTA>:2:30:10,20,20,4:15,75]
  -e , --snpEff      Predict causal variants using SnpEff. Check
                     available databases in SnpEff.
  --line-colors      Colors for threshold lines in plots. Specify a
                     comma-separated list in the order of SNP-index,
                     p95, and p99. [red,lime,orange]
  --dot-color        Color of the dots in plots. [navy]
  --mem              Maximum memory per thread when sorting BAM files;
                     suffixes K/M/G are recognized. [1G]
  -q , --min-MQ      Minimum mapping quality for mpileup. [40]
  -Q , --min-BQ      Minimum base quality for mpileup. [18]
  -C , --adjust-MQ   Adjust the mapping quality for mpileup. The default
                     setting is optimized for BWA. [50]
  -v, --version      show program's version number and exit

MutMap can be run from FASTQ (without or with trimming) and BAM. If you want to run MutMap from VCF, please use MutPlot (example 5). Once you run MutMap, MutMap automatically completes the subprocesses.

Example 1 : run MutMap from FASTQ without trimming
Example 2 : run MutMap from FASTQ with trimming
Example 3 : run MutMap from BAM
Example 4 : run MutMap from multiple FASTQs and BAMs
Example 5 : run MutPlot from VCF

Example 1 : run MutMap from FASTQ without trimming

mutmap -r reference.fasta \
       -c cultivar.1.fastq.gz,cultivar.2.fastq.gz \
       -b bulk.1.fastq.gz,bulk.2.fastq.gz \
       -n 20 \
       -o example_dir

-r : reference fasta

-c : FASTQs of cultivar. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.

-b : FASTQs of bulk. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.

-n : number of individuals in mutant bulk.

-o : name of output directory. The specified directory should not already exist.

Example 2 : run MutMap from FASTQ with trimming

mutmap -r reference.fasta \
       -c cultivar.1.fastq.gz,cultivar.2.fastq.gz \
       -b bulk.1.fastq.gz,bulk.2.fastq.gz \
       -n 20 \
       -o example_dir \
       -T \
       -a adapter.fasta

-r : reference fasta

-c : FASTQs of cultivar. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.

-b : FASTQs of mutant bulk. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.

-n : number of individuals in mutant bulk.

-o : name of output directory. The specified directory should not already exist.

-T : trim your reads by trimmomatic.

-a : FASTA of adapter sequences for trimmomatic.

If you are using TrueSeq3, you can find the adapter sequences in the Github page of Trimmomatic. This thread is also helpful to preprare the adapter file.

Example 3 : run MutMap from BAM

mutmap -r reference.fasta \
       -c cultivar.bam \
       -b bulk.bam \
       -n 20 \
       -o example_dir

-r : reference fasta

-c : BAM of cultivar.

-b : BAM of mutant bulk.

-n : number of individuals in mutant bulk.

-o : name of output directory. The specified directory should not already exist.

Example 4 : run MutMap from multiple FASTQs and BAMs

mutmap -r reference.fasta \
       -c cultivar_1.1.fastq.gz,cultivar_1.2.fastq.gz \
       -c cultivar_1.bam \
       -b bulk_1.1.fastq.gz,bulk_1.2.fastq.gz \
       -b bulk_2.bam \
       -b bulk_3.bam \
       -n 20 \
       -o example_dir

MutMap automatically merges multiple FASTQ and BAM files. Of course, you can merge FASTQ or BAM files using cat or samtools merge before inputting them into MutMap. If you specify -c multiple times, please make sure that those files include only one individual. On the other hand, -b can include more than one individual because they are bulked samples. MutMap automatically classifies FASTQ and BAM files based on whether comma exists or not.

Example 5 : run MutPlot from VCF

usage: mutplot -v <VCF> -o <OUT_DIR> -n <INT> [-w <INT>] [-s <INT>]
               [-D <INT>] [-d <INT>] [-N <INT>] [-m <FLOAT>]
               [-S <INT>] [-e <DATABASE>] [--igv] [--indel]

MutPlot version 2.3.6

options:
  -h, --help            show this help message and exit
  -v , --vcf            VCF file which contains cultivar and mutant bulk.
                        in this order. This VCF file must have AD field.
  -o , --out            Output directory. The specified directory can already
                        exist.
  -n , --N-bulk         Number of individuals in the mutant bulk.
  -w , --window         Window size in kilobases (kb). [2000]
  -s , --step           Step size in kilobases (kb). [100]
  -D , --max-depth      Maximum depth of variants to be used. This cutoff
                        applies to both the cultivar and the bulk. [250]
  -d , --min-depth      Minimum depth of variants to be used. This cutoff
                        applies to both the cultivar and the bulk. [8]
  -N , --N-rep          Number of replicates for simulations to generate
                        null distribution. [5000]
  -m , --min-SNPindex   Cutoff of minimum SNP-index for clear results. [0.3]
  -S , --strand-bias    Filter out spurious homozygous genotypes in the cultivar
                        based on strand bias. If ADF (or ADR) is higher than
                        this cutoff when ADR (or ADF) is 0, that SNP will be
                        filtered out. If you want to disable this filtering,
                        set this cutoff to 0. [7]
  -e , --snpEff         Predict causal variants using SnpEff. Check
                        available databases in SnpEff.
  --igv                 Output IGV format file to check results on IGV.
  --indel               Plot SNP-index with INDEL.
  --line-colors         Colors for threshold lines in plots. Specify a
                        comma-separated list in the order of SNP-index,
                        p95, and p99. [red,lime,orange]
  --dot-color           Color for dot in plot.
                        [navy]
  --fig-width           Width allocated in chromosome figure. [7.5]
  --fig-height          Height allocated in chromosome figure. [4.0]
  --white-space         White space between figures. (This option
                        only affects vertical direction.) [0.6]
  -f , --format         Specify the format of an output image.
                        eps/jpeg/jpg/pdf/pgf/png/rgba/svg/svgz/tif/tiff
  --version             show program's version number and exit

MutPlot is included in MutMap. MutMap runs MutPlot after making the VCF. Then, MutPlot will work with default parameters. If you want to change some parameters, you can use VCF inside of (OUT_DIR/30_vcf/mutmap.vcf.gz) to retry plotting process like below.

mutplot -v OUT_DIR/30_vcf/mutmap.vcf.gz \
        -o ANOTHER_DIR_NAME \
        -n 20 \
        -w 2000 \
        -s 100

Use MutPlot for a VCF which was made by yourself

In this case:

Ensure that your VCF includes the AD format.
Ensure that your VCF includes two columns of cultivar and mutant bulk in this order.

If you encounter an error, please try running MutMap from FASTQ or BAM before reporting it in the issues.

Outputs

Inside of OUT_DIR is like below.

|-- 10_ref
|   |-- reference.fasta
|   |-- reference.fasta.amb
|   |-- reference.fasta.ann
|   |-- reference.fasta.bwt
|   |-- reference.fasta.fai
|   |-- reference.fasta.pac
|   `-- reference.fasta.sa
|-- 20_bam
|   |-- bulk.filt.bam
|   |-- bulk.filt.bam.bai
|   |-- cultivar.filt.bam
|   `-- cultivar.filt.bam.bai
|-- 30_vcf
|   |-- mutmap.vcf.gz
|   `-- mutmap.vcf.gz.tbi
|-- 40_mutmap
|   |-- snp_index.tsv
│   ├── snp_index.p95.tsv
│   ├── snp_index.p99.tsv
|   |-- sliding_window.tsv
│   ├── sliding_window.p95.tsv
│   ├── sliding_window.p99.tsv
|   `-- mutmap_plot.png
`-- log
    |-- bcftools.log
    |-- bgzip.log
    |-- bwa.log
    |-- mutplot.log
    |-- samtools.log
    `-- tabix.log

If you run MutMap with trimming, you will get the directory of 00_fastq which includes FASTQs after trimming.
You can check the results in 40_mutmap.
- snp_index.tsv : columns in this order.
  - CHROM : chromosome name
  - POSI : position in chromosome
  - VARIANT : SNP or INDEL
  - DEPTH : depth of bulk
  - p99 : 99% confidence interval of simulated SNP-index
  - p95 : 95% confidence interval of simulated SNP-index
  - SNP-index : real SNP-index
- sliding_window.tsv : columns in this order.
  - CHROM : chromosome name
  - POSI : central position of window
  - MEAN p99 : mean of p99
  - MEAN p95 : mean of p95
  - MEAN SNP-index : mean SNP-index
- mutmap_plot.png : resulting plot (like below)
  - BLUE dot : variant
  - RED line : mean SNP-index
  - ORANGE line : mean p99
  - GREEN line : mean p95

About multiple testing correction

This function has been deprecated since v2.3.5. We highly recommend running MutMap without this function. However, if you would like to use this function, you can use it with versions of MutMap older than v2.3.5.

Build and use your own database for snpEff

If you want to use your own database for snpEff, you need additional steps. Here we assume that you installed MutMap via anaconda distribution, creating new environment with conda create.

Find the directory of snpEff that includes snpEff script, configuration file and database. You can find it in /home/anaconda3/envs/{your_env_name_installed_mutmap}/share/snpeff-5.0-0/. anaconda3 may be miniconda3. Also, the version of snpeff may be different.
Go to this directory and follow the snpEff manual to build the database. Don't forget to add your database info to the snpEff configuration file. https://pcingola.github.io/SnpEff/se_buildingdb/#add-a-genome-to-the-configuration-file
Run MutMap with option -e {your_database_name}

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
images		images
mutmap		mutmap
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MutMap User Guide

version 2.3.6

Table of contents

What is MutMap?

Citation

Installation

Dependencies

Software

Python (>=3.5) libraries

Installation via bioconda

Manual installation

Usage

Example 1 : run MutMap from FASTQ without trimming

Example 2 : run MutMap from FASTQ with trimming

Example 3 : run MutMap from BAM

Example 4 : run MutMap from multiple FASTQs and BAMs

Example 5 : run MutPlot from VCF

Use MutPlot for a VCF which was made by yourself

Outputs

About multiple testing correction

Build and use your own database for snpEff

About

Releases 23

Packages

Languages

License

YuSugihara/MutMap

Folders and files

Latest commit

History

Repository files navigation

MutMap User Guide

version 2.3.6

Table of contents

What is MutMap?

Citation

Installation

Dependencies

Software

Python (>=3.5) libraries

Installation via bioconda

Manual installation

Usage

Example 1 : run MutMap from FASTQ without trimming

Example 2 : run MutMap from FASTQ with trimming

Example 3 : run MutMap from BAM

Example 4 : run MutMap from multiple FASTQs and BAMs

Example 5 : run MutPlot from VCF

Use MutPlot for a VCF which was made by yourself

Outputs

About multiple testing correction

Build and use your own database for snpEff

About

Resources

License

Stars

Watchers

Forks

Releases 23

Packages 0

Languages

Packages