LongTrack is a novel framework that uses long-read metagenomic assemblies and reliable informatics tailored for FMT strain tracking. The core idea of LongTrack is based on (1) long read metagenomic sequencing data generated for the donors’ and recipients’ samples before FMT to construct de novo metagenome-assembled genomes (long-read MAGs), (2) selecting strain-specific unique k-mers from long read MAGs, and (3) the use of unique k-mers and short read metagenomic data for precision strain tracking.
- Python version 2.7.16
- Bowtie version 2.2.8
Python packages
- numpy >=1.7.1
- HTSeq >=0.5.3p9
- matplotlib >= 1.0.0
- seaborn >= 0.5.0
- pandas >= 0.7.3
To showcase the toolbox applications, we provide the following demonstration (which takes ~5 minutes in total) that integrates two major steps together: 1) an illustrative run that performs strain tracking for 5 long-read MAGs across 3 post-FMT samples; 2) summarizing strain tracking
The example data, LongTrack_test_data.zip, can be downloaded at the following zenodo URLs https://zenodo.org/records/14765650
Prepare for LongTrack_demo.sh
unzip LongTrack_test_data.zip
mv Data LongTrack/
Add directory of python2 and bowtie2 to your $PATH environment variable
module load python/2.7.16 bowtie2
or
export PATH=$PATH:[bowtie2_path]
Running LongTrack_demo.sh
cd LongTrack/code
sh LongTrack_demo.sh
Explanation of inputs in LongTrack_demo.sh
Inputs
- MAG: This folder includes long-read MAGs (.fna) that de novo assembled from the donors. And the k-mer (k=31) database for each MAG (_kmcdb) generated by KMC v3.1.0
Akkermansia_muciniphila_D1.fna
Akkermansia_muciniphila_D1_kmcdb_dump
Akkermansia_muciniphila_D1_kmcdb.kmc_pre
Akkermansia_muciniphila_D1_kmcdb.kmc_suf
…
- metagenome: This folder includes the short-read metagenomic data of post-FMT recipients across 3 time points and unrelated samples as the negative control (NC1 and NC2). (Paired-end data: *_sample_PE1.fasta *_sample_PE2.fasta)
NC1_sample_PE1.fasta
NC1_sample_PE2.fasta
postFMT1W4_sample_PE1.fasta
postFMT1W4_sample_PE2.fasta
…
- unique_kmer: This folder includes the unique k-mers from each long-read MAG
Akkermansia_muciniphila_D1_kmcdb_dump_withpos
…
- conflict_table: This file lists, for each sample, its conflicts (no-relationship samples). For example, negative controls are in conflict with every sample, which would be used as no-relationship samples to calculate confidence scores
postFMT1W4 NC1,NC2
postFMT1W8 NC1,NC2
postFMT1Y5 NC1,NC2
NC1 NC2,postFMT1W4,postFMT1W8,postFMT1Y5
NC2 NC1,postFMT1W4,postFMT1W8,postFMT1Y5
Outputs
Once the above scripts completes, the following files and figures will be generated in the folders described below.
- Strain tracking table:
Tracking_results/results_readdistribution_actualreads_confidencescores
, Presence (1) or absence (0) of each long-read MAG across different post-FMT samples collected at time points and negative controls.
strain NC1 NC2 postFMT1W4 postFMT1W8 postFMT1Y5
Akkermansia_muciniphila_D1 0 0 1 1 1
Alistipes_onderdonkii_D1 0 0 1 1 1
Bifidobacterium_longum.D1.str1 0 0 1 1 1
Bifidobacterium_longum.D1.str2 0 0 1 1 1
Gemmiger_formicilis_D1 0 0 1 1 1