You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
trying the analyses on other public (or private) data;
doing other analyses (overlap, containment, etc.) with the k-mer sketches;
Getting started
Start up RStudio on your instance, and click on Terminal.
Setup (do these once)
Create a working directory
mkdir ~/smash-data/
Install sourmash and various plugins with conda (also see a conda tutorial; you'll need to install miniforge if you're not running conda/mamba on your Jetstream computer.)
for i in /opt/shared/sourmash-data/10sketches/*.fa
do
sourmash sketch dna $i -o genome-$(basename $i .fa).sig.zip \
--name-from-first
done
sourmash sig cat genome*.sig.zip -o 10sketches.sig.zip
sourmash tax metagenome -g *.x.gtdb-rs214.csv \
-t /opt/shared/sourmash-data/gtdb-rs214.lineages.sqldb \
-r order -F human
See the sourmash tax documentations for various output options!
Getting % of metagenome covered from gather
If you've run a fastgather above, you will still need to run sourmash gather to get nice human-readable output; you can do this quickly by using the fastgather output as a picklist to limit the results to the previous calculated output:
found 849 matches total;
the recovered matches hit 83.1% of the abundance-weighted query.
the recovered matches hit 68.7% of the query k-mers (unweighted).
Running abundhist to generate metagenome abundance histograms
After activating the smash conda environment, you can install the abundhist plugin like so:
pip install sourmash_plugin_abundhist
and run it on any metagenome or mixture that has been sketched with -p abund like so:
https://hackmd.io/9ORFRJGaTOiOdEAY-Aih2A?view
open lab / Sat + Monday , Jul 20 & 22, 2024 / STAMPS 2024
Suggestion for open lab:
sourmash sketch dna
creates files containing DNA k-mers;Getting started
Start up RStudio on your instance, and click on Terminal.
Setup (do these once)
Create a working directory
Install sourmash and various plugins with conda (also see a conda tutorial; you'll need to install miniforge if you're not running conda/mamba on your Jetstream computer.)
Then run:
This installs:
Change directory and activate environment (each time you log in)
Change to sourmash working directory and activate sourmash software environment:
Might as well check for updates, sourmash is fast moving ;).
Convert some metagenomes into k-mer signatures
Use sourmash sketch to turn metagenomes into k-mers:
This will take about 5 minutes and create three
SRR*.sig.zip
files.(You can grab any metagenome you want from ENA, e.g. SRR7947181; use
curl -JLO <read URL
to download the FASTQ file(s) and sketch them as above!)Sketch some genomes into k-mer signatures
Use sourmash sketch to turn genomes into k-mers:
Run various analyses on the k-mer sketches!
Genome comparisons via
venn
andupset
Run:
to produce a file
genomes-venn.png
:Run:
to produce a file
genomes-upset.png
:Genome distance matrix, dendrograms, and ordination
First compare the genomes:
Now build a matrix+dendrogram view:
to produce
10sketches.mat.png
:You can produce a metric MDS plot too, colored by species:
to produce
10sketches.mds.png
:Flat (no abund) metagenome comparisons via
venn
andupset
Run a venn comparison of metagenomes =>
metag-venn.png
.Run an upset comparison of metagenomes =>
metag-upset.png
Genome presence/absence
Calculate which GTDB genomes are in a metagenome:
(will take ~3 minutes)
Make the detection/abundance plot:
=>
SRR7947178.detection.png
:=>
SRR7947178.ani.png
The three columns being plotted from
SRR7947178.x.gtdb-rs214.csv
are:f_match_orig
- the detection / containment of the match in the metagenome;match_containment_ani
- the containment-based ANI of the match;average_abund
- abundance of matching k-mers in the metagenome;Generating taxonomic classifications for metagenomes with sourmash
The basic workflow is to first run
sourmash gather
as above, and then runsourmash tax metagenome
.Run gather for each metagenome:
Then you can run taxonomic analyses like so:
See the sourmash tax documentations for various output options!
Getting % of metagenome covered from gather
If you've run a
fastgather
above, you will still need to runsourmash gather
to get nice human-readable output; you can do this quickly by using the fastgather output as a picklist to limit the results to the previous calculated output:and you should then see at the end:
Running abundhist to generate metagenome abundance histograms
After activating the
smash
conda environment, you can install the abundhist plugin like so:and run it on any metagenome or mixture that has been sketched with
-p abund
like so:Other topics we can discuss!
Appendix 1: UNIX Command Line!
Totally new to the command-line, or want to strengthen your foundation? Do this Unix Crash Course.
If you want some experience in other aspects, consider going through these:
Appendix 2: Conda tutorial
Please see my conda tutorial; you'll need to install miniforge if you're not running things on your Jetstream computer.
Appendix 3: examining assembly overlap with k-mers
See my in-class notes.
Contact info
Contact Titus at [email protected].
The text was updated successfully, but these errors were encountered: