-
Notifications
You must be signed in to change notification settings - Fork 1
Tutorials
To get started please follow the Installation instructions to install STIM either through Conda or by building it from source. There are two different examples based on the storage layout, a single slice one and one with multiple slices. Therefore, we first explain the basics of our storage layout.
For the tutorials, please download the example Visium data by clicking here and navigate to the folder where the data is stored. We assume you installed STIM using Conda and have the appropriate Conda environment active. If you compiled STIM from source, the executables may not be in your $PATH
. In this case, call them with the full path (e.g., ./st-explorer
if you installed them in the current directory).
Note: your browser might automatically unzip the data, we cover both cases during the resaving step in the tutorials below.
A spatial transcriptomics dataset can consist of a single 2-dimensional (2d) slice, or a container that contains several 2d slices and thereby forms a 3d volume. Note that for any 3d volume (container-dataset), each 2d slice can also be addressed as an individual dataset (slice-dataset). Most commands support both types of datasets, while some require a container (e.g. alignment).
Slice-datasets can either be saved in an anndata-conforming layout, where the expression values, locations and annotations are stored in /X
, /obsm/spatial
and /obs
, respectively; or in a generic hierarchical layout, where the arrays are stored in /expressionValues
, /locations
and /annotations
, respectively. The N5 API is used to read and write these layouts using the N5, Zarr, or HDF5 backend. If your slice(s) are stored in .csv
files, you can use the st-resave
command (see below) to resave your data into one of the supported formats by specifying the extension of the output as .h5
(generic HDF5), .n5
(generic N5), or .zarr
(generic Zarr); an additional suffix ad
is used to indicate the AnnData-conforming layout (e.g. h5ad
for HDF5-backed AnnData).
For a slice-dataset, you can:
- interactively view it using
st-explorer
(explore all genes & annotations) orst-bdv-view
(view multiple genes in parallel) - render the dataset in ImageJ/Fiji and save the rendering, e.g., as TIFF, using
st-render
; - normalize the dataset using
st-normalize
; - add annotations such as, e.g., celltypes, using
st-add-annotations
; - create a container-dataset from one or more slice-datasets (see below).
For alignment of several slices, slices have to be grouped into an N5-container to allow additional annotations to be stored. In addition to all commands listed above for slice-datasets, the subsequent commands can be used for container-datasets:
- create a container-dataset containing one or more existing slice-datasets using
st-add-slice
; - add a slice-dataset to a pre-existing container-dataset using
st-add-slice
; - perform pairwise alignment of slices using
st-align-pairs
(pre-processing); - visualize aligned pairs of slices using
st-align-pairs-view
(optional user verification); - perform global alignment of all slices using
st-align-global
(yielding the actual transformation for each slice-dataset); - visualize globally aligned data in BigDataViewer using
st-bdv-view
.
- First, we need to convert the data we just downloaded as CSV into one of the supported formats for efficent storage and access to the dataset. We want the first slice of the data to be saved in an anndata file called
slice1.h5ad
. Assuming the data are in the downloadedvisium.zip
file in the same directory as the executables, execute the following:
st-resave -i visium.zip/section1_locations.csv,visium.zip/section1_reads.csv,slice1.h5ad
This will automatically load the *.csv
files from within the zipped file and create a slice1.h5ad
file in the current directory (alternatively, you could extract the *.csv
files as well and link them). The entire resaving process should take about 10 seconds on a modern notebook with an SSD. Note: if your browser automatically unzipped the data, just change visium.zip
to the respective folder name, most likely visium
.*
- Next, we will simply take a look at the slice-dataset directly:
st-explorer -i slice1.h5ad
First, type calm2
into the 'search gene' box. Using -c '0,110'
we already set the display range to more or less match this dataset. You can manually change it by clicking in the BigDataViewer window and press s
to bring up the brightness dialog. Feel free to play with the Visualization Options in the explorer, e.g. move Gauss Rendering to 0.5 to get a sharper image and then play with the Median Filter radius to filter the data.
- Now, we will create a TIFF image for gene Calm2 and Mbp:
st-render -i slice1.h5ad -g 'Calm2,Mbp' -rf 0.5
You can now for example overlay both images into a two-channel image using Image > Color > Merge Channels
and select Calm2 as magenta and Mbp as green. You could for example convert this image to RGB Image > Type > RGB Color
and then save it as TIFF, JPEG or AVI (e.g JPEG compression). These can be added to your presentation or paper for example, check out our beautiful AVI here (you need to click download on the right top). You could render a bigger image setting -s 0.1
. Note: Please check the documentation of ImageJ and Fiji for help how to further process images.
-
Make sure you followed the previous tutorial such that you've already resaved the first slice of the visium dataset as anndata file
slice1.h5ad
. -
In order to perform the alignment of the whole dataset (would work identically for more than two slices), we need to create a container-dataset containing the already resaved slice-dataset:
st-add-slice -c visium.n5 -i slice1.h5ad
This will create an N5 container visium.n5
and link the first slice to it. If you don't want the slice to be linked but moved instead, you can use the -m
flag. Also, custom storage locations for the location, expression values, and annotations arrays within the slice can be given by -l
, -e
, and -a
, respectively.
- Now we resave the second slice of the data as N5 slice-dataset. Assuming the data are in the downloaded
visium.zip
file in the same directory as the executables:
st-resave \
-i visium.zip/section2_locations.csv,visium.zip/section2_reads.csv,slice2.n5 \
-c visium.n5
It will automatically load the *.csv
files from within the zipped file and add it to the visium.n5
container-dataset already containing the first slice. The entire resaving process should take about 10 seconds on a modern notebook with an SSD. Note: if your browser automatically unzipped the data, just change visium.zip
to the respective folder name, most likely visium
.
- Next, we can again take a look at the data, which now includes two slice-datasets. We can do this interactively
st-explorer -i visium.n5
by rendering images for all desired genes
st-render -i visium.n5 -g 'Calm2,Mbp' -rf 0.5
or by looking at one of the datasets in the container
st-bdv-view -i visium.n5 -g 'Calm2,Mbp' -rf 0.5 -d slice1.h5ad
st-bdv-view -i visium.n5 -g 'Calm2,Mbp' -rf 0.5 -d slice2.n5
Selecting genes and adjusting visualization options work exactly as in the first tutorial.
We can now overlay both images into a two-channel image again using Image > Color > Merge Channels
and select Calm2 as magenta and Mbp as green. By flipping through the slices (slice1 and slice2) you will realize that they are not aligned.
- To remedy this, we will perform alignment of the two slices. We will use 15 automatically selected genes
-n
, a maximum error of 100--maxEpsilon
(in units of the sequenced locations) and require at least 30 inliers per gene--minNumInliersGene
(this dataset is more robust than the SlideSeq one). The alignment process takes around 1-2 minutes on a modern notebook. Note: at this point no transformations are stored within the container-dataset, but only the list of corresponding points between all pairs of slices.
st-align-pairs -c visium.n5 -n 15 -rf 0.5 --maxEpsilon 100 --minNumInliersGene 30
For your dataset, the optimal choice of parameters may vary. A good baseline for the --maxEpsilon
parameter is ten times the average distance between the sequenced points. If the --maxEpsilon
option is not given, this value is computed and used automatically. For the number of selected genes -n
, higher values yield better results but then alignment is slower. Increasing the minimal number of inliers per gene --minNumInliersGene
can also increase alignment quality, but can lead to the alignment to fail.
The st-align-pairs
command will precompute and store the standard deviation values as gene annotations in the container. You can compute these values separately with the tool st-add-entropy
, as:
st-add-entropy -i visium.n5/slice1.h5ad
st-add-entropy -i visium.n5/slice2.n5
# then, compute the pairwise alignment
st-align-pairs -c visium.n5 -n 15 -rf 0.5 --maxEpsilon 100 --minNumInliersGene 30 --entropyPath "stdev"
-
Now we will visualize before/after alignment of this pair of slices. To this end, we create two independent images, one using
st-render
(see above) and one usingst-align-pairs-view
on the automatically selected gene mt-Nd4.st-render
will display the slices unaligned, whilest-align-pairs-view
will show them aligned.
st-render -i visium.n5 -rf 0.5 -g mt-Nd4
st-align-pairs-view -c visium.n5 -rf 0.5 -g mt-Nd4
Note: to create the GIF shown I saved both images independently, opened them in Fiji, cropped them, combined them, converted them to 8-bit color, set framerate to 1 fps, and saved it as one GIF.
- Finally, we perform the global alignment. In this particular case, it is identical to the pairwise alignment process as we only have two slices. However, we still need to do it so the final transformations for the slices are stored in the slice-datasets. After that,
st-explorer
,st-bdv-view
andst-render
will take these transformations into account when displaying the data. This final processing step usually only takes a few seconds.
st-align-global -c visium.n5 --absoluteThreshold 100 -rf 0.5 --lambda 0.0 --skipICP
-
The final dataset can for example be visualized and interactively explored using BigDataViewer. Therefore, we specify three genes
-g Calm2,Mbp,mt-Nd4
, a crisper rendering-rf 0.5
, and a relative z-spacing between the two planes that shows them close to each other-z 2
. Of course, the same data can be visualized usingst-explorer
andst-render
, and visualization options such as color or contrast per gene can be adjusted manually. This will display all sections in the container:
st-bdv-view3d -i visium.n5 -g Calm2,Mbp,mt-Nd4 -rf 0.5 -z 2
It is also possible to visualize a single slice, with interactive controls for rendering strategy, render factor, and filters:
st-bdv-view -i visium.n5 -d slice1.h5ad -g Calm2,Mbp,mt-Nd4 -rf 0.5
We encourage you to use this small two slice dataset as a starting point for playing with and extending STIM. If you have any questions, feature requests or concerns please open an issue here on GitHub. Thanks so much!
You can align spatial transcriptomics data interactively using st-align-interactive
, which provides a GUI based on BigDataViewer
. Here you'll learn more about the GUI, how to navigate data, and how to perform alignment manually, with SIFT, or with ICP.
First, launch the interactive alignment tool:
st-align-interactive -c visium.n5 -d1 slice1.h5ad -d2 slice2.n5 -n 10 -rf 1.5
This will load the datasets slice1.h5ad
and slice2.n5
from the visium.n5
container, then computes the standard deviation, stores it in the datasets (so it is not recomputed later), and selects the -n 10
genes with highest standard deviation for plotting. Upon loading, you will see the pair of ST data rendered in one color per section. Initially, the two sections are shown in their unaligned coordinates, and the first of the 10 automatically selected genes is used for rendering.
The GUI consists of two main parts: the viewport (A), and the sidebar (B). The viewport is a standard BigDataViewer
, where you can zoom, translate and rotate the data. In the sidebar, you will find the following cards (more on cards 6-8 below):
- Display Modes: how the data is visualized (e.g., single/fused, type of interpolation...)
- Sources: e.g., change colors for the sources
- Groups: to select what is displayed in BDV
- STIM Display Options: some rendering settings (factor, brightness limits)
- STIM Filtering Options: to apply on-the-fly filters to the rendered image (Gaussian, Median...)
- Manual Alignment: one can perform pairwise alignment by dragging and scrolling with the mouse over the viewport
- SIFT Alignment: automatic interactive alignment using SIFT/RANSAC
- ICP Alignment: once SIFT alignment is performed, an additional round of ICP refinement is possible.
You can change the gene used for visualization under Groups
, by pressing the radio button (circular) for the gene of interest.
You can add more genes to BDV under the STIM Display Options
card, and pressing Genes(+)
. Then, you will see a window where you can select any gene in the data (tip: you can search for it using the textbox on the upper part). Then clicking on the gene, and pressing the Add & Close
button.
If you are not familiar with the controls in BigDataViewer
, here are some basic navigation instructions that you can follow to zoom, drag, or rotate the displayed data.
To perform manual alignment, go to Manual Alignment
and press Start
. Then, you can scale, translate and rotate one section (moving) respect to another (fixed) by using the mouse. Refer to the basic navigation instructions for BigDataViewer. In general, you can:
- scale with the mouse wheel
- rotate with left drag anywhere on the canvas
- translate with right or middle drag
The transformation matrix, displayed above the Reset
/Cancel
/Start
buttons, is updated dynamically as you rotate, translate or scale the moving image. Once you are done, you can press Finish
to keep the transformation. This will be used when rendering any other gene (thus, all will be aligned upon display).
To perform alignment with SIFT, go to the SIFT Alignment
card. We provide several presets for SIFT matching (from fast to very thorough), and more advanced SIFT options that can be used to tweak these presets and improve the automatic alignment. These parameters are briefly described in the GUI, common to those in st-align-pairs
. One possible use-case of this card is to interactively explore and have a good intuition of parameters that are most suitable for the data at hand, i.e., before proceeding with pairwise alignments of all the sections (when >2 are available). Once you have chosen some parameters, click Run SIFT alignment
- a progress bar will be updated in real time.
Once SIFT alignment has finished, the alignment is previewed along with all the matches per gene. You can navigate all genes by going back to the Groups
card, and selecting the gene of interest as described above. When adding new genes, these will appear aligned, as they will be automatically transformed using the estimated model. It is possible to store the transformation to the container by clicking on the save button (floppy disk icon), or reset the transformation.
Optionally, it is possible to refine the result from SIFT alignment using ICP. Navigate to the ICP Alignment
card, adjust the parameters (similarly to SIFT alignment), and click Run ICP alignment
. It is possible to store the transformation to the container by clicking on the save button (floppy disk icon), or reset the transformation.
As an alternative to the command-line interface, we also provide the stimwrap
Python package,
which provides an API to call STIM programs from Python, e.g., via Jupyter Notebooks. This, together with the support for
AnnData
-backed n5 containers, allows seamless integration of STIM into Python-based workflows (or, more specifically, scverse
-based workflows).
In practice, this means that you can have a single notebook where you can run preliminary data QC, 3D alignment, and any other downstream analysis such as cell typing, neighborhood analysis, differential expression, among others - without needing to convert data formats or use different languages.
In a nutshell, you can install the stimwrap
package via pip
:
pip install stimwrap
From python, the workflow above (for aligning a multi-slice dataset) can be replicated as:
import stimwrap as st
# convert visium expression matrix to AnnData and n5 (we can mix&match!)
st.resave(input="visium.zip/section1_locations.csv,visium.zip/section1_reads.csv,slice1.h5ad", container="visium.n5")
st.resave(input="visium.zip/section2_locations.csv,visium.zip/section2_reads.csv,slice2.n5", container="visium.n5")
# pairwise and global alignment
st.align_pairs(container="visium.n5", num_genes=15, rendering_factor=0.5, max_epsilon=100, min_num_inliers_gene=30)
st.align_global(container="visium.n5", absolute_threshold=100, rendering_factor=0.5, lmbda=0.0, skip_icp=True)
# you can visualize the results using BDV or any other function from STIM
# you will need to run Python from a session with a window server (e.g., local or remote X11)
st.bdv_view3d(input="visium.n5", genes=['Calm2,Mbp,mt-Nd4'], rendering_factor=0.5, z_spacing_factor=2)
Above, we saved one slice as h5ad
and another as n5
. Storing both
as h5ad
would ensure that we do not need to convert the data more than once to perform later downstream analysis, e.g., with scverse
tools.
You can refer to the documentation or the example notebooks for more information about the general workflow. For instance, we provide Jupyter Notebooks for the 3D registration of Open-ST data, plus some additional cases of downstream analysis, to showcase the interoperability of STIM with the Python ecosystem.