Skip to content

Commit 1e60aae

Browse files
committed
scNiche v1.1.0
1 parent 719fc18 commit 1e60aae

22 files changed

+2293
-494
lines changed

README.md

+58-12
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# scNiche v1.0.0
1+
# scNiche v1.1.0
22

33
## Identification and characterization of cell niches in tissue from spatial omics data at single-cell resolution
44

@@ -11,10 +11,12 @@ scNiche is a computational framework to identify and characterize cell niches fr
1111
## Requirements and Installation
1212
[![anndata 0.10.1](https://img.shields.io/badge/anndata-0.10.1-success)](https://pypi.org/project/anndata/) [![pandas 1.5.0](https://img.shields.io/badge/pandas-1.5.0-important)](https://pypi.org/project/pandas/) [![squidpy 1.2.3](https://img.shields.io/badge/squidpy-1.2.3-critical)](https://pypi.org/project/squidpy/) [![scanpy 1.9.1](https://img.shields.io/badge/scanpy-1.9.1-informational)](https://github.com/scverse/scanpy) [![dgl 1.1.0+cu113](https://img.shields.io/badge/dgl-1.1.0%2Bcu113-blueviolet)](https://www.dgl.ai/) [![torch 1.21.1+cu113](https://img.shields.io/badge/torch-1.12.1%2Bcu113-%23808080)](https://pytorch.org/get-started/locally/) [![matplotlib 3.6.2](https://img.shields.io/badge/matplotlib-3.6.2-ff69b4)](https://pypi.org/project/matplotlib/) [![seaborn 0.13.0](https://img.shields.io/badge/seaborn-0.13.0-9cf)](https://pypi.org/project/seaborn/)
1313

14-
### Create and activate Python environment
14+
### Create and activate conda environment with requirements installed.
1515
For scNiche, the Python version need is over 3.9. If you have already installed a lower version of Python, consider installing Anaconda, and then you can create a new environment.
1616
```
17-
conda create -n scniche python=3.9
17+
cd scNiche-main
18+
19+
conda env create -f scniche_dev.yaml -n scniche
1820
conda activate scniche
1921
```
2022

@@ -29,24 +31,65 @@ pip install dgl==1.1.0+cu113 -f https://data.dgl.ai/wheels/cu113/repo.html
2931
```
3032
The version of PyTorch and DGL should be suitable to the CUDA version of your machine. You can find the appropriate version on the [PyTorch](https://pytorch.org/get-started/locally/) and [DGL](https://www.dgl.ai/) website.
3133

32-
### Install other requirements
33-
```
34-
cd scNiche-main
35-
pip install -r requirements.txt
36-
```
34+
3735
### Install scNiche
3836
```
3937
python setup.py build
4038
python setup.py install
4139
```
4240

4341
## Tutorials (identify cell niches)
44-
scNiche requires the single-cell spatial omics data (stored as `.h5ad` format) as input, where cell population label of each cell needs to be provided.
42+
#### - Spatial proteomics data or single-cell spatial transcriptomics data
43+
44+
By default, scNiche requires the single-cell spatial omics data (stored as `.h5ad` format) as input, where cell population label of each cell needs to be provided.
4545

4646
Here are examples of scNiche on simulated and biological datasets:
4747
* [Demonstration of scNiche on the simulated data](tutorial/tutorial_simulated.ipynb)
48-
* [Demonstration of scNiche on the mouse spleen CODEX data](tutorial/tutorial_spleen.ipynb)
49-
* [Demonstration of scNiche on the human upper tract urothelial carcinoma (UTUC) IMC data](tutorial/tutorial_utuc.ipynb)
48+
* [Demonstration of scNiche on the mouse V1 neocortex STARmap data](tutorial/tutorial_STARmap.ipynb)
49+
50+
51+
scNiche also provides a subgraph-based batch training strategy to scale to large datasets and multi-slices:
52+
53+
1. Batch training strategy of scNiche for single-slice:
54+
* [Demonstration of scNiche on the mouse spleen CODEX data](tutorial/tutorial_spleen.ipynb) (over 80,000 cells per slice)
55+
56+
2. Batch training strategy of scNiche for multi-slices:
57+
* [Demonstration of scNiche on the human upper tract urothelial carcinoma (UTUC) IMC data](tutorial/tutorial_utuc.ipynb) (containing 115,060 cells from 16 slices)
58+
* [Demonstration of scNiche on the mouse frontal cortex and striatum MERFISH data](tutorial/tutorial_MERFISH.ipynb) (containing 376,107 cells from 31 slices)
59+
60+
61+
#### - Low-resolution spatial transcriptomics data
62+
We here take 4 slices from the same donor of the [human DLPFC 10X Visium data](http://spatial.libd.org/spatialLIBD/) as an example.
63+
64+
In contrast to spatial proteomics data, which usually contain only a few dozen proteins, these spatial transcriptomics data can often measure tens of thousands of genes,
65+
with potential batch effects commonly present across tissue slices from different samples.
66+
Therefore, dimensionality reduction and batch effect removal need to be performed on the molecular profiles of the cells and their neighborhoods before run scNiche.
67+
We used [scVI](https://github.com/scverse/scvi-tools) by defalut, however, simple PCA dimensionality reduction or other deep learning-based integration methods like [scArches](https://github.com/theislab/scarches) are also applicable.
68+
69+
Furthermore, cell type labels are usually unavailable for these spatial transcriptomics data. As alternatives,
70+
we can:
71+
1. Use the `deconvolution results of spots` as a substitute view to replace the `cellular compositions of neighborhoods`.
72+
We used the human middle temporal gyrus (MTG) scRNA-seq data by [Hodge et al.](https://doi.org/10.1038/s41586-019-1506-7) as the single-cell reference, and deconvoluted the spots using [Cell2location](https://github.com/BayraktarLab/cell2location):
73+
74+
* [Demonstration of scNiche on Slice 151673 (with deconvolution results)](tutorial/tutorial_dlpfc151673.ipynb)
75+
76+
2. Only use the molecular profiles of cells and neighborhoods as input:
77+
78+
* [Demonstration of scNiche on Slice 151673 (without deconvolution results)](tutorial/tutorial_dlpfc151673-2view.ipynb)
79+
80+
81+
Multi-slice analysis of 4 slices based on the batch training strategy of scNiche:
82+
83+
* [Demonstration of scNiche on 4 slices from the same donor (with deconvolution results)](tutorial/tutorial_DLPFC.ipynb)
84+
85+
#### - Spatial multi-omics data
86+
The strategy of scNiche for modeling features from different views of the cell offers more possible avenues for expansion,
87+
such as application to spatial multi-omics data. We here ran scNiche on a postnatal day (P)22 mouse brain coronal section
88+
dataset generated by [Zhang et al.](https://doi.org/10.1038/s41586-023-05795-1), which includes RNA-seq and CUT&Tag (acetylated histone H3 Lys27 (H3K27ac) histone modification) modalities.
89+
The dataset can be downloaded [here](https://zenodo.org/records/10362607).
90+
91+
* [Demonstration of scNiche on the mouse brain spatial CUT&Tag–RNA-seq data](tutorial/tutorial_multi-omics.ipynb)
92+
5093

5194
## Tutorials (characterize cell niches)
5295
scNiche also offers a downstream analytical framework for characterizing cell niches more comprehensively.
@@ -56,6 +99,9 @@ Here are examples of scNiche on two biological datasets:
5699
* [Demonstration of scNiche on the mouse liver Seq-Scope data](tutorial/tutorial_liver.ipynb)
57100

58101

102+
## Acknowledgements
103+
The scNiche model is developed based on the [multi-view clustering framework (CMGEC)](https://github.com/wangemm/CMGEC-TMM-2021). We thank the authors for releasing the codes.
104+
59105
## About
60-
scNiche was developed by Jingyang Qian. Should you have any questions, please contact Jingyang Qian at [email protected].
106+
scNiche is developed by Jingyang Qian. Should you have any questions, please contact Jingyang Qian at [email protected].
61107

images/workflow.jpg

-59.3 KB
Loading

requirements.txt

-7
This file was deleted.

scniche/datasets/_dataset.py

+43
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,49 @@ def human_utuc_imc():
4242
return adata
4343

4444

45+
def mouse_v1_starmap():
46+
"""
47+
Raw mouse V1 neocortex dataset from Wang et al. (Science, 2018), containing 1 slice replicate with the layer labels.
48+
49+
This downloads 9.6 MB of data upon the first call of the function and stores it in `./scniche_data/STARmap.h5ad`.
50+
:return: AnnData
51+
"""
52+
url = "https://figshare.com/ndownloader/files/50249244"
53+
datasetdir = './scniche_data/STARmap.h5ad'
54+
adata = sc.read(datasetdir, backup_url=url)
55+
return adata
56+
57+
58+
def human_dlpfc_visium():
59+
"""
60+
Raw human DLPFC dataset from Maynard et al. (Nat Neurosci., 2021),
61+
containing 4 slices (Slice 151673, 151674, 151675, and 151676) from the same donor with the layer labels.
62+
The scVI (Nat Methods., 2018) embedding as well as the Cell2location (Nat Biotechnol., 2022) deconvolution results
63+
are also provided.
64+
65+
This downloads 71.93 MB of data upon the first call of the function and stores it in `./scniche_data/DLPFC.h5ad`.
66+
:return: AnnData
67+
"""
68+
url = "https://figshare.com/ndownloader/files/50249673"
69+
datasetdir = './scniche_data/DLPFC.h5ad'
70+
adata = sc.read(datasetdir, backup_url=url)
71+
return adata
72+
73+
74+
def mouse_aging_merfish():
75+
"""
76+
processed mouse aging brain dataset from Allen et al. (Cell, 2023), containing 31 slices with the tissue labels.
77+
The data has been normalized and scaled by the original authors, and the PCA results are also provided.
78+
79+
This downloads 281.56 MB of data upon the first call of the function and stores it in `./scniche_data/MERFISH_Aging.h5ad`.
80+
:return: AnnData
81+
"""
82+
url = "https://figshare.com/ndownloader/files/50251680"
83+
datasetdir = './scniche_data/MERFISH_Aging.h5ad'
84+
adata = sc.read(datasetdir, backup_url=url)
85+
return adata
86+
87+
4588
def human_tnbc_mibi_tof():
4689
"""
4790
Processed human triple-negative breast cancer (TNBC) MIBI-TOF dataset from Keren et al. (Cell, 2018),

scniche/preprocess/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
"process_multi_slices",
1010
"construct_graph",
1111
"random_split",
12+
"random_split2",
1213
"myDataset",
1314
"prepare_data",
1415
"prepare_data_batch",

scniche/preprocess/_build.py

+37-28
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,25 @@
55

66
def prepare_data(
77
adata: AnnData,
8+
choose_views: Optional[list] = None,
89
k_cutoff_graph: int = 20,
910
mik_graph: int = 5,
1011
verbose: bool = True
1112
):
12-
13-
feat1 = adata.obsm['X_cn_norm']
14-
feat2 = adata.obsm['X_data']
15-
feat3 = adata.obsm['X_data_nbr']
16-
1713
if verbose:
1814
print("-------Constructing graph for each view...")
19-
for view, feat in zip(['g1', 'g2', 'g3'], [feat1, feat2, feat3]):
15+
if choose_views is None:
16+
choose_views = ['X_cn_norm', 'X_data', 'X_data_nbr']
17+
else:
18+
missing_views = [view for view in choose_views if view not in adata.obsm.keys()]
19+
if missing_views:
20+
raise ValueError(f"The following views are missing in adata.obsm: {', '.join(missing_views)}")
21+
22+
for view in choose_views:
23+
feat = adata.obsm[view]
2024
g = construct_graph(np.array(feat), k_cutoff_graph, mik_graph)
21-
adata.uns[view] = g
25+
graph_name = 'g_' + view
26+
adata.uns[graph_name] = g
2227
if verbose:
2328
print("Constructing done.")
2429

@@ -27,23 +32,25 @@ def prepare_data(
2732

2833
def prepare_data_batch(
2934
adata: AnnData,
35+
choose_views: Optional[list] = None,
3036
batch_num: int = 4,
3137
k_cutoff_graph: int = 20,
3238
mik_graph: int = 5,
3339
verbose: bool = True
3440
):
35-
feat1 = adata.obsm['X_cn_norm']
36-
feat2 = adata.obsm['X_data']
37-
feat3 = adata.obsm['X_data_nbr']
38-
39-
# TODO: batch idx
41+
# create batch idx
4042
random.seed(123)
4143
batch_size = adata.shape[0] // batch_num
4244
left_cell_num = adata.shape[0] % batch_num
4345
add_cell_num = batch_num - left_cell_num
4446
add_cell = random.choices(range(adata.shape[0]), k=add_cell_num)
4547

46-
batch_idx = random_split(adata.shape[0], batch_size)
48+
# bug fixed
49+
if left_cell_num < batch_size:
50+
batch_idx = random_split(adata.shape[0], batch_size)
51+
else:
52+
batch_idx = random_split2(adata.shape[0], batch_num)
53+
4754
if left_cell_num > 0:
4855
for i in range(left_cell_num):
4956
batch_idx[i].append(batch_idx[len(batch_idx) - 1][i])
@@ -57,28 +64,30 @@ def prepare_data_batch(
5764

5865
adata.uns['batch_idx'] = batch_idx_new
5966

60-
g1_list = []
61-
g2_list = []
62-
g3_list = []
67+
# check
68+
if choose_views is None:
69+
choose_views = ['X_cn_norm', 'X_data', 'X_data_nbr']
70+
else:
71+
missing_views = [view for view in choose_views if view not in adata.obsm.keys()]
72+
if missing_views:
73+
raise ValueError(f"The following views are missing in adata.obsm: {', '.join(missing_views)}")
74+
75+
feat = [adata.obsm[view] for view in choose_views]
76+
g_list = [[] for _ in range(len(feat))]
77+
6378
if verbose:
6479
print("-------Constructing batch-graph for each view...")
65-
for i in tqdm(range(batch_num)):
66-
feat1_tmp = feat1[batch_idx_new[i]]
67-
feat2_tmp = feat2[batch_idx_new[i]]
68-
feat3_tmp = feat3[batch_idx_new[i]]
69-
70-
g1_tmp = construct_graph(np.array(feat1_tmp), k_cutoff_graph, mik_graph)
71-
g2_tmp = construct_graph(np.array(feat2_tmp), k_cutoff_graph, mik_graph)
72-
g3_tmp = construct_graph(np.array(feat3_tmp), k_cutoff_graph, mik_graph)
7380

74-
g1_list.append(g1_tmp)
75-
g2_list.append(g2_tmp)
76-
g3_list.append(g3_tmp)
81+
for i in tqdm(range(batch_num)):
82+
for j in range(len(feat)):
83+
feat_tmp = feat[j][batch_idx_new[i]]
84+
g_tmp = construct_graph(np.array(feat_tmp), k_cutoff_graph, mik_graph)
85+
g_list[j].append(g_tmp)
7786

7887
if verbose:
7988
print("Constructing done.")
8089

81-
mydataset = myDataset(g1_list, g2_list, g3_list)
90+
mydataset = myDataset(g_list)
8291
dataloader = GraphDataLoader(mydataset, batch_size=1, shuffle=False, pin_memory=True)
8392
adata.uns['dataloader'] = dataloader
8493

scniche/preprocess/_utils.py

+24-13
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
from anndata import AnnData
88
from tqdm import tqdm
99
from sklearn.decomposition import PCA
10+
from scipy.sparse import issparse
1011
from torch.utils.data import Dataset, DataLoader
1112
from sklearn.neighbors import NearestNeighbors
1213
from typing import Optional, Union
@@ -38,8 +39,8 @@ def cal_spatial_neighbors(
3839

3940
# CNs
4041
meta = adata.obs.copy()
41-
meta['x_new'] = adata.obsm['spatial'][:, 0]
42-
meta['y_new'] = adata.obsm['spatial'][:, 1]
42+
meta['x_new'] = list(adata.obsm['spatial'][:, 0])
43+
meta['y_new'] = list(adata.obsm['spatial'][:, 1])
4344

4445
if celltype_order is None:
4546
celltype_order = sorted(meta[celltype_key].unique())
@@ -109,7 +110,10 @@ def cal_spatial_exp(
109110
if layer_key is not None:
110111
data_raw = adata.obsm[layer_key].copy()
111112
else:
112-
data_raw = adata.X.copy()
113+
if issparse(adata.X):
114+
data_raw = adata.X.toarray().copy()
115+
else:
116+
data_raw = adata.X.copy()
113117
data_nbr = []
114118
for i in range(indices.shape[0]):
115119
data_nbr_tmp = data_raw[indices[i]].mean(axis=0)
@@ -192,12 +196,25 @@ def construct_graph(
192196
return g
193197

194198

199+
# left_cell_num < batch_size
195200
def random_split(n, m):
196201
nums = list(range(n))
197202
random.shuffle(nums)
198203
return [nums[i:i + m] for i in range(0, n, m)]
199204

200205

206+
# left_cell_num > batch_size
207+
def random_split2(n, batch_num):
208+
nums = list(range(n))
209+
random.shuffle(nums)
210+
211+
batch_size = n // (batch_num + 1)
212+
result = [nums[i * batch_size: (i + 1) * batch_size] for i in range(batch_num)]
213+
result.append(nums[batch_num * batch_size:])
214+
215+
return result
216+
217+
201218
def set_seed():
202219
# seed
203220
seed = 123
@@ -210,21 +227,15 @@ def set_seed():
210227

211228

212229
class myDataset(Dataset):
213-
def __init__(self, g1, g2, g3):
214-
self.g1 = g1
215-
self.g2 = g2
216-
self.g3 = g3
230+
def __init__(self, g_list):
231+
self.g_list = g_list
217232

218233
def __getitem__(self, idx):
219234

220-
tmp_g1 = self.g1[idx]
221-
tmp_g2 = self.g2[idx]
222-
tmp_g3 = self.g3[idx]
223-
224-
return tmp_g1, tmp_g2, tmp_g3
235+
return tuple(g[idx] for g in self.g_list)
225236

226237
def __len__(self):
227-
return len(self.g1)
238+
return len(self.g_list[0])
228239

229240

230241

scniche/trainer/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
from ._utils import *
44

55
__all__ = [
6-
"GAE",
6+
"MGAE",
77
"FeatureFusion",
88
"InnerProductDecoder",
99
"GFN",

0 commit comments

Comments
 (0)