scRNA-seq_Pr

This repository includes

Five folders, each containing Jupyter notebooks that reproduce the results in the preprint "Normalization and gene selection for single-cell RNA-seq UMI data using sampling-adjusted sums of squares of Pearson residuals with a Poisson model" and its Supplement 1. There is one folder for each data set.
The file nru_DE.py of functions that implement the algorithms. The examples in the notebooks call three functions:
- nru: normalize and rank UMI counts: return statistics, residuals, and intermediate results
- DE_H_stats: perform Kruskal-Wallis tests to analyze differential expression
- mean_SSQ_Pearson_residuals
  - called by nru in the notebooks 01_compute_Mg_Ag_data_prep_for_Lg_Sg
  - called in the notebooks 02_Fig_1_and_Table to calculate the mean SSQ of Pearson residuals for all genes in the input count matrix. In the paper, these are plotted in Figure 1A and summarized in Table 2.
The file plot_tab_utilities.py

Functions in nru_DE.py are described in function_documentation.md.

Before running the Jupyter notebooks, please create folders for programs and data. For example, we use D:/analyze_Pearson_residuals/ for programs, with a subfolder for each data set.

Then copy the files nru_DE.py and plot_tab_utilities.py to the program folder.

For the data sets discussed here, the subfolders are

33k_PBMC
10k_heart
10k_brain
lupus
retinal

Folders are specified near the top of each notebook in the cell headed with the comment #### user specified.

Instructions for downloading the data are in data_download_instructions.md. For the retinal data, they are taken directly from https://github.com/berenslab/umi-normalization. For the 33k PBMC data set from 10x Genomics, they follow that web page very closely; modifications were necessary due to changes made on the 10x Genomics website after the berenslab page was last updated.

Each subfolder contains 11 notebooks

one is data-specific (e.g. prep_33k_PBMC); it prepares 2 pandas data frames
- a sparse data frame containing UMI counts
- a clustering (provided with the data)
the remaining 10 notebooks perform the reported analyses; there are two versions of notebooks 06 and 10, depending on whether or not genes are notated in the plots
- 01_compute_Mg_Ag_data_prep_for_Lg_Sg
- 02_Fig_1_and_Table
- 03_compute_L_g
- 04_compute_S_g
- 05_Figs_3_5_and_Tables
- 06_Fig_7_and_Table or 06_Fig_7_notated_and_Table
- 07_compute_Ag_for_complementary_samples
- 08_compute_Lg_for_complementary_samples
- 09_compute_Sg_for_complementary_samples
- 10_Figs_2_4_6_and_Tables or 10_Figs_2_notated_4_6_and_Tables

For the lupus data there is an additional notebook - extract_Kang_Lupus_data_from_ExperimentHub - as explained in data_download_instructions.md.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
10k_brain_notebooks		10k_brain_notebooks
10k_heart_notebooks		10k_heart_notebooks
33k_PBMC_notebooks		33k_PBMC_notebooks
lupus_notebooks		lupus_notebooks
retinal_notebooks		retinal_notebooks
single_marker_genes_simulation		single_marker_genes_simulation
LICENSE		LICENSE
README.md		README.md
data_download_instructions.md		data_download_instructions.md
function_documentation.md		function_documentation.md
nru_DE.py		nru_DE.py
plot_tab_utilities.py		plot_tab_utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scRNA-seq_Pr

About

Releases

Packages

Languages

License

victorkleb/scRNA-seq_Pr

Folders and files

Latest commit

History

Repository files navigation

scRNA-seq_Pr

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages