Author: Feng Zhang
Email: [email protected]
Demo list: http://omics.fudan.edu.cn/static/demo/index.html (including Human oligodendrogliomas)
Operating system: Mac or Linux, python=2.7, R=3.5
Python packages: scipy
R packages: stringr, pastecs, mixtools, igraph
Running time (GSE70630): ~15 min (from step1 to step4), 4 threads, Centos7, Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
-
Get normalized expression matrix: sup1_normData.R
-
Downstream analysis (cluster & regulatory graph) of regulatory feature binary matrix: sup2_Downstream.R
i) Build expression index ( Python )
python | step1_BuildIndex.py | PAIR_LIST | EXP | OUTPUT_INDEX | RATE
PAIR_LIST: a list of tab-delimited gene pairs
EXP: normalized expression matrix (without quotation marks)
OUTPUT_INDEX: the index file name, given by the user
RATE: the cutoff for the non-zero rate. For each gene pair, non-zero rate equals to the proportion of cells of which two genes both are detected.
ii) Generate Z-value matrix ( Python )
python | step2_CalZmat.py | OUTPUT_INDEX | OUTPUT_ZMAT | CPU
OUTPUR_INDEX: the index file generated in the first step
OUTPUT_ZMAT: the z-value matrix file name, given by the user
CPU: the number of threads to run GCA
iii) Mixture models analysis ( R )
Rscript | step3_deMix.R | EXP | OUTPUT_ZMAT | OUTPUT | CPU | SEED | CUTOFF
EXP: the expression matrix used in the first step
OUTPUT_ZMAT: the z-value matrix generated in the second step
OUTPUT: the name of the output directory, given by the user
CPU: the number of threads to run GCA
SEED: seed for random function in R
CUTOFF: the cutoff of edge score (HCS) to draw the gene graph
iv) Get regulatory feature binary matrix ( Python )
python | step4_getRegFeature.py | MODE | GCA_OUTPUT | OUTPUT
MODE: a list of tab-delimited gene pairs with regulatory mode information (TRRUST file or your own file in the same format)
GCA_OUTPUT: the OUTPUT directory of GCA
OUTPUT: the regulatory feature binary matrix
The result summary HTML file named “index.html” locates in the output directory.
The left side of this page records the contribution score of each gene, and the right side presents the edge score of each gene pair.
Links:
“Graph” links to the gene graph that shows the path of genes that might contribute to the expression heterogeneity.
“Args” links to the txt file that records GCA running options
“txt” in “MixInfo” links to the txt file that records the parameters of ‘mixtools’ (Benaglia, et al., 2010).
“txt” in “Cluster” links to the txt file that records the cell labels, z-values, sub-group labels, color keys, and gene’s expression values.
“pdf” in “Figure” links to the pdf file that shows all the figure result for each gene pair.
Users can simply adjust the figure by loading the “OUTPUT.saved_RData” into R, and generate different figures following the scripts in: step3_deMix.R