Installation

remotes::install_github("Sage-Bionetworks/sageseqr")

RNA-seq normalization workflow in R

The sageseqr package integrates the drake R package, the config package for R, and Synapse. drake tracks dependency relationships in the workflow and only updates data when it has changed. A config file allows inputs and parameters to be explicitly defined in one location. Synapse is a data repository that allows sensitive data to be stored and shared responsibly.

The workflow takes RNA-seq gene counts and sample metadata as inputs, normalizes counts by conditional quantile normalization (CQN), removes outliers based on a user-defined threshold, empirically selects meaningful covariates and returns differential expression analysis results. The data is also visualized in several ways to help you understand meaningful trends. The visualizations include a heatmap identifying highly correlated covariates, a sample-specific x and y marker gene check, boxplots visualizing the distribution of continuous variables and a principal component analysis (PCA) to visualize sample distribution.

The Targets

The series of steps that make up the workflow are called targets. The target objects are stored in a cache and can either be read or loaded into your environment with the drake functions readd or loadd. Source code for each target can be visualized by setting show_source = TRUE with loadd and readd.

Importantly, running clean will remove the data stored as targets (but, the data is never completely gone!). You may specific targets by name by passing them to the clean function.

The targets are called by the sageseqr rnaseq_plan() function and are:

Raw data:

import_metadata- imports the raw metadata directly from synapse
import_counts - imports the raw counts directly from synapse
biomart_results - the complete list of genes with biomaRt annotations.

Exploratory data visualizations:

gene_coexpression - the distribution of correlated gene counts.
boxplots - the distribution of continuous variables.
sex_plot - the distribution of samples by x and y marker genes.
sex_plot_pca - a PCA of sex-specific expression to visualize more dimensionality than sex_plot.
correlation_plot - the correlation of covariates.
significant_covariates_plot - the correlation of covariates to gene expression.
outliers - the clustering of samples by PCA.
plot_de_volcano - volcano plot of differentially expressed genes.

Transformed or normalized data:

clean_md - metadata with factor and numeric types.
filtered_counts - counts matrix with low gene expression removed.
biotypes - gene proportions summarized by biotype.
cqn_counts - CQN normalized counts.
model - model selected by multivariate forward stepwise regression (evaluated by Bayesian Information Criteria (BIC)).
de - differential expression results including adjusted p-values and gene list.
report - output markdown report rendered as HTML.

Access to Data

Anyone can create a Synapse account and access public data in a variety of disciplines: Alzheimer's Disease Knowledge portal, CommonMind Consoritum.

Name		Name	Last commit message	Last commit date
Latest commit History 598 Commits
.github		.github
R		R
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
_pkgdown.yml		_pkgdown.yml
config.yml		config.yml
sageseqr.Rproj		sageseqr.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Installation

RNA-seq normalization workflow in R

The Targets

Access to Data

About

Licenses found

Releases

Packages

Languages

License

Licenses found

daisyhan97/sageseqr

Folders and files

Latest commit

History

Repository files navigation

Installation

RNA-seq normalization workflow in R

The Targets

Access to Data

About

Resources

License

Licenses found

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages