This repository contains functions to model the per-gene expression from a gene-by-cell matrix of (log-transformed) expression values. Genes with high variance are considered to be more interesting and are prioritized for further analyses. The code itself was originally derived from the scran R package, factored out into a separate C++ library for easier re-use.
Given a tatami::Matrix of log-expression values for each gene in each cell,
we can compute the per-gene variances and model the trend with respect to the mean across genes:
#include "scran_variances/scran_variances.hpp"
std::shared_ptr<tatami::Matrix<double, int> > mat = some_data_source();
scran_variances::ModelGeneVariancesOptions opt;
auto res = scran_variances::model_gene_variances(*mat, opt);
res.means; // vector of means across genes.
res.variances; // vector of variances across genes.
res.fitted; // vector of fitted values of the mean-variance trend for each gene.
res.residuals; // vector of residuals from the trend.Typically, the residuals are used for feature selection, as these account for non-trivial mean-variance trends in transformed count data.
scran_variances::ChooseHighlyVariableGenesOptions copt;
copt.top = 5000;
auto chosen = scran_variances::choose_highly_variable_genes_index(
res.residuals.size(),
res.residuals.data(),
copt
);
// Create the HVG submatrix for downstream analysis.
auto hvg_subset = tatami::make_DelayedSubset(mat, chosen, /* by_row = */ true);Users can also fit a trend directly to their own statistics.
scran_variances::FitVarianceTrendOptions fopt;
fopt.span = 0.5;
fopt.minimum_mean = 1;
auto fit = scran_variances::fit_variance_trend(100, means, variances, fopt);
fit.fitted; // fitted values for all genes.
fit.residuals; // residuals values for all genes.Check out the reference documentation for more details.
If you're using CMake, you just need to add something like this to your CMakeLists.txt:
include(FetchContent)
FetchContent_Declare(
scran_variances
GIT_REPOSITORY https://github.com/libscran/scran_variances
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_variances)Then you can link to scran_variances to make the headers available during compilation:
# For executables:
target_link_libraries(myexe libscran::scran_variances)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_variances)find_package(libscran_scran_variances CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_variances)To install the library, use:
mkdir build && cd build
cmake .. -DSCRAN_VARIANCES_TESTS=OFF
cmake --build . --target installBy default, this will use FetchContent to fetch all external dependencies.
If you want to install them manually, use -DSCRAN_VARIANCES_FETCH_EXTERN=OFF.
See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I.
This requires the external dependencies listed in extern/CMakeLists.txt, which also need to be made available during compilation.