scverse · adamgayoso · Oct 2, 2021 · Oct 1, 2021 · Oct 1, 2021 · Oct 1, 2021
diff --git a/docs/user_guide/background/differential_expression.rst b/docs/user_guide/background/differential_expression.rst
@@ -5,11 +5,11 @@ Differential Expression
 Under construction.
 
 Problem statement
-==========================================
+==================
 
 Differential expression analyses aim to quantify and detect expression differences of some quantity between conditions, e.g., cell types.
 In single-cell experiments, such quantity can correspond to transcripts, protein expression, or chromatin accessibility.
-A central notion when comparing expression levels of two cell states 
+A central notion when comparing expression levels of two cell states
 is the log fold-change
 
 .. math::
@@ -19,18 +19,18 @@ is the log fold-change
       \beta_g := \log h_{g}^B - \log h_{g}^A,
    \end{align}
 
-where 
+where
 :math:`\log h_{g}^A, \log h_{g}^B`
 respectively denote the mean expression levels in subpopulations :math:`A`
 and
 :math:`B`.
 
 
 
-Motivations to use scVI-tools for differential expression 
-======================================================================
+Motivation
+==========
 
-In the particular case of single-cell RNA-seq data, existing differential expression models often model that the mean expression level 
+In the particular case of single-cell RNA-seq data, existing differential expression models often model that the mean expression level
 :math:`\log h_{g}^C`.
 as a linear function of the cell-state and batch assignments.
 These models face two notable limitations to detect differences in expression between cell-states in large-scale scRNA-seq datasets.
@@ -49,11 +49,9 @@ This guide has two objectives.
 First, it aims to provide insight as to how scVI-tools' differential expression module works for transcript expression (``scVI``), surface protein expression (``TOTALVI``), or chromatin accessibility (``PeakVI``).
 More precisely, we explain how it can:
 
-    + approximate population-specific normalized expression levels
-
-    + detect biologically relevant features
-
-    + provide easy-to-interpret predictions
+- approximate population-specific normalized expression levels
+- detect biologically relevant features
+- provide easy-to-interpret predictions
 
 More importantly, this guide explains the function of the hyperparameters of the ``differential_expression`` method.
 
@@ -70,64 +68,42 @@ More importantly, this guide explains the function of the hyperparameters of the
    * - ``idx1``, ``idx2``
      - Mask or queries for the compared populations :math:`A` and :math:`B`.
      - yes
-     - 
-     - 
+     -
+     -
    * - ``mode``
      - Characterizes the null hypothesis.
-     - 
+     -
      - yes
-     - 
+     -
    * - ``delta``
      - composite hypothesis characteristics (when ``mode="change"``).
-     - 
+     -
      - yes
-     - 
+     -
    * - ``fdr_target``
      - desired FDR significance level
-     - 
-     - 
+     -
+     -
      - yes
    * - ``importance_sampling``
      - Precises if expression levels are estimated using importance sampling
      - yes
-     - 
-     - 
+     -
+     -
 
 Notations and model assumptions
-======================================================================
+================================
 While considering different modalities, scVI, TOTALVI, and PeakVI share similar properties, allowing us to perform differential expression of transcripts, surface proteins, or chromatin accessibility, similarly.
 We first introduce some notations that will be useful in the remainder of this guide.
 In particular, we consider a deep generative model where a latent variable with prior :math:`z_n \sim \mathcal{N}_d(0, I_d)` represents cell :math:`n`'s identity.
 In turn, a neural network :math:`f^h_\theta` maps this low-dimensional representation to normalized, expression levels.
-The following table recaps which names the scVI-tools codebase uses.
-
-.. list-table::
-   :widths: 20 50 15 15
-   :header-rows: 1
-
-   * - Model
-     - Type of expression
-     - latent variable name
-     - Normaled expression name
-   * - scVI
-     - Gene expression.
-     - ``z``
-     - ``px_scale``
-   * - TOTALVI
-     - Gene & surface protein expression.
-     - ``z``
-     - ``px_scale`` (gene) and ``py_scale`` (surface protein)
-   * - PEAKVI
-     - Chromatin accessibility.
-     - ``z``
-     - ``p``
 
 
 Approximating population-specific normalized expression levels
-====================================================================================
+===============================================================
 
 A first step to characterize differences in expression consists in estimating state-specific expression levels.
-For several reasons, most ``scVI-tools`` models do not explicitly model discrete cell types. 
+For several reasons, most ``scVI-tools`` models do not explicitly model discrete cell types.
 A given cell's state often is unknown in the first place, and inferred with ``scvi-tools``.
 In some cases, states may also have an intricate structure that would be difficult to model.
 The class of models we consider here assumes that a latent variable :math:`z` characterizes cells' biological identity.
@@ -142,7 +118,7 @@ In particular, we will represent state :math:`C` latent representation with the
    \begin{align}
       \hat P^C(
         Z
-      ) = 
+      ) =
       \frac
       {1}
       {
@@ -161,7 +137,7 @@ We note :math:`h^A_f, h^B_f` the respective expression levels in states :math:`A
 
 
 Detecting biologically relevant features
-========================================================
+========================================
 Once we have expression levels distributions for each condition, scvi-tools constructs an effect size, which will characterize expression differences.
 When considering gene or surface protein expression, log-normalized counts are a traditional choice to characterize expression levels.
 . Consequently, the canonical effect size for feature :math:`f` is the log fold-change, defined as the difference between log expression between conditions,
@@ -171,22 +147,23 @@ When considering gene or surface protein expression, log-normalized counts are a
 
    \begin{align}
       \beta_f
-      = 
-      \log_2 h_^B{f} - \log_2 h_^A{f}.
+      =
+      \log_2 h_{f}^B - \log_2 h_{f}^A.
    \end{align}
-As chromatin accessibility cannot be interpreted in the same way, we take :math:`\beta_f = h_^B{f}- h_^A{f}` instead.
+
+As chromatin accessibility cannot be interpreted in the same way, we take :math:`\beta_f = h_{f}^B- h_{f}^A` instead.
 
 scVI-tools provides several ways to formulate the competing hypotheses from the effect sizes to detect DE features.
 When ``mode = "vanilla"``, we consider point null hypotheses of the form :math:`\mathcal{H}_{0f}: \beta_f = 0`.
 To avoid detecting features of little practical interest, e.g., when expression differences between conditions are significant but very subtle, we recommend users to use ``mode = "change"`` instead.
-In this formulation, we consider null hypotheses instead, such that 
+In this formulation, we consider null hypotheses instead, such that
 
 .. math::
    :nowrap:
 
    \begin{align}
       \lvert \beta_f \rvert
-      \leq 
+      \leq
       \delta.
    \end{align}
 
@@ -196,7 +173,7 @@ A straightforward decision consists in detecting genes for which the posterior d
 
 
 Providing easy-to-interpret predictions
-========================================================
+=======================================
 The obtained gene sets may be difficult to interpret for some users.
 For this reason, we provide a data-supported way to select :math:`\epsilon`, such that the posterior expected False Discovery Proportion (FDP) is below a significance level :math:`\alpha`.
 To clarify how to compute the posterior expectation, we introduce two notations.
@@ -217,7 +194,7 @@ the decision rule tagging :math:`k` features of highest :math:`p_f` as DE.
 We also note :math:`d^f` the binary random variable taking value 1 if feature :math:`f` is differentially expressed.
 
 The False Discovery Proportion is a random variable corresponding to the ratio of the number of false positives over the total number of predicted positives.
-For the specific family of decision rules :math:`\mu^k, k` that we consider here, the FDP can be written as 
+For the specific family of decision rules :math:`\mu^k, k` that we consider here, the FDP can be written as
 
 .. math::
    :nowrap:
@@ -230,20 +207,15 @@ For the specific family of decision rules :math:`\mu^k, k` that we consider here
       {\sum_f \mu_f^k}
       .
    \end{align}
-  
-However, note that the posterior expectation of :math:`d^f`, denoted as :math:`\mathbb{E}_{post}[]`, verifies :math:`\mathbb{E}_{post}[FDP_{d^f}] = p^f`.
-Hence, by linearity of the expectation, we can estimate the false discovery rate corresponding to :math:`k` detected features as 
+
+However, note that the posterior expectation of :math:`d^f`, denoted as :math:`\mathbb{E}_{post}[.]`, verifies :math:`\mathbb{E}_{post}[FDP_{d^f}] = p^f`.
+Hence, by linearity of the expectation, we can estimate the false discovery rate corresponding to :math:`k` detected features as
 
 .. math::
    :nowrap:
 
    \begin{align}
-      \mathbb{E}_{post}[FDP_{\mu^k}]
-      =
-      \frac
-      {\sum_f (1 - p^f) \mu_f^k}
-      {\sum_f \mu_f^k}
-      .
+      \mathbb{E}_{post}[FDP_{\mu^k}] = \frac{\sum_f (1 - p^f) \mu_f^k}{\sum_f \mu_f^k}.
    \end{align}
 
  Hence, for a given significance level :math:`\alpha`, we select the maximum detections :math:`k^*`, such that :math:`\mathbb{E}_{post}[FDP_{\mu^k}] \leq \alpha`, as illustrated below.

diff --git a/docs/user_guide/index.rst b/docs/user_guide/index.rst
@@ -15,6 +15,9 @@ scRNA-seq analysis
    * - :doc:`/user_guide/models/scvi`
      - [Lopez18]_
      - Dimensionality reduction, removal of unwanted variation, integration across replicates, donors, and technologies, differential expression, imputation, normalization of other cell- and sample-level confounding factors
+   * - :doc:`/user_guide/models/scanvi`
+     - [Xu21]_
+     - scVI tasks with cell type transfer from reference, seed labeling
    * - :doc:`/user_guide/models/linearscvi`
      - [Svensson20]_
      - scVI tasks with linear decoder

diff --git a/docs/user_guide/models/cellassign.rst b/docs/user_guide/models/cellassign.rst
@@ -10,15 +10,13 @@ such that CellAssign will scale to very large datasets.
 
 The advantages of CellAssign are:
 
-    + Lightweight model that can be fit quickly.
-
-    + Ability to control for nuisance factors.
+- Lightweight model that can be fit quickly.
+- Ability to control for nuisance factors.
 
 The limitations of CellAssign include:
 
-    + Requirement for a cell types by gene markers binary matrix.
-
-    + The simple linear model may not handle non-linear batch effects.
+- Requirement for a cell types by gene markers binary matrix.
+- The simple linear model may not handle non-linear batch effects.
 
 
 .. topic:: Tutorials:

diff --git a/docs/user_guide/models/destvi.rst b/docs/user_guide/models/destvi.rst
@@ -8,13 +8,12 @@ can be used to explore the spatial organization of a tissue and understanding ge
 
 The advantages of DestVI are:
 
-    + Can stratify cells into discrete cell types and model continuous sub-cell-type variation.
-
-    + Scalable to very large datasets (>1 million cells).
+- Can stratify cells into discrete cell types and model continuous sub-cell-type variation.
+- Scalable to very large datasets (>1 million cells).
 
 The limitations of DestVI include:
 
-    + Effectively requires a GPU for fast inference.
+- Effectively requires a GPU for fast inference.
 
 .. topic:: Tutorial:
 
@@ -240,4 +239,4 @@ can be found in the DestVI paper.
         `bioRxiv <https://doi.org/10.1101/2021.05.10.443517>`__.
     .. [#ref2] Jakub Tomczak, Max Welling (2018),
         *VAE with a VampPrior*,
-        `International Conference on Artificial Intelligence and Statistics <http://proceedings.mlr.press/v84/tomczak18a/tomczak18a.pdf`__.
+        `International Conference on Artificial Intelligence and Statistics <http://proceedings.mlr.press/v84/tomczak18a/tomczak18a.pdf`__.
diff --git a/docs/user_guide/models/figures/fdr_control.png b/docs/user_guide/models/figures/fdr_control.png
diff --git a/docs/user_guide/models/figures/scanvi_pgm.png b/docs/user_guide/models/figures/scanvi_pgm.png
diff --git a/docs/user_guide/models/linearscvi.rst b/docs/user_guide/models/linearscvi.rst
@@ -7,15 +7,13 @@ is a flavor of scVI with a linear decoder.
 
 The advantages of LDVAE are:
 
-    + Can be used to interpret latent dimensions with factor loading matrix.
-
-    + Scalable to very large datasets (>1 million cells).
+- Can be used to interpret latent dimensions with factor loading matrix.
+- Scalable to very large datasets (>1 million cells).
 
 The limitations of LDVAE include:
 
-    + Less capacity than scVI, which uses a neural network decoder.
-
-    + Less capable of integrating data with complex batch effects.
+- Less capacity than scVI, which uses a neural network decoder.
+- Less capable of integrating data with complex batch effects.
 
 
 .. topic:: Tutorials: