Skip to content

Commit

Permalink
Autozi userguide (#1204)
Browse files Browse the repository at this point in the history
* add autozi userguide

* add ref

* fix ref

* add tutorials

* typo

* edits
  • Loading branch information
galenxing authored Oct 7, 2021
1 parent d6a6a5f commit 122c4aa
Show file tree
Hide file tree
Showing 2 changed files with 76 additions and 0 deletions.
3 changes: 3 additions & 0 deletions docs/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ scRNA-seq analysis
* - :doc:`/user_guide/models/linearscvi`
- [Svensson20]_
- scVI tasks with linear decoder
* - :doc:`/user_guide/models/autozi`
- [Clivio19]_
- for assessing gene-specific levels of zero-inflation in scRNA-seq data
* - :doc:`/user_guide/models/cellassign`
- [Zhang19]_
- Marker-based automated annotation
Expand Down
73 changes: 73 additions & 0 deletions docs/user_guide/models/autozi.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
======
AUTOZI
======

**AUTOZI** [#ref1]_ (Python class :class:`scvi.model.AUTOZI`)
is a model for assessing gene-specific levels of zero-inflation in scRNA-seq data.

.. topic:: Tutorials:

- :doc:`/tutorials/notebooks/AutoZI_tutorial`

Generative process
==================
AUTOZI is very similar to scVI but employs a spike-and-slab prior for the zero-inflation mixture assignment for each gene.
Whether the zero-inflation rate (:math:`\pi_{ng}` in the original scVI model) is sampled from a set of
non-negligible values (the "slab" component) or the set of negligible values (the "spike" component) is defined by
:math:`m_g \sim Bernoulli(\delta_g)` where :math:`\delta_g \sim Beta(\alpha, \beta)`.
Thus, for each gene :math:`g`, the zero-inflation rate is defined,
:math:`\pi_{ng} = (1-m_g)\pi_{ng}^{slab} + m_g \pi_{ng}^{spike}`.

The full generative model is as follows:

.. math::
:nowrap:
\begin{align}
z_n &\sim N(0,I)\\
l_n &\sim LogNormal(l_u, l_\sigma^2)\\
\delta_g &\sim Beta(\alpha^g,\beta^g)\\
m_g &\sim Bernoulli(\delta_g)\\
\pi _{ng} &=( 1-m_{g}) \delta _{\{0\}} +m_{g} \delta _{\{h^{g}( z_{n})\}}\\
x_{ng}|z_n,l_n,m_g &\sim ZINB(l_nw_g(z_n), \theta_g, \pi_{ng})\\
\end{align}
Where :math:`w^g` and :math:`h^g` are neural networks taking in :math:`z_n` and outputting
the dropout rate and library size frequency respectively. The priors :math:`l_u` and
:math:`l_{\sigma^2}` are the empircal mean and variance of the log library size per batch
respectively. The priors for :math:`\delta_g` are :math:`\alpha^g` and :math:`\beta^g` which
by default are both set to 0.5 to enforce sparsity while maintaining symmetry. Finally,
:math:`\delta_{\{x\}}` denotes the Dirac distribution on :math:`x`.

Inference Procedure
===================

To learn the parameters, we employ variational inference (see :doc:`/user_guide/background/variational_inference`) with the following approximate posterior
distribution:

.. math::
:nowrap:
\begin{align*}
\bar{q} &= \prod ^{G}_{g=1} q( \delta _{g})\prod ^{N}_{n=1} q( z_{n} |x_{n}) q( l_{n} |x_{n})
\end{align*}
Tasks
=====
To classify whether a gene :math:`g` is or is not zero inflated,
we call::

>>> outputs = model.get_alpha_betas()
>>> alpha_posterior = outputs['alpha_posterior']
>>> beta_posterior = outputs['beta_posterior']

Then Bayesian decision theory suggests the posterior probability of of zero-inflation
is :math:`q(\delta_g < 0.5)`.
>>> from scipy.stats import beta
>>> threshold = 0.5
>>> zi_probs = beta.cdf(0.5, alpha_posterior, beta_posterior)

.. topic:: References:
.. [#ref1] Oscar Clivio, Romain Lopez, Jeffrey Regier, Adam Gayoso, Michael I. Jordan, Nir Yosef (2019),
*Detecting zero-inflated genes in single-cell transcriptomics data*,
`Machine Learning in Computational Biology (MLCB) <https://www.biorxiv.org/content/biorxiv/early/2019/10/10/794875.full.pdf>`__.

0 comments on commit 122c4aa

Please sign in to comment.