Cellxgene Desktop and Data Portal explorers have a common UI that enables exploration and interrogation of single cell data. This section describes cellxgene's data exploration capabilities.
You can follow any of these tutorials by launching an instance of the cellxgene Explorer with the Tabula Sapiens using this Hosted Explorer link. For more information on Tabula Sapiens, you can refer to the preprint on biorxiv.
Description: cellxgene's user interface organizes single cell data similarly to how it is organized in single cell data formats. The left hand side displays categorical and numerical sample metadata. The right hand side is a space for displaying features such as genes and gene sets. The center displays the embedding, where each cell is a point. UMAP and tSNE are common embeddings, which place cells based on their local distances in gene expression space. Cells from spatial data can also be displayed using each cell's (x, y) coordinates. Here is more complete overview of the cellxgene interface in both the hosted and desktop explorer.
Key Concepts: user interface explanation
https://docs.google.com/presentation/d/1DzuwuzLLCcy4hGCCekYwjPpwsWMJth0qt_DJOYc9avs/edit?usp=sharing
Description: Categorical metadata (such as tissue of origin or cell type) can be used in a number of ways within cellxgene including coloring embedding plots (i.e. color UMAP by cell type), looking at cell counts, within a categorical metadata field, making selections of cells or viewing the interaction between different categorical metadata fields.
Key Concepts: categorical metadata, selecting cells by category (i.e. cell type), interaction between categorical metadata fields
https://docs.google.com/presentation/d/1fxuuzhiaYbMdG2NiHgN5-vxeQHwEjWVOSi4VmCAP1L0/edit?usp=sharing
**Description: **Numerical metadata (such gene expression features or QC metrics such as number of genes) can be examined on the embedding plot and be used to filter and select cells. Additionally tools like the clip tool can give us control over how these attributes are displayed on the embedding plot.
Key Concepts: numerical metadata, cell filtering and selection, interaction between numerical metadata categorical metadata fields
https://docs.google.com/presentation/d/13c0Nj_kR32j0hNL0uZG_4cl646N_TH3ADaQzZFqDPBI/edit?usp=sharing"
Description: Cellxgene allows for the complex selection of cells via selection directly on the embedding, gene expression cutoffs, and based on categorical metadata attributes.
Key Concepts: categorical metadata selection, numerical metadata selection, complex selection (combining selection methods)
https://docs.google.com/presentation/d/1T5fLkecZziuo6qUfAam1FmSYKhx-Ln9ytpqccjX8j2o/edit?usp=sharing
Description: Cellxgene allows you to compare the expression of multiple genes via bivariate plots.
Key Concepts: gene expression, co-expression, cell selection, subsetting
https://docs.google.com/presentation/d/13k5imBj40lMOHiWLvmdmoTI2F-ur79-f1X5wKW1tkRE/edit?usp=sharing
Description: Cellxgene allows you to examine groups of genes via the gene sets feature
Key Concepts: gene expression, co-expression, cell selection, subsetting
Resources: comma separated gene set list for use with this tutorial
ACAA1, ACAA2, ACADL, ACADM, ACADS, ACADSB, ACADVL, ACAT1, ACAT2, ACOX1, ACOX3, ACSL1, ACSL3, ACSL4, ACSL5, ACSL6, ADH1A, ADH1B, ADH1C, ADH4, ADH5, ADH6, ADH7, ALDH1B1, ALDH2, ALDH3A2, ALDH7A1, ALDH9A1, CPT1A, CPT1B, CPT1C, CPT2, CYP4A11, CYP4A22, ECHS1, ECI1, ECI2, EHHADH, GCDH, HADH, HADHA, HADHB
https://docs.google.com/presentation/d/1ihFrKRSxnfFNexctOZ0EkUow9SWgEr-9TOVKvVLE8ms/edit?usp=sharing
Description: Cellxgene allows you to find marker genes between selected cell populations
Key Concepts: gene expression, differential expression, cell selection, subsetting
https://docs.google.com/presentation/d/1J0QbINEeHWXNGLwj0dV7PaKCoZn3TRjE3Z2qYydr960/edit?usp=sharing
Note: In the hosted explorer present on the cellxgene data portal, the differential expression feature has a limit of 50,000 cells (i.e. the sum of the number of cells in group 1 and group 2 cannot exceed 50,000). You can workaround this by only selecting subsamples of large clusters. We are currently working on ways to scale this calculation in the hosted setting.
Note: You can find more information here about how our differential expression is calculated. In brief, we use a Welch's t-test. While we are aware that single cell data does not always meet the assumptions imposed by this test, we utilize it because it performs well at identifying the most differentially expressed genes. This enables rapid exploration without sacrificing accuracy.