added tutorials to docs again

rsinghlab · Feb 8, 2024 · 6face6b · 6face6b
1 parent a1849ce
commit 6face6b
Show file tree

Hide file tree

Showing 9 changed files with 7,367 additions and 1 deletion.
diff --git a/docs/source/tutorials/tutorial_atacseq.ipynb b/docs/source/tutorials/tutorial_atacseq.ipynb
diff --git a/docs/source/tutorials/tutorial_bloodchemistry.ipynb b/docs/source/tutorials/tutorial_bloodchemistry.ipynb
diff --git a/docs/source/tutorials/tutorial_dnam_illumina_human_array.ipynb b/docs/source/tutorials/tutorial_dnam_illumina_human_array.ipynb
diff --git a/docs/source/tutorials/tutorial_dnam_illumina_mammalian_array.ipynb b/docs/source/tutorials/tutorial_dnam_illumina_mammalian_array.ipynb
diff --git a/docs/source/tutorials/tutorial_dnam_rrbs.ipynb b/docs/source/tutorials/tutorial_dnam_rrbs.ipynb
diff --git a/docs/source/tutorials/tutorial_histonemarkchipseq.ipynb b/docs/source/tutorials/tutorial_histonemarkchipseq.ipynb
@@ -0,0 +1,298 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a76ae282-3b11-4246-8292-a9276267832d",
+   "metadata": {},
+   "source": [
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rsinghlab/pyaging/blob/main/tutorials/tutorial_histonemarkchipseq.ipynb) [![Open In nbviewer](https://img.shields.io/badge/View%20in-nbviewer-orange)](https://nbviewer.jupyter.org/github/rsinghlab/pyaging/blob/main/tutorials/tutorial_histonemarkchipseq.ipynb)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d444a24e-6a98-4db1-8688-7f3f80ed2876",
+   "metadata": {},
+   "source": [
+    "# Bulk histone mark ChIP-Seq"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "186154f3-1c8d-4284-a5a4-01f28d4db533",
+   "metadata": {},
+   "source": [
+    "This tutorial is a brief guide for the implementation of the seven histone-mark-specific clocks and the pan-histone-mark clock developed ourselves. Link to [preprint](https://www.biorxiv.org/content/10.1101/2023.08.21.554165v3)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "270379c1-9159-4677-92fa-10b08aa9f703",
+   "metadata": {},
+   "source": [
+    "We just need two packages for this tutorial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "dd281360-7e16-45d9-ae2b-8f8f3fff809d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import pyaging as pya"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6893601-615e-449b-829b-c144276f402f",
+   "metadata": {},
+   "source": [
+    "## Download and load example data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fd3e80a9-5361-40f0-bf3e-6f6057181594",
+   "metadata": {},
+   "source": [
+    "Let's download an example of H3K4me3 ChIP-Seq bigWig file from the ENCODE project."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "85c15bf3-6cf1-4f71-abf2-d0d7ee81b86b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "|-----> 🏗️ Starting download_example_data function\n",
+      "|-----------> Downloading data to pyaging_data/ENCFF386QWG.bigWig\n",
+      "|-----------> in progress: 24.0057%"
+     ]
+    }
+   ],
+   "source": [
+    "pya.data.download_example_data('ENCFF386QWG')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3880246a-471e-4f75-bd2f-ed2623458a48",
+   "metadata": {},
+   "source": [
+    "To exemplify that multiple bigWigs can be turned into a df object at once, let's just repeat the file path."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f65f5cc7-4c42-45a5-a04e-83e0520eccff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = pya.pp.bigwig_to_df(['pyaging_data/ENCFF386QWG.bigWig', 'pyaging_data/ENCFF386QWG.bigWig'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1a24e0a5-f97f-4f01-95a7-dd96246d9eb2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.index = ['sample1', 'sample2'] # just to avoid an annoying anndata warning that samples have same names"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "769858ac-9d6d-43f8-9c53-0f4a88c5484c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e303dc0f-9e77-4524-9c04-90540e9ee75d",
+   "metadata": {},
+   "source": [
+    "## Convert data to AnnData object"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae8e44bc-67fc-4508-9623-faea44301fa8",
+   "metadata": {},
+   "source": [
+    "AnnData objects are highly flexible and are thus our preferred method of organizing data for age prediction."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6c167be6-1bd3-407c-ae12-771739189c3c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata = pya.preprocess.df_to_adata(df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f82813b-3db2-4570-9e4c-3dce08dc5108",
+   "metadata": {},
+   "source": [
+    "Note that the original DataFrame is stored in `X_original` under layers. This is what the `adata` object looks like:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "641a61a6-46fc-4d47-b176-eb39524ce94f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c72aa719-efd3-4094-90f5-bffcaea76a34",
+   "metadata": {},
+   "source": [
+    "## Predict age"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aff9395b-4954-4148-9cbb-6681e7217cf3",
+   "metadata": {},
+   "source": [
+    "We can either predict one clock at once or all at the same time. For convenience, let's simply input a few clocks of interest at once. The function is invariant to the capitalization of the clock name. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c02455b4-06dd-44c2-b4b3-a2bb434eae7d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pya.pred.predict_age(adata, ['CamilloH3K4me3', 'CamilloH3K9me3', 'CamilloPanHistone'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f64fb182-937b-4f67-b58e-5fffb0e2fad0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata.obs.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bbaa2243-e380-4020-bf04-f7aa7da83cd4",
+   "metadata": {},
+   "source": [
+    "Having so much information printed can be overwhelming, particularly when running several clocks at once. In such cases, just set verbose to False."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e8dd3457-8983-41a4-aaab-41563b91a866",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pya.data.download_example_data('ENCFF386QWG', verbose=False)\n",
+    "df = pya.pp.bigwig_to_df(['pyaging_data/ENCFF386QWG.bigWig', 'pyaging_data/ENCFF386QWG.bigWig'], verbose=False)\n",
+    "df.index = ['sample1', 'sample2']\n",
+    "adata = pya.preprocess.df_to_adata(df, verbose=False)\n",
+    "pya.pred.predict_age(adata, ['CamilloH3K4me3', 'CamilloH3K9me3', 'CamilloPanHistone'], verbose=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8192ab67-a1cc-4728-8ca0-f81a56940fbf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata.obs.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9832aa0b-99a8-4938-a2a2-5e9b484a3353",
+   "metadata": {},
+   "source": [
+    "After age prediction, the clocks are added to `adata.obs`. Moreover, the percent of missing values for each clock and other metadata are included in `adata.uns`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a4b22bf1-116f-456f-82d2-58b300f863f1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c08ff758-675c-4136-9fb8-c19f0e05fefd",
+   "metadata": {},
+   "source": [
+    "## Get citation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8407c418-6251-4b08-9d29-166f9a4339d2",
+   "metadata": {},
+   "source": [
+    "The doi, citation, and some metadata are automatically added to the AnnData object under `adata.uns[CLOCKNAME_metadata]`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2946393e-a199-46ba-a9dd-80bc8fa88787",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adata.uns['camilloh3k4me3_metadata']"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.17"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}