Skip to content

Commit

Permalink
added tutorials to docs again
Browse files Browse the repository at this point in the history
  • Loading branch information
Lucas Camillo authored and Lucas Camillo committed Feb 8, 2024
1 parent a1849ce commit 6face6b
Show file tree
Hide file tree
Showing 9 changed files with 7,367 additions and 1 deletion.
772 changes: 772 additions & 0 deletions docs/source/tutorials/tutorial_atacseq.ipynb

Large diffs are not rendered by default.

633 changes: 633 additions & 0 deletions docs/source/tutorials/tutorial_bloodchemistry.ipynb

Large diffs are not rendered by default.

1,186 changes: 1,186 additions & 0 deletions docs/source/tutorials/tutorial_dnam_illumina_human_array.ipynb

Large diffs are not rendered by default.

1,616 changes: 1,616 additions & 0 deletions docs/source/tutorials/tutorial_dnam_illumina_mammalian_array.ipynb

Large diffs are not rendered by default.

1,656 changes: 1,656 additions & 0 deletions docs/source/tutorials/tutorial_dnam_rrbs.ipynb

Large diffs are not rendered by default.

298 changes: 298 additions & 0 deletions docs/source/tutorials/tutorial_histonemarkchipseq.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,298 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a76ae282-3b11-4246-8292-a9276267832d",
"metadata": {},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rsinghlab/pyaging/blob/main/tutorials/tutorial_histonemarkchipseq.ipynb) [![Open In nbviewer](https://img.shields.io/badge/View%20in-nbviewer-orange)](https://nbviewer.jupyter.org/github/rsinghlab/pyaging/blob/main/tutorials/tutorial_histonemarkchipseq.ipynb)"
]
},
{
"cell_type": "markdown",
"id": "d444a24e-6a98-4db1-8688-7f3f80ed2876",
"metadata": {},
"source": [
"# Bulk histone mark ChIP-Seq"
]
},
{
"cell_type": "markdown",
"id": "186154f3-1c8d-4284-a5a4-01f28d4db533",
"metadata": {},
"source": [
"This tutorial is a brief guide for the implementation of the seven histone-mark-specific clocks and the pan-histone-mark clock developed ourselves. Link to [preprint](https://www.biorxiv.org/content/10.1101/2023.08.21.554165v3)."
]
},
{
"cell_type": "markdown",
"id": "270379c1-9159-4677-92fa-10b08aa9f703",
"metadata": {},
"source": [
"We just need two packages for this tutorial."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "dd281360-7e16-45d9-ae2b-8f8f3fff809d",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import pyaging as pya"
]
},
{
"cell_type": "markdown",
"id": "b6893601-615e-449b-829b-c144276f402f",
"metadata": {},
"source": [
"## Download and load example data"
]
},
{
"cell_type": "markdown",
"id": "fd3e80a9-5361-40f0-bf3e-6f6057181594",
"metadata": {},
"source": [
"Let's download an example of H3K4me3 ChIP-Seq bigWig file from the ENCODE project."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85c15bf3-6cf1-4f71-abf2-d0d7ee81b86b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"|-----> 🏗️ Starting download_example_data function\n",
"|-----------> Downloading data to pyaging_data/ENCFF386QWG.bigWig\n",
"|-----------> in progress: 24.0057%"
]
}
],
"source": [
"pya.data.download_example_data('ENCFF386QWG')"
]
},
{
"cell_type": "markdown",
"id": "3880246a-471e-4f75-bd2f-ed2623458a48",
"metadata": {},
"source": [
"To exemplify that multiple bigWigs can be turned into a df object at once, let's just repeat the file path."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f65f5cc7-4c42-45a5-a04e-83e0520eccff",
"metadata": {},
"outputs": [],
"source": [
"df = pya.pp.bigwig_to_df(['pyaging_data/ENCFF386QWG.bigWig', 'pyaging_data/ENCFF386QWG.bigWig'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a24e0a5-f97f-4f01-95a7-dd96246d9eb2",
"metadata": {},
"outputs": [],
"source": [
"df.index = ['sample1', 'sample2'] # just to avoid an annoying anndata warning that samples have same names"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "769858ac-9d6d-43f8-9c53-0f4a88c5484c",
"metadata": {},
"outputs": [],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "e303dc0f-9e77-4524-9c04-90540e9ee75d",
"metadata": {},
"source": [
"## Convert data to AnnData object"
]
},
{
"cell_type": "markdown",
"id": "ae8e44bc-67fc-4508-9623-faea44301fa8",
"metadata": {},
"source": [
"AnnData objects are highly flexible and are thus our preferred method of organizing data for age prediction."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c167be6-1bd3-407c-ae12-771739189c3c",
"metadata": {},
"outputs": [],
"source": [
"adata = pya.preprocess.df_to_adata(df)"
]
},
{
"cell_type": "markdown",
"id": "3f82813b-3db2-4570-9e4c-3dce08dc5108",
"metadata": {},
"source": [
"Note that the original DataFrame is stored in `X_original` under layers. This is what the `adata` object looks like:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "641a61a6-46fc-4d47-b176-eb39524ce94f",
"metadata": {},
"outputs": [],
"source": [
"adata"
]
},
{
"cell_type": "markdown",
"id": "c72aa719-efd3-4094-90f5-bffcaea76a34",
"metadata": {},
"source": [
"## Predict age"
]
},
{
"cell_type": "markdown",
"id": "aff9395b-4954-4148-9cbb-6681e7217cf3",
"metadata": {},
"source": [
"We can either predict one clock at once or all at the same time. For convenience, let's simply input a few clocks of interest at once. The function is invariant to the capitalization of the clock name. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c02455b4-06dd-44c2-b4b3-a2bb434eae7d",
"metadata": {},
"outputs": [],
"source": [
"pya.pred.predict_age(adata, ['CamilloH3K4me3', 'CamilloH3K9me3', 'CamilloPanHistone'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f64fb182-937b-4f67-b58e-5fffb0e2fad0",
"metadata": {},
"outputs": [],
"source": [
"adata.obs.head()"
]
},
{
"cell_type": "markdown",
"id": "bbaa2243-e380-4020-bf04-f7aa7da83cd4",
"metadata": {},
"source": [
"Having so much information printed can be overwhelming, particularly when running several clocks at once. In such cases, just set verbose to False."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e8dd3457-8983-41a4-aaab-41563b91a866",
"metadata": {},
"outputs": [],
"source": [
"pya.data.download_example_data('ENCFF386QWG', verbose=False)\n",
"df = pya.pp.bigwig_to_df(['pyaging_data/ENCFF386QWG.bigWig', 'pyaging_data/ENCFF386QWG.bigWig'], verbose=False)\n",
"df.index = ['sample1', 'sample2']\n",
"adata = pya.preprocess.df_to_adata(df, verbose=False)\n",
"pya.pred.predict_age(adata, ['CamilloH3K4me3', 'CamilloH3K9me3', 'CamilloPanHistone'], verbose=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8192ab67-a1cc-4728-8ca0-f81a56940fbf",
"metadata": {},
"outputs": [],
"source": [
"adata.obs.head()"
]
},
{
"cell_type": "markdown",
"id": "9832aa0b-99a8-4938-a2a2-5e9b484a3353",
"metadata": {},
"source": [
"After age prediction, the clocks are added to `adata.obs`. Moreover, the percent of missing values for each clock and other metadata are included in `adata.uns`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a4b22bf1-116f-456f-82d2-58b300f863f1",
"metadata": {},
"outputs": [],
"source": [
"adata"
]
},
{
"cell_type": "markdown",
"id": "c08ff758-675c-4136-9fb8-c19f0e05fefd",
"metadata": {},
"source": [
"## Get citation"
]
},
{
"cell_type": "markdown",
"id": "8407c418-6251-4b08-9d29-166f9a4339d2",
"metadata": {},
"source": [
"The doi, citation, and some metadata are automatically added to the AnnData object under `adata.uns[CLOCKNAME_metadata]`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2946393e-a199-46ba-a9dd-80bc8fa88787",
"metadata": {},
"outputs": [],
"source": [
"adata.uns['camilloh3k4me3_metadata']"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.17"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 6face6b

Please sign in to comment.