-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Lucas Camillo
authored and
Lucas Camillo
committed
Feb 8, 2024
1 parent
a1849ce
commit 6face6b
Showing
9 changed files
with
7,367 additions
and
1 deletion.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
1,186 changes: 1,186 additions & 0 deletions
1,186
docs/source/tutorials/tutorial_dnam_illumina_human_array.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
1,616 changes: 1,616 additions & 0 deletions
1,616
docs/source/tutorials/tutorial_dnam_illumina_mammalian_array.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
298 changes: 298 additions & 0 deletions
298
docs/source/tutorials/tutorial_histonemarkchipseq.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,298 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a76ae282-3b11-4246-8292-a9276267832d", | ||
"metadata": {}, | ||
"source": [ | ||
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rsinghlab/pyaging/blob/main/tutorials/tutorial_histonemarkchipseq.ipynb) [![Open In nbviewer](https://img.shields.io/badge/View%20in-nbviewer-orange)](https://nbviewer.jupyter.org/github/rsinghlab/pyaging/blob/main/tutorials/tutorial_histonemarkchipseq.ipynb)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d444a24e-6a98-4db1-8688-7f3f80ed2876", | ||
"metadata": {}, | ||
"source": [ | ||
"# Bulk histone mark ChIP-Seq" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "186154f3-1c8d-4284-a5a4-01f28d4db533", | ||
"metadata": {}, | ||
"source": [ | ||
"This tutorial is a brief guide for the implementation of the seven histone-mark-specific clocks and the pan-histone-mark clock developed ourselves. Link to [preprint](https://www.biorxiv.org/content/10.1101/2023.08.21.554165v3)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "270379c1-9159-4677-92fa-10b08aa9f703", | ||
"metadata": {}, | ||
"source": [ | ||
"We just need two packages for this tutorial." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "dd281360-7e16-45d9-ae2b-8f8f3fff809d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import pandas as pd\n", | ||
"import pyaging as pya" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b6893601-615e-449b-829b-c144276f402f", | ||
"metadata": {}, | ||
"source": [ | ||
"## Download and load example data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "fd3e80a9-5361-40f0-bf3e-6f6057181594", | ||
"metadata": {}, | ||
"source": [ | ||
"Let's download an example of H3K4me3 ChIP-Seq bigWig file from the ENCODE project." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "85c15bf3-6cf1-4f71-abf2-d0d7ee81b86b", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"|-----> 🏗️ Starting download_example_data function\n", | ||
"|-----------> Downloading data to pyaging_data/ENCFF386QWG.bigWig\n", | ||
"|-----------> in progress: 24.0057%" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"pya.data.download_example_data('ENCFF386QWG')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3880246a-471e-4f75-bd2f-ed2623458a48", | ||
"metadata": {}, | ||
"source": [ | ||
"To exemplify that multiple bigWigs can be turned into a df object at once, let's just repeat the file path." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f65f5cc7-4c42-45a5-a04e-83e0520eccff", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"df = pya.pp.bigwig_to_df(['pyaging_data/ENCFF386QWG.bigWig', 'pyaging_data/ENCFF386QWG.bigWig'])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "1a24e0a5-f97f-4f01-95a7-dd96246d9eb2", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"df.index = ['sample1', 'sample2'] # just to avoid an annoying anndata warning that samples have same names" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "769858ac-9d6d-43f8-9c53-0f4a88c5484c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"df.head()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e303dc0f-9e77-4524-9c04-90540e9ee75d", | ||
"metadata": {}, | ||
"source": [ | ||
"## Convert data to AnnData object" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ae8e44bc-67fc-4508-9623-faea44301fa8", | ||
"metadata": {}, | ||
"source": [ | ||
"AnnData objects are highly flexible and are thus our preferred method of organizing data for age prediction." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "6c167be6-1bd3-407c-ae12-771739189c3c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"adata = pya.preprocess.df_to_adata(df)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3f82813b-3db2-4570-9e4c-3dce08dc5108", | ||
"metadata": {}, | ||
"source": [ | ||
"Note that the original DataFrame is stored in `X_original` under layers. This is what the `adata` object looks like:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "641a61a6-46fc-4d47-b176-eb39524ce94f", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"adata" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c72aa719-efd3-4094-90f5-bffcaea76a34", | ||
"metadata": {}, | ||
"source": [ | ||
"## Predict age" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "aff9395b-4954-4148-9cbb-6681e7217cf3", | ||
"metadata": {}, | ||
"source": [ | ||
"We can either predict one clock at once or all at the same time. For convenience, let's simply input a few clocks of interest at once. The function is invariant to the capitalization of the clock name. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "c02455b4-06dd-44c2-b4b3-a2bb434eae7d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pya.pred.predict_age(adata, ['CamilloH3K4me3', 'CamilloH3K9me3', 'CamilloPanHistone'])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f64fb182-937b-4f67-b58e-5fffb0e2fad0", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"adata.obs.head()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "bbaa2243-e380-4020-bf04-f7aa7da83cd4", | ||
"metadata": {}, | ||
"source": [ | ||
"Having so much information printed can be overwhelming, particularly when running several clocks at once. In such cases, just set verbose to False." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "e8dd3457-8983-41a4-aaab-41563b91a866", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pya.data.download_example_data('ENCFF386QWG', verbose=False)\n", | ||
"df = pya.pp.bigwig_to_df(['pyaging_data/ENCFF386QWG.bigWig', 'pyaging_data/ENCFF386QWG.bigWig'], verbose=False)\n", | ||
"df.index = ['sample1', 'sample2']\n", | ||
"adata = pya.preprocess.df_to_adata(df, verbose=False)\n", | ||
"pya.pred.predict_age(adata, ['CamilloH3K4me3', 'CamilloH3K9me3', 'CamilloPanHistone'], verbose=False)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8192ab67-a1cc-4728-8ca0-f81a56940fbf", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"adata.obs.head()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "9832aa0b-99a8-4938-a2a2-5e9b484a3353", | ||
"metadata": {}, | ||
"source": [ | ||
"After age prediction, the clocks are added to `adata.obs`. Moreover, the percent of missing values for each clock and other metadata are included in `adata.uns`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "a4b22bf1-116f-456f-82d2-58b300f863f1", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"adata" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c08ff758-675c-4136-9fb8-c19f0e05fefd", | ||
"metadata": {}, | ||
"source": [ | ||
"## Get citation" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8407c418-6251-4b08-9d29-166f9a4339d2", | ||
"metadata": {}, | ||
"source": [ | ||
"The doi, citation, and some metadata are automatically added to the AnnData object under `adata.uns[CLOCKNAME_metadata]`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "2946393e-a199-46ba-a9dd-80bc8fa88787", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"adata.uns['camilloh3k4me3_metadata']" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.17" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.