From 6a979e359f85421a593ab7a51e97e536f8a49fe3 Mon Sep 17 00:00:00 2001 From: John Turner Date: Thu, 31 Oct 2024 15:02:51 -0400 Subject: [PATCH 1/2] remove-module --- reusable-workflow-repo | 1 - 1 file changed, 1 deletion(-) delete mode 160000 reusable-workflow-repo diff --git a/reusable-workflow-repo b/reusable-workflow-repo deleted file mode 160000 index e24998d..0000000 --- a/reusable-workflow-repo +++ /dev/null @@ -1 +0,0 @@ -Subproject commit e24998da8b63cab46f4422abd12571a8ede8b040 From ac509deae83e67aeb7b77de70e83d42cac484ab3 Mon Sep 17 00:00:00 2001 From: github-action Date: Thu, 31 Oct 2024 19:03:19 +0000 Subject: [PATCH 2/2] Github Action: Lint Notebooks --- .../RNA-seq-checkpoint.ipynb | 782 --------------- .../rnaseq-gcb-checkpoint.config | 20 - 01-RNA-Seq/RNA-seq.ipynb | 44 +- .../RRBS-downstream-checkpoint.ipynb | 934 ------------------ .../rrbs-gcb-checkpoint.config | 15 - 02-RRBS/RRBS-downstream.ipynb | 32 +- 03-Integration/Integration.ipynb | 31 +- .../New-Data-checkpoint.ipynb | 248 ----- 04-New-Data/New-Data.ipynb | 26 +- docs/quiz_files/methylation.ipynb | 588 +---------- docs/quiz_files/rna-pre_module.ipynb | 624 +----------- 11 files changed, 19 insertions(+), 3325 deletions(-) delete mode 100644 01-RNA-Seq/.ipynb_checkpoints/RNA-seq-checkpoint.ipynb delete mode 100644 01-RNA-Seq/.ipynb_checkpoints/rnaseq-gcb-checkpoint.config delete mode 100644 02-RRBS/.ipynb_checkpoints/RRBS-downstream-checkpoint.ipynb delete mode 100644 02-RRBS/.ipynb_checkpoints/rrbs-gcb-checkpoint.config delete mode 100644 04-New-Data/.ipynb_checkpoints/New-Data-checkpoint.ipynb diff --git a/01-RNA-Seq/.ipynb_checkpoints/RNA-seq-checkpoint.ipynb b/01-RNA-Seq/.ipynb_checkpoints/RNA-seq-checkpoint.ipynb deleted file mode 100644 index 12160b6..0000000 --- a/01-RNA-Seq/.ipynb_checkpoints/RNA-seq-checkpoint.ipynb +++ /dev/null @@ -1,782 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "9204cae7-66d3-4dce-9a1d-ca007c666bbc", - "metadata": {}, - "source": [ - "# Module 1: RNA-seq analysis \n", - "## Module Overview \n", - " \n", - "+ [Introduction](#GS)\n", - "+ [Raw Reads to Gene Count Table](#RR)\n", - "+ [Gene counts to Differential Expression](#GC)" - ] - }, - { - "cell_type": "markdown", - "id": "0d86bb1c-96d9-495a-9382-c380ae075451", - "metadata": { - "tags": [] - }, - "source": [ - "## Pre-Module Flashcards to Revise the Basics " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f0e9ac26-e308-471a-83a4-113f767524b0", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "IRdisplay::display_html('')" - ] - }, - { - "cell_type": "markdown", - "id": "b491749f-1501-40c9-ac0e-188ce48d8935", - "metadata": { - "tags": [] - }, - "source": [ - "## **1. Introduction** \n", - "### Central Dogma of Molecular Biology. \n", - " \n", - "
\n", - "\n", - "
Fig 1: Different omes and corresponding omics technologies. [1]
\n", - "
\n", - " \n", - "
\n", - " \n", - " [1] Reference: Virkud, Y. V., Kelly, R. S., Wood, C., & Lasky-Su, J. A. (2019). The nuts and bolts of omics for the clinical allergist. Annals of Allergy, Asthma and Immunology, 123(6), 558-563.\n", - "
\n", - "\n", - "The central dogma of molecular biology is the representation of omes and omics. Omic data of various kinds are frequently employed in human medical research. The fields of omic research include the omic data produced by the central dogma, which includes the fields of genomics (DNA), transcriptomics (RNA), proteomics (proteins), and metabolomics (small molecules, including amino acids, fatty acids, carbohydrates, vitamins, lipids, and nucleotides); however, new types of omic data have emerged, including the fields of epigenomics (methyl tags and histones), exposomics (allergens, toxins, diet (bacteria and microorganisms). As a result, the majority of early scientific efforts were devoted to describing the genome, transcriptome, and proteome. However, seven major omics disciplines are currently being investigated in great detail: the genome (DNA), transcriptome (RNA), proteome (proteins), epigenome (DNA modifications that influence expression), metabolome (metabolites), microbiome (microbiota), and exposome (exposures).\n", - "\n", - "### Next Generation Sequencing Technique for RNA-Seq. \n", - "\n", - "
\n", - "\n", - "
Fig 2: High level workflow of RNA sequencing. [2]
\n", - "
\n", - " \n", - "
\n", - " \n", - " [2] Reference: Van den Berge, K., Hembach, K. M., Soneson, C., Tiberi, S., Clement, L., Love, M. I., ... & Robinson, M. D. (2019). RNA sequencing data: Hitchhiker's guide to expression analysis. Annual Review of Biomedical Data Science, 2(1), 139-173.\n", - "
\n", - " \n", - "The above figure displays an overview of an RNA-seq protocol's experimental phases. The sequenced reads from the cDNA library are mapped to a reference genome or transcriptome after being created from isolated RNA targets. Depending on the objective of the experiment, downstream data analysis may entail, among other things, evaluating differential expression, variant calling, or genome annotation.\n", - "\n", - "### **Gene Expression** \n", - "The phrase \"gene expression\" refers to how a gene affects a cell's overall phenotype and functions through the activity of the molecular products that are encoded in a given nucleotide sequence of the gene.\n", - "\n", - "Knowing how much gene expression levels vary from the norm can help identify the genes that are genuinely crucial for things like disease prognosis or cell/tissue identity. In order to determine whether a single gene is expressed at all, low-throughput techniques such using a reporter gene and fluorescent protein product have been replaced by high-throughput techniques.\n", - " \n", - "### **RNA-Seq** \n", - "RNA sequencing has proven to be a ground breaking tool in the study of transcriptomics in the last decade. The accuracy, throughput, and the resolution produced with RNA-seq analysis has provided phenomenal results. There is a variety of applications available for transcriptomic sequencing. Currently, RNA-seq is considered to be the most effective, reliable, and flexible method to determine gene expression and transcription activation at genome-wide level.\n", - " \n", - "### **Differential Expression Analysis** \n", - "Differential expression analysis is the process of statistically analyzing the normalized read count data to identify quantifiable differences in expression levels between experimental groups. A gene is said to be differentially expressed if there is a statistically significant difference or change in read counts or expression levels between two experimental conditions. Understanding the biological differences between healthy and diseased states depends on differential gene expression. With the use of this approach, researchers can choose specific gene expression targets for further investigation and pinpoint the molecular causes of phenotypic variations.\n", - " \n", - "### General Roadmap for RNA-Seq Experiment \n", - " \n", - "
\n", - "\n", - "
Fig 3: Extensive roadmap for types of computational RNA-Seq methods. [3]
\n", - "
\n", - " \n", - "
\n", - " \n", - " [3] Reference: Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., Szcześniak, M. W., Gaffney, D. J., Elo, L. L., Zhang, X., & Mortazavi, A. (2016). A survey of best practices for RNA-seq data analysis. Genome biology, 17, 13. https://doi.org/10.1186/s13059-016-0881-8
\n", - " \n", - "This figure depicts a generic strategy for computing analysis using RNA-seq. The primary analytical steps for pre-analysis, core analysis, and advanced analysis are listed above the lines. The text discusses the major analysis points for each stage that are mentioned below the lines. (a) Experimental design, sequencing design, and quality control are all processes in preprocessing. (b) Transcriptome profiling, differential gene expression, and functional profiling are fundamental analyses. (c) Advanced analysis involves data integration, visualization, and other RNA-seq technologies.\n", - " \n", - "### Some Handy Abbreviations. \n", - "+ **ChIP-seq:** Chromatin immunoprecipitation sequencing\n", - "+ **eQTL:** Expression quantitative loci\n", - "+ **sQTL:** Splicing quantitative trait loci\n", - "+ **TPM:** Transcripts per million\n", - "+ **FPKM:** Fragments per kilobase of exon model per million mapped reads\n", - "+ **RPKM:** Reads per kilobase of exon model per million reads\n", - "+ **GSEA:** Gene set enrichment analysis\n", - "+ **PCA:** Principal component analysis\n", - "+ **TF:** Transcription factor\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "1c59a399-9632-4cf0-bd12-dc52c5a23396", - "metadata": {}, - "source": [ - "### **Analysis Architecture for this Module** \n", - "\n", - "
\n", - "\n", - "
Fig 4: Analysis workflow for RNA-Seq module.
\n", - "
\n", - " \n", - "This figure represents the analysis architecture followed in this module. The module has been designed according to the resources and the availability of data. The blue box represents the pipeline that can be implemented using the Nextflow nf-core/rnaseq module. The purple box represents the data that can be directly extracted from GEO. Both the blue and purple boxes generate gene counts, and the user can implement either of the methods to generate gene counts and feed them to perform the further downstream analysis. However, Nextflow would take a lot of storage and processing power, so it is recommended to extract the data from GEO if available. If the required data is not available from GEO, then the Nextflow pipeline can be used to extract the gene counts. The downstream analysis is carried through using the R kernel of a Jupyter notebook, and all the steps are discussed in detail in this module." - ] - }, - { - "cell_type": "markdown", - "id": "d6d83b99-a263-4335-a480-cc5e7c1a5175", - "metadata": {}, - "source": [ - "## **2. Raw Reads to Gene Counts Table (Optional)** " - ] - }, - { - "cell_type": "markdown", - "id": "52c3191d-ac11-4a7e-b5b3-a4ad71dce613", - "metadata": {}, - "source": [ - "### Preprocessing Raw reads to generate gene count table using nextflow \n", - "\n", - "
\n", - "\n", - "
Fig 5: Summary of different methods under nf-core RNA-Seq pipeline. [4]
\n", - "
\n", - "\n", - "
\n", - " \n", - " [4] Reference: Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
\n", - "\n", - "\n", - "A bioinformatics pipeline called nf-core/rnaseq can be used to analyze RNA sequencing data from organisms with an annotated reference genome. This pipeline represents different stages of the analysis. It contains all the analysis steps, starting from preprocessing of the fastq data followed by genome alignment and quantification. Gene expression levels are generated from mRNA and miRNA sequencing data using RNA-Seq quantification. The next step is pseudo-alignment and quantification, followed by post-processing of the data, and then the final quality control of the input data is performed. The different colors of the pipeline represent the different methods of processing the fastq files. For example, the black line represents STAR, quantification, and salmon software usage to process the files. The user can choose any method of their choice while processing their files. \n", - "\n", - "This step is **optional** as it is the preprocessing step to let you experience generating your own gene counts table. To save on computational and storage resources, we have already provided the gene count table with this module that will be copied from our bucket in step 3. The gene counts can also be extracted from the NCBI's GEO website using the same data acccession under the supplementary files section. \n", - "\n", - "If however you want to try the nextflow analysis, here are a few tips to help you along. First, if you are not using NIH Cloud Lab as your environment, you need to configure the Nextflow Service Account following [this guide](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToCreateNextflowServiceAccount.md). Second, you will need to configure your config file to point to the Google Life Sciences API. We provide a template that you can modify with your GCP bucket (need to create one, `gsutil mb gs://UNIQUE-BUCKET-NAME` and your project ID. Read more about Nextflow and the Life Sciences API in [our guide](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToUseNextflowandGCPLifeSciences.ipynb). Again, you only need to create the service account if not using NIH Cloud Lab. For further details on how to use Nextflow for RNA Seq analysis, please refer to [nf-core/rnaseq](https://nf-co.re/rnaseq) or [rnaAssemblyMDI](https://github.com/NIGMS/rnaAssemblyMDI) module to learn more about pre-processing through Nextflow." - ] - }, - { - "cell_type": "markdown", - "id": "1a6b4a8a-732c-4cb6-a99c-934a0f9253f7", - "metadata": {}, - "source": [ - "### Install Nextflow" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "6da16610-a83f-42b5-be52-f8116db3fb03", - "metadata": {}, - "outputs": [], - "source": [ - "system('export NXF_MODE=google') \n", - "#Install nexflow, make it exceutable, and update it\n", - "system('curl https://get.nextflow.io | bash' , intern=TRUE)\n", - "system('chmod +x nextflow' , intern=TRUE)\n", - "system('./nextflow self-update' , intern=TRUE)" - ] - }, - { - "cell_type": "markdown", - "id": "3852204f-a42a-40a2-8b3f-4c153a9b0416", - "metadata": {}, - "source": [ - "**The size of the output data generated by Nextflow is quite large we can mitigate that by storing the temporary and output files to a bucket by setting the 'workDir' and 'params.outdir' to an existing bucket. Make sure you modify the file called rnaseq-gls.config**\n", - " \n", - "``` \n", - " workDir = 'gs://your_bucket_name/rna-tmp'\n", - " params.outdir = 'gs://your_bucket_name/rna-outputs'\n", - " ```\n", - " " - ] - }, - { - "cell_type": "markdown", - "id": "7b289373-fda0-483d-86ea-984b38fee3af", - "metadata": {}, - "source": [ - "This next step can take about 45 min, but since it runs serverlessly using the Life Science API, we recommend you actually paste the command into a terminal, and then run the rest of the notebook. Then you can review the output at the end and see how the different jobs were processed. Plus Nextflow looks strange in `R` so it is better in the terminal anyway. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "bb2e1f61-eea1-4464-9842-76fabae4c39a", - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "system('./nextflow run nf-core/rnaseq -c rnaseq-gcb.config -profile test,gcb', intern=TRUE)" - ] - }, - { - "cell_type": "markdown", - "id": "a2b8d218-4302-4090-9bb1-b619d1b0f178", - "metadata": {}, - "source": [ - "
\n", - " \n", - " Tip: If you don't immediately see a output on your screen check your output directory you have pointed to in your config file to insure that Nextflow is running. You should see some output directories/files.\n", - "
" - ] - }, - { - "cell_type": "markdown", - "id": "3d97711b-544c-4562-a7c5-4fafa4a69315", - "metadata": {}, - "source": [ - "## **3. Gene Counts Table to Differential Expression** " - ] - }, - { - "cell_type": "markdown", - "id": "92edbc62-6497-429b-a278-a1f5f364896e", - "metadata": {}, - "source": [ - "### Install and Load required packages " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "75f2c0da-332b-4b12-8705-897a58fc0e9f", - "metadata": {}, - "outputs": [], - "source": [ - "# Make installation paths writable so the packages install correctly.\n", - "system(\"sudo chmod -R 777 /usr/lib/R/library\", intern=TRUE)\n", - "system(\"sudo chmod -R 777 /usr/local/lib/R/site-library\", intern=TRUE)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "4ea2df9e-b6c8-4da4-bbde-4659e28b42f7", - "metadata": {}, - "outputs": [], - "source": [ - "# Intall r-base packages\n", - "packages <- c(\"hexbin\", \"tidyverse\")\n", - "\n", - "if (!requireNamespace(\"Biobase\", quietly = TRUE))\n", - " BiocManager::install(\"Biobase\")\n", - "install.packages(\"NMF\")\n", - "\n", - "# Install packages not yet installed\n", - "installed_packages <- packages %in% rownames(installed.packages())\n", - "if (any(installed_packages == FALSE)) {\n", - " install.packages(packages[!installed_packages])\n", - "}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e84050fd-7b86-4d23-8d16-20479e6040e1", - "metadata": {}, - "outputs": [], - "source": [ - "# Install BiocManager if not already installed\n", - "if (!require(\"BiocManager\", quietly = TRUE))\n", - " install.packages(\"BiocManager\")\n", - "\n", - "# Install Bioconductor packages.\n", - "packages <- c(\"BSgenome\", \"DESeq2\", \"vsn\", \"genomation\")\n", - "\n", - "# Install packages not yet installed\n", - "installed_packages <- packages %in% rownames(installed.packages())\n", - "if (any(installed_packages == FALSE)) {\n", - " BiocManager::install(packages[!installed_packages])\n", - "}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a5ea70df-98ba-4a69-aeec-2baed34a72fc", - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "library(DESeq2)\n", - "library(magrittr)" - ] - }, - { - "cell_type": "markdown", - "id": "54112688-a0dc-4cb8-8680-4d4f3da306d2", - "metadata": {}, - "source": [ - "An RNA-Seq experiment analysis involves a number of phases. Sequencing reads are analyzed first (FASTQ files). Usually, they are aligned to a reference genome. The number of reads that were mapped to each gene may then be determined. We execute statistical studies on a table of counts as a result to identify differentially expressed genes and pathways.\n", - "\n", - "### Importing RNA_seq raw counts and annotation file \n", - "\n", - "The importing file can be generated locally. It combines several samples from multiple runs. We will annotate all of these runs and file names as control since, for instance, control might have four different names for runs and files." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e49c7007-61b3-4057-89de-3f3fb1085800", - "metadata": {}, - "outputs": [], - "source": [ - "# download data files from storage bucket\n", - "system(\"gsutil cp gs://nigms-sandbox/nosi-und/RNA-Seq/GSE173380_RNAseq_counts.csv.gz .\", intern=TRUE)\n", - "system(\"gsutil cp gs://nigms-sandbox/nosi-und/RNA-Seq/sample_info.txt .\", intern=TRUE)\n", - "\n", - "readcounts <- read.table(\"GSE173380_RNAseq_counts.csv.gz\", sep = \",\", header = T, row.names = 1)\n", - "sample_info <- read.table(\"sample_info.txt\", sep = \"\\t\")" - ] - }, - { - "cell_type": "markdown", - "id": "bba34d55-a676-46e4-884f-15f1fecec1de", - "metadata": {}, - "source": [ - "
\n", - " \n", - " Note: If you've used Nextflow to produce your gene counts table and would like to use it for the down processing analysis instead of the provided counts table enter your own files into the code above by copying the salmon.merged.gene_counts.tsv from the salmon subdirectory within your Nextflow output directory.\n", - "
" - ] - }, - { - "cell_type": "markdown", - "id": "672cae94-ddd5-44c5-9f36-f0393253e63f", - "metadata": {}, - "source": [ - "### Store the information in a dataframe \n", - "The next step is to make a data frame. We will store the raw read counts, sample informations, and the conditions data into the dataframe here named as DESeq.ds." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "5fce0d5b-740b-4f9c-8479-fa60a0566532", - "metadata": {}, - "outputs": [], - "source": [ - "DESeq.ds <- DESeqDataSetFromMatrix(countData = round(readcounts), colData = sample_info, design = ~condition)\n" - ] - }, - { - "cell_type": "markdown", - "id": "18800b8f-e68a-4993-9ca6-ca95e0f4267a", - "metadata": {}, - "source": [ - "### Explore the dataframe \n", - "\n", - "It is important to investigate the dataframe to see if the raw counts uploaded correctly and all the required information for the analysis is available. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "cb7c8095-8d5c-4165-a0bf-9acfec7d97fe", - "metadata": {}, - "outputs": [], - "source": [ - "colData(DESeq.ds) %>% head\n", - "assay(DESeq.ds, \"counts\") %>% head\n", - "rowData(DESeq.ds) %>% head\n", - "counts(DESeq.ds) %>% str\n", - "rowSums(counts(DESeq.ds)) %>% head" - ] - }, - { - "cell_type": "markdown", - "id": "e4acc357-e06b-4b52-a34d-6028ae56b2e3", - "metadata": {}, - "source": [ - "### Improving the quality by removing the genes with no gene count \n", - "Enhancing the quality of the input reads is the following stage in the RNA-seq analysis workflow. When the sequencing quality is very high, this step may be viewed as optional. However, this step may still enhance the quality of the input sequences even with the highest-quality sequencing datasets. The adapter sequences that contaminate the sequenced reads and the low-quality nucleotides that are typically present at the ends of the sequences are the most frequent technical artifacts that can be filtered out.\n", - "\n", - "Before going on to the downstream analytical processes, the sequencing quality control and read pre-processing steps can be repeated several times until a suitable level of quality in the sequence data is reached." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "4fc2ab39-8608-435b-a8f9-fb6f379be65a", - "metadata": {}, - "outputs": [], - "source": [ - "DESeq.ds <- DESeq.ds[ rowSums(counts(DESeq.ds)) > 0, ]\n", - "#Inspect data after manupalation\n", - "rowSums(counts(DESeq.ds)) %>% head\n", - "colSums(readcounts)\n", - "colSums(counts(DESeq.ds)) \n", - "#colSums(counts(DESeq.ds)) and colSums(readcounts) as we only removed the genes that did not express in any of the samples. " - ] - }, - { - "cell_type": "markdown", - "id": "575d6a2d-2ae4-4d50-826c-c286c3d2f2b8", - "metadata": {}, - "source": [ - "### Normalizing the read counts\n", - "The read counts are normalized by computing size factors, which addresses the differences not only in the library sizes, but also the library compositions.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "40aa5366-4c12-4443-99a9-39ac707639d3", - "metadata": {}, - "outputs": [], - "source": [ - "# Get the size factor using estimateSizeFactors from DESeq.\n", - "DESeq.ds <- estimateSizeFactors(DESeq.ds)\n", - "# Check that the size factor has been added to the dataframe replacing raw reads.\n", - "sizeFactors(DESeq.ds)\n", - "# colData now contains the normalized reads in the form of sizeFactors\n", - "colData(DESeq.ds)\n", - "# We can also retrieve the normalized read counts using counts() function\n", - "counts.sf_normalized <- counts(DESeq.ds, normalized = TRUE)" - ] - }, - { - "cell_type": "markdown", - "id": "ae15406a-d20b-4a9f-9d1c-9878980fa99e", - "metadata": {}, - "source": [ - "The next step is to transform the size-factor normalized read counts to log2 scale. If the read counts are further changed to log scale after normalization, most downstream analyses perform better. This is a result of the RNA-seq data's unusually wide range of expression values, which may be explored and visualized in various ways." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "38523c0a-b9af-49ca-b7e8-2dbb4604fe5c", - "metadata": {}, - "outputs": [], - "source": [ - "# transform size-factor normalized read counts to log2 scale using pseudocount of 1\n", - "log.norm.counts <- log2(counts.sf_normalized + 1)\n", - "par(mfrow=c(2,1)) # to plot the following two images parallel.\n", - "\n", - "# first, boxplots of non-transformed read counts (one per sample)\n", - "boxplot(counts.sf_normalized, notch = TRUE,\n", - " main = \"Untransformed Read Counts\", ylab = \"read counts\")\n", - "\n", - "# box plots of log2-transformed read counts\n", - "boxplot(log.norm.counts, notch = TRUE,\n", - " main = \"log2-Transformed Read Counts\",\n", - " ylab = \"log2(read counts)\")" - ] - }, - { - "cell_type": "markdown", - "id": "af43d13e-3e90-452a-ab28-5235b53ccbd7", - "metadata": {}, - "source": [ - "Numerous statistical tests and analyses make the assumption that the data is homoskedastic, or that the variance of each variable is identical. But heteroskedastic behavior frequently appears in data with significant variations in the sizes of the individual observations. Plotting the mean vs. the standard deviation is one method for visually examining heteroskedasticity. Some variability is expected, but if there is a hump as in these data, it means that the variance is influenced by the mean, which goes against the homoskedasticity assumption." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a283f6d7-3370-41e7-a85c-c54c695f721f", - "metadata": {}, - "outputs": [], - "source": [ - "# mean-sd plot\n", - "library(vsn)\n", - "library(ggplot2)\n", - "library(hexbin)\n", - "\n", - "msd_plot <- meanSdPlot(log.norm.counts, \n", - " ranks=FALSE, # show the data on the original scale\n", - " plot = FALSE)\n", - "msd_plot$gg + \n", - " ggtitle(\"Sequencing Depth Normalized log2(read counts)\") +\n", - " ylab(\"standard deviation\")" - ] - }, - { - "cell_type": "markdown", - "id": "0f9019fd-d1ee-46a5-a25a-e85e1b6f2212", - "metadata": {}, - "source": [ - "Utilizing DESeq's rlog() method, we will lower the heteroskedasticity. Rlog() translates numbers to log2 scale and normalizes for sequencing depth." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8f444ca6-b525-4e2e-9f9e-4363663b45fc", - "metadata": {}, - "outputs": [], - "source": [ - "# Regularized log-transformed values\n", - "DESeq.rlog <- rlog(DESeq.ds, blind = TRUE)\n", - "rlog.norm.counts <- assay(DESeq.rlog)\n", - "# Mean-SD plot for rlog-transformed data\n", - "msd_plot <- meanSdPlot(rlog.norm.counts, \n", - " ranks=FALSE, \n", - " plot = FALSE)\n", - "msd_plot$gg + \n", - " ggtitle(\"rlog-Transformed Read Counts\") +\n", - " ylab(\"standard deviation\")" - ] - }, - { - "cell_type": "markdown", - "id": "df199c4c-b6e2-4a09-8e8e-8546a4f89d81", - "metadata": {}, - "source": [ - "### Hierarchical clustering \n", - "As an exploratory tool, clustering RNA-seq data enables the user to arrange and visualize correlations between groups of genes and to choose particular genes for further analysis. Based on chosen traits, this approach aims to isolate reasonably homogeneous gene groupings. Here, we are performing Pearson method of clustering." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8ca292e8-27a2-4336-b047-07751cef499d", - "metadata": {}, - "outputs": [], - "source": [ - "# cor() calculates the correlation between columns of a matrix\n", - "distance.m_rlog <- as.dist(1 - cor(rlog.norm.counts, method = \"pearson\" ))\n", - "# plot() can directly interpret the output of hclust()\n", - "plot( hclust(distance.m_rlog), \n", - " labels = colnames(rlog.norm.counts),\n", - " main = \"rlog Transformed Read Counts\\nDistance: Pearson Correlation\")" - ] - }, - { - "cell_type": "markdown", - "id": "dde86016-3ede-49a2-8226-e3c19942b757", - "metadata": {}, - "source": [ - "The outcomes of differential gene expression analysis are frequently plotted in shapes like volcano, MA, and heatmaps, while many other types of analysis can be carried out as well. These plots enable one to look more closely at the output list of differentially expressed genes and perhaps find or further explore prospective genes.\n", - "\n", - "### PCA using DESeq \n", - "Lets generate a PCA plot to visualize the clustering of the replicates as scatter plot in a two dimension plot.\n", - "\n", - "A PCA plot biological reproducibility of the sample replicates might be used to make a final diagnosis. We must take the normalized counts out of the DESeqDataSet object in order to plot the PCA findings. To see if the duplicates cluster properly, it is feasible to color the points in the scatter plot according to the relevant variable." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b38f5fab-4b48-4e74-9e23-793e202312dc", - "metadata": {}, - "outputs": [], - "source": [ - "P <- plotPCA(DESeq.rlog)\n", - "P <- P + theme_bw() + ggtitle(\"Rlog Transformed Counts\")\n", - "print(P)" - ] - }, - { - "cell_type": "markdown", - "id": "9c417748-487c-43fa-9fa4-a7f839617c78", - "metadata": {}, - "source": [ - "### Differential Gene Expression Analysis (DGE) \n", - "One of the most popular uses of RNA-sequencing (RNA-seq) data is differential gene expression (DGE) analysis. This procedure is frequently utilized in various RNA-seq data analysis applications because it enables the identification of genes that are differentially expressed across two or more conditions.\n", - "\n", - "Internal normalization is carried out by DESeq2 in which the geometric mean for each gene across all samples is determined. Then, this mean is divided by the gene counts in each sample. The size factor for a sample is the median of these ratios in that sample." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f36513db-22f6-417f-aaa8-39ba1837ca35", - "metadata": {}, - "outputs": [], - "source": [ - "# DESeq2 uses the levels of the condition to determine the order of the comparison\n", - "str(colData(DESeq.ds)$condition)\n", - "# set the first-level-factor\n", - "colData(DESeq.ds)$condition <- relevel(colData(DESeq.ds)$condition , \"control\")\n", - "# Finally run the DESeq analysis\n", - "DESeq.ds <- DESeq(DESeq.ds)" - ] - }, - { - "cell_type": "markdown", - "id": "9b7a4189-9a0e-4f68-b23a-4f042633290d", - "metadata": {}, - "source": [ - "### Explore the DGE analysis results " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "17827bd4-0938-49e8-9f3a-b4e979fedb3a", - "metadata": {}, - "outputs": [], - "source": [ - "#Check the results of deseq analysis\n", - "DGE.results <- results(DESeq.ds, independentFiltering = TRUE, alpha = 0.05)\n", - "summary(DGE.results)" - ] - }, - { - "cell_type": "markdown", - "id": "c91efec4-cdd8-4fd2-8ddf-f9b87c0bcab8", - "metadata": {}, - "source": [ - "### P-value Histogram \n", - "Our DE results can be quickly and easily \"sanity checked\" by creating a p-value histogram. A high bar between 0 and 0.05 should be seen, followed by a somewhat uniform tail to the right." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "39a986b0-3ee2-436a-927b-37f539e7c627", - "metadata": {}, - "outputs": [], - "source": [ - "#Histogram\n", - "hist(DGE.results$pvalue, \n", - " col = \"grey\", border = \"white\", xlab = \"\", ylab = \"\",\n", - " main = \"Frequencies of P-values (all genes)\")" - ] - }, - { - "cell_type": "markdown", - "id": "63fb1f85-9820-4725-9298-149f57c82a23", - "metadata": {}, - "source": [ - "### MA Plot\n", - "MA plot is helpful to determine whether the data normalization was successful. The MA plot is a scatter plot where the y-axis represents the log fold change in the specified contrast and the x-axis represents the average of normalized counts across samples. Since most genes are not anticipated to have differential expression, the majority of points are predicted to be on the horizontal 0 line." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "079404d0-d179-4bdf-8a2a-63e135b68c45", - "metadata": {}, - "outputs": [], - "source": [ - "#MA plot\n", - "plotMA(DGE.results, alpha = 0.05, main = \"Control vs Test\", ylim = c(-4,4))" - ] - }, - { - "cell_type": "markdown", - "id": "09b12790-7e7f-4331-8671-831e5f347d0c", - "metadata": {}, - "source": [ - "### Heatmap\n", - "
\n", - " \n", - " Note: Heatmap will be saved as a PDF file in the current working directory.
\n", - "\n", - "This step shows how to make a heatmap of the top genes in an RNA-seq dataset that have differential expression. To do this we need to extract the differentially expressed genes from the DE results. Heat Maps help viewers focus on the parts of data visualizations that matter most by helping to better visualize the volume of locations and events inside a dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "5bdda266-08e2-47fd-bdd8-f06cd4dd7f0c", - "metadata": {}, - "outputs": [], - "source": [ - "#HEATMAP\n", - "# load the library with the aheatmap() function\n", - "library(NMF)\n", - "\n", - "# aheatmap needs a matrix of values, e.g., a matrix of DE genes with the transformed read counts for each replicate\n", - "# sort the results according to the adjusted p-value\n", - "DGE.results.sorted <- DGE.results[order(DGE.results$padj), ]\n", - "# identify genes with the desired adjusted p-value cut-off\n", - "DGEgenes <- rownames(subset(DGE.results.sorted, padj < 0.05))\n", - "# extract the normalized read counts for DE genes into a matrix\n", - "hm.mat_DGEgenes <- log.norm.counts[DGEgenes , ]\n", - "#plot the normalized read counts of DE genes sorted by the adjusted p-value\n", - "pdf(\"plot.pdf\")\n", - "aheatmap(hm.mat_DGEgenes, Rowv = NA, Colv = NA)\n", - "# combine the heatmap with hierarchical clustering\n", - "image1 <- aheatmap(hm.mat_DGEgenes, Rowv = TRUE, Colv = TRUE, # add dendrograms to rows and columns \n", - " distfun = \"euclidean\", hclustfun = \"average\")\n", - "\n", - "\n", - "# scale the read counts per gene to emphasize the sample-type-specific differences\n", - "aheatmap(hm.mat_DGEgenes ,\n", - " Rowv = TRUE , Colv = TRUE ,\n", - " distfun = \"euclidean\", hclustfun = \"average\",\n", - " scale = \"row\") \n", - "dev.off()\n", - "#values are transformed into distances from the center of the \n", - "#row-specific average: (actual value - mean of the group) / standard deviation" - ] - }, - { - "cell_type": "markdown", - "id": "f869e5f6-8f42-4c1a-aad8-e8ff3fec2eda", - "metadata": {}, - "source": [ - "### Write the DGE results to a text file " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "38f62bcf-3150-417e-bebb-787763db04aa", - "metadata": {}, - "outputs": [], - "source": [ - "write.table(DGE.results.sorted, file=\"rna-seq_dge-results.txt\", sep = \"\\t\")" - ] - }, - { - "cell_type": "markdown", - "id": "5fcae12b-9eec-4a07-ac0c-094c2c673ffe", - "metadata": {}, - "source": [ - "
" - ] - }, - { - "cell_type": "markdown", - "id": "919d00c4-2247-4f94-aa4f-b7cf6c0334d3", - "metadata": {}, - "source": [ - "### References and useful links " - ] - }, - { - "cell_type": "markdown", - "id": "3c6057e8-88ec-4e2e-ab95-01de6ed74d0d", - "metadata": {}, - "source": [ - "- #### https://bioinformatics-core-shared-training.github.io/RNAseq_May_2020_remote/html/05_Annotation_and_Visualisation.html\n", - "- #### https://bioinformatics-core-shared-training.github.io/cruk-summer-school-2018/RNASeq2018/html/02_Preprocessing_Data.nb.html \n", - "- #### https://girke.bioinformatics.ucr.edu/GEN242/tutorials/sprnaseq/sprnaseq/ \n", - "- #### https://compgenomr.github.io/book/rnaseqanalysis.html" - ] - } - ], - "metadata": { - "environment": { - "kernel": "ir", - "name": "common-cpu.m109", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m109" - }, - "kernelspec": { - "display_name": "R", - "language": "R", - "name": "ir" - }, - "language_info": { - "codemirror_mode": "r", - "file_extension": ".r", - "mimetype": "text/x-r-source", - "name": "R", - "pygments_lexer": "r", - "version": "4.2.3" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/01-RNA-Seq/.ipynb_checkpoints/rnaseq-gcb-checkpoint.config b/01-RNA-Seq/.ipynb_checkpoints/rnaseq-gcb-checkpoint.config deleted file mode 100644 index 659154f..0000000 --- a/01-RNA-Seq/.ipynb_checkpoints/rnaseq-gcb-checkpoint.config +++ /dev/null @@ -1,20 +0,0 @@ -profiles { - gcb { - //the three lines below are to run the test data included with rnaseq-nf - //params.transcriptome = 'gs://rnaseq-nf/data/ggal/transcript.fa' - //params.reads = 'gs://rnaseq-nf/data/ggal/gut_{1,2}.fq' - //params.multiqc = 'gs://rnaseq-nf/multiqc' - //everything else below is required for google life sciences API profile - process.executor = 'google-batch' - process.container = 'quay.io/nextflow/rnaseq-nf:v1.1' - google.location = 'us-central1' - google.region = 'us-central1' - // project needs to be updated to your own project id - google.project = '' - // update your out and work dirs with your bucket name, feel free to look at these buckets at the end - // to get a sense of how Nextflow organizes files - params.outdir = 'gs:///gse173380-rnaseq/results' - workDir = 'gs:///gse173380-rnaseq/work' - - } -} \ No newline at end of file diff --git a/01-RNA-Seq/RNA-seq.ipynb b/01-RNA-Seq/RNA-seq.ipynb index 098fdaf..c5b4e63 100644 --- a/01-RNA-Seq/RNA-seq.ipynb +++ b/01-RNA-Seq/RNA-seq.ipynb @@ -16,9 +16,7 @@ { "cell_type": "markdown", "id": "0d86bb1c-96d9-495a-9382-c380ae075451", - "metadata": { - "tags": [] - }, + "metadata": {}, "source": [ "## Pre-Module Flashcards to Revise the Basics " ] @@ -27,9 +25,7 @@ "cell_type": "code", "execution_count": null, "id": "f0e9ac26-e308-471a-83a4-113f767524b0", - "metadata": { - "tags": [] - }, + "metadata": {}, "outputs": [], "source": [ "IRdisplay::display_html('')" @@ -38,9 +34,7 @@ { "cell_type": "markdown", "id": "b491749f-1501-40c9-ac0e-188ce48d8935", - "metadata": { - "tags": [] - }, + "metadata": {}, "source": [ "## **1. Introduction** \n", "### Central Dogma of Molecular Biology. \n", @@ -200,10 +194,7 @@ "cell_type": "code", "execution_count": null, "id": "bb2e1f61-eea1-4464-9842-76fabae4c39a", - "metadata": { - "scrolled": true, - "tags": [] - }, + "metadata": {}, "outputs": [], "source": [ "system('./nextflow run nf-core/rnaseq -c rnaseq-gcb.config -profile test,gcb', intern=TRUE)" @@ -294,10 +285,7 @@ "cell_type": "code", "execution_count": null, "id": "a5ea70df-98ba-4a69-aeec-2baed34a72fc", - "metadata": { - "scrolled": true, - "tags": [] - }, + "metadata": {}, "outputs": [], "source": [ "library(DESeq2)\n", @@ -753,27 +741,7 @@ ] } ], - "metadata": { - "environment": { - "kernel": "ir", - "name": "r-cpu.4-2.m110", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/r-cpu.4-2:m110" - }, - "kernelspec": { - "display_name": "R", - "language": "R", - "name": "ir" - }, - "language_info": { - "codemirror_mode": "r", - "file_extension": ".r", - "mimetype": "text/x-r-source", - "name": "R", - "pygments_lexer": "r", - "version": "4.2.3" - } - }, + "metadata": {}, "nbformat": 4, "nbformat_minor": 5 } diff --git a/02-RRBS/.ipynb_checkpoints/RRBS-downstream-checkpoint.ipynb b/02-RRBS/.ipynb_checkpoints/RRBS-downstream-checkpoint.ipynb deleted file mode 100644 index d5d88a5..0000000 --- a/02-RRBS/.ipynb_checkpoints/RRBS-downstream-checkpoint.ipynb +++ /dev/null @@ -1,934 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "d2701e45-708c-4879-b3b1-1141509f3cb9", - "metadata": {}, - "source": [ - "# Module 2: DNA Methylation Analysis \n", - "## Module Overview \n", - "\n", - "+ [Introduction](#IN)\n", - "+ [Raw Reads to Methylation Coverage File](#RR)\n", - "+ [Methylation Coverage to Differential Methylation ](#MC)\n" - ] - }, - { - "cell_type": "markdown", - "id": "bfce5f63-c32b-408b-844d-a240ab930b18", - "metadata": {}, - "source": [ - "Watch [this video](https://youtu.be/_T46fuV7qYw) to learn more about this submodule." - ] - }, - { - "cell_type": "markdown", - "id": "3d3fbfff-c92b-461b-a049-9c274888d555", - "metadata": {}, - "source": [ - "## **1. Introduction** \n", - "\n", - "### What is Epigenetics? \n", - "+ Changes in gene expression caused by mechanisms other than changes in the underlying DNA sequence.\n", - "+ Enables a cell/organism to respond to its dynamic external environment during development and throughout life.\n", - "+ Epigenetic changes to the genome can be inherited if these changes occur in cells giving rise to gametes.\n", - " \n", - "### Epigenetics Mechanisms \n", - "+ DNA Methylation\n", - "+ Histone Modification\n", - " \n", - "### DNA Methylation \n", - "DNA Methylation is an epigenetic mechanism that can control gene regulation. Methylation involves the transfer of a methyl group and typically occurs at the CpG dinucleotides in vertebrates. The locations of methylated DNA, including hyper- and hypo- methylated DNA can give information on different diseases and allow researchers to predict and study these diseases. Bisulfite sequencing is a technique that can determine the patterns of DNA methylation. Bisulfite sequencing has a single-base resolution that allows researchers to study the methylation patterns at the base level. \n", - "\n", - "The addition of methyl groups to DNA, mostly CpG sites, is to convert cytosine to 5-methylcytosine. DNA methylation at promoter regions can impede target gene expression. CpG sites are regions of DNA where a cytosine nucleotide occurs next to a guanine nucleotide in the linear sequence of bases along its length. \"CpG\" stands for cytosine and guanine separated by a phosphate (—C—phosphate—G—), which links the two nucleosides together in DNA. Methyl groups attached to DNA affects accessibility of genes to transcription proteins. Highly methylated DNA stays tightly wound around histones, preventing RNA polymerase binding and gene transcription. Low methylation loosens the coils and make the DNA accessible to RNA polymerase, allowing gene transcription.\n", - "\n", - "
\n", - "\n", - "
Fig 1: Affect of epigenetic mechanisms on health. [1]
\n", - " \n", - "
\n", - " \n", - "
\n", - " \n", - " [1] Reference: https://commonfund.nih.gov/epigenomics/figure
\n" - ] - }, - { - "cell_type": "markdown", - "id": "adb9e9d3-762b-488c-a575-6cd1191910fd", - "metadata": {}, - "source": [ - "### DNA Methylation Flashcards \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "98324c9c-3bed-41c9-9294-4e77d826aaa4", - "metadata": {}, - "outputs": [], - "source": [ - "IRdisplay::display_html('')" - ] - }, - { - "cell_type": "markdown", - "id": "6e5e9c78-6ccc-4390-897f-5c699fda18c6", - "metadata": {}, - "source": [ - "### Analysis Architecture for Differential Methylation Analysis \n", - "\n", - "
\n", - "\n", - "
Fig 2: Analysis Architecture for Differential Methylation Analysis.
\n", - "
\n", - " \n", - "This figure represents the analysis architecture followed in this module. The module has been designed according to the resources and the availability of data. The blue box represents the pipeline that can be implemented using the Nextflow nf-core/methylseq module. The purple box represents the data that can be directly extracted from GEO. Both the blue and purple boxes generate a methylation coverage file, and the user can implement either of the methods to generate gene counts and feed them to perform the further downstream analysis. However, Nextflow would take a lot of storage and processing power, so it is recommended to extract the data from GEO if available. If the required data is not available from GEO, then the Nextflow pipeline can be used to extract the gene counts. The downstream analysis is carried through using the R kernel of a Jupyter notebook, and all the steps are discussed in detail in this module." - ] - }, - { - "cell_type": "markdown", - "id": "4563e871-4276-4b46-acf5-a08629408e0b", - "metadata": {}, - "source": [ - "## **2. Raw Reads to Methylation Coverage File (Optional)** \n", - "
\n", - "\n", - "
Fig 3: Flowchart for converting raw read to methylation coverage.
\n", - "
\n", - " \n", - "This figure represents the analysis architecture followed in this module. The module has been designed according to the resources and the availability of data. The analysis steps represent the pipeline that can be implemented using the Nextflow nf-core/methylseq module. In this figure, the analysis steps to perform methyl seq are shown. Now, there are two different workflows that can be followed to implement this pipeline. The first one is Bismark workflow, where it shows all the tools which can be used for each step of the analysis. We have a similar tools list for each step for the bwa-meth workflow. Both of them are very popular workflows to implement methylseq pipeline.\n", - " \n", - "The sample command to run nf-core methylseq pipeline to generate quality control reports and extract methylation call and coverage file is provided below. #### This step is optional as it is the preprocessing step to let you experience generating your own methylation coverage file. To save on computational and storage resources, we have already provided the methylation coverage file you will use in the down processing analysis in step 3. \n", - " \n", - "If you choose to generate your own methylation coverage file then refer to the instructions outlined in the RNAseq submodule, and refer to the nf-core [methylseq](https://nf-co.re/methylseq). Again, you will need to modify the config file to include your bucket and project ID. " - ] - }, - { - "cell_type": "markdown", - "id": "3e663878-cd15-446d-873b-80b6e57e82b4", - "metadata": {}, - "source": [ - "**The size of the output data generated by Nextflow is large we can mitigate that by storing the temporary and output files to a bucket by setting the 'workDir' and 'params.outdir' to a existing bucket:**\n", - " \n", - "``` \n", - " workDir = 'gs://your_bucket_name/meth-tmp'\n", - " params.outdir = 'gs://your_bucket_name/meth-outputs'\n", - " ```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "6f4ade6c-85ce-46c6-ab25-ffe6cb6fabb7", - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "#This step can take up to 20 min depending on the machine-type and input files.\n", - "system('../01-RNA-Seq/nextflow run nf-core/methylseq -c rrbs-gcb.config -profile test,gcb', intern=TRUE)" - ] - }, - { - "cell_type": "markdown", - "id": "5b087eb5-a75f-4b5c-9497-ee67d7421abd", - "metadata": {}, - "source": [ - "
\n", - " \n", - " Tip: If you don't immediately see a output on your screen check your output directory you have pointed to in your config file to insure that Nextflow is running. You should see some output directories/files.\n", - "
" - ] - }, - { - "cell_type": "markdown", - "id": "07b395cf-8f00-4aa7-8f91-2fdc8f60ae17", - "metadata": {}, - "source": [ - "## **3. Methylation Coverage to Differential Methylation** " - ] - }, - { - "cell_type": "markdown", - "id": "6df8c78d-d65b-4db6-8f81-7f74840c93f8", - "metadata": {}, - "source": [ - "### Install the required packages. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c0a10c0c-f13a-40e3-bd08-54529934e720", - "metadata": {}, - "outputs": [], - "source": [ - "# Intall r-base packages\n", - "packages <- c(\"ggplot2\", \"ggforce\", \"tidyverse\")\n", - "\n", - "# Install packages not yet installed\n", - "installed_packages <- packages %in% rownames(installed.packages())\n", - "if (any(installed_packages == FALSE)) {\n", - " install.packages(packages[!installed_packages])\n", - "}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f559b564-bf2a-416f-b7cf-71ef61c40d41", - "metadata": {}, - "outputs": [], - "source": [ - "# Install BiocManager if not already installed\n", - "if (!require(\"BiocManager\", quietly = TRUE))\n", - " install.packages(\"BiocManager\")\n", - "\n", - "## Install Bioconductor packages.\n", - "packages <- c(\"methylKit\", \"GenomicRanges\", \"genomation\")\n", - "\n", - "# Install packages not yet installed\n", - "installed_packages <- packages %in% rownames(installed.packages())\n", - "if (any(installed_packages == FALSE)) {\n", - " BiocManager::install(packages[!installed_packages])\n", - "}" - ] - }, - { - "cell_type": "markdown", - "id": "c4ef3b09-b581-4478-a418-431c8fcf04c2", - "metadata": {}, - "source": [ - "### Load packages " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f0a92e51-b404-4020-94e5-66fcbdbf1459", - "metadata": {}, - "outputs": [], - "source": [ - "library(\"methylKit\")\n", - "library(\"GenomicRanges\")\n", - "library(\"genomation\")" - ] - }, - { - "cell_type": "markdown", - "id": "97aabbdb-3070-435d-a424-d0bb42196e6a", - "metadata": {}, - "source": [ - "### Reading Methylation Call Files and Design Experiment\n", - "The sample files are collected in an R list object and then loaded into methylKit using the methRead function. methRead loads all of the methylation files into a methylRawList object and sample location, IDs, assembly, treatment, and context should be supplied in this function" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "86802238-2889-4e39-998d-0998eb7263e1", - "metadata": {}, - "outputs": [], - "source": [ - "# download data files from storage bucket\n", - "system(\"gsutil -m cp gs://nigms-sandbox/nosi-und/RRBS/*.gz .\", intern=TRUE)\n", - "\n", - "file.list=list(\"GSM5266860_CD_NP1.txt.gz\",\n", - " \"GSM5266861_CD_NP2.txt.gz\",\n", - " \"GSM5266862_CD_NP3.txt.gz\", \n", - " \"GSM5266863_CD_P1.txt.gz\", \n", - " \"GSM5266864_CD_P2.txt.gz\", \n", - " \"GSM5266865_CD_P3.txt.gz\", \n", - " \"GSM5266866_BN_NP1.txt.gz\", \n", - " \"GSM5266867_BN_NP2.txt.gz\",\n", - " \"GSM5266868_BN_NP3.txt.gz\", \n", - " \"GSM5266869_BN_P1.txt.gz\",\n", - " \"GSM5266870_BN_P2.txt.gz\",\n", - " \"GSM5266871_BN_P3.txt.gz\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b72c5799-5a34-47cc-b01f-24ee7fab1fcf", - "metadata": {}, - "outputs": [], - "source": [ - "myobj=methRead(file.list,\n", - " sample.id=list(\"CD_NP1\",\"CD_NP2\",\"CD_NP3\",\"CD_P1\",\"CD_P2\",\"CD_P3\",\"BN_NP1\",\"BN_NP2\",\"BN_NP3\",\"BN_P1\",\"BN_P2\",\"BN_P3\"),\n", - " assembly=\"Rnor_6.0\",\n", - " treatment=c(0,0,0,0,0,0,1,1,1,1,1,1),\n", - " context=\"CpG\"\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "db5f3fcd-f60d-4d84-aae3-f70844bf3c1c", - "metadata": {}, - "source": [ - "
\n", - " \n", - " Note: If you've used Nextflow to produce your methylation coverage files and would like to use them for the down processing analysis instead of the test data provided enter your own files into the two previous code cells above with by copying them from the bismark subdirectory within your Nextflow outputs directory.\n", - "
" - ] - }, - { - "cell_type": "markdown", - "id": "c958528b-e51e-49a9-9d62-5ff378165d89", - "metadata": {}, - "source": [ - "### Data Filtration and Exploratory Analysis \n", - "#### Descriptive Statistics\n", - "Once the data has been collected into a single object, we now look at the basic statistics for each sample. Basic statistics can include the percentage methylation and the coverage. Percentage methylation histograms normally have peaks on both of the distribution's ends. Within a cell, cytosines are either methylated or unmethylated. Using this knowledge, we can determine if there is a similar pattern between many cells for locations with high methylation, low methylation, and intermediate methylation. Typically, there should be a higher number of locations with high methylation and low methylation, and a lower number of locations with intermediate methylation. Bisulfite sequencing does have a relatively high error rate and because of this, samples between 0% and 10% are typically classified as \"unmethylated\" while samples between 90% and 100% are classified as \"fully methylated\", though these thresholds are not fixed." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f0ec17c4-819f-44ec-9776-c22d4b78166a", - "metadata": {}, - "outputs": [], - "source": [ - "# Get a histogram of the methylation percentage per sample\n", - "# Here for sample 2\n", - "getMethylationStats(myobj[[2]],plot=TRUE,both.strands=FALSE)" - ] - }, - { - "cell_type": "markdown", - "id": "861df9be-2b38-4ae4-a21f-5d1cab64bea2", - "metadata": {}, - "source": [ - "Experiments that are suffering from PCR duplication bias will have a secondary peak towards the right hand side of the coverage histogram." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2c95887c-37ad-4d02-a9a7-d679573aad39", - "metadata": {}, - "outputs": [], - "source": [ - "#Histogram of methylation coverage\n", - "getCoverageStats(myobj[[2]],plot=TRUE,both.strands=FALSE)" - ] - }, - { - "cell_type": "markdown", - "id": "b12a7a43-cb20-4ffc-8aeb-475098315b1c", - "metadata": {}, - "source": [ - "### Filter Step\n", - "Filtering samples based on coverage can often be useful. Specifically, if samples have overamplification or PCR bias, it can be useful to discard bases that have a very high read coverage. Bases with a very low read coverage should also be discarded because they tend to produce statistics that are unreliable and unstable in the downstream analyses. The code shown below filters a methylRawList and discards bases that have covereage below 10 reads, which was already done when the files were read in. Additionally, the code below discards bases with more than 99.9th percentile coverage in each sample." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d5770278-0aaf-46f6-80a5-3237b8e46a50", - "metadata": {}, - "outputs": [], - "source": [ - "filtered.myobj=filterByCoverage(myobj,lo.count=10,lo.perc=NULL,\n", - " hi.count=NULL,hi.perc=99.9)" - ] - }, - { - "cell_type": "markdown", - "id": "0aa5d39a-cf2b-4990-a520-53e21808b8ad", - "metadata": {}, - "source": [ - "### Normalization \n", - "Basic normalization of the coverage values between samples can be performed using a scaling factor. This scaling factor is derived from differences in the median coverage distributions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "20014e56-6224-4666-85bb-bd4b1c01217a", - "metadata": {}, - "outputs": [], - "source": [ - "myobj.filt.norm <- normalizeCoverage(filtered.myobj, method = \"median\")" - ] - }, - { - "cell_type": "markdown", - "id": "d6653eb4-ec29-4d68-bd69-aae43950d73e", - "metadata": {}, - "source": [ - "### Merging samples into a single table\n", - "Before further analysis can be performed, bases that are covered by the reads need to be extracted for all samples. The unite() function merges all of the samples into one object covering the base-pair locations in all samples. Setting destrand=TRUE (the default is FALSE) will merge reads on both strands of a CpG dinucleotide. This provides better coverage, but only advised when looking at CpG methylation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "612dca8c-ea6c-4736-9183-c1434636b062", - "metadata": {}, - "outputs": [], - "source": [ - "## we use :: notation to make sure unite() function from only methylKit package is called\n", - "meth=unite(myobj.filt.norm, destrand=FALSE)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ce50e2b8-956c-4116-a9ba-d5126382b5ac", - "metadata": {}, - "outputs": [], - "source": [ - "# creates a methylBase object, where only CpGs covered with at least 1 sample per group will be returned\n", - "meth.min=unite(myobj,min.per.group=1L)" - ] - }, - { - "cell_type": "markdown", - "id": "caf5f15d-4df2-44b4-b357-0e8482eed87e", - "metadata": {}, - "source": [ - "### Filtering CpGs \n", - "Many CpG sites with little to no variation among study subject are often present in high-throughput methylation data, which is not very informative for downstream analyses. Standard deviation filtering of methylation ratio values (equivalent to Beta values), is the most commonly used and simple method. This method has been shown to be consistent and robust for use in different real datasets and on most occasions will suffice." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "1004e74f-1edd-4de0-94f5-64fd4a5c4dbc", - "metadata": {}, - "outputs": [], - "source": [ - "# get percent methylation matrix\n", - "pm=percMethylation(meth) \n", - "\n", - "# calculate standard deviation of CpGs\n", - "sds=matrixStats::rowSds(pm)\n", - "\n", - "# Visualize the distribution of the per-CpG standard deviation\n", - "# to determine a suitable cutoff\n", - "hist(sds, breaks = 100, col=\"cornflowerblue\", xlab=\"Std. dev. per CpG\")\n", - "\n", - "# keep only CpG with standard deviations larger than 2%\n", - "meth <- meth[sds > 2]\n", - "\n", - "# Check the remaining number of CpGs\n", - "nrow(meth)" - ] - }, - { - "cell_type": "markdown", - "id": "2f159ea5-b7d7-4913-ab4d-9ab49ed4ca24", - "metadata": {}, - "source": [ - "C -> T mutations can be further removed because they do not represent true bisulfite-treatment-associated conversions. We can store mutation locations in a GRanges object and we can use the object to remove the overlapping CpGs with the mutations. To perform the overlap operation, we convert the methylKit object to a GRanges object and perform filtering using the %over% function. This results in a returned methylKit object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c885aeda-cc40-4c7a-9a4d-9b6bcc0bed77", - "metadata": {}, - "outputs": [], - "source": [ - "library(GenomicRanges)\n", - "# example SNP\n", - "mut=GRanges(seqnames=c(\"chr21\",\"chr21\"),\n", - " ranges=IRanges(start=c(9853296, 9853326),\n", - " end=c( 9853296,9853326)))\n", - "\n", - "# select CpGs that do not overlap with mutations\n", - "sub.meth=meth[! as(meth,\"GRanges\") %over% mut,]\n", - "nrow(meth)\n", - "nrow(sub.meth)" - ] - }, - { - "cell_type": "markdown", - "id": "da34db55-7742-4bb5-8e96-1866e9245a7a", - "metadata": {}, - "source": [ - "### Data Structures and Outlier Detection \n", - "We can check the correlation between samples using getCorrelation. This function will plot scatter plots with Pearson correlation coefficients." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "dfaffe07-8847-4d33-96c7-bdde55b4db8c", - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "getCorrelation(meth,plot=TRUE)" - ] - }, - { - "cell_type": "markdown", - "id": "30f7a30a-8a08-4da4-9da6-e6a417d2be6d", - "metadata": {}, - "source": [ - "### Clustering Analysis \n", - "The data structure can additionally be visualized in a dendrogram using hierarchical clustering of distance measures derived from each samples’ percentage methylation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3238ad84-c358-47de-882c-82e94983630c", - "metadata": {}, - "outputs": [], - "source": [ - "clusterSamples(meth, dist=\"correlation\", method=\"ward.D2\", plot=TRUE)" - ] - }, - { - "cell_type": "markdown", - "id": "9c855550-c139-423e-8850-3aaf1e450688", - "metadata": {}, - "source": [ - "### Principal Component Analysis \n", - "We can also visualize the data through plotting the samples in a principal component space. Multidimensional data (i.e. we have as many dimensions in this data as there are CpG loci in meth) can be projected in into the PCA plot's 2- or 3- dimensional space, while maintaining as much variation as possible. In the PCA space, samples that are more alike will be clustered together, and with this plot we can identify the largest sources of variation in the data as well as if there are sample swaps or outlier samples." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "35fb9215-b031-420a-b22b-4cff3e255d61", - "metadata": {}, - "outputs": [], - "source": [ - "pc=PCASamples(meth,obj.return = TRUE, scale=FALSE, screeplot = FALSE, comp=c(1,2), transpose=TRUE)\n", - "summary(pc)" - ] - }, - { - "cell_type": "markdown", - "id": "6b275030-1b44-4c71-8bca-0617273c4c02", - "metadata": {}, - "source": [ - "### Differential Methylation \n", - "### Single CpG Sites\n", - "Once we have confirmed that the basic statistics and data structures of the samples are reasonable, we can proceed to differential methylation. Differential DNA methylation is usually calculated by comparing the proportion of methylated Cs in a test sample relative to a control. The Fisher's Exact Test and similar methods can be applied when there are no replicates for the test and control cases. This can allow us to make simple comparisons between the pairs of samples such as the test and control. When replicates are present, regression based methods are typically used to model the methylation levels relative to the sample groups and variation between the replicates. Regression methods also have another additional advantage over the use of the Fisher's Exact test in that they all for the inclusion of sample specific covariates (categorical or continuous) as well as the ability to adjust for confounding variables. \n", - "\n", - "There are three options provided to get the differential methylation results namely Fisher’s Exact Test, Betabinomial Distribution Based Test, and Logistic Regression Based Test as you will see below. Only the Fisher’s exact test and the Logistic Regression based test will be explored. If you plan to use Betabinomial Distribution Based Test or compare the results of all three types of tests, the code can be uncommented. " - ] - }, - { - "cell_type": "markdown", - "id": "8746ae4b-50ef-440e-8793-c8d95d2ba357", - "metadata": {}, - "source": [ - "### Fisher’s Exact Test" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "0a2fb51c-c36e-41b0-abfe-4607b113a352", - "metadata": {}, - "outputs": [], - "source": [ - "pooled.meth=pool(meth,sample.ids=c(\"test\",\"control\"))\n", - "dm.pooledf=calculateDiffMeth(pooled.meth)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2c9c8b94-c9e2-424f-9c8e-47c6cdac2b6a", - "metadata": {}, - "outputs": [], - "source": [ - "# get differentially methylated bases/regions with specific cutoffs\n", - "all.diff=getMethylDiff(dm.pooledf,difference=25,qvalue=0.01,type=\"all\")\n", - "\n", - "# get hyper-methylated\n", - "hyper=getMethylDiff(dm.pooledf,difference=25,qvalue=0.01,type=\"hyper\")\n", - "\n", - "# get hypo-methylated\n", - "hypo=getMethylDiff(dm.pooledf,difference=25,qvalue=0.01,type=\"hypo\")\n", - "\n", - "#using [ ] notation\n", - "hyper2=dm.pooledf[dm.pooledf$qvalue < 0.01 & dm.pooledf$meth.diff > 25,]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "42faa1ac-8dae-49e6-a251-ecb33f022a0c", - "metadata": {}, - "outputs": [], - "source": [ - "head(dm.pooledf)\n", - "nrow(dm.pooledf)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "45743138-d8e5-4c4a-9303-1213b31929f8", - "metadata": {}, - "outputs": [], - "source": [ - "#Check the results\n", - "head(hyper)\n", - "nrow(hyper)\n", - "head(hypo)\n", - "nrow(hypo)" - ] - }, - { - "cell_type": "markdown", - "id": "9fb7afee-deb5-4c0c-a9c0-43a2a4bdfdaf", - "metadata": {}, - "source": [ - "### Optional: Betabinomial-Distribution-Based Tests\n", - "The beta-binominal model for calculating the differential methylation can be accessed through the code below. This accounts for both sampling and epigenetic variablity, and is useful for better modeling of the variance. This model follows the binominal distribution of the number of reads which is similar to how logistic regression works. However, the beta distribution can have varying methylation proportions across samples.\n", - "\n", - "If you plan to use Betabinomial Distribution Based Test or compare the results of all three types of tests, the code can be uncommented. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "154d6d74-1ef6-4769-9dfd-de4513faf365", - "metadata": {}, - "outputs": [], - "source": [ - "#dm.dss=calculateDiffMethDSS(meth)" - ] - }, - { - "cell_type": "markdown", - "id": "427de19a-4068-4adc-b6e1-28edf4162662", - "metadata": {}, - "source": [ - "### Logistic Regression Based Tests\n", - "The following code tests for the differential methylation of our dataset; i.e comparing methylation levels between two groups. If the data has replicates, logistic regression should be used." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a1526c96-f139-4b17-9bcf-2133b37a028a", - "metadata": {}, - "outputs": [], - "source": [ - "# Test for differential methylation... This might take a few minutes.\n", - "dm.lr=calculateDiffMeth(meth,overdispersion = \"MN\",test =\"Chisq\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "bf36b322-8b1d-4891-883e-c19d904a9ae7", - "metadata": {}, - "outputs": [], - "source": [ - "# Simple volcano plot to get an overview of differential methylation\n", - "plot(dm.lr$meth.diff, -log10(dm.lr$qvalue))\n", - "abline(v=0)" - ] - }, - { - "cell_type": "markdown", - "id": "9bfdb5d8-2ced-480a-86dc-c6c7fe9f7282", - "metadata": {}, - "source": [ - "Next, we can visualize the number of hyper- and hypomethylation events per chromosome, as a percent of the sites with minimum coverage and minimal differential methylation. By default this is a 25% change in methylation and all samples with 10X coverage." - ] - }, - { - "cell_type": "markdown", - "id": "24a854bb-c1bd-4786-b24e-37ae45ce540f", - "metadata": {}, - "source": [ - "### Explore Results " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f9b8eeec-53ce-42ac-b30c-5895f0d57d32", - "metadata": {}, - "outputs": [], - "source": [ - "# Overview of percentage hyper and hypo CpGs per chromosome.\n", - "diffMethPerChr(dm.lr)" - ] - }, - { - "cell_type": "markdown", - "id": "fdb12379-2a6c-40a7-812b-5c2c67d652c2", - "metadata": {}, - "source": [ - "After q-value calculation, we can select the differentially methylated regions/bases based on q-value and percent methylation difference cutoffs of Treatment versus Control. The following bits of code selects the bases that have q-value < 0.01 and percent methylation difference larger than 25%. If you specify type=\"hyper\" or type=\"hypo\" options, you will extract the hyper-methylated or hypo-methylated regions/bases.\n", - "\n", - "If necessary, covariates (such as age, sex, smoking status, …) can be included in the regression analysis. The function will then try to separate the influence of the covariates from the treatment effect via the logistic regression model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "afacf59d-3be2-4e97-99fe-234ad16e0993", - "metadata": {}, - "outputs": [], - "source": [ - "# get hyper methylated bases and order by qvalue\n", - "myDiff25p.hyper <- getMethylDiff(dm.lr,\n", - " difference=25,\n", - " qvalue=0.01,\n", - " type=\"hyper\")\n", - "myDiff25p.hyper <- myDiff25p.hyper[order(myDiff25p.hyper$qvalue),]\n", - "\n", - "# get hypo methylated bases and order by qvalue\n", - "myDiff25p.hypo <- getMethylDiff(dm.lr,\n", - " difference=25,\n", - " qvalue=0.01,\n", - " type=\"hypo\")\n", - "myDiff25p.hypo <- myDiff25p.hypo[order(myDiff25p.hypo$qvalue),]\n", - "\n", - "# get all differentially methylated bases and order by qvalue\n", - "myDiff25p <- getMethylDiff(dm.lr,\n", - " difference=25,\n", - " qvalue=0.01)\n", - "\n", - "#get all differentially methylated bases with pvalue < 0.05\n", - "myDiff25p <- getMethylDiff(dm.lr,\n", - " difference=25,\n", - " qvalue=0.01)\n", - "\n", - "#Order by qvalue\n", - "myDiff25p <- myDiff25p[order(myDiff25p$qvalue),]\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "629b56ac-c36e-4555-8eda-6a1f2a847b84", - "metadata": {}, - "outputs": [], - "source": [ - "#Explore the results\n", - "head(dm.lr)\n", - "nrow(dm.lr)\n", - "head(myDiff25p.hyper)\n", - "nrow(myDiff25p.hyper)\n", - "head(myDiff25p.hypo)\n", - "nrow(myDiff25p.hypo)\n", - "head(myDiff25p)\n", - "nrow(myDiff25p)" - ] - }, - { - "cell_type": "markdown", - "id": "6fb42767-5bb4-44a6-854a-d3d6090e752e", - "metadata": {}, - "source": [ - "### CpG Annotation \n", - "Annotation of the differentially methylated regions and bases using the genomation package can help with biological interpretation of the data. A common annotation task looks at where the CpGs of interest are relative to genes, gene parts, and regulatory regions. The code below shows an example of reading the gene annotation information from a BED file (Browser Extensible Data - file format containing genome coordinates and associated annotations), and the following annotation of the differentially methylated regions using genomation functions. This annotation file can be downloaded from the UCSC TableBrowser." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "7d60de64-2fd1-46d8-939c-133d96999512", - "metadata": {}, - "outputs": [], - "source": [ - "# download data files from storage bucket\n", - "system(\"gsutil cp gs://nigms-sandbox/nosi-und/RRBS/rn6_ensGene.bed .\", intern=TRUE)\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "caa30161-015b-404d-8a87-c882f0677fe7", - "metadata": {}, - "outputs": [], - "source": [ - "# First load the annotation data; i.e the coordinates of promoters, TSS, intron and exons\n", - "gene.obj <- readTranscriptFeatures(\"rn6_ensGene.bed\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3f9fb368-e05d-40ea-b76c-0592914828e2", - "metadata": {}, - "outputs": [], - "source": [ - "head(gene.obj)" - ] - }, - { - "cell_type": "markdown", - "id": "062b9b3e-e6e7-4ff6-b401-136f1df96f75", - "metadata": {}, - "source": [ - "Annotate the results from the differentially methylated calls calculated. Some data wrangling is required to make the data compatible with the annotateWithGeneParts function. Here the chr is added to annotate the chromosome number and then the data is converted into a GRanger object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c2d55ccd-8bef-4a40-b8f9-fa75c54eaa65", - "metadata": {}, - "outputs": [], - "source": [ - "anot.diff <- myDiff25p\n", - "anot.diff$chr <- sapply(anot.diff$chr, function(x) paste('chr', x, sep = \"\"))\n", - "head(anot.diff)\n", - "class(anot.diff)\n", - "anot.diff <- as(anot.diff,\"GRanges\")" - ] - }, - { - "cell_type": "markdown", - "id": "0158f96e-150c-4bb2-90d5-537ea8e27fd2", - "metadata": {}, - "source": [ - "The final data (anot.diff) is then used in the next step for annotation using annotateWithGeneParts function." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d4763449-18ae-4ab0-b816-390a1e0f04ee", - "metadata": {}, - "outputs": [], - "source": [ - "myDiff25p.all.anot <- annotateWithGeneParts(anot.diff, gene.obj)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ed94ae05-3ff3-4680-9f42-a2fc5a99c25e", - "metadata": {}, - "outputs": [], - "source": [ - "# Summary of target set annotation\n", - "myDiff25p.all.anot" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "7fce9e3f-5660-4985-95d8-d0da5d7b047e", - "metadata": {}, - "outputs": [], - "source": [ - "# View the distance to the nearest Transcription Start Site; the target.row column in the output indicates the row number in the initial target set\n", - "dist_tss <- getAssociationWithTSS(myDiff25p.all.anot)\n", - "head(dist_tss)\n", - "\n", - "# See whether the differentially methylated CpGs are within promoters,introns or exons; the order is the same as the target set\n", - "head(getMembers(myDiff25p.all.anot))\n", - "\n", - "# This can also be summarized for all differentially methylated CpGs\n", - "plotTargetAnnotation(myDiff25p.all.anot, main = \"Differential Methylation Annotation\")" - ] - }, - { - "cell_type": "markdown", - "id": "20fb0258-5560-44b5-9059-c9294186ca41", - "metadata": {}, - "source": [ - "### (Optional) Make a dataframe with TSS, values and qvalues for future analysis. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "5a03f970-1162-430b-9766-69f9acea3591", - "metadata": {}, - "outputs": [], - "source": [ - "bs_results <- cbind(dist_tss, qvalue = anot.diff$qvalue, pvalue = anot.diff$pvalue)" - ] - }, - { - "cell_type": "markdown", - "id": "41b9e5f7-121a-41bc-9c1e-402729bdbc81", - "metadata": {}, - "source": [ - "### (Optional) Write the Results to a Text File. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "7a16cebf-0d6f-4679-8adc-28f64260dd08", - "metadata": {}, - "outputs": [], - "source": [ - "#Write results to a text file. \n", - "write.table(bs_results, \"bs_results.txt\", sep = \"\\t\")" - ] - }, - { - "cell_type": "markdown", - "id": "b75a33f0-dcec-4c86-8bc8-436f15f22498", - "metadata": {}, - "source": [ - "
" - ] - }, - { - "cell_type": "markdown", - "id": "8ae72409-7b74-484f-ab2a-84f169d1cdd0", - "metadata": {}, - "source": [ - "### References and useful links " - ] - }, - { - "cell_type": "markdown", - "id": "5c960b60-8eee-47df-9eb7-144e31aed77e", - "metadata": {}, - "source": [ - "- #### https://www.bioconductor.org/packages/release/bioc/vignettes/methylKit/inst/doc/methylKit.html#4_Annotating_differentially_methylated_bases_or_regions\n", - "- #### https://nbis-workshop-epigenomics.readthedocs.io/en/stable/content/tutorials/methylationSeq/Seq_Tutorial.html\n", - "- #### https://compgenomr.github.io/book/bsseq.html" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "5ca1d3e0-ccb8-4a5d-a8f0-57294ad2e212", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "environment": { - "kernel": "ir", - "name": "common-cpu.m109", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m109" - }, - "kernelspec": { - "display_name": "R", - "language": "R", - "name": "ir" - }, - "language_info": { - "codemirror_mode": "r", - "file_extension": ".r", - "mimetype": "text/x-r-source", - "name": "R", - "pygments_lexer": "r", - "version": "4.2.3" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/02-RRBS/.ipynb_checkpoints/rrbs-gcb-checkpoint.config b/02-RRBS/.ipynb_checkpoints/rrbs-gcb-checkpoint.config deleted file mode 100644 index 926d332..0000000 --- a/02-RRBS/.ipynb_checkpoints/rrbs-gcb-checkpoint.config +++ /dev/null @@ -1,15 +0,0 @@ -profiles { - gcb { - process.executor = 'google-batch' - process.container = 'quay.io/nextflow/rnaseq-nf:v1.1' - google.location = 'us-central1' - google.region = 'us-central1' - process.machineType = 'c2-standard-30' - dag.overwrite = true - google.project = '' - params.outdir = 'gs:///gse173380-rnaseq/results' - workDir = 'gs:///gse173380-rnaseq/work' - - - } -} \ No newline at end of file diff --git a/02-RRBS/RRBS-downstream.ipynb b/02-RRBS/RRBS-downstream.ipynb index 1616ff2..e03ec87 100644 --- a/02-RRBS/RRBS-downstream.ipynb +++ b/02-RRBS/RRBS-downstream.ipynb @@ -119,10 +119,7 @@ "cell_type": "code", "execution_count": null, "id": "6f4ade6c-85ce-46c6-ab25-ffe6cb6fabb7", - "metadata": { - "scrolled": true, - "tags": [] - }, + "metadata": {}, "outputs": [], "source": [ "#This step can take up to 20 min depending on the machine-type and input files.\n", @@ -457,10 +454,7 @@ "cell_type": "code", "execution_count": null, "id": "dfaffe07-8847-4d33-96c7-bdde55b4db8c", - "metadata": { - "scrolled": true, - "tags": [] - }, + "metadata": {}, "outputs": [], "source": [ "getCorrelation(meth,plot=TRUE)" @@ -906,27 +900,7 @@ "source": [] } ], - "metadata": { - "environment": { - "kernel": "ir", - "name": "r-cpu.4-2.m110", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/r-cpu.4-2:m110" - }, - "kernelspec": { - "display_name": "R", - "language": "R", - "name": "ir" - }, - "language_info": { - "codemirror_mode": "r", - "file_extension": ".r", - "mimetype": "text/x-r-source", - "name": "R", - "pygments_lexer": "r", - "version": "4.2.3" - } - }, + "metadata": {}, "nbformat": 4, "nbformat_minor": 5 } diff --git a/03-Integration/Integration.ipynb b/03-Integration/Integration.ipynb index 78416c8..cb727fd 100644 --- a/03-Integration/Integration.ipynb +++ b/03-Integration/Integration.ipynb @@ -3,9 +3,7 @@ { "cell_type": "markdown", "id": "636ad27c-6ec1-4e64-a473-1ceade534303", - "metadata": { - "tags": [] - }, + "metadata": {}, "source": [ "# Module 3: Integration of Epigenetic and Transcriptomic \n", "## Module Overview \n", @@ -254,10 +252,7 @@ "cell_type": "code", "execution_count": null, "id": "16dd1f73-4d9e-4d2a-bbb3-4bf604a6e0e4", - "metadata": { - "scrolled": true, - "tags": [] - }, + "metadata": {}, "outputs": [], "source": [ "library(MethReg)\n", @@ -1802,27 +1797,7 @@ "source": [] } ], - "metadata": { - "environment": { - "kernel": "ir", - "name": "r-cpu.4-2.m108", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/r-cpu.4-2:m108" - }, - "kernelspec": { - "display_name": "R", - "language": "R", - "name": "ir" - }, - "language_info": { - "codemirror_mode": "r", - "file_extension": ".r", - "mimetype": "text/x-r-source", - "name": "R", - "pygments_lexer": "r", - "version": "4.2.3" - } - }, + "metadata": {}, "nbformat": 4, "nbformat_minor": 5 } diff --git a/04-New-Data/.ipynb_checkpoints/New-Data-checkpoint.ipynb b/04-New-Data/.ipynb_checkpoints/New-Data-checkpoint.ipynb deleted file mode 100644 index fed2d93..0000000 --- a/04-New-Data/.ipynb_checkpoints/New-Data-checkpoint.ipynb +++ /dev/null @@ -1,248 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "1d0516a4-c45d-4ecf-b04c-e21e19f933f3", - "metadata": {}, - "source": [ - "# Module 4: Running the module with new data " - ] - }, - { - "cell_type": "markdown", - "id": "8af9faa7-692e-4d8e-b9b9-a2070e21d905", - "metadata": {}, - "source": [ - "In this notebook, we are going to explore how to run this module with a new dataset. These submodules provide a great framework for running a rigorous and scalable analysis, but there are some considerations that must be made in order to run this with your own data. We will walk through that process here so that hopefully, you are able to take these notebooks to your research group and use them for your own analysis. Notice that we do not give you all the answers in the code blocks, but if you get stuck, use the dropdowns for help. This module largely uses Nextflow for the RNA-seq and Methyl-seq analysis, which makes it very easy to run the same analysis on new datasets by updating the config files." - ] - }, - { - "cell_type": "markdown", - "id": "8209f0e3-a631-49f2-91cf-aa7ce85fea13", - "metadata": {}, - "source": [ - "## **Importing the example dataset**" - ] - }, - { - "cell_type": "markdown", - "id": "f923f5d4-597a-4e39-803e-5252f2fe4cd6", - "metadata": {}, - "source": [ - "Our new dataset comes from a paper by [Hadad et al. Epigenetics Chromatin. 2019](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6781367/) that compares methylation changes in mice as they age and correlates those changes to gene expression changes. The data is available in SRA under the bioProject number [PRJNA523985](https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA523985). The impact of methylation on aging, particularly in brain tissue, is of great research interest. There are many samples in this dataset, but we will limit our analysis to young vs old female mice." - ] - }, - { - "cell_type": "markdown", - "id": "364ce93b-f726-45f5-81da-19ddd214b244", - "metadata": {}, - "source": [ - "To download the dataset, follow the instructions in the [STRIDES tutorial](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/tutorials/notebooks/SRADownload/SRA-Download.ipynb) on downloading datasets from SRA using the prefetch+fasterq dump method. The accession numbers for this analysis are: \n", - "\n", - "- SRR8616802\n", - "- SRR8616795\n", - "- SRR8616796\n", - "- SRR8616777\n", - "- SRR8616778\n", - "- SRR8616772\n", - "- SRR8616799\n", - "- SRR8616800\n", - "- SRR8616801\n", - "- SRR8616787\n", - "- SRR8616788\n", - "- SRR8616789\n", - "\n", - "You can save these accession numbers in a file and use that in the `sra-tools` commands. Once you pull these files from SRA, store them in a storage bucket so that Nextflow can see them in the next steps." - ] - }, - { - "cell_type": "markdown", - "id": "4218d79f-8431-490a-9bc0-76ba89561c83", - "metadata": {}, - "source": [ - "## **RNA-Seq analysis**" - ] - }, - { - "cell_type": "markdown", - "id": "0e33ebbe-60c4-4f73-9cf9-70a2245d6fb9", - "metadata": {}, - "source": [ - "To run the RNA-Seq portion of this tutorial, you need to update the config file to point to your RNA-Seq reads. Let's look at the `rnaseq-gcb.config` file and make the necessary changes. We will need to specify `params.outdir`, `workDir`, `params.input`, and `params.genome`." - ] - }, - { - "cell_type": "markdown", - "id": "1a96a33c-c580-4aea-889a-02c53378586a", - "metadata": {}, - "source": [ - "```\n", - "profiles {\n", - " gcb {\n", - " // Google batch parameters\n", - " process.executor = 'google-batch'\n", - " process.container = 'quay.io/nextflow/rnaseq-nf:v1.1'\n", - " google.location = 'us-central1'\n", - " google.region = 'us-central1'\n", - " google.project = ''\n", - " // Workflow parameters\n", - " params.outdir = ''\n", - " workDir = ''\n", - " params.input = ''\n", - " params.genome = ''\n", - " }\n", - "}\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "be4d3374-abf6-4d0a-9fc2-b9bbf356dc0f", - "metadata": {}, - "source": [ - "Check the [nf-core RNA-seq](https://nf-co.re/rnaseq/3.12.0) documentation to find out how most of this is structured. As with the primary dataset, the workDir and outdir are going to be locations in your Google Cloud storage bucket. The sequences are from mice so we need to specify a mouse genome build. Finally, the most effort goes into the input parameter. The nf-core documentation specifies that the reads need to be structured in a sample sheet. Let's look at an example. Using your bucket paths to your reads, create a .csv like this and put it in your bucket, then provide the bucket path in the config file. If you need help, click the help button below to see how we suggest." - ] - }, - { - "cell_type": "markdown", - "id": "f51d42b4-7b6e-4ab0-8030-b05f8704bfd7", - "metadata": {}, - "source": [ - "
\n", - " Click for help\n", - "\n", - "```\n", - "params.outdir = 'gs://PROJECT-ID/rnaseq/results'\n", - "workDir = 'gs://PROJECT-ID/rnaseq/work'\n", - "params.input = 'gs://PROJECT-ID/samplesheet.csv'\n", - "params.genome = 'GRCm38'\n", - "```\n", - "\n", - "
\n" - ] - }, - { - "cell_type": "markdown", - "id": "98f11060-3d5b-43e0-af7a-beda8f4a46c0", - "metadata": {}, - "source": [ - "Here's how the first row of your sample sheet might look. Make sure to only include the RNA-seq samples in the sample sheet. The methyl-seq samples will be included in that sample sheet when we run that pipeline in the next step." - ] - }, - { - "cell_type": "markdown", - "id": "351a761b-a1db-4ef8-a671-b5d64b83056d", - "metadata": {}, - "source": [ - "``` \n", - "sample,fastq_1,fastq_2,strandedness \n", - "Young_1,gs://BUCKETPATH/SRR8616802_1.fastq.gz,gs://BUCKETPATH/SRR8616802_2.fastq.gz,auto\n", - "``` " - ] - }, - { - "cell_type": "markdown", - "id": "736c254a-a6b9-4901-a666-4bff478dcea0", - "metadata": {}, - "source": [ - "Once you have that together, run the Nextflow command again to run the nf-core pipeline on this dataset. " - ] - }, - { - "cell_type": "markdown", - "id": "89f2cf3a-51ed-4856-a8b1-3b6554e89bbd", - "metadata": {}, - "source": [ - "## **DNA Methylation Analysis**" - ] - }, - { - "cell_type": "markdown", - "id": "bfbb1122-2196-45cb-8d93-fbc12bb29913", - "metadata": {}, - "source": [ - "We will use the same framework for the methylation analysis as for the RNA-seq, which is to adjust the config file and let Nextflow run the pipeline. Let's look at the methylation config file and determine what we need to change. Like before, we need to specify a sample sheet input, the genome, workdir, and outdir. Fill in those below and try running the Nextflow command to run the core methyl-seq analysis." - ] - }, - { - "cell_type": "markdown", - "id": "c2197a3a-23f3-4b3c-85e0-9acd582b60f8", - "metadata": {}, - "source": [ - "```\n", - "profiles {\n", - " gcb {\n", - " // Google batch parameters\n", - " process.executor = 'google-batch'\n", - " process.container = 'quay.io/nextflow/rnaseq-nf:v1.1'\n", - " google.project = 'PROJECT-ID'\n", - " google.location = 'us-central1'\n", - " google.region = 'us-central1'\n", - " process.machineType = 'c2-standard-30'\n", - " // Workflow parameters\n", - " dag.overwrite = true\n", - " params.outdir = 'FILL-IN-HERE'\n", - " workDir = 'FILL-IN-HERE'\n", - " params.genome = `FILL-IN-HERE`\n", - " params.input = `FILL-IN-HERE`\n", - " }\n", - "}\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "88e6f8a7-b349-4144-85fd-859704b04209", - "metadata": {}, - "source": [ - "Be sure to consult the [nf-core methylseq documentation](https://nf-co.re/methylseq/2.5.0) to see how the input sample sheet is structured. There are some differences from the RNA-seq input. Here's an example of our suggested first line." - ] - }, - { - "cell_type": "markdown", - "id": "8ae0255f-f710-43dd-857b-9713f8bb97f9", - "metadata": {}, - "source": [ - "```\n", - "sample,fastq_1,fastq_2 \n", - "Young_1,gs://BUCKETPATH/SRR8616795_1.fastq.gz,gs://BUCKETPATH/SRR8616795_2.fastq.gz\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "60a55209-7853-4eee-aea3-410bc0ae2d10", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "environment": { - "kernel": "python3", - "name": ".m112", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/:m112" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/04-New-Data/New-Data.ipynb b/04-New-Data/New-Data.ipynb index 0a58a33..d549315 100644 --- a/04-New-Data/New-Data.ipynb +++ b/04-New-Data/New-Data.ipynb @@ -226,31 +226,7 @@ ] } ], - "metadata": { - "environment": { - "kernel": "python3", - "name": ".m112", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/:m112" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - } - }, + "metadata": {}, "nbformat": 4, "nbformat_minor": 5 } diff --git a/docs/quiz_files/methylation.ipynb b/docs/quiz_files/methylation.ipynb index 51213ce..6c277c0 100644 --- a/docs/quiz_files/methylation.ipynb +++ b/docs/quiz_files/methylation.ipynb @@ -2,568 +2,10 @@ "cells": [ { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "2cdf6acd-190b-4931-883c-71b22d5ac622", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "application/javascript": [ - "/*!\n", - " * swiped-events.js - v1.1.4\n", - " * Pure JavaScript swipe events\n", - " * https://github.com/john-doherty/swiped-events\n", - " * @inspiration https://stackoverflow.com/questions/16348031/disable-scrolling-when-touch-moving-certain-element\n", - " * @author John Doherty \n", - " * @license MIT\n", - " */\n", - "!function(t,e){\"use strict\";\"function\"!=typeof t.CustomEvent&&(t.CustomEvent=function(t,n){n=n||{bubbles:!1,cancelable:!1,detail:void 0};var a=e.createEvent(\"CustomEvent\");return a.initCustomEvent(t,n.bubbles,n.cancelable,n.detail),a},t.CustomEvent.prototype=t.Event.prototype),e.addEventListener(\"touchstart\",function(t){if(\"true\"===t.target.getAttribute(\"data-swipe-ignore\"))return;s=t.target,r=Date.now(),n=t.touches[0].clientX,a=t.touches[0].clientY,u=0,i=0},!1),e.addEventListener(\"touchmove\",function(t){if(!n||!a)return;var e=t.touches[0].clientX,r=t.touches[0].clientY;u=n-e,i=a-r},!1),e.addEventListener(\"touchend\",function(t){if(s!==t.target)return;var e=parseInt(l(s,\"data-swipe-threshold\",\"20\"),10),o=parseInt(l(s,\"data-swipe-timeout\",\"500\"),10),c=Date.now()-r,d=\"\",p=t.changedTouches||t.touches||[];Math.abs(u)>Math.abs(i)?Math.abs(u)>e&&c0?\"swiped-left\":\"swiped-right\"):Math.abs(i)>e&&c0?\"swiped-up\":\"swiped-down\");if(\"\"!==d){var b={dir:d.replace(/swiped-/,\"\"),xStart:parseInt(n,10),xEnd:parseInt((p[0]||{}).clientX||-1,10),yStart:parseInt(a,10),yEnd:parseInt((p[0]||{}).clientY||-1,10)};s.dispatchEvent(new CustomEvent(\"swiped\",{bubbles:!0,cancelable:!0,detail:b})),s.dispatchEvent(new CustomEvent(d,{bubbles:!0,cancelable:!0,detail:b}))}n=null,a=null,r=null},!1);var n=null,a=null,u=null,i=null,r=null,s=null;function l(t,n,a){for(;t&&t!==e.documentElement;){var u=t.getAttribute(n);if(u)return u;t=t.parentNode}return a}}(window,document);\n", - "\n", - "function jaxify(string) {\n", - " var mystring = string;\n", - " console.log(mystring);\n", - "\n", - " var count = 0;\n", - " var loc = mystring.search(/([^\\\\]|^)(\\$)/);\n", - "\n", - " var count2 = 0;\n", - " var loc2 = mystring.search(/([^\\\\]|^)(\\$\\$)/);\n", - "\n", - " //console.log(loc);\n", - "\n", - " while ((loc >= 0) || (loc2 >= 0)) {\n", - "\n", - " /* Have to replace all the double $$ first with current implementation */\n", - " if (loc2 >= 0) {\n", - " if (count2 % 2 == 0) {\n", - " mystring = mystring.replace(/([^\\\\]|^)(\\$\\$)/, \"$1\\\\[\");\n", - " } else {\n", - " mystring = mystring.replace(/([^\\\\]|^)(\\$\\$)/, \"$1\\\\]\");\n", - " }\n", - " count2++;\n", - " } else {\n", - " if (count % 2 == 0) {\n", - " mystring = mystring.replace(/([^\\\\]|^)(\\$)/, \"$1\\\\(\");\n", - " } else {\n", - " mystring = mystring.replace(/([^\\\\]|^)(\\$)/, \"$1\\\\)\");\n", - " }\n", - " count++;\n", - " }\n", - " loc = mystring.search(/([^\\\\]|^)(\\$)/);\n", - " loc2 = mystring.search(/([^\\\\]|^)(\\$\\$)/);\n", - " //console.log(mystring,\", loc:\",loc,\", loc2:\",loc2);\n", - " }\n", - "\n", - " //console.log(mystring);\n", - " return mystring;\n", - "}\n", - "\n", - "window.flipCard = function flipCard(ths) {\n", - " console.log(ths);\n", - " console.log(ths.id);\n", - " ths.classList.toggle(\"flip\"); \n", - " var next=document.getElementById(ths.id+'-next');\n", - " next.style.pointerEvents='none';\n", - " next.classList.add('flipped');\n", - " if (typeof MathJax != 'undefined') {\n", - " var version = MathJax.version;\n", - " console.log('MathJax version', version);\n", - " if (version[0] == \"2\") {\n", - " MathJax.Hub.Queue([\"Typeset\", MathJax.Hub]);\n", - " } else if (version[0] == \"3\") {\n", - " MathJax.typeset([ths]);\n", - " }\n", - " } else {\n", - " console.log('MathJax not detected');\n", - " }\n", - "\n", - "\n", - " setTimeout(reenableNext, 700, next);\n", - "}\n", - "\n", - "function reenableNext(next) {\n", - " next.style.pointerEvents='auto';\n", - "}\n", - "\n", - "\n", - "\n", - "function slide2(containerId) {\n", - " var container = document.getElementById(containerId);\n", - " var next=document.getElementById(containerId+'-next');\n", - " var frontcard = container.children[0];\n", - " var backcard = container.children[1];\n", - " container.style.pointerEvents='none';\n", - " //backcard.style.pointerEvents='none';\n", - " next.style.pointerEvents='none';\n", - " next.classList.remove('flipped');\n", - " next.classList.add('hide');\n", - "\n", - " //container.classList.add(\"prepare\");\n", - " \n", - " container.className=\"flip-container slide\";\n", - " backcard.parentElement.removeChild(frontcard);\n", - " backcard.parentElement.appendChild(frontcard);\n", - " setTimeout(slideback, 600, container, frontcard, backcard, next);\n", - " \n", - "}\n", - "\n", - "\n", - "window.checkFlip = function checkFlip(containerId) {\n", - " var container = document.getElementById(containerId);\n", - "\n", - "\n", - " if (container.classList.contains('flip')) {\n", - " container.classList.remove('flip');\n", - " setTimeout(slide2, 600, containerId);\n", - " } \n", - " else {\n", - " slide2(containerId);\n", - " }\n", - "}\n", - "\n", - "\n", - "function slideback(container, frontcard, backcard, next) {\n", - " container.className=\"flip-container slideback\";\n", - " setTimeout(cleanup, 600, container, frontcard, backcard, next);\n", - "}\n", - "\n", - "function cleanup(container, frontcard, backcard, next) {\n", - " container.removeChild(frontcard);\n", - " backcard.className=\"flipper frontcard\";\n", - " container.className=\"flip-container\";\n", - "\n", - " var cardnum=parseInt(container.dataset.cardnum);\n", - " var cards=eval('cards'+container.id);\n", - " var flipper=createOneCard(container, false, cards, cardnum);\n", - " container.append(flipper);\n", - " cardnum= (cardnum+1) % parseInt(container.dataset.numCards);\n", - " container.dataset.cardnum=cardnum;\n", - " if (cardnum != 1){\n", - " next.innerHTML=\"Next >\";\n", - " } else {\n", - " //next.innerHTML=\"Reload \\\\(\\\\circlearrowleft\\\\) \";\n", - " next.innerHTML='Reload '\n", - " if (typeof MathJax != 'undefined') {\n", - " var version = MathJax.version;\n", - " console.log('MathJax version', version);\n", - " if (version[0] == \"2\") {\n", - " MathJax.Hub.Queue([\"Typeset\", MathJax.Hub]);\n", - " } else if (version[0] == \"3\") {\n", - " MathJax.typeset([next]);\n", - " }\n", - " } else {\n", - " console.log('MathJax not detected');\n", - " }\n", - "\n", - "\n", - " }\n", - "\n", - " if (typeof MathJax != 'undefined') {\n", - " var version = MathJax.version;\n", - " console.log('MathJax version', version);\n", - " if (version[0] == \"2\") {\n", - " MathJax.Hub.Queue([\"Typeset\", MathJax.Hub]);\n", - " } else if (version[0] == \"3\") {\n", - " MathJax.typeset();\n", - " }\n", - " } else {\n", - " console.log('MathJax not detected');\n", - " }\n", - "\n", - "\n", - " next.style.pointerEvents='auto';\n", - " container.style.pointerEvents='auto';\n", - " next.classList.remove('hide');\n", - " container.addEventListener('swiped-left', function(e) {\n", - " /*\n", - " console.log(e.detail);\n", - " console.log(id);\n", - " */\n", - " checkFlip(container.id);\n", - " }, {once: true });\n", - "\n", - "\n", - "}\n", - "\n", - "\n", - "function createOneCard (mydiv, frontCard, cards, cardnum) {\n", - " var colors=[\n", - " '--asparagus',\n", - " '--terra-cotta',\n", - " '--cyan-process'\n", - " ]\n", - "\n", - " var flipper = document.createElement('div');\n", - " if (frontCard){\n", - " flipper.className=\"flipper frontcard\"; \n", - " }\n", - " else {\n", - " flipper.className=\"flipper backcard\"; \n", - " }\n", - "\n", - " var front = document.createElement('div');\n", - " front.className='front flashcard';\n", - "\n", - " var frontSpan= document.createElement('span');\n", - " frontSpan.className='flashcardtext';\n", - " frontSpan.innerHTML=jaxify(cards[cardnum]['front']);\n", - " //frontSpan.textContent=jaxify(cards[cardnum]['front']);\n", - " front.style.background='var(' + colors[cardnum % colors.length] + ')';\n", - "\n", - "\n", - " front.append(frontSpan);\n", - " flipper.append(front);\n", - "\n", - " var back = document.createElement('div');\n", - " back.className='back flashcard';\n", - "\n", - " var backSpan= document.createElement('span');\n", - " backSpan.className='flashcardtext';\n", - " backSpan.innerHTML=jaxify(cards[cardnum]['back']);\n", - " back.append(backSpan);\n", - "\n", - " flipper.append(back);\n", - "\n", - " return flipper;\n", - "\n", - "}\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "function createCards(id) {\n", - " console.log(id);\n", - " \n", - " var mydiv=document.getElementById(id);\n", - " \n", - " var cards=eval('cards'+id);\n", - " mydiv.dataset.cardnum=0;\n", - " mydiv.dataset.numCards=cards.length;\n", - " mydiv.addEventListener('swiped-left', function(e) {\n", - " /*\n", - " console.log(e.detail);\n", - " console.log(id);\n", - " */\n", - " checkFlip(id);\n", - " }, {once: true});\n", - "\n", - " var cardnum=0;\n", - " \n", - " for (var i=0; i<2; i++) {\n", - " \n", - " var flipper;\n", - " if (i==0){\n", - " flipper=createOneCard(mydiv, true, cards, cardnum);\n", - " }\n", - " else {\n", - " flipper=createOneCard(mydiv, false, cards, cardnum);\n", - " }\n", - "\n", - " mydiv.append(flipper);\n", - " if (typeof MathJax != 'undefined') {\n", - " var version = MathJax.version;\n", - " if (typeof version == 'undefined') {\n", - " setTimeout(function(){\n", - " var version = MathJax.version;\n", - " console.log('After sleep, MathJax version', version);\n", - " if (version[0] == \"2\") {\n", - " MathJax.Hub.Queue([\"Typeset\", MathJax.Hub]);\n", - " } else if (version[0] == \"3\") {\n", - " MathJax.typeset([flipper]);\n", - " }\n", - " }, 500);\n", - " } else{\n", - " console.log('MathJax version', version);\n", - " if (version[0] == \"2\") {\n", - " MathJax.Hub.Queue([\"Typeset\", MathJax.Hub]);\n", - " } else if (version[0] == \"3\") {\n", - " MathJax.typeset([flipper]);\n", - " }\n", - " }\n", - " } else {\n", - " console.log('MathJax not detected');\n", - " }\n", - "\n", - "\n", - " cardnum = (cardnum + 1) % mydiv.dataset.numCards;\n", - " }\n", - " mydiv.dataset.cardnum = cardnum;\n", - "\n", - " var next=document.getElementById(id+'-next');\n", - " if (cards.length==1) {\n", - " // Don't show next if no other cards!\n", - " next.style.pointerEvents='none';\n", - " next.classList.add('hide');\n", - " } else {\n", - " next.innerHTML=\"Next >\";\n", - " }\n", - "\n", - " return flipper;\n", - "}\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "var cardsezpyyfoIBHpf=[\n", - " {\n", - " \"front\": \"How does DNA methylation modify the DNA?\",\n", - " \"back\": \"Through covalent addition of a methyl group to cytosine.\"\n", - " },\n", - " {\n", - " \"front\": \"What adds methyl groups to cytosines?\",\n", - " \"back\": \"DNA methyltransferase (DNMT).\"\n", - " },\n", - " {\n", - " \"front\": \"Where on the DNA are the methyl groups added?\",\n", - " \"back\": \"Cytosine nucleotides and at the major groove.\"\n", - " },\n", - " {\n", - " \"front\": \"How is DNA methylation differ from histone methylation/modification\",\n", - " \"back\": \"DNA is marked and permanently modified, histones leaves the DNA itself unmethylated.\"\n", - " },\n", - " {\n", - " \"front\": \"How are DNA methylation patterns maintained?\",\n", - " \"back\": \"Through cell division and DNA synthesis\"\n", - " },\n", - " {\n", - " \"front\": \"Methylation status: zygote\",\n", - " \"back\": \"largely unmethylated\"\n", - " },\n", - " {\n", - " \"front\": \"Methylation status: cell differentiation\",\n", - " \"back\": \"patterns of methylation established\"\n", - " },\n", - " {\n", - " \"front\": \"Methylation status: meiosis\",\n", - " \"back\": \"methylation cleared\"\n", - " },\n", - " {\n", - " \"front\": \"Why does DNA methylation reset during meiosis?\",\n", - " \"back\": \"if an embryo begins with methylated DNA, many genes will never be expressed\"\n", - " },\n", - " {\n", - " \"front\": \"Hyper-methylation\",\n", - " \"back\": \"An increase in the epigenetic methylation of cytosine and adenosine residues in DNA\"\n", - " },\n", - " {\n", - " \"front\": \"Hypo-Methylation\",\n", - " \"back\": \" A decrease in the epigenetic methylation of cytosine and adenosine residues in DNA.\"\n", - " }\n", - "];\n", - " \n", - " createCards(\"ezpyyfoIBHpf\");\n", - " " - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "from jupytercards import display_flashcards\n", "display_flashcards(\"./methylation.json\")" @@ -578,31 +20,7 @@ "source": [] } ], - "metadata": { - "environment": { - "kernel": "python3", - "name": "r-cpu.4-1.m95", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/r-cpu.4-1:m95" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - } - }, + "metadata": {}, "nbformat": 4, "nbformat_minor": 5 } diff --git a/docs/quiz_files/rna-pre_module.ipynb b/docs/quiz_files/rna-pre_module.ipynb index 026ad05..d9f45db 100644 --- a/docs/quiz_files/rna-pre_module.ipynb +++ b/docs/quiz_files/rna-pre_module.ipynb @@ -2,635 +2,17 @@ "cells": [ { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "6f16b220-5cdc-405e-8930-acb7c4da535a", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "application/javascript": [ - "/*!\n", - " * swiped-events.js - v1.1.4\n", - " * Pure JavaScript swipe events\n", - " * https://github.com/john-doherty/swiped-events\n", - " * @inspiration https://stackoverflow.com/questions/16348031/disable-scrolling-when-touch-moving-certain-element\n", - " * @author John Doherty \n", - " * @license MIT\n", - " */\n", - "!function(t,e){\"use strict\";\"function\"!=typeof t.CustomEvent&&(t.CustomEvent=function(t,n){n=n||{bubbles:!1,cancelable:!1,detail:void 0};var a=e.createEvent(\"CustomEvent\");return a.initCustomEvent(t,n.bubbles,n.cancelable,n.detail),a},t.CustomEvent.prototype=t.Event.prototype),e.addEventListener(\"touchstart\",function(t){if(\"true\"===t.target.getAttribute(\"data-swipe-ignore\"))return;s=t.target,r=Date.now(),n=t.touches[0].clientX,a=t.touches[0].clientY,u=0,i=0},!1),e.addEventListener(\"touchmove\",function(t){if(!n||!a)return;var e=t.touches[0].clientX,r=t.touches[0].clientY;u=n-e,i=a-r},!1),e.addEventListener(\"touchend\",function(t){if(s!==t.target)return;var e=parseInt(l(s,\"data-swipe-threshold\",\"20\"),10),o=parseInt(l(s,\"data-swipe-timeout\",\"500\"),10),c=Date.now()-r,d=\"\",p=t.changedTouches||t.touches||[];Math.abs(u)>Math.abs(i)?Math.abs(u)>e&&c0?\"swiped-left\":\"swiped-right\"):Math.abs(i)>e&&c0?\"swiped-up\":\"swiped-down\");if(\"\"!==d){var b={dir:d.replace(/swiped-/,\"\"),xStart:parseInt(n,10),xEnd:parseInt((p[0]||{}).clientX||-1,10),yStart:parseInt(a,10),yEnd:parseInt((p[0]||{}).clientY||-1,10)};s.dispatchEvent(new CustomEvent(\"swiped\",{bubbles:!0,cancelable:!0,detail:b})),s.dispatchEvent(new CustomEvent(d,{bubbles:!0,cancelable:!0,detail:b}))}n=null,a=null,r=null},!1);var n=null,a=null,u=null,i=null,r=null,s=null;function l(t,n,a){for(;t&&t!==e.documentElement;){var u=t.getAttribute(n);if(u)return u;t=t.parentNode}return a}}(window,document);\n", - "\n", - "function jaxify(string) {\n", - " var mystring = string;\n", - " console.log(mystring);\n", - "\n", - " var count = 0;\n", - " var loc = mystring.search(/([^\\\\]|^)(\\$)/);\n", - "\n", - " var count2 = 0;\n", - " var loc2 = mystring.search(/([^\\\\]|^)(\\$\\$)/);\n", - "\n", - " //console.log(loc);\n", - "\n", - " while ((loc >= 0) || (loc2 >= 0)) {\n", - "\n", - " /* Have to replace all the double $$ first with current implementation */\n", - " if (loc2 >= 0) {\n", - " if (count2 % 2 == 0) {\n", - " mystring = mystring.replace(/([^\\\\]|^)(\\$\\$)/, \"$1\\\\[\");\n", - " } else {\n", - " mystring = mystring.replace(/([^\\\\]|^)(\\$\\$)/, \"$1\\\\]\");\n", - " }\n", - " count2++;\n", - " } else {\n", - " if (count % 2 == 0) {\n", - " mystring = mystring.replace(/([^\\\\]|^)(\\$)/, \"$1\\\\(\");\n", - " } else {\n", - " mystring = mystring.replace(/([^\\\\]|^)(\\$)/, \"$1\\\\)\");\n", - " }\n", - " count++;\n", - " }\n", - " loc = mystring.search(/([^\\\\]|^)(\\$)/);\n", - " loc2 = mystring.search(/([^\\\\]|^)(\\$\\$)/);\n", - " //console.log(mystring,\", loc:\",loc,\", loc2:\",loc2);\n", - " }\n", - "\n", - " //console.log(mystring);\n", - " return mystring;\n", - "}\n", - "\n", - "window.flipCard = function flipCard(ths) {\n", - " console.log(ths);\n", - " console.log(ths.id);\n", - " ths.classList.toggle(\"flip\"); \n", - " var next=document.getElementById(ths.id+'-next');\n", - " next.style.pointerEvents='none';\n", - " next.classList.add('flipped');\n", - " if (typeof MathJax != 'undefined') {\n", - " var version = MathJax.version;\n", - " console.log('MathJax version', version);\n", - " if (version[0] == \"2\") {\n", - " MathJax.Hub.Queue([\"Typeset\", MathJax.Hub]);\n", - " } else if (version[0] == \"3\") {\n", - " MathJax.typeset([ths]);\n", - " }\n", - " } else {\n", - " console.log('MathJax not detected');\n", - " }\n", - "\n", - "\n", - " setTimeout(reenableNext, 700, next);\n", - "}\n", - "\n", - "function reenableNext(next) {\n", - " next.style.pointerEvents='auto';\n", - "}\n", - "\n", - "\n", - "\n", - "function slide2(containerId) {\n", - " var container = document.getElementById(containerId);\n", - " var next=document.getElementById(containerId+'-next');\n", - " var frontcard = container.children[0];\n", - " var backcard = container.children[1];\n", - " container.style.pointerEvents='none';\n", - " //backcard.style.pointerEvents='none';\n", - " next.style.pointerEvents='none';\n", - " next.classList.remove('flipped');\n", - " next.classList.add('hide');\n", - "\n", - " //container.classList.add(\"prepare\");\n", - " \n", - " container.className=\"flip-container slide\";\n", - " backcard.parentElement.removeChild(frontcard);\n", - " backcard.parentElement.appendChild(frontcard);\n", - " setTimeout(slideback, 600, container, frontcard, backcard, next);\n", - " \n", - "}\n", - "\n", - "\n", - "window.checkFlip = function checkFlip(containerId) {\n", - " var container = document.getElementById(containerId);\n", - "\n", - "\n", - " if (container.classList.contains('flip')) {\n", - " container.classList.remove('flip');\n", - " setTimeout(slide2, 600, containerId);\n", - " } \n", - " else {\n", - " slide2(containerId);\n", - " }\n", - "}\n", - "\n", - "\n", - "function slideback(container, frontcard, backcard, next) {\n", - " container.className=\"flip-container slideback\";\n", - " setTimeout(cleanup, 600, container, frontcard, backcard, next);\n", - "}\n", - "\n", - "function cleanup(container, frontcard, backcard, next) {\n", - " container.removeChild(frontcard);\n", - " backcard.className=\"flipper frontcard\";\n", - " container.className=\"flip-container\";\n", - "\n", - " var cardnum=parseInt(container.dataset.cardnum);\n", - " var cards=eval('cards'+container.id);\n", - " var flipper=createOneCard(container, false, cards, cardnum);\n", - " container.append(flipper);\n", - " cardnum= (cardnum+1) % parseInt(container.dataset.numCards);\n", - " container.dataset.cardnum=cardnum;\n", - " if (cardnum != 1){\n", - " next.innerHTML=\"Next >\";\n", - " } else {\n", - " //next.innerHTML=\"Reload \\\\(\\\\circlearrowleft\\\\) \";\n", - " next.innerHTML='Reload '\n", - " if (typeof MathJax != 'undefined') {\n", - " var version = MathJax.version;\n", - " console.log('MathJax version', version);\n", - " if (version[0] == \"2\") {\n", - " MathJax.Hub.Queue([\"Typeset\", MathJax.Hub]);\n", - " } else if (version[0] == \"3\") {\n", - " MathJax.typeset([next]);\n", - " }\n", - " } else {\n", - " console.log('MathJax not detected');\n", - " }\n", - "\n", - "\n", - " }\n", - "\n", - " if (typeof MathJax != 'undefined') {\n", - " var version = MathJax.version;\n", - " console.log('MathJax version', version);\n", - " if (version[0] == \"2\") {\n", - " MathJax.Hub.Queue([\"Typeset\", MathJax.Hub]);\n", - " } else if (version[0] == \"3\") {\n", - " MathJax.typeset();\n", - " }\n", - " } else {\n", - " console.log('MathJax not detected');\n", - " }\n", - "\n", - "\n", - " next.style.pointerEvents='auto';\n", - " container.style.pointerEvents='auto';\n", - " next.classList.remove('hide');\n", - " container.addEventListener('swiped-left', function(e) {\n", - " /*\n", - " console.log(e.detail);\n", - " console.log(id);\n", - " */\n", - " checkFlip(container.id);\n", - " }, {once: true });\n", - "\n", - "\n", - "}\n", - "\n", - "\n", - "function createOneCard (mydiv, frontCard, cards, cardnum) {\n", - " var colors=[\n", - " '--asparagus',\n", - " '--terra-cotta',\n", - " '--cyan-process'\n", - " ]\n", - "\n", - " var flipper = document.createElement('div');\n", - " if (frontCard){\n", - " flipper.className=\"flipper frontcard\"; \n", - " }\n", - " else {\n", - " flipper.className=\"flipper backcard\"; \n", - " }\n", - "\n", - " var front = document.createElement('div');\n", - " front.className='front flashcard';\n", - "\n", - " var frontSpan= document.createElement('span');\n", - " frontSpan.className='flashcardtext';\n", - " frontSpan.innerHTML=jaxify(cards[cardnum]['front']);\n", - " //frontSpan.textContent=jaxify(cards[cardnum]['front']);\n", - " front.style.background='var(' + colors[cardnum % colors.length] + ')';\n", - "\n", - "\n", - " front.append(frontSpan);\n", - " flipper.append(front);\n", - "\n", - " var back = document.createElement('div');\n", - " back.className='back flashcard';\n", - "\n", - " var backSpan= document.createElement('span');\n", - " backSpan.className='flashcardtext';\n", - " backSpan.innerHTML=jaxify(cards[cardnum]['back']);\n", - " back.append(backSpan);\n", - "\n", - " flipper.append(back);\n", - "\n", - " return flipper;\n", - "\n", - "}\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "function createCards(id) {\n", - " console.log(id);\n", - " \n", - " var mydiv=document.getElementById(id);\n", - " \n", - " var cards=eval('cards'+id);\n", - " mydiv.dataset.cardnum=0;\n", - " mydiv.dataset.numCards=cards.length;\n", - " mydiv.addEventListener('swiped-left', function(e) {\n", - " /*\n", - " console.log(e.detail);\n", - " console.log(id);\n", - " */\n", - " checkFlip(id);\n", - " }, {once: true});\n", - "\n", - " var cardnum=0;\n", - " \n", - " for (var i=0; i<2; i++) {\n", - " \n", - " var flipper;\n", - " if (i==0){\n", - " flipper=createOneCard(mydiv, true, cards, cardnum);\n", - " }\n", - " else {\n", - " flipper=createOneCard(mydiv, false, cards, cardnum);\n", - " }\n", - "\n", - " mydiv.append(flipper);\n", - " if (typeof MathJax != 'undefined') {\n", - " var version = MathJax.version;\n", - " if (typeof version == 'undefined') {\n", - " setTimeout(function(){\n", - " var version = MathJax.version;\n", - " console.log('After sleep, MathJax version', version);\n", - " if (version[0] == \"2\") {\n", - " MathJax.Hub.Queue([\"Typeset\", MathJax.Hub]);\n", - " } else if (version[0] == \"3\") {\n", - " MathJax.typeset([flipper]);\n", - " }\n", - " }, 500);\n", - " } else{\n", - " console.log('MathJax version', version);\n", - " if (version[0] == \"2\") {\n", - " MathJax.Hub.Queue([\"Typeset\", MathJax.Hub]);\n", - " } else if (version[0] == \"3\") {\n", - " MathJax.typeset([flipper]);\n", - " }\n", - " }\n", - " } else {\n", - " console.log('MathJax not detected');\n", - " }\n", - "\n", - "\n", - " cardnum = (cardnum + 1) % mydiv.dataset.numCards;\n", - " }\n", - " mydiv.dataset.cardnum = cardnum;\n", - "\n", - " var next=document.getElementById(id+'-next');\n", - " if (cards.length==1) {\n", - " // Don't show next if no other cards!\n", - " next.style.pointerEvents='none';\n", - " next.classList.add('hide');\n", - " } else {\n", - " next.innerHTML=\"Next >\";\n", - " }\n", - "\n", - " return flipper;\n", - "}\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "var cardsYmLAwqBoJCLI=[\n", - " {\n", - " \"front\": \"DNA stands for\",\n", - " \"back\": \"deoxyribonucleic acid\"\n", - " },\n", - " {\n", - " \"front\": \"What are the structural units of DNA called?\",\n", - " \"back\": \"nucleotides\"\n", - " },\n", - " {\n", - " \"front\": \"Three properties of DNA are\",\n", - " \"back\": \"DNA is a double-stranded molecule that has a long chain of complementary nucleotides and it can self replicate. \"\n", - " },\n", - " {\n", - " \"front\": \"What are the four nitrogenous bases associated with DNA?\",\n", - " \"back\": \"Adenine, thymine, guanine, and cytosine\"\n", - " },\n", - " {\n", - " \"front\": \"What are the four nitrogenous bases associated with RNA\",\n", - " \"back\": \"Adenine, uracil, guanine, and cytosine\"\n", - " },\n", - " {\n", - " \"front\": \"How is a nucleic acid sequence read or copied\",\n", - " \"back\": \"It should be read and copied in a 5' to 3' direction\"\n", - " },\n", - " {\n", - " \"front\": \"What nucleotide does Adenine pair with in DNA?\",\n", - " \"back\": \"Thymine\"\n", - " },\n", - " {\n", - " \"front\": \"What nucleotide does Adenine pair with in RNA?\",\n", - " \"back\": \"Uracil\"\n", - " },\n", - " {\n", - " \"front\": \"What nucleotide does cytosine pair with in DNA?\",\n", - " \"back\": \"Guanine\"\n", - " },\n", - " {\n", - " \"front\": \"A three base-pair segment on the mRNA, that is read as one unit and codes for an amino acid is called a\",\n", - " \"back\": \"Codon\"\n", - " },\n", - " {\n", - " \"front\": \"Process by which mRNA is made from the genomic DNA\",\n", - " \"back\": \"Transcription\"\n", - " },\n", - " {\n", - " \"front\": \"Process by which the mRNA is read in units of 3 bases, matching to amino acids which build a polypeptide chain\",\n", - " \"back\": \"Translation\"\n", - " },\n", - " {\n", - " \"front\": \"If a mutation in the coding sequence does not lead to any amino acid change in the protein sequence, this is known as a\",\n", - " \"back\": \"Synonymous or Silent mutation.\"\n", - " },\n", - " {\n", - " \"front\": \"The mutation that causes a change in the amino acid composition of a protein is called a\",\n", - " \"back\": \"Missense or a non-synonymous mutation\"\n", - " },\n", - " {\n", - " \"front\": \"What is the function or purpose of a tRNA molecule?\",\n", - " \"back\": \"tRNA is responsible for carrying the amino acids to the ribosome where translation is happening\"\n", - " },\n", - " {\n", - " \"front\": \"Why would a cell replicate its DNA?\",\n", - " \"back\": \"A cell would initiate DNA replication prior to dividing into two cells so that each new cell has a copy of the DNA\"\n", - " },\n", - " {\n", - " \"front\": \"Long DNA molecule with part or all of the genetic material of an organism is called a\",\n", - " \"back\": \"Chromosome\"\n", - " },\n", - " {\n", - " \"front\": \"The observable (visible) traits of an individual or organism is known as the\",\n", - " \"back\": \"Phenotype\"\n", - " },\n", - " {\n", - " \"front\": \"The particular alleles for each gene carried by an individual is known as\",\n", - " \"back\": \"Genotype\"\n", - " },\n", - " {\n", - " \"front\": \"Determining the nucleotide sequence of a fragment of DNA is known as\",\n", - " \"back\": \"Sequencing\"\n", - " }\n", - "];\n", - " \n", - " createCards(\"YmLAwqBoJCLI\");\n", - " " - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "from jupytercards import display_flashcards\n", "display_flashcards(\"./rna-pre_module.json\")" ] } ], - "metadata": { - "environment": { - "kernel": "python3", - "name": "r-cpu.4-2.m103", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/r-cpu.4-2:m103" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - } - }, + "metadata": {}, "nbformat": 4, "nbformat_minor": 5 }