Wuming Gong, Nikita Dsouza and Daniel J. Garry
Assay for Transposase-Accessible Chromatin with sequencing (ATAC-seq) reveals chromatin accessibility across the genome. Currently no method specifically detects differential chromatin accessibility. Here, SeATAC uses a conditional variational autoencoder model to learn the latent representation of ATAC-seq V-plots and outperforms MACS2 and NucleoATAC on six separate tasks. Applying SeATAC to several pioneer factor induced differentiation or reprogramming ATAC-seq datasets suggests that induction of these factors not only relaxes the closed chromatin but also decreases chromatin accessibility of 20% to 30% of their target sites. SeATAC is a novel tool to accurately reveal genomic regions with differential chromatin accessibility from ATAC-seq data. SeATAC is available at https://github.com/gongx030/seatac as an R package. The preprint can be found at bioRxiv. Additionally, SeATAC has been used to investigate how Etv2 shape the chromatin landscape in MEF reprogramming and limb development.
Figures | Link | |
---|---|---|
A full V-plot has a width of 640 bp genomic region and a height of 640 bp of fragment sizes. An array of 5 x 10 pixels are aggregated together and become a single larger pixel, resulting in a 128 x 64 pixels image. | Figure 1a | R |
Figures | Link | |
---|---|---|
The ROC curves for SeATAC, NucleoATAC and MACS2 with a shift size of 50 bp. | Figure 2c | R |
The violin plot shows the AUC (area under ROC) of SeATAC, NucleoATAC and MACS2 on 523 ATAC-seq samples from 20 studies. *** Wilcoxon rank sum test p-value < 0.001. | Figure 2d | R |
The AUC of SeATAC, NucleoATAC and MACS2 at different read counts cutoff from 1 to 20 (the minimum reads in a V-plot). | Figure 2e | R |
Figures | Link | |
---|---|---|
The ROC curve for recovering nucleosome positions from ATAC-seq with 0.1%, 1% and 10% of the sequencing reads randomly sampled from the full dataset (GM12878). | Figure 3a | R |
The heatmaps shows the nucleosome density estimated by SeATAC (blue) and NucleoATAC (purple) on a 1% down-sampled dataset. | Figure 3b | R |
The violin plot shows the AUC (area under ROC) of SeATAC and NucleoATAC on 523 ATAC-seq samples from 20 studies. *** Wilcoxon rank sum test p-value < 0.001. | Figure 3c | R |
The AUC of SeATAC and NucleoATAC at different read counts cutoff from 1 to 20 (the minimum reads in a V-plot). | Figure 3d | R |
Figures | Link | |
---|---|---|
The ROC curve for detecting nucleosome changes from ATAC-seq with 10% of the sequencing reads from the full dataset (GM12878). | Figure 4a | R |
The raw and estimated V-plot of a NFR (chr1:113162059-113162698) and a NOR (chr2:226653061-226653700) region are shown | Figure 4b | R |
The heatmaps show the nucleosome density of ~5,000 sampled NOR and NFR regions estimated by SeATAC & NucleoATAC on a 10% down-sampled dataset and NucleoATAC signal on the full dataset (black) & a MNase-seq dataset on GM12878. | Figure 4c | R |
The violin plot shows the AUC (area under ROC) of SeATAC and NucleoATAC on 523 ATAC-seq samples from 20 studies. *** Wilcoxon rank sum test p-value < 0.001. | Figure 4d | R |
The AUC of SeATAC and NucleoATAC at different read counts cutoff from 1 to 20 (the minimum reads in a V-plot). | Figure 4e | R |
Figures | Link | |
---|---|---|
The Venn diagrams show the number of Etv2 motifs with increased chromatin accessibility identified by SeATAC, MACS2 and NucleoATAC Etv2 induced MEF. | Figure 5a | R |
The Venn diagrams show the number of Etv2 motifs with increased chromatin accessibility identified by SeATAC, MACS2 and NucleoATAC Etv2 induced EB differentiation | Figure 5b | R |
The aggregated V-plot includes 1,626, 222 and 2,305 Etv2 motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of Etv2 induced EB differentiation | Figure 5c | R |
The barplots show the Gene Ontology (GO) terms that are significantly associated with the genes which promoters (-5,000 - +1,000bp region flanking the TSS) have Etv2 motifs with increased chromatin accessibility, identified by SeATAC, MACS2 and NucleoATAC. | Figure 5d | R |
Figures | Link | |
---|---|---|
Dot plots comparing the changes of motif associated chromatin accessibility estimated by chromVAR (x-axis) and the difference of the percent of TFBS with decreased or increased chromatin accessibility estimated by SeATAC. | Figure 6a | R |
The barplots show the genomic distribution of Etv2 binding sites with decreased (NFR->NOR) or increased (NOR->NFR) chromatin accessibility in EB differentiation or MEF reprogramming. | Figure 6b | R |
The aggregated V-plot include 3,000 and 1,623 Etv2 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during MEF reprograming. | Figure 6c | R |
The heatmaps showing Etv2, Brg1, H3K27ac ChIP-seq of 3,000 and 1,623 Etv2 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility at day 2.5 EB (Brg1 and H3K27ac), 3 hours post Etv2 induction (Etv2), and 12 hours post Etv2 induction (Etv2, Brg1 and H3K27ac). | Figure 6d | R |
The barplots show the percent of genes that were down-regulated, up-regulated or not changed between day 2.5 EB and 12 hours post Etv2 induction. | Figure 6e | R |
Brachyury (T) and Mycn are significantly down-regulated during the Etv2 induced differentiation. | Fgure 6f | R |
Brachyury (T) and Mycn (f) have Etv2 motifs that become significantly less accessible during the differentiation at their promoter region (-5,000 - +1,000bp region flanking the TSS) | Figure 6g | R |
Figures | Link | |
---|---|---|
The density plots show the observed (red) and corrected (green) fragment size distribution of 13 samples from a human hematopoietic differentiation ATAC-seq data (GSE96771). | Figure S1a | R |
Figures | Link | |
---|---|---|
The plot shows the AUC of SeATAC, NucleoATAC and MACS2 at different shift sizes (from 10 to 100) used to generate the synthetic data for evaluating task #1. | Figure S2a | R |
Figures | Link | |
---|---|---|
The plots show the AUC (area under ROC) of SeATAC on 523 ATAC-seq samples from 20 studies at (a) total read counts (Total QNAMEs), (b) mitochondria rate, (c) proper pair rate, (d) unmapped rate, (e) has unmapped mate rate, (f) non-redundant fraction, (g) PCR bottleneck coefficient 1, and (h) PCR bottleneck coefficient 2. | Figure S3a-h | R |
Figures | Link | |
---|---|---|
The area under ROC (AUC) of three tools, SeATAC, NucleoATAC and MACS2 on the regions over promoter region (column wise) and latent dimensions (row wise). | Figure S4b | R |
The area under ROC (AUC) of three tools on 17 paired RNA-seq / ATAC-seq datasets. | Figure S4c | R |
Figures | Link | |
---|---|---|
The aggregated V-plot includes: 728 and 1,633 NFKB1 binding sites with increased chromatin accessibility in GM12878 compared with K562 at distal and promoter regions, respectively. The heatmap color indicates the estimated read density. | Figure S5a | R |
The line plots include: mean signal of H3K27ac, h3K4me1, H3K4me3 signals of 728 and 1,633 NFKB1 binding sites with increased chromatin accessibility in GM12878 compared with K562 at distal and promoter regions. | Figure S5b | R |
The mean squared error of observed and predicted histone modification signals. | Figure S5d | R |
Figures | Link | |
---|---|---|
The aggregated V-plot includes 2,776, 116 and 1,449 Etv2 motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of Etv2 induced MEF reprogramming | Figure S6a | R |
The heatmaps show the Etv2, Brg1, H3K27ac ChIP-seq of 3,996 and 1,307 Etv2 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility at undifferentiated MEFs (Brg1 and H3K27ac), 1 day post-Etv2 induction (Etv2), and 7 days post-Etv2 induction (Etv2, Brg1 and H3K27ac). | Figure S6b | R |
The UCSC genome browser track show the ATAC-seq density near the Etv2 motifs at the promoters of Brachyury (T) and Mycn. | Figure S6c | T Mycn |
Figures | Link | |
---|---|---|
The Venn diagrams show the number of Ascl1 motifs with increased chromatin accessibility identified by SeATAC, MACS2 and NucleoATAC. | Figure S7a | R |
The barplots show the Gene Ontology (GO) terms that are significantly associated with the genes which promoters (-5,000 - +1,000bp region flanking the TSS) have Ascl1 motifs with increased chromatin accessibility, identified by SeATAC, MACS2 and NucleoATAC. | Figure S7b | R |
The aggregated V-plot includes 8,658, 7,687 and 7,708 Ascl1 motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of Ascl1 induced MEF reprogramming (undifferentiated MEFs vs. 22 days post Ascl1 induction). | Figure S7c | R |
Figures | Link | |
---|---|---|
The Venn diagrams show the number of OSK motifs with increased chromatin accessibility identified by SeATAC, MACS2 and NucleoATAC. | Figure S8a | R |
The barplots show the Gene Ontology (GO) terms that are significantly associated with the genes which promoters (-5,000 - +1,000bp region flanking the TSS) have OSK motifs with increased chromatin accessibility, identified by SeATAC, MACS2 and NucleoATAC. | Figure S8b | R |
The aggregated V-plot includes 5,826, 1,355 and 6,371 OSK motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of OSK induced MEF reprogramming | Figure S8c | R |
Figures | Link | |
---|---|---|
The dot plots compare the changes of motif associated chromatin accessibility estimated by chromVAR (x-axis) and the difference of the percent of TFBS with decreased or increased chromatin accessibility estimated by SeATAC | Figure S9a | R |
The barplots show the genomic distribution of Ascl1 binding sites with decreased (NFR->NOR) or increased (NOR->NFR) chromatin accessibility in MEF reprogramming. | Figure S9b | R |
The aggregated V-plot include 24,098 and 7,071 Ascl1 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during MEF reprograming. | Figure S9c | R |
The heatmaps show the MNase-seq, H3K27m3, H3K36m3, H3K9ac, H3K79me2, H3K4me2, H3K4me1 and P300 ChIP-seq signals in undifferentiated MEFs of 24,098 and 7,071 Ascl1 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during the MEF reprogramming. | Figure S9d | R |
The V-plot show Ascl1 motifs with decreased chromatin accessibility at the promoters (-5,000 - +1,000bp region flanking the TSS) of four genes (Hmga2, Elf4, Egfr and Hes1) that are down-regulated during the Ascl1 induced MEF reprogramming. | Figure S9e | R |
Figures | Link | |
---|---|---|
The dot plots compare the changes of motif associated chromatin accessibility estimated by chromVAR (x-axis) and the difference of the percent of TFBS with decreased or increased chromatin accessibility estimated by SeATAC | Figure S10a | R |
The barplots show the genomic distribution of OSK binding sites with decreased (NFR->NOR) or increased (NOR->NFR) chromatin accessibility in MEF reprogramming. | Figure S10b | R |
The aggregated V-plot include 15,825 and 4,935 OSK binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during MEF reprograming. | Figure S10c | R |
The heatmaps show the MNase-seq, H3K27m3, H3K36m3, H3K9ac, H3K79me2, H3K4me2, H3K4me1 and P300 ChIP-seq signals in undifferentiated MEFs of 15,825 and 4,935 OSK binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during the MEF reprogramming. | Figure S10d | R |
The barplots show the percent of genes that were down-regulated, up-regulated or not changed between undifferentiated MEFs and 7 hours post OSK induction. | Figure S10e | R |
Maf and Smad3 are significantly down-regulated during the OSK induced MEF reprogramming. | Figure S10f | R |
Maf and Smad3 have OSK motifs that become significantly less accessible during the differentiation at their promoter region (-5,000 - +1,000bp region flanking the TSS). | Figure S10g | R |
Figures | Link | |
---|---|---|
Example regions with significantly increased chromatin accessibility from undifferentiated MEFs to D7 Flk1+ samples. (f-j) Example regions with significantly decreased chromatin accessibility from undifferentiated MEFs to D7 Flk1+ samples. | Figure S11a-j | R |