Gaps and filters regions to be removed in fragment profile analysis!!
- gaps_filters_hg19.rdata and gaps_filters_hg38.rdata were produced by the script gaps_filters_hg.R, which includes telomeres, centromeres and ENOCDE blacklist regions!!
- AB_hg19.rdata and ab_hg38.rdata are HiC_AB_Compartments downloaded & liftovered from here
- Download reference genome data
You can download reference genome, pre-build BWA index and annotated regions (e.g., blacklist) from ENCODE for hg38 and hg19 on the command line. The manifest file hg38/hg19.tsv will be generated accordingly. Currently, the ENCODE black list and bwa index are mandatory for the manifest file, which you can also create it by yourself based on
.Reference/hg38_template.tsv
with existing data.
## eg: ./download_build_reference.sh hg38 /your/genome/data/path/hg38
$ ./assets/Reference/download_reference.sh [GENOME] [DEST_DIR]
- Build reference genomes index If your sequencing libraries come with spike-ins, you can build new aligner index after combining spike-in genome with human genome. The new index information will be appended to corresponding manifest file.
## eg: ./assets/Reference/build_reference_index.sh hg38 ./data/BAC_F19K16_F24B22.fa hg38_BAC_F19K16_F24B22 /your/genome/data/path/hg38
$ ./assets/Reference/build_reference_index.sh [GENOME] [SPIKEIN_FA] [INDEX_PREFIX] [DEST_DIR]
Spike-in FASTA sequences for two BACs: F19K16 from Arabidopsis Chr1 and F24B22 from Arabidopsis Chr3, and sytheticDNAs were enclosed.
-
SyntheticDNA_Arabidopsis_BACs.fa consists of Arabidopsis BAC (F19K16_F24B22) and sythetic DNA sequences.
-
SyntheticDNA_Arabidopsis_BACs_seqNames.txt: sequences' name
-
How to forge a BSgenome package for the spike-ins
- Spike-in genome
## Get Fasta sequence and transfer to 2bit format with ucsctools
$ faToTwoBit BCA_F19K16_F24B22.fa BCA_F19K16_F24B22.2bit
- Forge BSgenome package
# prepare the seed file according to BSgenome instruction
# eg: BSgenome.Athaliana.BAC.F19K16.F24B22-seed
library(BSgenome)
forgeBSgenomeDataPkg("path/to/seed/file")
- Build package
$ R CMD build /path/to/pkgdir
Full list of commonly used the UMI barcodes for cfMeDIP-seq
- NNT_barcodes.txt ## Barcodes for the pattern of NNT
- UMI_barcodes_OICR.txt ## Barcodes list applied by the OICR protocols