-
Notifications
You must be signed in to change notification settings - Fork 1
Home
sayadennis edited this page Jun 8, 2023
·
4 revisions
Welcome to the BBCAR project wiki! Here, I document the most up-to-date workflow of the project.
- Characterization of genomic aberrations that distinguishes individuals by risk for breast cancer.
- Development of risk stratification multi-view ML model.
- Sequencing data location:
- BBD tissue:
/projects/b1122/saya/raw/bbb_tissue/
- Germline:
/projects/b1122/saya/raw/germline/
- BBD tissue:
- Originally taken from:
- Tumor tissue:
/projects/b1122/Zexian/Alignment/BBCAR/RAW_data/
- Germline:
/projects/b1122/Zexian/Alignment/Germline_37/RAW_data/
- Tumor tissue:
- Clinical data location:
/projects/b1131/saya/bbcar/data/clinical/
- Originally taken from:
- Gannon shared local file with me:
/Users/sayadennis/Projects/bbcar_project/GATK_Analysis_Sample_Status.xlsx
- Files with names starting with
BBCaRDatabaseNU09B2-*
are downloaded from the BBCAR REDCap database. NOTE: the outcome labels of REDCap is apparently not always correct!!! Gannon and Natalie double-checked the outcomes for each patient and correctly labeled withbbcar_label_studyid_from_gatk_filenames.csv
.
- Gannon shared local file with me:
- A subset of samples were sequenced at University of Chicago, and the rest were sequenced at Indiana.
- Which samples were sequenced at Indiana?
- Sample IDs can be found at
/projects/b1131/saya/bbcar/data/sample_ids_uchicago.txt
- Sample IDs can be found at
- What's the difference?
- U Chicago samples: Uses Exome intervals
/projects/b1122/gannon/bbcar/RAW_data/int_lst/SureSelect_v5/
- Indiana samples: Uses Exome intervals
/projects/b1122/gannon/bbcar/RAW_data/int_lst/SureSelect_v6/
- U Chicago samples: Uses Exome intervals
- Which samples were sequenced at Indiana?
- Process data
- Create data summary
- Statistical characterization of features
- Predict breast cancer risk