-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Welcome to the BBCAR project wiki! Here, I document the most up-to-date workflow of the project.
Breast cancer remains a formidable global health challenge affecting women. Timely identification and prevention are pivotal in reducing mortality rates associated with the disease. Benign breast disease (BBD) diagnoses are common among women, with around one-third of BBD cases eventually progressing to breast cancer. However, BBD alone is typically not a strong enough risk factor for patients to take up preventive therapy.
This study aims to refine breast cancer risk stratification in women diagnosed with BBD. Leveraging whole-exome sequencing of BBD biopsy tissues, complemented by a subset of germline sequencing data, predictive models were constructed. Employing machine learning techniques, these models were trained and evaluated for their capacity to predict breast cancer risk.
- Characterization of genomic aberrations in BBD that distinguishes individuals by risk for breast cancer.
- Development of risk stratification ML model using genomic aberrations as input features.
- Originally taken from:
- Tumor tissue:
/projects/b1122/Zexian/Alignment/BBCAR/RAW_data/
- Germline:
/projects/b1122/Zexian/Alignment/Germline_37/RAW_data/
- Tumor tissue:
- Copied to:
- BBD tissue:
/projects/b1122/saya/raw/bbb_tissue/
- Germline:
/projects/b1122/saya/raw/germline/
- BBD tissue:
- Currently using:
- BBD tissue:
/projects/b1131/saya/bbcar/data/00_raw/tissue/
- Germline:
/projects/b1131/saya/bbcar/data/00_raw/germline/
- BBD tissue:
- Clinical data location:
/projects/b1131/saya/bbcar/data/clinical/
- Originally taken from:
- Gannon shared local file with me:
/Users/sayadennis/Projects/bbcar_project/GATK_Analysis_Sample_Status.xlsx
- Files with names starting with
BBCaRDatabaseNU09B2-*
are downloaded from the BBCAR REDCap database. NOTE: the outcome labels of REDCap is apparently not always correct!!! Gannon and Natalie double-checked the outcomes for each patient and correctly labeled withbbcar_label_studyid_from_gatk_filenames.csv
.
- Gannon shared local file with me:
- A subset of samples were sequenced at University of Chicago, and the rest were sequenced at Indiana.
- Which samples were sequenced at Indiana?
- Sample IDs can be found at
/projects/b1131/saya/bbcar/data/sample_ids_uchicago.txt
- Sample IDs can be found at
- What is the difference?
- U Chicago samples: Uses Exome intervals
/projects/b1122/gannon/bbcar/RAW_data/int_lst/SureSelect_v5/
- Indiana samples: Uses Exome intervals
/projects/b1122/gannon/bbcar/RAW_data/int_lst/SureSelect_v6/
- U Chicago samples: Uses Exome intervals
- Which samples were sequenced at Indiana?
- Process data
- Create data summary
- Statistical characterization of features
- Predict breast cancer risk