Skip to content
sayadennis edited this page Jun 8, 2023 · 4 revisions

Welcome to the BBCAR project wiki! Here, I document the most up-to-date workflow of the project.

Project goals

  • Characterization of genomic aberrations that distinguishes individuals by risk for breast cancer.
  • Development of risk stratification multi-view ML model.

Data description

Sequencing reads files

  • Sequencing data location:
    • BBD tissue: /projects/b1122/saya/raw/bbb_tissue/
    • Germline: /projects/b1122/saya/raw/germline/
  • Originally taken from:
    • Tumor tissue: /projects/b1122/Zexian/Alignment/BBCAR/RAW_data/
    • Germline: /projects/b1122/Zexian/Alignment/Germline_37/RAW_data/

Clinical data files

  • Clinical data location: /projects/b1131/saya/bbcar/data/clinical/
  • Originally taken from:
    • Gannon shared local file with me: /Users/sayadennis/Projects/bbcar_project/GATK_Analysis_Sample_Status.xlsx
    • Files with names starting with BBCaRDatabaseNU09B2-* are downloaded from the BBCAR REDCap database. NOTE: the outcome labels of REDCap is apparently not always correct!!! Gannon and Natalie double-checked the outcomes for each patient and correctly labeled with bbcar_label_studyid_from_gatk_filenames.csv.

Sequencing metadata files

  • A subset of samples were sequenced at University of Chicago, and the rest were sequenced at Indiana.
    • Which samples were sequenced at Indiana?
      • Sample IDs can be found at /projects/b1131/saya/bbcar/data/sample_ids_uchicago.txt
    • What's the difference?
      • U Chicago samples: Uses Exome intervals /projects/b1122/gannon/bbcar/RAW_data/int_lst/SureSelect_v5/
      • Indiana samples: Uses Exome intervals /projects/b1122/gannon/bbcar/RAW_data/int_lst/SureSelect_v6/

Workflow

  1. Process data
  2. Create data summary
  3. Statistical characterization of features
  4. Predict breast cancer risk
Clone this wiki locally