Skip to content

Latest commit

 

History

History
98 lines (66 loc) · 3.77 KB

README.md

File metadata and controls

98 lines (66 loc) · 3.77 KB

UKBiobankGWAS

Notes and code for running UK Biobank GWAS at the MRC IEU

Please note the pipeline is built on University of Bristol infrastructure and this documentation is also for internal use only.

For external researchers - please refer to UK Biobank Genetic Data: MRC-IEU Quality Control, version 2 if you are interested in building the pipeline locally.

Steps

  1. Request directories and permissions for GWAS pipeline to be set up by IEU data manager if not already done
  2. Create input files in your RDSF input directory
  3. Wait for files to be copied over to BC4
  4. On BC4, clone this repo and get .env
  5. Run the GWAS submission script
  6. A job is submitted to the queue that QCs the files, and then creates a new submission job for the GWAS
  7. Wait for GWAS to complete and output files to sync back to RDSF

Locations

RDSF (backed-up)

  • Input:
    • /projects/MRC-IEU/research/data/ukbiobank/software/gwas_pipeline/dev/release_candidate/data/phenotypes/<your_username>/input
  • Output (deprecated):
    • /projects/MRC-IEU/research/data/ukbiobank/software/gwas_pipeline/dev/release_candidate/data/phenotypes/<your_username>/output

BC4 (not backed-up)

  • Input (read-only):
    • /mnt/storage/private/mrcieu/research/UKBIOBANK_GWAS_Pipeline/data/phenotypes/<your_username>/input
  • Output:
    • /mnt/storage/private/mrcieu/research/UKBIOBANK_GWAS_Pipeline/data/phenotypes/<your_username>/output

Details

Create input files on RDSF

Create jobs.csv in RDSF Input directory, containing information on GWAS jobs

  • all column names must be present
  • if no value, provide empty entry e.g. ,,
  • for multiple covariates, separate using ;
name,application_id,pheno_file,pheno_col,covar_file,covar_col,qcovar_col,method
test,123,test.txt,test_name,bolt_covariates.txt,sex;chip,age,bolt
test2,123,test.txt,test_name,bolt_covariates.txt,sex;chip,age,bolt
  • Each gwas job is first checked to make sure both phenotype and covariate files exist in correct format and contain specified columns.
  • If all good, submission script is created and run as a new slurm job

Create phenotype and covariate files, and place them in RDSF input directory as before.

The input files will be synced to BC4 Input directory.

Setup and run job submission code on BC4

Single job

Run from within the repository

sbatch scripts/ukb_gwas.sh

  • by default this will run the first row in jobs.csv
  • can specify rows using 0 based indexing, so row 3 is 2, e.g. sbatch scripts/ukb_gwas.sh 2

Multiple jobs

Run from within this repository

for i in {0..1}; do echo $i; sbatch scripts/ukb_gwas.sh $i; done

Summary

Can generate summary files and parse to create counts:

sbatch UKBiobankGWAS/scripts/summary.sh
python UKBiobankGWAS/scripts/summary_parser.py

Output files

The outputs will be saved in BC4 output directory.

Please note that from May 2023, the outputs will not be synced back automatically to RDSF output directory due to storage shortage, but you may do this manually and still use this RDSF location if necessary. Existing outputs on RDSF will be kept until further notice.

To do

  • add args to allow only qc step
  • add plink