ukbb Sherlock Pipeline

Prerequisites

Stanford only: Ask Jamie to email [email protected] to get a sherlock account. Provide your SUNet ID.

Sherlock OnDemand

Stanford only: Use this online portal to access Sherlock. The genetics cluster also has an online portal. If onDemand is not available for the computing cluster you’re connecting to, use Mobaxterm or connect to the cluster from the shell.

Connecting Sherlock to Github

See this page. Run commands in the terminal after logging into Sherlock. Note: you may need to generate a personal access token to access git from sherlock. This will function as your password when requesting to pull/push from/to github

Download UKBB Data

Generate the input file from the UKBB online database. This is a text file containing the participant ID and field ID. This can be extracted with filters such as “no missing data” from the UKBB website. Also extract the key, bulk, and ukbfetch files.
Place the input file into the commands folder and the bulk, key, and ukbfetch into the raw data folder. Note: To make ukbfetch an executable you may need to run

chmod 777 ukbfetch

To submit batch jobs to download data go back to the code folder and run

bash getData.txt

Note: sometimes only the input files are segmented rather than running the entire code. It usually runs fully if you copy and paste the loop within the shell script into the command line. I have no idea why.

Check to make sure the correct number of files has been downloaded. Use the inputMod.py code to check to see if the expected flies have been run. This code will update the input files

Repeat steps 3 and 4 as necessary. Some reasons that repeating may be necessary (1) the jobs failed, (2) trying to download from more than 10 parallel computers at a time, (3) participants have withdrawn and data is no longer available.

Convert .cwa files to .csv

Download the biobankAccelerometerAnalysis toolbox from github

git clone https://github.com/ZeitzerLab/biobankAccelerometerAnalysis.git cd biobankAccelerometerAnalysis

Run the following command from the ukbb folder to download modules, generate process commands, and convert files into .csvs. NOTE: Paths will need to be set within this file.

bash CWA2CSV.txt

Check to see if the files ran completely. We occasionally ran into instances where a job was canceled.

Unzip the .csv files

Run the following command. NOTE: Paths will need to be set within this file.

sbatch -n 1 -t 1-00:00:00 --job-name=GZ2CSV --wrap="bash /scratch/groups/jzeitzer/UKBB/Code/ukbb/GZ2CSV.txt"

Check to see if the files ran completely. We occasionally ran into instances where a job was canceled.
Run the following command to remove .gz files while keeping .csvs. NOTE: Paths will need to be set within this file.

python3 exIMP.py

Run Median Imputation Code & Calculate Median Day

Run the following command. NOTE: Paths will need to be set within this file.

bash CSV2MEDDAY.txt

Run IVIS (nparACT) code

Run the following command. NOTE: Paths will need to be set within this file.

bash CSV2IVIS.txt

Appendix

Basics and Navigating in the Terminal

ls - Used to see files within your current directory Ls - Used to see files within a different folder pwd - Used to determine your current working directory cd - Used to navigate to your base directory cd - Used to navigate to a specific folder cd .. - Used to navigate up one folder level mkdir - Used to make a new folder in a specific location touch - Used to create a new empty file mv - Used to move a file to a specific location mv - Used to rename a file cp - Used to copy a file cp - Used to make a copy of a file with a new name edit - Used to open a file in a text editor vim - Used to open file in vim text editor nano - Used to open file in nano text editor

Run missing data characterization code

Install the R package dplyr
Run the following command - make sure paths are correct and that proper missing data algorithm is called. There’s a missing data check that determines the total missing data and a missing data length check that determines the number of chunks of missing data and their length

bash CSV2MISSING.txt

Run compilation code to put all outputs in a single file - make sure paths are correct

bash stitchCSV.txt

Download Data from sherlock onto External drive

Open mobaxterm shell and navigate to external drive in shell
Run this command to download files from sherlock to external drive

rsync -avP [email protected]:/scratch/groups/jzeitzer/UKBB?Data/Outputs/Batch_fullfiles/timeSeries/ .

The following command may be necessary to adjust the permissions on the files after they are downloaded, otherwise they may not be able to be executed upon

chmod 777

Files greater than 2000

Check to see if there are files with acceleration greater than 2000

sbatch -n 1 -t 1-00:00:00 --job-name=MoreThan2000 --wrap="R --save < index2000.R”

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.Rhistory		.Rhistory
CSV2ACTDIST.txt		CSV2ACTDIST.txt
CSV2AVGDAY.txt		CSV2AVGDAY.txt
CSV2COS.txt		CSV2COS.txt
CSV2IMPUTED.txt		CSV2IMPUTED.txt
CSV2IVIS.txt		CSV2IVIS.txt
CSV2MDY.txt		CSV2MDY.txt
CSV2MEDDAY.txt		CSV2MEDDAY.txt
CSV2MISSING.txt		CSV2MISSING.txt
CWA2CSV.txt		CWA2CSV.txt
GZ2CSV.txt		GZ2CSV.txt
README.md		README.md
calcCos.R		calcCos.R
calcIVIS.R		calcIVIS.R
chunkActDist.py		chunkActDist.py
chunkActDist.py.save		chunkActDist.py.save
chunkAvgDay.py		chunkAvgDay.py
chunkCSV.py		chunkCSV.py
chunkCos.py		chunkCos.py
chunkIMP.py		chunkIMP.py
chunkInput.py		chunkInput.py
chunkMDY.py		chunkMDY.py
chunkMDYy.py		chunkMDYy.py
chunkMedDay.py		chunkMedDay.py
chunkMissing.py		chunkMissing.py
chunkSZN.py		chunkSZN.py
exIMP.py		exIMP.py
gapDur.m		gapDur.m
getData.txt		getData.txt
index2000.R		index2000.R
index4000.R		index4000.R
meanDay.m		meanDay.m
medianImputation.m		medianImputation.m
missingDataCheck.R		missingDataCheck.R
missingDataLength.R		missingDataLength.R
processCommandsMod.py		processCommandsMod.py
runActDist.m		runActDist.m
runActDist.txt		runActDist.txt
runAvgDay.m		runAvgDay.m
runAvgDay.txt		runAvgDay.txt
runCos.txt		runCos.txt
runIVIS.txt		runIVIS.txt
runMDY.m		runMDY.m
runMDY.txt		runMDY.txt
runMeanDay.m		runMeanDay.m
runMedDay.m		runMedDay.m
runMedDay.txt		runMedDay.txt
runMedImp.m		runMedImp.m
runMedImp.txt		runMedImp.txt
runMissingDataCheck.txt		runMissingDataCheck.txt
runMissingDataLength.txt		runMissingDataLength.txt
stitchActDist.py		stitchActDist.py
stitchActDist2.py		stitchActDist2.py
stitchCSV.py		stitchCSV.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ukbb Sherlock Pipeline

Prerequisites

Sherlock OnDemand

Connecting Sherlock to Github

Download UKBB Data

Convert .cwa files to .csv

Unzip the .csv files

Run Median Imputation Code & Calculate Median Day

Run IVIS (nparACT) code

Appendix

Basics and Navigating in the Terminal

Run missing data characterization code

Download Data from sherlock onto External drive

Files greater than 2000

About

Releases

Packages

Contributors 2

Languages

ZeitzerLab/ukbb

Folders and files

Latest commit

History

Repository files navigation

ukbb Sherlock Pipeline

Prerequisites

Sherlock OnDemand

Connecting Sherlock to Github

Download UKBB Data

Convert .cwa files to .csv

Unzip the .csv files

Run Median Imputation Code & Calculate Median Day

Run IVIS (nparACT) code

Appendix

Basics and Navigating in the Terminal

Run missing data characterization code

Download Data from sherlock onto External drive

Files greater than 2000

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages