Stanford only: Ask Jamie to email [email protected] to get a sherlock account. Provide your SUNet ID.
Stanford only: Use this online portal to access Sherlock. The genetics cluster also has an online portal. If onDemand is not available for the computing cluster you’re connecting to, use Mobaxterm or connect to the cluster from the shell.
See this page. Run commands in the terminal after logging into Sherlock. Note: you may need to generate a personal access token to access git from sherlock. This will function as your password when requesting to pull/push from/to github
-
Generate the input file from the UKBB online database. This is a text file containing the participant ID and field ID. This can be extracted with filters such as “no missing data” from the UKBB website. Also extract the key, bulk, and ukbfetch files.
-
Place the input file into the commands folder and the bulk, key, and ukbfetch into the raw data folder. Note: To make ukbfetch an executable you may need to run
chmod 777 ukbfetch
- To submit batch jobs to download data go back to the code folder and run
bash getData.txt
Note: sometimes only the input files are segmented rather than running the entire code. It usually runs fully if you copy and paste the loop within the shell script into the command line. I have no idea why.
- Check to make sure the correct number of files has been downloaded. Use the inputMod.py code to check to see if the expected flies have been run. This code will update the input files
Repeat steps 3 and 4 as necessary. Some reasons that repeating may be necessary (1) the jobs failed, (2) trying to download from more than 10 parallel computers at a time, (3) participants have withdrawn and data is no longer available.
- Download the biobankAccelerometerAnalysis toolbox from github
git clone https://github.com/ZeitzerLab/biobankAccelerometerAnalysis.git cd biobankAccelerometerAnalysis
- Run the following command from the ukbb folder to download modules, generate process commands, and convert files into .csvs. NOTE: Paths will need to be set within this file.
bash CWA2CSV.txt
- Check to see if the files ran completely. We occasionally ran into instances where a job was canceled.
- Run the following command. NOTE: Paths will need to be set within this file.
sbatch -n 1 -t 1-00:00:00 --job-name=GZ2CSV --wrap="bash /scratch/groups/jzeitzer/UKBB/Code/ukbb/GZ2CSV.txt"
-
Check to see if the files ran completely. We occasionally ran into instances where a job was canceled.
-
Run the following command to remove .gz files while keeping .csvs. NOTE: Paths will need to be set within this file.
python3 exIMP.py
- Run the following command. NOTE: Paths will need to be set within this file.
bash CSV2MEDDAY.txt
- Run the following command. NOTE: Paths will need to be set within this file.
bash CSV2IVIS.txt
ls - Used to see files within your current directory Ls - Used to see files within a different folder pwd - Used to determine your current working directory cd - Used to navigate to your base directory cd - Used to navigate to a specific folder cd .. - Used to navigate up one folder level mkdir - Used to make a new folder in a specific location touch - Used to create a new empty file mv - Used to move a file to a specific location mv - Used to rename a file cp - Used to copy a file cp - Used to make a copy of a file with a new name edit - Used to open a file in a text editor vim - Used to open file in vim text editor nano - Used to open file in nano text editor
-
Install the R package dplyr
-
Run the following command - make sure paths are correct and that proper missing data algorithm is called. There’s a missing data check that determines the total missing data and a missing data length check that determines the number of chunks of missing data and their length
bash CSV2MISSING.txt
- Run compilation code to put all outputs in a single file - make sure paths are correct
bash stitchCSV.txt
-
Open mobaxterm shell and navigate to external drive in shell
-
Run this command to download files from sherlock to external drive
rsync -avP [email protected]:/scratch/groups/jzeitzer/UKBB?Data/Outputs/Batch_fullfiles/timeSeries/ .
- The following command may be necessary to adjust the permissions on the files after they are downloaded, otherwise they may not be able to be executed upon
chmod 777
- Check to see if there are files with acceleration greater than 2000
sbatch -n 1 -t 1-00:00:00 --job-name=MoreThan2000 --wrap="R --save < index2000.R”