-
Notifications
You must be signed in to change notification settings - Fork 2
5. Running RNAseq pipeline as a MERCED job
Users can run the whole RNAseq pipeline as a job on the MERCED cluster. In this option, users will have to set up their input data folders as described in chapter four. After setting up the input data folder, users will modify and use the job template file and submit a job on MERCED cluster.
In the job template file, users will notice some texts in capital text inside square brackets. This style change is to inform users that these texts need to be changed by the user. Below are all the changes required from users -
-
On line 2, [ENTER EMAIL ADDRESS]
: Enter your email address. This is used to notify users about the status of their job. Users will get an email when the job run starts, ends, fails or is aborted. With this information, users can proceed accordingly.-
NOTE
: You will get email notifications when a job is submitted, starts and completes/fails. Therefore, each job run will send out at least three emails from SLURM. Users are recommended to create an inbox rule to help manage emails from the MERCED Slurm system or to delete lines two and three to not receive any emails from SLRUM.
-
-
On line 9, [ENTER_JOB_NAME]
: Enter a suitable name for this job run. An informative name makes it easier to track your job run. A good example isusername_rnaseq
such as bobcat_rnaseq. -
On line 19, [INSERT FULL PATH TO INPUT DATA FOLDER]
: This is the most crucial information that users must supply correctly to avoid major issues with job submission. Users must supply full path to their input folder. To get the full path to their input directory, users can runrealpath /path/to/input_data_folder/
, and copy and paste it job template file.
#!/bin/bash -l
+#SBATCH [email protected] # Notifications about job run
#SBATCH --mail-type=ALL # Email notifications on start, end, abort and cancel job status (DO NOT CHANGE)
#SBATCH --nodes=1 # 1 node requested (DO NOT CHANGE)
#SBATCH --ntasks=20 # 20 CPUs requested (DO NOT CHANGE)
#SBATCH -p fast.q # Using fast.q queue (DO NOT CHANGE)
#SBATCH --time=0-01:00:00 # 1 hour wall time limit for job, after an hour the job will be automatically killed
+#SBATCH --output=/home/bobcat/rnaseq_preprocess.stdout # Error message will be saved to this file (DO NOT CHANGE)
+#SBATCH --job-name=bobcat_rnaseq # Name of this job run, can be changed as per your convenience (NO_SPACES_PERMITTED)
#SBATCH --export=ALL # Environment variables propagated to job environment
# Load anaconda3 module
module load anaconda3
# Load RNAseq environment
source activate RNA-seq
# Command to run preprocessing and differential gene expression
+INPUTFOLDER="/home/bobcat/rnaseq_data/"
pipeline.sh "$INPUTFOLDER" > "$INPUTFOLDER"/preprocess.log
The last line of the job template file runs all the analysis. > "$INPUTFOLDER"/preprocess.log
of the command saves all the information displayed by all software used in the analysis into a file called preprocess.log in the input data folder. Once the run is complete, users can go through the preprocess.log file for checking any issues, if any, with the analysis run. It also provides information about the parameters that were used for specific analyses. This file is also useful to write the methods section of the manuscript.
Once the job template file is changed, save it in your input data folder. The job template file in your input data folder will not affect the analysis. Users can submit their job by running sbatch /path/to/job_template_file
. Once the job is submitted, users will receive a message of successful submission with a job ID associated with the run. Users can check the status of their job submission by running either squeue
or squeue -lu <username>
.