-
Notifications
You must be signed in to change notification settings - Fork 2
4. Creating input data folder
The main script of RNAseq is the pipeline.sh
file. This single bash script contains all the preprocessing steps - QC filtering with bbduk
, QC check with FASTQC
and, finally, alignment and gene counting with STAR
. pipeline.sh
takes in a single input which is a folder/directory with -
- all raw FASTQ sequence files AND
- the sample metadata Excel file.
The raw FASTQ sequence files may either be compressed (gzipped) or uncompressed. The file names must start with the sample ID, followed by the underscore and the rest of the file name. For example, projectname_date_L001.fastq.gz
should be named sampleid_projectname_date_L001.fastq.gz
. The first part of the file name before the first underscore is how the script knows which sample it is processing. The sample metadata
file contains all metadata associated with input samples including sample ID, genotype, condition, treatment, time, etc. For this repository, the sample metadata file must contain at least two columns - SampleID and Condition. The table below is an example of a sample metadata file, where the first two columns SampleID and Condition are required, and the third column FASTQ_file and beyond is optional, but highly recommended. A comprehensive metadata file also enables convenient sample submission to SRA, once your manuscript is published.
SampleID | Condition | FASTQ_file | Other_Sample_Info |
---|---|---|---|
Sample1A | WT | Sample1A_S8_L001_R1_001.fastq.gz | ... |
Sample1B | Mutant | Sample1B_S8_L001_R1_001.fastq.gz | ... |
Sample2A | WT | Sample2A_S8_L001_R1_001.fastq.gz | ... |
Sample2B | Mutant | Sample2B_S8_L001_R1_001.fastq.gz | ... |
Sample3A | WT | Sample3A_S8_L001_R1_001.fastq.gz | ... |
Sample3B | Mutant | Sample3B_S8_L001_R1_001.fastq.gz | ... |
... | ... | ... | ... |
NOTE: The input directory must contain raw FASTQ files and a sample metadata Excel file. On a broader note, users may implement a user-defined project structure to organize their RNA-seq data. Please go through Cookiecutter Data Science project and a published guide to get ideas on how to organize computational data.
Below is common usage of secure copy scp
function which one of the commands used for transferring files to/from MERCED. The other command is secure file transfer protocol sftp
. Please refer to MERCED wiki for detailed instructions on sftp
function.
scp
syntax -
scp FROM TO
Users can copy an individual file to MERCED using the following command on their machine -
# copy a file named descriptive_file_name.txt from your local machine
scp /full/path/to/descriptive_file_name.txt <username>@merced.ucmerced.edu:/full/path/to/destination/
# enter your MERCED password when prompted
To copy a folder of files from your local machine to MERCED -
# Ordered steps to transfer a directory with multiple FASTQ files to MERCED
# ---------------------------------------
# on your local machine run steps 1 and 2
# ---------------------------------------
# 1. create a folder tarball-gzipped file
tar -cvzf descriptive_folder_name.tar.gz /full/path/to/FASTQ_file_1.fastq.gz /full/path/to/FASTQ_file_2.fastq.gz /full/path/to/FASTQ_file_3.fastq.gz ... /full/path/to/FASTQ_file_N.fastq.gz
# 2. run scp command on descriptive_folder_name.tar.gz
scp /full/path/to/descriptive_folder_name.tar.gz <username>@merced.ucmerced.edu:/full/path/to/destination/
# enter MERCED password when prompted
# ----------------------------
# on MERCED, run steps 3 and 4
# ----------------------------
# 3. login into MERCED and create a folder called descriptive_directory_name
ssh <username>@merced.ucmerced.edu
# enter MERCED password when prompted
mkdir -p /full/path/to/descriptive_directory_name/
# 4. extract tarball contents to descriptive_directory_name
tar -xzvf /full/path/to/descriptive_folder_name.tar.gz -C /full/path/to/descriptive_directory_name/
Users can also use third party clients to transfer files to/from MERCED. FileZilla for Linux and Windows or Cyberduck for MacOS and Windows are alternative to using scp
or sftp
to transfer files with drag and drop.