Testing for correlations between expression in the blood and brain
- Rename tissue ID in TPM file to sample ID in order to select the individuals that have TPM values when comparing 2 tissues
- Use the script
rename_tpm.py
:python rename_tpm.py -h usage: rename_tpm.py [-h] --infile INFILE --outfile OUTFILE Rename tissue ID in TPM file to sample ID optional arguments: -h, --help show this help message and exit --infile INFILE Input the path to the infile. This file is the file after running subset_tpm.py. Example is Blood_tpm.tsv --outfile OUTFILE Input the path to the output file where the tissue ID is renamed to sample ID
- Example:
python rename_tpm.py --infile Blood_tpm.tsv --outfile Blood_tpm_sampleID.tsv python rename_tpm.py --infile Putamen_tpm.tsv --outfile Putamen_tpm_sampleID.tsv
-
Use the script
subset_tissue_tpm_for_shared_samples.py
:python subset_tissue_tpm_for_shared_samples.py -h usage: subset_tissue_tpm_for_shared_samples.py [-h] --tissue_1_tpm TISSUE_1_TPM --tissue_2_tpm TISSUE_2_TPM --tissue_1_tpm_shared TISSUE_1_TPM_SHARED --tissue_2_tpm_shared TISSUE_2_TPM_SHARED Subset tissue tpm files to contain samples that are shared between 2 tissues optional arguments: -h, --help show this help message and exit --tissue_1_tpm TISSUE_1_TPM Input the path to tissue 1 tpm file after renaming. For example: Blood_tpm_sampleID.tsv --tissue_2_tpm TISSUE_2_TPM Input the path to tissue 2 tpm file after renaming. For example: Putamen_tpm_sampleID.tsv --tissue_1_tpm_shared TISSUE_1_TPM_SHARED Input the output file for tissue 1 that contains shared samples --tissue_2_tpm_shared TISSUE_2_TPM_SHARED Input the output file for tissue 2 that contains shared samples
- Example:
python subset_tissue_tpm_for_shared_samples.py --tissue_1_tpm Blood_tpm_sampleID.tsv --tissue_2_tpm Putamen_tpm_sampleID.tsv --tissue_1_tpm_shared Blood_tpm_sampleID_shared_Blood.tsv --tissue_2_tpm_shared Putamen_tpm_sampleID_shared_Putamen.csv
- In addition to 2 new output files
Blood_tpm_sampleID_shared_Blood.tsv
andPutamen_tpm_sampleID_shared_Putamen.csv
, this script prints out:
Number of shared samples is 160 Number of columns of tissue 1 after subsetting for shared samples is 162 Number of columns of tissue 2 after subsetting for shared samples is 162
- So you know that the number of shared samples match.
- Use the script
subset_genes_with_tpm_threshold.py
:
python subset_genes_with_tpm_threshold.py -h
usage: subset_genes_with_tpm_threshold.py [-h] --tpm_threshold TPM_THRESHOLD
--tissue_1_tpm TISSUE_1_TPM
--tissue_2_tpm TISSUE_2_TPM
--tissue_1_out TISSUE_1_OUT
--tissue_2_out TISSUE_2_OUT
Subset genes with certain tpm threshold
optional arguments:
-h, --help show this help message and exit
--tpm_threshold TPM_THRESHOLD
Input the value for the tpm threshold. For example: 1
--tissue_1_tpm TISSUE_1_TPM
Input the path to tissue 1 tpm. For example:
Blood_tpm_sampleID_shared_Putamen.tsv
--tissue_2_tpm TISSUE_2_TPM
Input the path to tissue 2 tpm. For example:
Putamen_tpm_sampleID_shared_Blood.tsv
--tissue_1_out TISSUE_1_OUT
Input the path to tissue 1 output that contains only
genes where tpm across all samples is greater than
threshold.
--tissue_2_out TISSUE_2_OUT
Input the path to tissue 2 output that contains only
genes where tpm across all samples is greater than
threshold.
- Example:
python subset_genes_with_tpm_threshold.py --tpm_threshold 1 --tissue_1_tpm Blood_tpm_sampleID_shared_Blood.tsv --tissue_2_tpm Putamen_tpm_sampleID_shared_Putamen.tsv --tissue_1_out Blood_tpm_sampleID_shared_Blood_genes_tpm_greater_than_1.tsv --tissue_2_out Putamen_tpm_sampleID_shared_Putamen_genes_tpm_greater_than_1.tsv
- This script also prints out:
Number of genes with tpm greater than threshold of 1.0 is: 3243