Percent Spliced-In (PSI) values are commonly used to report alternative pre-mRNA splicing (AS) changes.
However, previous PSI-detection methods are limited to specific types of AS events. PSI-Sigma is using a new splicing index (PSIΣ) that is more flexible, can incorporate novel junctions, and can compute PSI values of individual exons in complex splicing events.
- PSI-Sigma is now published: https://www.ncbi.nlm.nih.gov/pubmed/31135034
- PSI-Sigma workshop:
- Docker/Singularity version:
docker pull docker.io/woodydon/psi_sigma_pipeline:4.0
singularity pull docker://woodydon/psi_sigma_pipeline:4.0
singularity pull docker.io/woodydon/psi_sigma_pipeline:4.0
- Run PSI-Sigma with singularity (example):
singularity exec --bind /mnt:/mnt docker://woodydon/psi_sigma_pipeline:4.0 perl /usr/local/bin/PSI-Sigma-2.3/dummyai.pl --gtf Homo_sapiens.GRCh38.100.sorted.gtf --nread 10 --name PSIsigma2d1 --type 1 --fmode 3 --threads 6
- The latest release: https://github.com/wososa/PSI-Sigma/releases/tag/v2.3
- Try the "--help" and "--threads" function.
- Papers using PSI-Sigma: Nature 2019 Nature 2023
- Alignment file for nanopore long-read PCR-cDNA-seq of human U87 cells: https://dropfiles.cshl.edu/link/mpkT92runIvNJ3QeBvZPzC
Kuan-Ting (Woody) Lin, [email protected]
For short-read RNA-seq data, please generate .bam, .bai and .SJ.out files by using STAR (https://github.com/alexdobin/STAR).
###This is an example for short-read RNA-seq###
STAR --runThreadN 6 \
--outSAMtype BAM SortedByCoordinate \
--outFilterIntronMotifs RemoveNoncanonical \
--genomeDir ~/index/starR100H38 \
--twopassMode Basic \
--readFilesIn R1.fastq R2.fastq \
--outFileNamePrefix <NAME>.
samtools index <NAME>.Aligned.sortedByCoord.out.bam
For long-read RNA-seq data, please use GMAP (http://research-pub.gene.com/gmap/src/gmap-gsnap-2017-11-15.tar.gz) or minimap2 (https://github.com/lh3/minimap2).
###This is an example for long-read RNA-seq###
#Example of using GMAP#
~/gmap-2017-11-15/bin/gmap -d GRCh38 -f samse --min-trimmed-coverage=0.5 --no-chimeras -B 5 -t 6 MinION_long_read.fastq > <NAME>.sam
#Example of using minimap2#
~/minimap2-2.17/minimap2 -ax splice:hq -uf H38.fa MinION_long_read.fastq > <NAME>.sam
samtools view -bS <NAME>.sam > <NAME>.bam
samtools sort <NAME>.bam -o <NAME>.Aligned.sortedByCoord.out.bam
samtools index <NAME>.Aligned.sortedByCoord.out.bam
Create links to the .bam, .bai, and .SJ.out files in the a folder (afolder). If you are using long-read RNA-seq data, .SJ.out files will be generated automatically since GMAP doesn't produce the file.
mkdir afolder
cd afolder
ln -s bamfolder/*.bam* .
ln -s bamfolder/*.SJ.* .
Download a .gtf file and sort the coordinates. (NOTE: sorting .gtf file is necessary!)
wget ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens//Homo_sapiens.GRCh38.87.gtf.gz
gzip -d Homo_sapiens.GRCh38.87.gtf.gz
(grep "^#" Homo_sapiens.GRCh38.87.gtf; grep -v "^#" Homo_sapiens.GRCh38.87.gtf | sort -k1,1 -k4,4n) > Homo_sapiens.GRCh38.87.sorted.gtf
rm Homo_sapiens.GRCh38.87.gtf
Create two files: (1) groupa.txt and (2) groupb.txt. Please put the full name or the suffix of your .bam files in the groupa.txt or groupb.txt. For example, the suffix of a "Sequins_MixA.Aligned.sortedByCoord.out.bam" file is "Sequins_MixA". Groupa.txt will be compared with groupb.txt. Below is an example:
#Note: one file name per line in groupa.txt and groupb.txt
echo Sequins_MixA.Aligned.sortedByCoord.out.bam >> groupa.txt
echo Sequins_MixB.Aligned.sortedByCoord.out.bam >> groupb.txt
#Alternatively, you can put only the suffix (WARNNING: only works when the .bam files are linked to the working directory)
echo Sequins_MixA > groupa.txt
echo Sequins_MixB > groupb.txt
Run dummyai.pl. After the .gtf file, please specify 1 for short-read RNA-seq and 2 for long-read RNA-seq. The last column is used to specify the minimum number of supporting reads for an AS event (10 is specified in the example below).
#For short-read RNA-seq (minimum 10 supporting reads for an AS event)
perl ~/PSIsigma/dummyai.pl --gtf Homo_sapiens.GRCh38.87.sorted.gtf --name PSIsigma --type 1 -nread 10
#For long-read RNA-seq (minimum 10 supporting reads for an AS event)
perl ~/PSIsigma/dummyai.pl --gtf Homo_sapiens.GRCh38.87.sorted.gtf --name PSIsigma --type 2 -nread 10
That's it. Filtered results (p<0.01) will be listed in the PSIsigma_r10_ir3.sorted.txt.
- Filtered Results (p<0.01): PSIsigma_r10_ir3.sorted.txt
- Unfiltered Results: PSIsigma_r10_ir3.txt
- Junction Read File: *.SJ.out.tab
- Intronic Read File: *.IR.out.tab
- Database File: *.db
- BAM File: *.bam
- GTF File: *.gtf (http://useast.ensembl.org/info/data/ftp/index.html/)
- Event Region: Genomic coordinates of the splicing event.
- Gene Symbol: Gene symbol of the splicing event.
- Target Exon: Genomic coordinates of the alternative exon.
- Event Type: Category of the splicing event.
- N: the number of valid samples in groupa.txt (influenced by the number of supporting reads).
- T: the number of valid samples in groupb.txt (influenced by the number of supporting reads).
- Exon Type: Whether the exon is a novel exon or an exon related to nonsense mediated decay (NMD).
- Reference Transcript: The transcript ID in the gene annotation file.
- ΔPSI (%): the average difference of PSI values in groupa.txt and groupb.txt.
- T-test p-value: p-value derived from two-sample t-test.
- FDR (BH): false discovery rate based on the p-values.
- N Values: It shows all valid PSI values derived from the .SJ.out.tab files based on groupa.txt. (influenced by the number of supporting reads).
- T Values: It shows all valid PSI values derived from the .SJ.out.tab files based on groupb.txt. (influenced by the number of supporting reads).
- Database ID: It shows the accession number of the splicing event in the database of PSI-Sigma (e.g., PSIsigma.db).
- Perl (https://www.perl.org/get.html)
- Samtools (http://www.htslib.org)
- R (https://www.r-project.org) (For version 1.9k and when --adjp 2 is used)
- PDL::LiteF
- PDL::Stats
- PDL::GSL::CDF
- Statistics::Multtest
- Statistics::R (For version 1.9k and when --adjp 2 is used)
conda create -n PSIsigma r-essentials r-base perl-app-cpanminus
conda activate PSIsigma
conda install python=3.9
conda install -c conda-forge gcc gsl
cpanm PDL::LiteF
cpanm PDL::GSL::CDF
cpanm PDL::Stats
cpanm Statistics::Multtest
cpanm Statistics::R
# 1-a. If you are a sudo user. Set up working directory for Perl library (Using Perl version 5.18 as an example)
export PERL5LIB=/usr/local/lib/perl/5.18
# 1-b. If you are a local user, you can do like this (https://stackoverflow.com/questions/2980297/how-can-i-use-cpan-as-a-non-root-user)
wget -O- http://cpanmin.us | perl - -l ~/perl5 App::cpanminus local::lib
eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib`
echo 'eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib`' >> ~/.bashrc
echo 'export MANPATH=$HOME/perl5/man:$MANPATH' >> ~/.bashrc
# 2. Install GSL
apt-get install -y git make g++ gcc python wget libgsl0-dev
# 3. Install PDL::GSL
cpan App::cpanminus
cpanm PDL::LiteF
cpanm PDL::GSL::CDF
cpanm PDL::Stats
cpanm Statistics::Multtest
cpanm Statistics::R
PSI-Sigma has been tested in Linux and Mac OS environment. You can install Linux bash shell on Windows to run PSI-Sigma.
- Linux Bash Shell on Windows: https://www.howtogeek.com/249966/how-to-install-and-use-the-linux-bash-shell-on-windows-10/
To use the PSIsigma-longread-gene-expression.pl:
perl ~/PSIsigma/PSIsigma-longread-gene-expression.pl Homo_sapiens.GRCh38.87.sorted.gtf Experiment.Aligned.sortedByCoord.out.bam
The default setting is using 4 CPUs to calculate gene expression levels by matching constitutive exons in the gene annotation. An extra perl extension (threads) is needed.
https://www.ncbi.nlm.nih.gov/pubmed/31135034
- Lin, K. T. & Krainer, A. R. PSI-Sigma: a comprehensive splicing-detection method for short-read and long-read RNA-seq analysis. Bioinformatics, doi:10.1093/bioinformatics/btz438 (2019).
- Oxford Nanopore London Calling 2019: https://vimeo.com/339511487
https://pubmed.ncbi.nlm.nih.gov/29449409
- K. T. Lin, W. K. Ma, J. Scharner et al., A human-specific switch of alternatively spliced AFMID isoforms contributes to TP53 mutations and tumor recurrence in hepatocellular carcinoma. Genome Res, (2018).
https://pubmed.ncbi.nlm.nih.gov/31578525
- A. Yoshimi, K. T. Lin, D. H. Wiseman et al., Coordinated alterations in RNA splicing and epigenetic regulation drive leukaemogenesis. Nature 574, 273-277 (2019).
https://pubmed.ncbi.nlm.nih.gov/32001512/
- M. A. Rahman, K. T. Lin, R. K. Bradley et al., Recurrent SRSF2 mutations in MDS affect both splicing and NMD. Genes Dev 34, 413-427 (2020).
https://pubmed.ncbi.nlm.nih.gov/34921016/
- W. K. Ma, D. M. Voss, J. Scharner et al., ASO-based PKM splice-switching therapy inhibits hepatocellular carcinoma growth. Cancer Res, (2021).
https://www.nature.com/articles/s41586-023-05820-3
- A. Jbara, K. T. Lin, C. Stossel et al., RBFOX2 modulates a metastatic signature of alternative splicing in pancreatic cancer. Nature, (2023).
- For licensing, please contact CSHL tech transfer office: [email protected]