Bhandari BK, Lim CS, Remus DM, Chen A, van Dolleweerd C, Gardner PP. (2021). Analysis of 11,430 recombinant protein production experiments reveals that protein yield is tunable by synonymous codon changes of translation initiation sites. PLOS Computational Biology. 17(10), e1009461. DOI:10.1371/journal.pcbi.1009461
Note: TIsigner is not intended for optimising signal peptide encoded sequences. Optimizing RNA accessibility could interfere with signal peptide translation arrest, potentially leading to unintended consequences.
- This repository contains the scripts and Jupyter notebooks to reproduce the results and figures of this preprint. The source code of TIsigner webserver is available here.
- Dependencies can be installed using Anaconda3. For example,
conda install -c bioconda viennarna
. ViennaRNA can also be installed according to the instructions here. - IXnos requires python2 to run.
- openen.py is a wrapper for RNAplfold using multiple processes. It is useful to calculate the opening energy of multi-fasta sequences. The output can be analysed as in Fig1_2_S1_S2.ipynb
$ python openen.py -h
usage: openen.py [-h] -s STR [-U STR/INT] [-x] [-W INT] [-u INT] [-S] [-n INT]
[-t INT] [-e] [-i INT] [-l INT] [-r] [-o STR] [-p INT]
RNAplfold wrapper using multiprocesses
optional arguments:
-h, --help show this help message and exit
-s STR, --sequence STR
Sequences in fasta or csv format
-U STR/INT, --utr STR/INT
Use an integer if 5UTR presence, e.g., -U 1. Use your
own 5UTR sequence if your plasmid backbone is not of
pET vector. Default = GGGGAATTGTGAGCGGATAACAATTCCCCTCT
AGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACAT
-x, --execute Run RNAplfold multiprocessing
-W INT, --winsize INT
Average the pair probabilities over windows of given
size. An RNAplfold option. Default = 210
-u INT, --ulength INT
Compute the mean probability that subsegments of
length 1 to a given length are unpaired. An RNAplfold
option. Default = 210
-S, --stack Stack _openen dataframes to single-column dataframes,
concatenate them as a single pandas dataframe and
output it as a .pkl pickle file. Requires i and j
options
-n INT, --utrlength INT
The length of 5UTR. Related to option -S and -e.
Default = 71
-t INT, --distance INT
Downstream distance to start codon to include when
stacking. Related to option -S. Default = 100
-e, --parse Parsing _openen dataframes to get opening energy of
unpaired subsegments. Requires i and l options
-i INT, --ipos INT Position i centered at start codon of an input
sequence. Related to option -e. Default = 18
-l INT, --length INT Subsegment l as in _openen file. Related to option -e.
Default = 48
-r, --remove Remove _openen and .ps files
-o STR, --output STR Output file name for .pkl. Related to -S. Default =
openen
-p INT, --processes INT
Number of processes to spawn. Default = half of the
number of CPU
© Bikash Kumar Bhandari, Chun Shem Lim, Paul P Gardner (2019-)