2_gdsc+tcga.lyx

#LyX 2.2 created this file. For more info see http://www.lyx.org/
\lyxformat 508
\begin_document
\begin_header
\save_transient_properties true
\origin unavailable
\textclass article
\begin_preamble
\usepackage[font=small,labelfont=bf]{caption}

\newcommand*\name[1]{#1}                                                                                                                                                                                                                       
\newcommand*\abbrv[1]{#1}                                                                                                                                                                                                                      
\newcommand*\rpkg[1]{\textit{#1}}                                                                                                                                                                                                              
\newcommand*\file[1]{\textit{#1}}                                                                                                                                                                                                              
\newcommand{\code}[1]{\texttt{#1}}                                                                                                                                                                                                             
                                                                                                                                                                                                                                               
\newcommand*\protein[1]{#1}                                                                                                                                                                                                                    
\newcommand*\gene[1]{\textit{#1}}                                                                                                                                                                                                              
\end_preamble
\use_default_options true
\maintain_unincluded_children false
\language british
\language_package default
\inputencoding auto
\fontencoding global
\font_roman "default" "default"
\font_sans "default" "default"
\font_typewriter "default" "default"
\font_math "auto" "auto"
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100 100
\font_tt_scale 100 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\spacing single
\use_hyperref false
\papersize default
\use_geometry false
\use_package amsmath 1
\use_package amssymb 1
\use_package cancel 1
\use_package esint 1
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 1
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 1
\index Index
\shortcut idx
\color #008000
\end_index
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header

\begin_body

\begin_layout Part
Gene set methods for drug response
\end_layout

\begin_layout Standard
Pathway methods are often used in a cancer context, both for cell lines
 and primary tumours.
 Most of the time, the method of choice is to take a gene set from either
 Gene Ontology (GO) 
\begin_inset CommandInset citation
LatexCommand cite
key "Ashburner2000-xl"

\end_inset

, KEGG 
\begin_inset CommandInset citation
LatexCommand cite
key "Kanehisa2000-mp"

\end_inset

 or Reactome 
\begin_inset CommandInset citation
LatexCommand cite
key "Croft2011-po"

\end_inset

, and calculate a combined expression score using either a Fisher's exact
 test (e.g.
 by a tool called DAVID
\begin_inset Foot
status open

\begin_layout Plain Layout
note that although this tool is still widely used, it has last been updated
 in 2010 and misses a lot of annotations 
\begin_inset CommandInset citation
LatexCommand cite
key "Wadi2016-wc"

\end_inset


\end_layout

\end_inset

) if one is to test gene sets against differentially expressed genes, or
 some variant of Gene Set Enrichment Analysis (GSEA) 
\begin_inset CommandInset citation
LatexCommand cite
key "Subramanian2005-pd"

\end_inset

 if the sets are pre-defined, but one wants to avoid cutting of continuous
 expression values at an arbitrary threshold.
 There are, however, more advanced pathway methods available.
 Signalling Pathway Impact Analysis 
\begin_inset CommandInset citation
LatexCommand cite
key "Tarca2008-ey"

\end_inset

 and Differential Expression Analysis for Pathways 
\begin_inset CommandInset citation
LatexCommand cite
key "Haynes2013-eo"

\end_inset

 take into account the directionality and sign of edges in a pathway.
 Pathifier 
\begin_inset CommandInset citation
LatexCommand cite
key "Drier2013-vn"

\end_inset

 calculates probable information flow between the set items.
 PARADIGM 
\begin_inset CommandInset citation
LatexCommand cite
key "Vaske2010-zn"

\end_inset

 employs a Bayesian framework that models translation, activity, and interaction
s.
 I will leave the more complex methods for a later chapter and focus on
 GSEA using different gene sets here.
\end_layout

\begin_layout Standard
GSEA using GO gene sets is ubiquitous, often following a differential expression
 analysis to see which higher-level function the differentially expressed
 genes mediate.
 After computing the enrichment scores, our list of genes is condensed down
 to a list of significantly enriched GO categories that may be related to
 the phenotype we are observing.
 This may work very well in some cases.
 There are, however, a couple of caveats to observe: (1) a gene does not
 exclusively belong to one process; we might very well get a significant
 p-value only caused by the overlap between different sets, (2) if we test
 all categories and correct by false discovery rate we might dilute our
 signal so much that small categories can no longer be significant, or (3)
 the process that did indeed cause our phenotype does not correspond to
 a gene set at all (this can be due to missing biological knowledge, annotation
 errors, or simply the fact that curators have not yet added a certain gene
 to a certain category).
 Maybe the most dangerous caveat of them all is that once we see our list
 of resulting categories, we are inclined to pick out category that 
\begin_inset Quotes eld
\end_inset

makes sense
\begin_inset Quotes erd
\end_inset

.
 Taking this selection of desired categories on its head, we may also be
 inclined to overlook a category that we don't want to see, e.g.
 because the involved process is already known in literature and we could
 not publish our new findings in a high-impact journal.
 The aim of this chapter is to illustrate these issues.
\end_layout

\begin_layout Standard
I will use this chapter to examine which processes are involved in making
 cancer cell lines sensitive or resistant to different drugs in the GDSC
 panel 
\begin_inset CommandInset citation
LatexCommand cite
key "Garnett2012-dk,Iorio2016-gh"

\end_inset

.
 I will not filter the gene sets I use, to see how well represented signalling
 pathways are among the top hits for drug sensitivity, where they are known
 to play a pivotal role for targeted therapies 
\begin_inset CommandInset citation
LatexCommand cite
key "Garnett2012-dk,Iorio2016-gh,Yap2012-mi"

\end_inset

.
\end_layout

\begin_layout Standard
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
Results obtained in section 
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:Pathway-responsive-genes-SPEED"

\end_inset

 contributed the pathway scores for latest publication of the GDSC screening.
 All analyses, plots, and written text in this thesis I produced myself:
\end_layout

\begin_layout Standard
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, 
\series bold
Schubert M
\series default
, Aben N, Gonçalves E, Barthorpe S, Lightfoot H, Cokelaer T, Greninger P,
 van Dyk E, Chang H, de Silva H, Heyn H, Deng X, Egan RK, Liu Q, Mironenko
 T, Mitropoulos X, Richardson L, Wang J, Zhang T, Moran S, Sayols S, Soleimani
 M, Tamborero D, Lopez-Bigas N, Ross-Macdonald P, Esteller M, Gray NS, Haber
 DA, Stratton MR, Benes CH, Wessels LF, Saez-Rodriguez J, McDermott U, Garnett
 MJ.
 
\begin_inset Quotes eld
\end_inset


\shape italic
A Landscape of Pharmacogenomic Interactions in Cancer
\shape default

\begin_inset Quotes erd
\end_inset

.
 
\series bold
Cell
\series default
 (2016).
\end_layout

\begin_layout Section
Methods used throughout this thesis
\end_layout

\begin_layout Subsection
Gene sets
\end_layout

\begin_layout Standard
To obtain Gene Ontology sets, I used the 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
rpkg{BioMart}
\end_layout

\end_inset

 R package 
\begin_inset CommandInset citation
LatexCommand cite
key "Smedley2009-xo"

\end_inset

 to query the Ensembl 
\begin_inset CommandInset citation
LatexCommand cite
key "Hubbard2002-lv,Yates2016-bw"

\end_inset

 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
code{hsapiens
\backslash
_gene
\backslash
_ensembl}
\end_layout

\end_inset

 database for all 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
abbrv{HGNC}
\end_layout

\end_inset

 symbols that had a Gene Ontology 
\begin_inset CommandInset citation
LatexCommand cite
key "Ashburner2000-xl,Gene_Ontology_Consortium2004-sn"

\end_inset

 ID (
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
code{go
\backslash
_id}
\end_layout

\end_inset

 field) associated with them, yielding three main categories (biological
 process, molecular function, cellular compartment) with 16413 gene sets
 covering 18806 genes total
\begin_inset Foot
status open

\begin_layout Plain Layout
query of Ensembl Biomart on March 1st 2016
\end_layout

\end_inset

.
 For Reactome 
\begin_inset CommandInset citation
LatexCommand cite
key "Croft2011-po"

\end_inset

, I downloaded the file 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
file{ReactomePathways.gmt}
\end_layout

\end_inset


\begin_inset Foot
status open

\begin_layout Plain Layout
\begin_inset CommandInset href
LatexCommand href
target "http://www.reactome.org/pages/download-data/"

\end_inset


\end_layout

\end_inset

.
 It contained a total of 1675 pathways covering 7852 genes.
\end_layout

\begin_layout Standard
For other gene sets, I used the 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
name{Enrichr}
\end_layout

\end_inset

 platform 
\begin_inset CommandInset citation
LatexCommand cite
key "Chen2013-da"

\end_inset

 and the gene sets the authors assembled in their GitHub repository
\begin_inset Foot
status open

\begin_layout Plain Layout
\begin_inset CommandInset href
LatexCommand href
target "https://github.com/yokuyuki/Enrichr"

\end_inset


\end_layout

\end_inset

.
 They encompassed gene sets for 35 pathway and pathway-related resources,
 including Gene Ontology, Reactome (where I queried the original databases
 to obtain more up-to-date gene lists), as well as KEGG 
\begin_inset CommandInset citation
LatexCommand cite
key "Kanehisa2000-mp"

\end_inset

 (that already used the last non-commercial release).
\end_layout

\begin_layout Subsection
Gene Set Variation Analysis (GSVA)
\begin_inset CommandInset label
LatexCommand label
name "subsec:GSVA"

\end_inset


\end_layout

\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open

\begin_layout Plain Layout
\align center
\begin_inset Graphics
	filename figures/GSEA_GSVA_schema.pdf
	width 100text%

\end_inset


\end_layout

\begin_layout Plain Layout
\begin_inset Caption Standard

\begin_layout Plain Layout
\begin_inset Argument 1
status collapsed

\begin_layout Plain Layout
Calculation of the running sum statistic for GSEA and GSVA
\end_layout

\end_inset

Calculation of the running sum statistic for GSEA and GSVA.
 Calculation of the running sum statistic (top left) and absolute deviation
 for enrichment score in case of GSEA vs.
 the difference in GSVA.
 Genes are ordered by differential expression, genes that are in the query
 set are indicated by black bars.
 The red line indicates the running sum score where a score is added each
 time there is a hit and subtracted otherwise.
 GSEA hence produces a bimodal distribution of scores (left), while GSVA
 produces a unimodal distribution (right).
 This is why the former needs label shuffling of two conditions (bottom
 left) to compute empirical p-values, while the latter produces scores for
 each sample (but no statistical significance; bottom right).
\begin_inset CommandInset label
LatexCommand label
name "fig:GSEA_schema"

\end_inset


\end_layout

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Standard
Gene Set Enrichment Analysis 
\begin_inset CommandInset citation
LatexCommand cite
key "Subramanian2005-pd"

\end_inset

 is the 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
latin{de facto}
\end_layout

\end_inset

 standard to compute the expression level of a set of genes.
 It uses as an input a ranked list of genes (
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
latin{e.g.}
\end_layout

\end_inset

 fold changes).
 It then computes the running sum of a set of interest by starting at the
 beginning of this list and adding a score if the current gene is in the
 set, or subtracts a score otherwise.
 This can be summarised like the following:
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
s_{g+1}=s_{g}+\left\{ \begin{array}{c}
1/n_{set}\\
-1/(n-n_{set})
\end{array}\right.\begin{array}{c}
g\in set\\
g\notin set
\end{array}
\]

\end_inset


\end_layout

\begin_layout Standard
A schema of this calculation is shown in figure 
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:GSEA_schema"

\end_inset

.
 In the case of GSEA, the overall score is the maximal deviation from zero.
 As is shown in the example, this leads to a bimodal distribution of scores
 when testing different sets or the same set on different samples, because
 even if the genes in the set of interest are evenly spread, there will
 always be a deviation.
 As GSEA is commonly used to compute the significance of enrichment between
 two conditions (left panels in figure 
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:GSEA_schema"

\end_inset

), this is not a problem: we can obtain the distribution of scores under
 the null hypothesis by shuffling the labels of the reference and the samples
 we are looking at, and then compute the empirical p-value as quantile of
 this distribution.
 This, however, also means that we need to compare two conditions in order
 to do this reliably.
 We can not compute enrichment scores for each individual sample.
\end_layout

\begin_layout Standard
Gene Set Variation Analysis 
\begin_inset CommandInset citation
LatexCommand cite
key "Hanzelmann2013-xl"

\end_inset

 solves this: instead of taking the maximal deviation, it takes the difference
 between maximum positive and negative enrichment score.
 This directly yields a unimodal distribution of enrichment scores in different
 samples that can hence be used in statistical tests that assume normality
\begin_inset Foot
status open

\begin_layout Plain Layout
raw gene enrichment scores could still be used in nonparametric tests, but
 they are usually less powerful
\end_layout

\end_inset

.
 As I am interested in correlating one continuous value (drug sensitivity)
 with the set enrichment score, using GSVA (and the 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
rpkg{GSVA}
\end_layout

\end_inset

 R package) instead of GSEA is the natural choice.
\end_layout

\begin_layout Standard
\begin_inset Note Note
status open

\begin_layout Plain Layout
TODO_SUBSECTION: Linear associations –> describe OLS (maybe I can get away
 w/o)
\end_layout

\end_inset


\end_layout

\begin_layout Subsection
Drug associations using the half-maximum inhibitory concentration (IC50)
\begin_inset CommandInset label
LatexCommand label
name "subsec:Drug-associations"

\end_inset


\end_layout

\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open

\begin_layout Plain Layout
\align center
\begin_inset Graphics
	filename figures/IC50_schema.pdf
	width 60col%

\end_inset


\end_layout

\begin_layout Plain Layout
\begin_inset Caption Standard

\begin_layout Plain Layout
\begin_inset Argument 1
status collapsed

\begin_layout Plain Layout
Schema for calculation of 
\begin_inset Formula $IC_{50}$
\end_inset

 values
\end_layout

\end_inset

Calculation of 
\begin_inset Formula $IC_{50}$
\end_inset

 values.Calculation of 
\begin_inset Formula $IC_{50}$
\end_inset

 values.
 Cell viability is measured at different drug concentrations and then a
 drug response curve is fitted to the data.
 The halfway point of viability between the minimum (
\begin_inset Formula $E_{min}$
\end_inset

) and maximum effect (
\begin_inset Formula $E_{max}$
\end_inset

) is the 
\begin_inset Formula $IC_{50}$
\end_inset

.
 The curve is defined by these three values and the steepness of the slope.
\begin_inset CommandInset label
LatexCommand label
name "fig:IC50schema"

\end_inset


\end_layout

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Standard
The original GDSC data set contained different dilutions of drugs that the
 cell lines in the panel were subjected to, measuring how much it interfered
 with their growth.
 Since I have got a pathway score for each cell line, I also need a single
 value corresponding to the sensitivity to a given drug.
 One way to do this is to measure the growth inhibition at different concentrati
ons, and then fit a dose-response curve to the data points, interpolating
 (or extrapolating, if necessary) the concentration at which the half-maximal
 inhibition occurred.
 This term is referred to the 
\begin_inset Formula $IC_{50}$
\end_inset

 value, and has already been calculated in 
\begin_inset CommandInset citation
LatexCommand cite
key "Iorio2016-gh"

\end_inset

.
 The curve to fit is of sigmoid shape (figure 
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:IC50schema"

\end_inset

) and has the formula:
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
x=IC_{50}\left(\frac{y-E_{min}}{E_{max}-y}\right)^{-slope}
\]

\end_inset


\end_layout

\begin_layout Standard
I obtained already processed gene expression matrix from the GDSC cell lines
 and their fitted 
\begin_inset Formula $IC_{50}$
\end_inset

 values to 265 public drugs from the GDSC publication 
\begin_inset CommandInset citation
LatexCommand cite
key "Iorio2016-gh"

\end_inset

.
 I performed a linear regression using the 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
code{lm}
\end_layout

\end_inset

 function in R between the gene set score as an independent variable (
\begin_inset Formula $S_{j}$
\end_inset

, where 
\begin_inset Formula $j$
\end_inset

 corresponds to each different phenotype from 
\begin_inset Formula $1$
\end_inset

 to 
\begin_inset Formula $k$
\end_inset

; this could 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
latin{e.g.}
\end_layout

\end_inset

 be pathways or the presence of a mutation) and the 
\begin_inset Formula $log_{10}$
\end_inset

 of the 
\begin_inset Formula $IC50$
\end_inset

 in micro-molar as the response variable (
\begin_inset Formula $D_{i}$
\end_inset

, where 
\begin_inset Formula $i$
\end_inset

 is the drug index).
 I regressed out the contribution of individual tissues by including it
 as a covariate (
\begin_inset Formula $T$
\end_inset

) in the fit.
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
D_{i}\sim T+S_{j}\qquad\forall i\in drugs\;\forall j\in phenotypes
\]

\end_inset


\end_layout

\begin_layout Standard
In other words, for each drug 
\begin_inset Formula $D_{i}$
\end_inset

, I fit the following model for all cell lines 
\begin_inset Formula $c$
\end_inset

 in the GDSC panel.
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
\begin{array}{c}
D_{i}^{c_{1}}\\
D_{i}^{c_{2}}\\
D_{i}^{c_{3}}\\
\vdots
\end{array}\sim\begin{array}{c}
T^{c_{1}}\\
T^{c_{2}}\\
T^{c_{3}}\\
\vdots
\end{array}+\begin{array}{c}
S_{j}^{c_{1}}\\
S_{j}^{c_{2}}\\
S_{j}^{c_{3}}\\
\vdots
\end{array}
\]

\end_inset


\end_layout

\begin_layout Standard
I performed this association between every drug and all gene set scores,
 yielding an effect size (how many units of drug response changed per unit
 of enrichment score) and p-value for each pair.
 I corrected the p-values for each pair using the False Discovery Rate (FDR)
 
\begin_inset CommandInset citation
LatexCommand cite
key "Benjamini1995-rj"

\end_inset

.
 In addition, I performed these associations using each tissue separately:
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
D_{i}\sim S_{j}\qquad\forall i\in drugs\;\forall j\in phenotypes\mid T^{c}=t;\quad\forall t\in tissues
\]

\end_inset


\end_layout

\begin_layout Standard
In this case, I only include cell lines 
\begin_inset Formula $c$
\end_inset

 whose tissue 
\begin_inset Formula $T$
\end_inset

equals 
\begin_inset Formula $t$
\end_inset

 and build models for each tissue separately.
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
\begin{array}{c}
D_{i}^{c_{1}}\\
D_{i}^{c_{2}}\\
D_{i}^{c_{3}}\\
\vdots
\end{array}\sim\begin{array}{c}
S_{j}^{c_{1}}\\
S_{j}^{c_{2}}\\
S_{j}^{c_{3}}\\
\vdots
\end{array}\qquad
\]

\end_inset


\end_layout

\begin_layout Section
Cell Line Drug Response
\end_layout

\begin_layout Subsection
Associations with Mutations
\end_layout

\begin_layout Standard
Associations between drug response and mutated genes have already been published
 with the 2012 and 2016 versions of the GDSC screening 
\begin_inset CommandInset citation
LatexCommand cite
key "Garnett2012-dk,Iorio2016-gh"

\end_inset

.
 I reproduce them here in order to ensure that the associations I obtain
 are the same as the ones previously published.
 P-values vary slightly between the two because the cell lines included
 in this study are not exactly the same as in the original article.
 But the overall results (volcano plot in figure 
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Mutation-pancan"

\end_inset

 and associations in appendix 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
ref{sub:ch2-mut}
\end_layout

\end_inset

) very much agree: the strongest hit in both cases is that 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
gene{TP53}
\end_layout

\end_inset

 mutations correlate with resistance to Nutlin-3a, drugs that specifically
 target mutant 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
gene{BRAF}
\end_layout

\end_inset

 require such a mutation to be effective (Dabrafenib, PLX4720), and MEK
 inhibitors work better with mutations in 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
gene{KRAS}
\end_layout

\end_inset

 or 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
gene{NRAS}
\end_layout

\end_inset

.
\begin_inset Foot
status open

\begin_layout Plain Layout
The pan-cancer volcano plot has been removed in the published version, but
 is available here: http://www.cancerrxgene.org/gdsc1000
\end_layout

\end_inset


\end_layout

\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open

\begin_layout Plain Layout
\align center
\begin_inset Graphics
	filename figures/gdsc_mut.pdf
	lyxscale 50
	width 80col%

\end_inset


\end_layout

\begin_layout Plain Layout
\begin_inset Caption Standard

\begin_layout Plain Layout
\begin_inset Argument 1
status collapsed

\begin_layout Plain Layout
Volcano plot of associations between driver mutations and drug response
\end_layout

\end_inset

Volcano plot of associations between driver mutations and drug response.
 Effect size is fold changes between cell lines harbouring a mutated vs.
 a wild-type copy on the horizontal axis, FDR-adjusted p-values on the vertical
 axis.
\begin_inset CommandInset label
LatexCommand label
name "fig:Mutation-pancan"

\end_inset


\end_layout

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Subsection
Associations with Gene Ontology categories
\end_layout

\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open

\begin_layout Plain Layout
\align center
\begin_inset Graphics
	filename figures/2_go_all.pdf
	lyxscale 50
	width 100text%

\end_inset


\end_layout

\begin_layout Plain Layout
\begin_inset Caption Standard

\begin_layout Plain Layout
\begin_inset Argument 1
status collapsed

\begin_layout Plain Layout
Volcano plot of associations between expression of Gene Ontology categories
 and drug response
\end_layout

\end_inset

Volcano plot of associations between expression of Gene Ontology categories
 and drug response.
 Effect size is standard deviations of the score on the horizontal axis,
 FDR-adjusted p-values on the vertical axis.
\begin_inset CommandInset label
LatexCommand label
name "fig:GO-PanCan-unfiltered"

\end_inset


\end_layout

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Standard
The most significant associations for Gene Ontology 
\begin_inset CommandInset citation
LatexCommand cite
key "Croft2011-po"

\end_inset

 are much less clear than the ones for mutations.
 I used all categories for 
\begin_inset Quotes eld
\end_inset

biological processes
\begin_inset Quotes erd
\end_inset

 and 
\begin_inset Quotes eld
\end_inset

molecular function
\begin_inset Quotes erd
\end_inset

 between 5 and 500 genes to calculate gene set scores using GSVA.
 The results for their correlations with drug response are shown in figure
 
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:GO-PanCan-unfiltered"

\end_inset

 (associations in appendix 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
ref{sub:ch2-go}
\end_layout

\end_inset

).
\end_layout

\begin_layout Standard
Among the the top drugs, there are MEK inhibitors (Trametinib, RDEA119),
 p53-stabiliser Nutlin-3a, and the multi-kinase inhibitor WZ3105.
 The biological processes that they are involved in have no obvious connection
 with their mechanism of action: While the 
\begin_inset Quotes eld
\end_inset

extrinsic apoptotic signaling pathway via death domain receptors
\begin_inset Quotes erd
\end_inset

 and 
\begin_inset Quotes eld
\end_inset

cellular response to UV-C
\begin_inset Quotes erd
\end_inset

 could somehow be linked to Nutlin-3a via p53-mediated apoptosis and DNA
 damage respectively, there is no obvious connection between glycoprotein
 binding, protease binding, the ruffle membrane, or hemidesmosomes and MEK
 inhibitors.
 Similarly, 
\begin_inset Quotes eld
\end_inset

xenobiotic metabolic process
\begin_inset Quotes erd
\end_inset

 gives a hint that YM155 may be inactivated by modification of the drug,
 but it does not tell us anything about the mechanism of the drug (it binds
 the promoter of Survivin, suppressing its expression
\begin_inset Foot
status open

\begin_layout Plain Layout
https://www.caymanchem.com/product/11490
\end_layout

\end_inset

) or its possible indications.
 Overall, there are so many significant associations that it is necessary
 to select interesting categories either before or after computing those
 in order to be able to interpret them.
\end_layout

\begin_layout Standard
\begin_inset Note Note
status open

\begin_layout Plain Layout
would be interesting to know where the 
\begin_inset Quotes eld
\end_inset

signaling
\begin_inset Quotes erd
\end_inset

/onc.
 addiction categories are! (same as speed2)
\end_layout

\end_inset


\end_layout

\begin_layout Subsection
Associations with Reactome pathways
\end_layout

\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open

\begin_layout Plain Layout
\align center
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
centerline{
\end_layout

\end_inset


\begin_inset Graphics
	filename figures/2_reactome_all.pdf
	lyxscale 50
	width 80page%

\end_inset


\begin_inset ERT
status open

\begin_layout Plain Layout

}
\end_layout

\end_inset


\end_layout

\begin_layout Plain Layout
\begin_inset Caption Standard

\begin_layout Plain Layout
\begin_inset Argument 1
status collapsed

\begin_layout Plain Layout
Volcano plot of associations between expression of Reactome pathways and
 drug response
\end_layout

\end_inset

Volcano plot of associations between expression of Reactome pathways and
 drug response.
 Effect size is standard deviations of the score on the horizontal axis,
 FDR-adjusted p-values on the vertical axis.
\begin_inset CommandInset label
LatexCommand label
name "fig:Reactome-PanCancer-unfiltered"

\end_inset


\end_layout

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Standard
Compared to Gene Ontology, the drug associations with Reactome pathway enrichmen
t using GSVA are maybe even harder to interpret (figure 
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Reactome-PanCancer-unfiltered"

\end_inset

 and associations in appendix 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
ref{sub:ch2-reactome}
\end_layout

\end_inset

).
 The MEK inhibitors Trametinib and RDEA119 associate most strongly with
 fibrin clot dissolution and laminin interactions (a fibrous protein present
 in the basal lamina of the epithelia).
 Gemcitabine and Etoposide are more effective if Influenza-related pathways
 are expressed, possibly hinting at involvement of the DNA replication machinery
 in promoting sensitivity to a Topoisomerase inhibitor and nucleoside analogue,
 respectively.
 Resistance to WZ3105 is again associated with a process acting on the drug,
 but this time it is export rather than modification.
 There is no obvious link between Bleomycin (which induces DNA double strand
 breaks) and the 
\begin_inset Quotes eld
\end_inset

Neurotransmitter Release Cycle
\begin_inset Quotes erd
\end_inset

.
 Again, there are so many significant associations that we need to limit
 the pathways in order to make sense of them.
\end_layout

\begin_layout Section
Pathway-responsive genes: The SPEED Platform
\end_layout

\begin_layout Standard
\begin_inset CommandInset label
LatexCommand label
name "sec:Pathway-responsive-genes-SPEED"

\end_inset

Another possibility is to start off with fewer gene sets that we know we
 are interested in.
 In the case of cancer signalling and drug response, these could be the
 signalling pathways that we know are involved.
\end_layout

\begin_layout Standard
The original SPEED platform 
\begin_inset CommandInset citation
LatexCommand citep
key "Parikh2010-uj"

\end_inset

 consists of signatures of 11 pathways, derived and comprised of the genes
 responsive to a total of 215 experiments (the perturbations between a control
 and a perturbed condition).
 They use a consensus gene signature across multiple experiments, perturbing
 agents, and other conditions to arrive at a gene list that corresponds
 to pathway activation in a wide range of conditions.
 These consensus signatures of pathway perturbations are distinct from the
 expression level of pathway members that I described above, as they are
 a downstream readout and not the expression status of the signalling molecules.
\end_layout

\begin_layout Subsection
Separability-optimised Gene Sets
\end_layout

\begin_layout Standard
The authors of the original SPEED publication used four parameters to generate
 gene lists from their input experiments:
\end_layout

\begin_layout Itemize
Z-score cutoff: the top n% of upregulated genes
\end_layout

\begin_layout Itemize
Total expression cutoff: the top m% of genes considering their basal expression
 in each experiment
\end_layout

\begin_layout Itemize
Experiment overlap: the percentage of experiments for which the other two
 conditions must be met
\end_layout

\begin_layout Itemize
Uniqueness: whether only genes should be returned that were unique to the
 stimulation of a specific pathway
\end_layout

\begin_layout Standard
With the SQLite database
\begin_inset Foot
status open

\begin_layout Plain Layout
\begin_inset CommandInset href
LatexCommand href
name "http://speed.sys-bio.net/SPEED_db.zip"
target "http://speed.sys-bio.net/SPEED_db.zip"

\end_inset


\end_layout

\end_inset

 and Python query tools the authors provided, I extracted gene lists using
 the above parameters.
 First, I used the default parameters in their implementation, which was
 to include all genes that were top 5% of up-regulated genes by z-score,
 overall top 50% of expressed genes, in at least 20% of the experiments
 per pathway, and disregarding whether the gene was in any other pathway
 or not.
 Using these default parameters, I obtained scores that were highly correlated
 between the different pathways, as shown in figure 
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:speed1_cor"

\end_inset

 (left).
\end_layout

\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open

\begin_layout Plain Layout
\align center
\begin_inset Graphics
	filename figures/speed1_correlation.pdf
	lyxscale 50
	width 100text%

\end_inset


\end_layout

\begin_layout Plain Layout
\begin_inset Caption Standard

\begin_layout Plain Layout
\begin_inset Argument 1
status collapsed

\begin_layout Plain Layout
Correlation plots for GSEA scores per pathway across cell lines for the
 original and optimised SPEED
\end_layout

\end_inset

Correlation plots for GSEA scores per pathway across cell lines for the
 original SPEED lists (left) and my separation-optimized version (right).
\begin_inset CommandInset label
LatexCommand label
name "fig:speed1_cor"

\end_inset


\end_layout

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Standard
To counteract this problem, I extracted gene lists for different combinations
 of the four parameters:
\end_layout

\begin_layout Itemize
Z-score: 0.25, 0.5, 0.75, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5,
 8, 9, 10, 11, 12, 15, 20, 25
\end_layout

\begin_layout Itemize
Total expression: 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
\end_layout

\begin_layout Itemize
Overlap: 5, 10, 20, 30, 40, 50, 60, 70, 80
\end_layout

\begin_layout Itemize
Uniqueness: True or False
\end_layout

\begin_layout Standard
For each combination, I optimised the order obtained by GSEA scores between
 control and stimulated experiments.
 For each combination of parameters, 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
latin{i.e.}
\end_layout

\end_inset

 for each of the 4950 gene lists, I calculated the raw enrichment score
 for each pathway and cell line, yielding a total of 55 million enrichment
 scores (as the ordering is non-parametric, the raw GSEA score is good to
 use here while being quicker to compute).
 I then went on to, for each pathway, construct a precision-recall curve
 that quantified how well the GSEA scores were able assign lower pathway
 activation scores to the control arrays than the perturbed arrays (where,
 in the case of an inhibition I performed GSEA using negative z-scores).
\end_layout

\begin_layout Standard
The set of control arrays comprised all the un-stimulated arrays in the
 database, and the stimulated set of all unperturbed arrays in the database
 where a certain pathway was perturbed.
 I used as a measure of performance the area under the precision-recall
 curve.
 A perfect ordering (that is, all control arrays and then all stimulated
 arrays) corresponded to a precision-recall AUC (prAUC) of 1, while a random
 ordering would respond to a prAUC about 0.5.
 
\end_layout

\begin_layout Standard
By using not only the matched control arrays to the perturbed arrays but
 all arrays present in the data set, I allow in my resulting signature the
 cross-activation of pathways while minimizing the fit to random differences
 in gene expression by different initial conditions.
\end_layout

\begin_layout Standard
I split the data set (both control and perturbed arrays) in five different
 subsets, where four of the five were the designated training set and the
 fifth the test set.
 I calculated the prAUC for all the parameters described in the previous
 section, and chose the set with the highest score.
 I then went on to the part that was not used in training and quantified
 the prAUC there as well.
 I performed the whole process five times, with another subset functioning
 as the test set each time.
 I then chose the set that the highest prAUC in the test set, or in the
 training set if it was lower than in the test set.
 I did not simply select the highest highest AUC in the test set because
 I would not want to select a model that performed badly on the training
 set to begin with.
 This selection procedure can be represented using the following formula:
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
selected=max^{all\,runs}(min^{per\,run}(prAUC^{train},prAUC^{test}))
\]

\end_inset


\end_layout

\begin_layout Standard
For the optimised lists, I observed a much lower overall correlation of
 the pathway scores (figure 
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:speed1_cor"

\end_inset

, right).
 The values for the different gene list cutoff parameters that I selected
 after optimisation are listed in table 
\begin_inset CommandInset ref
LatexCommand ref
reference "tab:Parameter-selection-overview"

\end_inset

, including the number of genes in the signature and prAUC (training and
 test set) compared between the original cutoffs used in the query tool
 and my selection.
 For all the pathways, the optimisation of parameters yielded a better separatio
n of control- vs.
 perturbed arrays.
\end_layout

\begin_layout Standard
\begin_inset Float table
wide false
sideways false
status open

\begin_layout Plain Layout
\begin_inset Caption Standard

\begin_layout Plain Layout
\begin_inset Argument 1
status collapsed

\begin_layout Plain Layout
Parameter selection overview for each pathway
\end_layout

\end_inset

Parameter selection overview for each pathway.
 Z: Z-value cutoff (0.25-20%), O: array overlap (20-80%); E: overall expression
 cutoff (5-100%), U: unique (+) or all (-) genes considered.
 Gene lists had to be between 50 and 250 genes to be considered.
 Values for optimized and original lists are shown.
 Precision-recall AUC shown for training, cross-training for optimized lists,
 and whole data set for optimized lists vs.
 the one obtained by using the original lists.
\begin_inset CommandInset label
LatexCommand label
name "tab:Parameter-selection-overview"

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Plain Layout
\align center
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
centerline{
\end_layout

\end_inset


\begin_inset Tabular
<lyxtabular version="3" rows="13" columns="11">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell multirow="3" alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
pathway
\end_layout

\end_inset
</cell>
<cell multicolumn="1" alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cutoffs used
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="center" valignment="top" topline="true" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="center" valignment="top" topline="true" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="center" valignment="top" topline="true" bottomline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="1" alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
# genes in list
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="1" alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
precision-recall AUC
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell multicolumn="1" alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
all arrays
\end_layout

\end_inset
</cell>
<cell multicolumn="2" alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell multirow="4" alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Z
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
O
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
E
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
U
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
opt
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
original
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
train
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cv
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
opt
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" bottomline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
original
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
H2O2
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
50
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
191
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
60
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.99
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.00
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\series bold
0.99
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.66
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
IL-1
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
60
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
75
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
141
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.91
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.99
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\series bold
0.93
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.85
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
JAK-STAT
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
70
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
162
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
114
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.78
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.87
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\series bold
0.81
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.68
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MAPK_only
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
70
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
10
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
65
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
559
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.94
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.97
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\series bold
0.95
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.46
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MAPK_PI3K
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
15
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
5
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
171
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
118
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.82
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.89
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\series bold
0.84
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.63
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
TLR
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.5
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
40
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
50
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
78
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
181
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.88
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.91
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\series bold
0.89
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.81
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
PI3K_only
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
15
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
30
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
227
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
67
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.80
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.97
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\series bold
0.83
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.49
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
TGFB
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
60
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
119
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
142
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.78
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.88
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\series bold
0.80
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.68
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
TNFa
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.5
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
50
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
70
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
56
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
259
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.77
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.99
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\series bold
0.81
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.66
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
VEGF
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
12
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
20
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
30
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
121
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
56
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.92
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.94
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\series bold
0.92
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.84
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Wnt
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
7.5
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
30
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
5
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
195
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
83
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.93
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.94
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\series bold
0.93
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.65
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset ERT
status open

\begin_layout Plain Layout

}
\end_layout

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Subsection
Associations between Pathway Scores and Tissues
\end_layout

\begin_layout Standard
As a control, I compared the inferred pathway activation scores between
 different tissues.
 If the assigned scores per tissue are biologically meaningful, I would
 expect to find well-established literature evidence supporting them.
 An overview heatmap of pathway activation scores is shown in figure 
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:tissue-heatmap"

\end_inset

.
 Relating this to known biology, the scores seem well supported by previously
 known evidence.
 A couple of examples to illustrate this are listed below.
\end_layout

\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open

\begin_layout Plain Layout
\align center
\begin_inset Graphics
	filename figures/4.1_speed_tissue.pdf
	width 80col%

\end_inset


\end_layout

\begin_layout Plain Layout
\begin_inset Caption Standard

\begin_layout Plain Layout
\begin_inset Argument 1
status open

\begin_layout Plain Layout
Heatmap of inferred pathway activation scores for different tissues
\end_layout

\end_inset

Heatmap of inferred pathway activation scores for different tissues.
 Pathways in rows, TCGA tissue labels in columns, relative pathway activation
 indicated by colour.
 Rows and columns are clustered so that similar tissues and pathways are
 shown close to each other, with branch lengths indicating distance.
\begin_inset CommandInset label
LatexCommand label
name "fig:tissue-heatmap"

\end_inset


\end_layout

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Itemize
JAK-STAT signalling is well-known to be upregulated in blood cancer cells
 
\begin_inset CommandInset citation
LatexCommand cite
key "Dutta2013-fg,Vainchenker2012-sv"

\end_inset

, including in particular AML 
\begin_inset CommandInset citation
LatexCommand cite
key "Lee2012-ts,Danial2000-xd"

\end_inset

, CML 
\begin_inset CommandInset citation
LatexCommand cite
key "Danial2000-xd"

\end_inset

, ALL 
\begin_inset CommandInset citation
LatexCommand cite
key "Vainchenker2012-sv"

\end_inset

, and DLBC 
\begin_inset CommandInset citation
LatexCommand cite
key "Gupta2012-zp"

\end_inset

.
 This correlates well with the inferred activity scores for JAK-STAT signalling.
 Similarly, TLR signalling has been shown to be induced by CpG oxydinucleotides
 in B-(DLBC, MM) and dendritic myeloid, but not T-cells 
\begin_inset CommandInset citation
LatexCommand cite
key "Rothenfusser2002-gi"

\end_inset

.
\end_layout

\begin_layout Itemize
High production of reactive oxygen species has been shown to occur in certain
 mesotheliomas 
\begin_inset CommandInset citation
LatexCommand cite
key "Kahlos1999-py"

\end_inset

, and gliomas 
\begin_inset CommandInset citation
LatexCommand cite
key "Drukala2010-kj"

\end_inset

.
\end_layout

\begin_layout Itemize
TGFB 
\begin_inset CommandInset citation
LatexCommand cite
key "Yamada1995-yi,Kjellman2000-om"

\end_inset

 and VEGF 
\begin_inset CommandInset citation
LatexCommand cite
key "Reardon2008-ot"

\end_inset

 expression have been shown to be increased in gliomas, correlating with
 malignancy 
\begin_inset CommandInset citation
LatexCommand cite
key "Kjellman2000-om,Leon1996-zu"

\end_inset

.
 Both were also increased in mesothelioma 
\begin_inset CommandInset citation
LatexCommand cite
key "Kuwahara2001-ru,Aoe2006-vc"

\end_inset

, and VEGF in melanoma 
\begin_inset CommandInset citation
LatexCommand cite
key "Rajabi2012-fj,Gajanin2010-ff"

\end_inset

 as well.
\end_layout

\begin_layout Itemize
MAPK_only/MAPK_PI3K signalling seems to be evenly distributed among non-blood
 cancer tissues.
 We found the highest activity of MAPK and MAPK_PI3K can be found in KIRC
 and BLCA/PAAD, respectively.
 However, the difference was a lot less apparent than for other pathways,
 suggesting that MAPK and EGFR pathway are important for all solid cancers.
\end_layout

\begin_layout Subsection
Pan-Cancer Drug Associations
\end_layout

\begin_layout Standard
I performed a linear regression between the obtained pathway scores and
 the 
\begin_inset Formula $IC_{50}$
\end_inset

s for all cell lines and pathways while correcting for tissue labels (used
 as covariate; details section 
\begin_inset CommandInset ref
LatexCommand ref
reference "subsec:Drug-associations"

\end_inset

).
 I adjusted p-values by controlling the false discovery rate and visualized
 the result as volcano plots and individual fits.
\end_layout

\begin_layout Standard
Linear associations of inferred pathway activity within all tissues and
 drug response are shown in figure 
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:pancan-volcano"

\end_inset

 (associations in appendix 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
ref{sub:ch2-speed}
\end_layout

\end_inset

).
 Negative regression slopes (left side of the graph, green) indicate sensitivity
 markers, 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
latin{i.e.}
\end_layout

\end_inset

 a higher pathway activation score correlates with a lower 
\begin_inset Formula $IC_{50}$
\end_inset

.
 Positive regression slopes (right side, red) indicate resistance markers,
 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
latin{i.e.}
\end_layout

\end_inset

 higher activation scores correlate with higher 
\begin_inset Formula $IC_{50}$
\end_inset

s.
\end_layout

\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open

\begin_layout Plain Layout
\align center
\begin_inset Graphics
	filename figures/speed1_volcano.pdf
	lyxscale 50
	width 100col%

\end_inset


\end_layout

\begin_layout Plain Layout
\begin_inset Caption Standard

\begin_layout Plain Layout
\begin_inset Argument 1
status collapsed

\begin_layout Plain Layout
Volcano plot of linear associations of inferred pathway activity within
 all tissues and drug response
\end_layout

\end_inset

Volcano plot of linear associations of inferred pathway activity within
 all tissues and drug response.
 Tissue of origin used as a covariate in the regression.
 P-values FDR-adjusted.
 Negative regression slopes (left side of the graph, green) indicate sensitivity
 markers, 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
latin{i.e.}
\end_layout

\end_inset

 a higher pathway activation score correlates with a lower 
\begin_inset Formula $IC_{50}$
\end_inset

.
 Positive regression slopes (right side, red) indicate resistance markers,
 
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
latin{i.e.}
\end_layout

\end_inset

 higher activation scores correlate with higher 
\begin_inset Formula $IC_{50}$
\end_inset

s.
\begin_inset CommandInset label
LatexCommand label
name "fig:pancan-volcano"

\end_inset


\end_layout

\end_inset


\end_layout

\end_inset


\end_layout

\begin_layout Standard
Sensitivity markers:
\end_layout

\begin_layout Itemize
RDEA119 
\begin_inset CommandInset citation
LatexCommand cite
key "Iverson2009-uu"

\end_inset

, PD-0325901 
\begin_inset CommandInset citation
LatexCommand cite
key "Ciuffreda2009-iz"

\end_inset

, and CI-1040 
\begin_inset CommandInset citation
LatexCommand cite
key "Allen2003-pz"

\end_inset

 are all MEK inhibitors and are thus to be expected to be more effective
 in cell lines where MAPK signalling is more active.
 In fact, the strongest associations are between those drugs and MAPK_PI3K
 signalling.
 However, MAPK and PI3K are difficult to distinguish in expression response
 due to pathway crosstalk 
\begin_inset CommandInset citation
LatexCommand cite
key "Parikh2010-uj"

\end_inset

.
\end_layout

\begin_layout Itemize
BIBW2992 and Gefitinib showed higher efficacy with PI3K_only activity.
 As both are are EGFR inhibitors and PI3K is known to cause resistance to
 those 
\begin_inset CommandInset citation
LatexCommand cite
key "Jeannot2014-av"

\end_inset

, this result is surprising because the association is stronger than with
 MAPK_only or MAPK_PI3K.
 It may be because the authors of the SPEED platform chose to include MEK
 inhibition as condition for PI3K activation.
\end_layout

\begin_layout Itemize
I found sensitivity correlating with Wnt activity for the drugs Etoposide,
 QS11, and GSK-650394.
 QS11 modulates ARF activity and 
\begin_inset Formula $β$
\end_inset

-catenin localisation 
\begin_inset CommandInset citation
LatexCommand cite
key "Zhang2007-fu"

\end_inset

, which may offer a treatment strategy for Wnt-driven tumours.
 GSK-650394 targets SGK1, which is activated by Wnt/
\begin_inset Formula $β$
\end_inset

-catenin signalling and has been shown to inhibit ROS-induced apoptosis
 in liver cells 
\begin_inset CommandInset citation
LatexCommand cite
key "Tao2013-sj"

\end_inset

.
 Etoposide induces DNA damage and senescence, where this process may be
 inhibited by negative feedback by SFRP1 
\begin_inset CommandInset citation
LatexCommand cite
key "Elzi2012-mv"

\end_inset

 due to Wnt signalling.
\end_layout

\begin_layout Itemize
FTI-277 is more effective when reactive oxygen response (H2O2) is active.
 Farnesyl transferase inhibitors are known to induce DNA damage via ROS
 
\begin_inset CommandInset citation
LatexCommand cite
key "Pan2005-on"

\end_inset

, which may cause growth arrest or apoptosis in cells that already suffer
 ROS damage.
\end_layout

\begin_layout Itemize
AZD6482 is a PI3K inhibitor 
\begin_inset CommandInset citation
LatexCommand cite
key "Nylander2012-cl"

\end_inset

, for which cells show increased sensitivity in our study when TNFa signalling
 is active.
 While the latter can be both oncogenic and tumour-suppressive 
\begin_inset CommandInset citation
LatexCommand cite
key "Pikarsky2006-ba"

\end_inset

, it has been shown that PI3K activation is necessary for NFKb-mediated
 cell survival in DLBC 
\begin_inset CommandInset citation
LatexCommand cite
key "Kloo2011-ut"

\end_inset

 and the combination of PI3K inhibition and active TNFa is known to cause
 apoptosis in vitiligous keratinocytes 
\begin_inset CommandInset citation
LatexCommand cite
key "Kim2007-db"

\end_inset

.
\end_layout

\begin_layout Itemize
TNF
\begin_inset Formula $\alpha$
\end_inset

 
\begin_inset CommandInset citation
LatexCommand cite
key "Sleijfer1998-vs"

\end_inset

 signalling has been shown to be increased after Bleomycin treatment, thereby
 mediating cytotoxicity.
 It can be hypothesised that if this pathway is active in cell lines, they
 are more likely to be affected by this.
\end_layout

\begin_layout Standard
Resistance markers:
\end_layout

\begin_layout Itemize
EHT 1864 is a Rac-family GTPase inhibitor 
\begin_inset CommandInset citation
LatexCommand cite
key "Shutes2007-mi"

\end_inset

, and MAPK_PI3K signalling is associated with resistance to this drug.
 Rac1 is known to be involved in MAPK signalling specifically for cancer
 development 
\begin_inset CommandInset citation
LatexCommand cite
key "Khosravi-Far1995-mk"

\end_inset

.
 Hence, cells with a higher MAPK activity may be less susceptible to Rac1
 inhibition.
\end_layout

\begin_layout Section
Discussion
\end_layout

\begin_layout Standard
\begin_inset Note Note
status open

\begin_layout Plain Layout
using the same algorithm (GSEA) SPEED is *a lot* better
\end_layout

\begin_layout Plain Layout
–> want to explore this further
\end_layout

\begin_layout Plain Layout
(also make point about biasing gene sets)
\end_layout

\begin_layout Plain Layout
bottom up vs top-down approach: either select sets before or after
\end_layout

\end_inset


\end_layout

\begin_layout Subsection
Cell line drug response
\end_layout

\begin_layout Standard
The importance of mutations, especially when they are drivers, and their
 role in a cell line's response to the different drugs is well established.
 With the new GDSC release 
\begin_inset CommandInset citation
LatexCommand cite
key "Iorio2016-gh"

\end_inset

 the authors uncovered previously unknown links that may ultimately lead
 to new clinical indications, or prioritise the development of certain drugs
 over others.
\end_layout

\begin_layout Standard
In contrast to this, the biological meaning of the top associations between
 gene set or pathway scores and drug response I obtained is doubtful at
 best.
 What does it mean to have an association of the 
\begin_inset Quotes eld
\end_inset

T cell receptor
\begin_inset Quotes erd
\end_inset

 in Head and Neck Squamous Carcinoma (HNSC)? There are no immune cells involved
 in the culturing of HNSC cell lines, for example.
\end_layout

\begin_layout Standard
Hence, we have two options: we either need to link seemingly unrelated gene
 sets back to the process that actually caused a difference in drug response
 by looking for evidence that may support it, or we need to pre-select gene
 sets that may be relevant for drug response.
\end_layout

\begin_layout Standard
For the first case, we can not easily follow the chain of causality between
 a biological process that mediates differential drug response and its downstrea
m readout as change of gene expression.
 The second alternative does not solve this, but leads to a much higher
 probability to catch causative gene sets when we already select candidates
 of exactly this by prior knowledge.
\end_layout

\begin_layout Subsection
Pathway-responsive genes
\end_layout

\begin_layout Standard
Using pathway-responsive genes instead of pathway expression to infer signalling
 activity makes a lot of sense: we look at the footprint of the actual signallin
g activity (the expression changes downstream of a signalling pathway) and
 not the potential mediators (protein kinases, among others) by means of
 mRNA expression level that is a lot further removed from the actual signalling
 going on in a cell.
 However, the SPEED platform has some issues in terms of the level in which
 its pathways correlated to one another: If we did calculate drug associations
 with the original scores, hits would be groups where a given drug is correlated
 with all pathways in about the same extent.
 This is likely due to how they only evaluated gene lists by their overlap
 with Gene Ontology categories, and not how well its enrichment scores are
 able to differentiate between microarrays where a given pathway perturbation
 is present and those where it is absent.
 The original version was hence not very useful for my purposes.
\end_layout

\begin_layout Standard
There is a need for scores that are more potent in distinguishing the gene
 expression footprints from one pathway to another.
 A way to do this in the existing platform is what I did: no longer require
 that the signature genes from all pathways are obtained using the same
 cutoffs of the author's 4 parameters, but instead optimise them in way
 so that they are best able to tell apart the pathways one from another.
 Looking at the correlations between the original cutoffs and the ones I
 suggest after the cross-validated optimisation, I know that this worked
 at least on those terms.
\end_layout

\begin_layout Standard
The next question to answer is whether we get more meaningful drug associations.
 Looking at the volcano plot, associations are, for one, not between a drug
 and all pathways, and for the other much better supported when searching
 for literature corresponding to the top hits.
\end_layout

\begin_layout Standard
However, there is still a number of potential issues with the current model,
 many of which can be traced back to using the original platform:
\end_layout

\begin_layout Itemize
The authors use raw microarray data as well as processed data.
 For the processed data, we have no idea of what the original authors did
 with their data to arrive at the expression levels they report.
 Sometimes they report this in their respective experimental or data analysis
 procedures, but often they don't.
 There is a potential of a variety of biases that we can not control for.
\end_layout

\begin_layout Itemize
The platform as it currently stands needs four parameters to be specified,
 each of which corresponds to a somewhat arbitrary cutoff.
 Even if we ignore this, they limit the number of signature genes in a way
 that does not support down-regulated genes at all (as the z-scores are
 filtered by top percentile only).
\end_layout

\begin_layout Itemize
We bias the selection of genes to the ones most commonly found in microarrays.
 If a gene is highly upregulated but not present in arrays and thus failing
 the overlap cutoff, we would lose it.
\end_layout

\begin_layout Itemize
MAPK inhibition was in the curated set of PI3K activators.
 This is an error in curation and could explain many PI3K associations.
\end_layout

\begin_layout Standard
Hence I argue that, while the optimisation of parameters yielded vast improvemen
ts in terms of correlatedness of scores and resulting drug associations,
 it still has enough drawbacks to suggest that a different approach keeping
 the overall idea of using pathway-responsive genes as signature for pathway
 activity may be worth exploring.
 I describe the approach I developed in the next chapter.
\end_layout

\begin_layout Standard
\begin_inset CommandInset bibtex
LatexCommand bibtex
bibfiles "references"
options "plain"

\end_inset


\end_layout

\end_body
\end_document