This repository has been archived by the owner on Jul 21, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-quantify-sample-exposure.Rmd
142 lines (95 loc) · 5.37 KB
/
03-quantify-sample-exposure.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# Signature Fit: Sample Signature Exposure Quantification and Analysis {#sigfit}
Besides *de novo* signature discovery shown in previous chapters, another common task is that
you have gotten some reference signatures (either from known database like COSMIC or *de novo* discovery step), you want to know how these signatures contribute (fit) in a sample. That's the target of `sig_fit()`.
`sig_fit()` uses multiple methods to compute exposure of pre-defined signatures from the spectrum of a (can be more) sample. Use `?sig_fit` see more detail.
To show how this function works, we use a sample with maximum mutation counts as example data.
```{r}
i <- which.max(apply(mt_tally$nmf_matrix, 1, sum))
example_mat <- mt_tally$nmf_matrix[i, , drop = FALSE] %>% t()
```
```{r}
head(example_mat)
```
## Fit Signatures from reference databases
For SBS signatures, users may want to directly use reference signatures from COSMIC database.
```{r}
sig_fit(example_mat, sig_index = 1:30)
```
> At default, COSMIC v2 signature database with 30 reference signatures is used (i.e. `sig_db = "legacy"`). Set `sig_db = "SBS"` for COSMIC v3 signature database.
That's it!
You can set `type = "relative"` for getting relative exposure.
```{r}
sig_fit(example_mat, sig_index = 1:30, type = "relative")
```
For multiple samples, you can return a `data.table`, it can be easier to integrate with other information in R.
```{r}
sig_fit(t(mt_tally$nmf_matrix[1:5, ]), sig_index = 1:30, return_class = "data.table", rel_threshold = 0.05)
```
When you set multiple signatures, we recommend setting `rel_threshold` option, which will set exposure of a signature to `0` if its relative exposure in a sample less than the `rel_threshold`.
## Fit Custom Signatures
We have already determined the SBS signatures before. Here we can set them to `sig` option.
```{r}
sig_fit(example_mat, sig = mt_sig2)
```
## Performance Comparison
Now that we can use `sig_fit` for getting optimal exposures, we can compare the RSS between **raw matrix** and the **reconstructed matrix** either by NMF and `sig_fit()`.
i.e.
$$
RSS = \sum(\hat H - H)^2
$$
```{r}
## Exposure got from NMF
sum((apply(mt_sig2$Signature, 2, function(x) x/sum(x)) %*% mt_sig2$Exposure - t(mt_tally$nmf_matrix))^2)
```
```{r}
## Exposure optimized by sig_fit
H_estimate = apply(mt_sig2$Signature, 2, function(x) x/sum(x)) %*% sig_fit(t(mt_tally$nmf_matrix), sig = mt_sig2)
H_estimate = apply(H_estimate, 2, function(x) ifelse(is.nan(x), 0, x))
H_real = t(mt_tally$nmf_matrix)
sum((H_estimate - H_real)^2)
```
## Estimate Exposure Stability by Bootstrap
This feature is based on `sig_fit()`, it uses the resampling data of original input and runs `sig_fit()` multiple times to estimate the exposure. Bootstrap replicates >= 100 is recommended, here I just use 10 times for illustration.
```{r}
bt_result <- sig_fit_bootstrap_batch(example_mat, sig = mt_sig2, n = 10)
bt_result
```
You can plot the result very easily with functions provided by **sigminer**.
```{r}
show_sig_bootstrap_exposure(bt_result, sample = "TCGA-A8-A09G-01A-21W-A019-09")
show_sig_bootstrap_error(bt_result, sample = "TCGA-A8-A09G-01A-21W-A019-09")
show_sig_bootstrap_stability(bt_result)
```
P values have been calculated under specified relative exposure cutoff (0.05 at default).
The result indicates `Sig3` is very stable.
# Subtype Prediction {#subtyping}
To expand the power of signatures to clinical application, based on **signature discovery** and **signature fitting** workflows, we can go further build neutral network model prediction model with **keras**. This feature is currently experimental and implemented in **sigminer**'s child package [**sigminer.prediction**](https://github.com/ShixiangWang/sigminer.prediction).
# Sigflow Pipeline {#sigflow}
[Sigflow](https://github.com/ShixiangWang/sigflow) provides useful mutational signature analysis workflows based on R package **sigminer**. It can auto-extract mutational signatures, fit mutation data to COSMIC reference signatures (SBS/DBS/INDEL) and run bootstrapping analysis for signature fitting.
For full documentation, please read Sigflow [README](https://github.com/ShixiangWang/sigflow).
## Cancer type specific signature index database
Signature fitting analysis may befit from directly specifying known signatures identified in a cancer type. We collect such information and provide the following data tables.
```{r}
db1 <- system.file("extdata", "cosmic2_record_by_cancer.rds", package = "sigminer")
db1 <- readRDS(db1)
colnames(db1) <- c("Cancer type", "Signature Index")
db2 <- system.file("extdata", "signature_record_by_cancer.rds", package = "sigminer")
db2 <- readRDS(db2)
colnames(db2) <- c("Cancer type", "Cohort", "Sequencing strategy",
"SBS signature index",
"DBS signature index",
"ID signature index")
```
```{r}
DT::datatable(db1, caption = "Data source: https://cancer.sanger.ac.uk/signatures_v2/matrix.png")
```
> Note, set `sig_db` to 'legacy' (the default) in `sig_fit()` family functions.
```{r}
DT::datatable(db2[, c(1:3, 4)], caption = "Data source: Alexandrov et al. https://www.nature.com/articles/s41586-020-1943-3")
```
```{r}
DT::datatable(db2[, c(1:3, 5)], caption = "Data source: Alexandrov et al. https://www.nature.com/articles/s41586-020-1943-3")
```
```{r}
DT::datatable(db2[, c(1:3, 6)], caption = "Data source: Alexandrov et al. https://www.nature.com/articles/s41586-020-1943-3")
```