-
Notifications
You must be signed in to change notification settings - Fork 4
/
models_07_including_noise.Rmd
323 lines (234 loc) · 13.4 KB
/
models_07_including_noise.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
---
title: "Model Fitting: Models with Noisy Processes"
output: html_notebook
---
```{r setup, cache=FALSE, include=FALSE}
library(knitr)
#opts_chunk$set(comment='', eval=FALSE)
```
Some models are specified with noise terms. Sometimes these are called error terms, but since we use (reach) errors in our model, I'll call them noise here. Up to now we've been ignoring noise, and in order to fit group data, it doesn't matter much, as the noise is expected to cancel out in the group average. However, when fitting individual data, it might make more sense to to also estimate how noisy some processes in the model are. This is in part hard because we don't have an estimate of variance on every individual trial, since we only have 1 data point along the learning curve. Let's see how this changes the two-rate model. Up to now, we've used this version:
$x^{t} ~= s^{t} + f^{t}$
Here $x^t$ is the reach direction $x$ the model predicts on trial $t$, which is the sum of the slow state $s$ and the fast state $f$ (on the same trial). The slow and fast state are given by:
slow process state: $s^{t} ~= r_s \cdot s^{t-1} - l_s \cdot e^{t-1}$
fast process state: $f^{t} ~= r_f \cdot f^{t-1} - l_f \cdot e^{t-1}$
Here $t-1$ indicates values of variables on the previous trial. Both the slow and fast state retain some fraction ($r_s$ and $r_f$ respectively) of their previous state. They also each compensate for part of the error $e$ on the previous trial ($l_s$ and $l_f$ respectively).
As you know we run this across a sequence of k trials $t_{1:k}$ and then have a perturbation schedule $\mathbf{p}_{1:k}$ which is a vector of values determined by the experiment (in our case with the rotation of the feedback in degrees), and it dictates the error:
$e^t ~= x^t - p^t$
Except when there is an error-clamp trial, which for now will simply set the error to 0. We'll also usually start the model with the two states and the previous error at 0.
The two retention rates and two learning rates are the parameters of the model we've been fitting so far.
The above model is completely deterministic: every time you feed it the same parameter values and perturbation schedule it will generate the exact same set of outputs, but that's perhaps not very realistic. At least, it's not what people do. There are several ways we can think of in which we end up with noisy deviations from this. For example, the estimate of the error might be slightly off one way or another for several reasons, or the execution of the intended movement is off somewhat. For now, we'll consider movement execution noise only. It is the simpler case, as it's the same for all kinds of trials, and all equations remain the same, except the one that determines the system output. This will now be:
$x^{t} ~= s^{t} + f^{t} + \epsilon_{\sigma_m^2}$
This epsilon does not have a fixed value: it will change on every trial. It will be drawn from a known distribution, so we can somewhat control it. We will use the normal distribution with mean 0 and some standard deviation ($\sigma_m^2$). (Make sure not to confuse the reach error $e$ with the execution noise $\epsilon$.)
Since the error on the next trial $e^{t+1}$ depends on $x^t$ (as well as the pertubation) the noise we happen to add to the executed movement $x^t$ at every trial will affect the next trial to some degree, and this will percolate through the rest of the sequence as well. This may fundamentally change the way the model works. Unfortunately, it also means we can't just run the model once, as the output will be different each and every time. In fact, the spread of the output will depend on that standard deviation and we do want to keep everything realistic, including our estimate of motor execution noise, so it will have to be fit as well. But how?
# First attempt
Here's a first attempt. Where it starts with the model function. Instead of running the model once, we run it a number of times, and this number will be an argument to the function. In the end, we will then get a mean and a standard deviation for all the responses on every trial. This defines a likelihood function, that we will use to compute the likelihood of the data given the model.
Let's begin.
```{r}
twoRateModel <- function(par, schedule, iterations=30) {
# for this implementation we will store what happens in a set of matrices:
slow_mat <- matrix(data=NA, nrow=length(schedule), ncol=iterations)
fast_mat <- matrix(data=NA, nrow=length(schedule), ncol=iterations)
total_mat <- matrix(data=NA, nrow=length(schedule), ncol=iterations)
for (it in c(1:iterations)) {
# these values should be zero at the start of the loop:
Et <- 0 # previous error: none
St <- 0 # state of the slow process: aligned
Ft <- 0 # state of the fast process: aligned
# now we loop through the perturbations in the schedule:
for (t in c(1:length(schedule))) {
# first we calculate what the model does
# this happens before we get visual feedback about potential errors
St <- (par['Rs'] * St) - (par['Ls'] * Et)
Ft <- (par['Rf'] * Ft) - (par['Lf'] * Et)
Xt <- St + Ft + rnorm(1, 0, par['sd'])
# now we calculate what the previous error will be for the next trial:
if (is.na(schedule[t])) {
Et <- 0
} else {
Et <- Xt + schedule[t]
}
# at this point we save the states in our matrices:
slow_mat[t,it] <- St
fast_mat[t,it] <- Ft
total_mat[t,it] <- Xt
}
}
# after all this, we can get the mean and sd for the processes and total output
slow_m <- apply(slow_mat, MARGIN=c(1), FUN=mean)
slow_s <- apply(slow_mat, MARGIN=c(1), FUN=sd)
fast_m <- apply(fast_mat, MARGIN=c(1), FUN=mean)
fast_s <- apply(fast_mat, MARGIN=c(1), FUN=sd)
total_m <- apply(total_mat, MARGIN=c(1), FUN=mean)
total_s <- apply(total_mat, MARGIN=c(1), FUN=sd)
# after we loop through all trials, we return the model output:
return(data.frame(slow_m,slow_s,fast_m,fast_s,total_m,total_s))
}
```
We'll run it, with the winning parameters from a previous model, and we need to get the schedule:
```{r}
# we also add a noise term, that is, we draw noise from a standard normal distribution
# with mean 0 and this standard deviation
par <- c('Ls'=0.07296769, 'Lf'=0.42160549, 'Rs'=0.99857272, 'Rf'=0.68626292, 'sd'=5)
# we'll get the schedule from our example data set:
load('data/tworatedata.rda')
baseline <- function(reachvector,blidx) reachvector - mean(reachvector[blidx], na.rm=TRUE)
tworatedata[,4:ncol(tworatedata)] <- apply(tworatedata[,4:ncol(tworatedata)], FUN=baseline, MARGIN=c(2), blidx=c(17:32))
schedule <- tworatedata$schedule
```
Now we're ready to run the model function:
```{r}
model <- twoRateModel(par=par, schedule=schedule, iterations = 100)
head(model)
```
The first thing we should notice is that on the first trial, the model output is not zero, and consequently, the slow and fast process on the next trial are not zero either.
And we can plot the model:
```{r}
reaches <- apply(tworatedata[4:ncol(tworatedata)], FUN=mean, MARGIN=c(1), na.rm=TRUE)
plot(reaches,type='l',col='#333333',xlab='trial',ylab='reach deviation [deg]',xlim=c(0,165),ylim=c(-35,35),bty='n',ax=F)
lines(c(1,33,33,133,133,145,145),c(0,0,30,30,-30,-30,0),col='#AAAAAA')
lines(c(145,164),c(0,0),col='#AAAAAA',lty=2)
lines(model$slow_m,col='blue')
lines(model$fast_m,col='red')
lines(model$total_m,col='purple')
axis(1,c(1,32,132,144,164),las=2)
axis(2,c(-30,-15,0,15,30))
```
Are the standard deviations of the model's total output not simply the same as the standard deviation execution noise parameter?
```{r}
t.test(model$total_s, mu=5)
```
No, not really. Is that an indication that it makes sense to implement it this way?
Either way, while this is pretty neat, two things are not right yet.
First, this approach doesn't really work on group average data (this is where the non-noisy model is actually fine). It can work to model individual noise though, so we need individual participants. That is easy: we grab one column of the example data, not the average.
Second, this was not fitted. How do we fit this? I'm thinking of optimizing the probability denisty of the data given the mean and standard deviation of the model.
First step, let's get one participant's data:
```{r}
p5 <- tworatedata$p005
```
There are NA's in our data, but that should be OK.
Now we need a new likelihood function:
```{r}
twoRateLikelihood <- function(par, schedule, reaches, checkStability=FALSE, iterations=30) {
lowLikelihood <- 0
Rf = par["Rf"];
Rs = par["Rs"];
Lf = par["Lf"];
Ls = par["Ls"];
# learning and retention rates of the fast and slow process are constrained:
if (Ls > Lf) {
return(lowLikelihood)
}
if (Rs < Rf) {
return(lowLikelihood)
}
# stability constraints:
if (checkStability) {
if ( ( ( (Rf - Lf) * (Rs - Ls) ) - (Lf * Ls)) <= 0 ) {
return(lowLikelihood)
}
p <- Rf - Lf - Rs + Ls
q <- ( p^2 + (4 * Lf * Ls) )
if ( ((Rf - Lf + Rs - Ls) + (q^0.5)) >= 2 ) {
return(lowLikelihood)
}
}
model <- twoRateModel(par, schedule, iterations)
# the likelihood of the data given the model:
return( prod( dnorm((model$total_m - reaches), mean=0, sd=model$total_s)^(1/length(schedule)), na.rm=TRUE ) )
}
```
Let's try this likelihood function on our participant 5, with the parameters from the group:
```{r}
for (x in c(1:5)) {
L <- twoRateLikelihood(
par = par,
schedule = schedule,
reaches = p5,
)
print(L)
}
```
All those likelihoods are in the ballpark range of what we'd expect, with parameters that are not fitted to the particular participant but that do work as they are fitted to the group. The likelihoods are all different, as we're no longer able to get the exact same result on every iteration. So this makes sense.
Now, let's actually fit the model. We will do grid search, but we'll do it in two steps.
- First we'll use a minimum number of iterations (30), and we'll set the noise to a value close to the motor output noise of the participant (this may underestimate the motor noise, but should be ballpark OK).
- In the second step, a bunch of the best ones get another grid search, but now with noise added.
- Third, the best of those will actually be fitted.
Let's get an estimate of motor output noise:
```{r}
sigma <- sd(c(
p5[155:164]-mean(p[155:164], na.rm=TRUE),
p5[1:10]-mean(p5[1:10],na.rm=TRUE)
), na.rm=TRUE)
```
Let's set up the global search grid, and do the first step of grid search:
```{r}
nvals <- 9
parvals <- seq(1/nvals/2,1-(1/nvals/2),1/nvals)
searchgrid <- expand.grid('Ls'=parvals,
'Lf'=parvals,
'Rs'=parvals,
'Rf'=parvals,
'sd'=sigma*1.5)
# evaluate starting positions:
Likelihoods <- apply(searchgrid, FUN=twoRateLikelihood, MARGIN=c(1), schedule=schedule, reaches=reaches, checkStability=TRUE)
```
We'll get the best half (but no more than 50) as starting points for the next step in grid search:
```{r}
num.best <- min(25, floor(length(which(Likelihoods > 0))/2))
best.idx <- order(Likelihoods, decreasing = TRUE)[1:num.best]
```
Now, we'll pick those and expand the search to include a bunch a sd's, and we'll increase the number of iterations:
```{r}
nvals <- 15
sdvals <- seq(1/nvals/2,1-(1/nvals/2),1/nvals) * 3 * sigma
overall.grid <- NA
for (idx in best.idx) {
idx.searchgrid <- expand.grid('Ls'=searchgrid$Ls[idx],
'Lf'=searchgrid$Lf[idx],
'Rs'=searchgrid$Rs[idx],
'Rf'=searchgrid$Rf[idx],
'sd'=sdvals
)
idx.L <- apply(idx.searchgrid, FUN=twoRateLikelihood, MARGIN=c(1), schedule=schedule, reaches=reaches, checkStability=TRUE, iterations=50)
# preserve the best one:
idx.searchgrid$Likelihood <- idx.L
if (is.data.frame(overall.grid)) {
overall.grid <- rbind(overall.grid, idx.searchgrid[order(idx.L, decreasing = TRUE)[1],])
} else {
overall.grid <- idx.searchgrid[order(idx.L, decreasing = TRUE)[1],]
}
}
```
So now we have 25 reasonable starting positions, but we will only really fit a smaller number of them, for now: the top 5. We'll remove the likelihood values as they are not parameters for optimx to fiddle with.
```{r}
topgrid <- overall.grid[order(overall.grid$Likelihood, decreasing = TRUE)[1:5],c(1:5)]
```
Now we can actually let optim fit those models, and here we'll use even more iterations.
```{r}
# we need to tell optimx to maximize the likelihood:
library(optimx)
control <- list()
control$maximize <- TRUE
#control$fnscale <- -1 # do not use together with maximize
# run optimx on the best starting positions:
allfits <- do.call("rbind",
apply( topgrid,
MARGIN=c(1),
FUN=optimx,
fn=twoRateLikelihood,
method='L-BFGS-B',
lower=c(0,0,0,0,0),
upper=c(1,1,1,1,Inf),
schedule=schedule,
reaches=reaches,
checkStability=TRUE,
iterations=250,
control=control ) )
print(allfits)
```
Now, we pick the winning parameters:
```{r}
# pick the best fit:
win <- allfits[order(allfits$value, decreasing = T)[1],]
print(par <- unlist(win[,1:5]))
```