-
Notifications
You must be signed in to change notification settings - Fork 1
/
tuts0203-state.Rmd
137 lines (93 loc) · 7.64 KB
/
tuts0203-state.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: State
---
The analyses demonstrated in the tutorial series so far, have been completely *state*-less. This means that each time an analysis is run, (for example, in response to a user checking a checkbox) it runs the analysis from beginning to end. In many cases, this isn't very efficient. A user may run a t-test, and then select a checkbox requesting descriptives. Without *state*, an analysis will recalculate the t-test results every time the analysis is changed, even when the changed option has no impact on the t-test results.
For many analyses, this isn't a problem -- indeed, a t-test runs very quickly, so recalculating with every option change doesn't really pose a problem; the user still receives the results near instantaneously. However, some analyses can take a considerable amount of time to run, and re-running these in their entirety with every change leads to long delays and a poor user experience. The solution to this problem is *state*.
In using state, an analysis retains information from when it was previously run. If a user makes a change to an existing analysis, the analysis can make use of the results that were calculated previously. Using the example of a t-test, if the user checks a checkbox requesting an additional table of descriptives, the analysis can re-use the t-test results from the last time the analysis ran. However, if the user changes an option which affects the t-test results -- say, the type of t-test -- then the analysis should not re-use the earlier t-test results. Whether earlier results should be used or not is determined by the `clearWith` property.
## `clearWith`
Each results element in the .r.yaml file can have a `clearWith` property specified. If no `clearWith` property is specified, then the default value of `*` is used, which means the `Table` or `Image` will be cleared if *any* option changes; *no* earlier results will ever be used. So far in this tutorial series, all analyses have behaved in this way.
Specifying a `clearWith` property lets us specify the circumstances where results should be re-used, and when not. For example, returning to our *t-test*, our `.a.yaml` file might contain the following options:
```{yaml }
- name: data
type: Data
- name: deps
title: Dependent Variables
type: Variables
- name: group
title: Grouping Variable
type: Variable
- name: alt
title: Alternative hypothesis
type: List
options:
- name: notEqual
title: Not equal
- name: oneGreater
title: One greater
- name: twoGreater
title: Two greater
default: notEqual
- name: varEq
title: Assume equal variances
type: Bool
default: true
```
We could add the `clearWith` property to the t-test results table in the `.r.yaml` file as follows:
```{yaml }
items:
- name: ttest
title: Independent Samples T-Test
type: Table
rows: (deps)
clearWith: # <-- here
- group
- alt
- varEq
columns:
- name: var
title: ''
type: text
content: ($key)
```
This `clearWith` specifies that the table is to be cleared if any of the options `group`, `alt` or `varEq` change. Take note that we *haven't* added the `deps` option to this list. When the user adds additional dependent variables, we don't want it to clear the existing rows. You can see what happens by running this example, and adding multiple dependent variables one at a time.
Before we added this `clearWith` property, adding another dependent variable caused the whole table to be cleared before being filled back in again. Now with `clearWith` (without `deps` listed), adding an additional dependent variable just adds another row, which is then filled in. The old rows are not cleared. This new behaviour minimises the amount the results flicker, and allows the user to see clearly what has changed in the results in response to their actions.
However, it should be noted that we haven't actually reduced the amount of calculations being performed. Although the table is no longer cleared when certain options are changed, our analysis implementation in the `.b.R` file still loops over all the dependent variables and performs a t-test for each. It then overrides the value already in the table with this newly calculated value; the exact same value. This isn't a problem, because the t-test runs very quickly, but we can modify our `.b.R` file to not calculate values which are already present in the table. We find out what parts of the table are already filled in with the `isFilled()` method.
## `isFilled()`
The `isFilled()` method can be called with any of the following:
`table$isFilled()`
`table$isFilled(rowNo=i, col)`
`table$isFilled(rowKey=key, col)`
By specifying or omitting different arguments, it is possible to query whether the whole table is filled, whether a particular row or column is filled, or whether a particular cell is filled. `isFilled()` returns either `TRUE` or `FALSE`.
Let's return to our t-test example, to the `.b.R` file. We might modify our `.run()` function as follows:
```
.run=function() {
table <- self$results$ttest
for (dep in self$options$deps) {
if ( ! table$isFilled(rowKey=dep)) { # <- this if statement!
formula <- jmvcore::constructFormula(dep, self$options$group)
formula <- as.formula(formula)
results <- t.test(formula, self$data)
table$setRow(rowKey=dep, values=list( # set by rowKey!
t=results$statistic,
df=results$parameter,
p=results$p.value
))
}
}
}
```
We've added an if-statement which checks if the row is already filled. If it is already filled in then it won't call the `t.test()` function or spend time populating the row. In this way we can skip calculations if the appropriate results are already filled in.
## `setState()`
However, sometimes we don't want to just store the final results; sometimes we want to store the intermediate objects as well. For example, we may want to create a fit object, and then reuse this same fit object the next time the analysis is run.
State can be saved and recovered from any results element, i.e. an `Image` or a `Table`, using the `setState()` method and `state` property:
```
table$setState(object)
object <- table$state
```
`$state` will return `NULL` if no state has been set.
Note that the `clearWith` property also applies to the state attached to a results element. The same mechanism can be used to selectively clear the state or not, depending on what options have changed.
When using `setState()` and `state`, an analysis will typically try and retrieve the state as one of the first things it does. If the state doesn't exist (`state` has a value of `NULL`), then the analysis will perform the calculations to create the object it requires and `setState()` that object onto a results element. Following this, the analysis can populate the tables and images from that object. Alternatively, if the state can be retrieved, then the analysis can bypass the initial time-consuming construction of the object, and just use the one from last time to populate the tables and images.
**WARNING** some R objects, when serialised, take up a lot of space. If these objects are large, then the save and restore process between analyses will be very sluggish. As such, it's worth investigating how large the objects you want to store will be. The following will give you the serialized size of an object in bytes:
```
length(serialize(object), connection=NULL)
```