forked from mjskay/uncertainty-examples
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathproportions.Rmd
executable file
·116 lines (93 loc) · 3.71 KB
/
proportions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: "Hypothetical outcome plots (HOPs) and quantile dotplots for proportions"
output: github_document
bibliography: references.bib
link-citations: true
---
Here is a quick example of two frequency-framing approaches to visualizing the uncertainty in a proportion: hypothetical outcome plots (HOPs), which are useful when you are using a medium that supports animation, and quantile dotplots, which are useful when you are using a medium that does not support animation.
## Setup
The following libraries are needed:
```{r setup, message = FALSE, warning = FALSE}
library(tidyverse)
library(modelr)
library(rstanarm)
library(tidybayes)
library(gganimate) # devtools::install_github("thomasp85/gganimate")
library(ggstance)
library(forcats)
theme_set(
theme_light() +
theme(panel.grid = element_blank())
)
```
## Data
We'll use some count data to illustrate:
```{r data}
df = data_frame(
group = c("A","B","C"),
count = c(100, 50, 30),
proportion = count / sum(count)
)
df
```
## Model
One *possible* model for this data is as follows (N.B. this makes some strong assumptions about how the data were generated for the purposes of an example, your mileage may vary!):
```{r model, results = "hide"}
m = stan_glm(count ~ group, family = poisson, data = df)
```
## Hypothetical outcome plots (HOPs)
Given that model, we could construct a HOPs [@Hullman2015; @Kale2018] bar chart illustrating the posterior distribution for the proportion in each group:
```{r hops, fig.width = 7, fig.height = 3, cache = TRUE}
n_hops = 100
p = df %>%
data_grid(group) %>%
add_fitted_draws(m, n = n_hops) %>%
group_by(.draw) %>%
mutate(proportion = .value / sum(.value)) %>%
ggplot(aes(y = group, x = proportion)) +
geom_colh(fill = "gray75") +
geom_point(data = df, color = "red") +
annotate("text", y = "C", x = .5, label = "Observed proportion", hjust = 0, color = "red") +
annotate("segment", y = "C", yend = "C", x = .18, xend = .49, linetype = "dashed", color = "red") +
annotate("text", y = 2.7, x = .5, label = "Posterior distribution for proportion", hjust = 0,
color = "gray35") +
annotate("segment", y = 2.7, yend = 2.7, x = .1, xend = .49, linetype = "dashed", color = "gray75") +
xlim(0,1) +
transition_states(.draw, transition_length = 1, state_length = 1)
animate(p, nframes = n_hops * 2, width = 600, height = 300)
```
## Quantile dotplots
If animation were not available (e.g. in a print medium), an alternative might be a quantile dotplot [@Kay2016; @Fernandes2018]:
```{r quantile-dotplots, fig.width = 7, fig.height = 4}
observed_label_data = data_frame(
group = "C",
label = "Observed proportion",
x = .18, xend = .44, y = .55
)
dotplot_label_data = data_frame(
group = "C",
label = "100 approximately equally likely proportions",
x = .18, xend = .44, y = .3
)
df %>%
data_grid(group) %>%
add_fitted_draws(m) %>%
group_by(.draw) %>%
mutate(proportion = .value / sum(.value)) %>%
group_by(group) %>%
do(data_frame(proportion = quantile(.$proportion, ppoints(100)))) %>%
ggplot(aes(x = proportion)) +
geom_dotplot(binwidth = .01, fill = "gray65", color = NA) +
facet_grid(fct_rev(group) ~ .) +
geom_text(aes(xend + .01, y, label = label), data = observed_label_data, hjust = 0, color = "red") +
geom_segment(aes(x, y, xend = xend, yend = y), data = observed_label_data,
linetype = "dashed", color = "red") +
geom_vline(aes(xintercept = proportion), data = df, color = "red") +
geom_text(aes(xend + .01, y, label = label), data = dotplot_label_data, hjust = 0, color = "gray35") +
geom_segment(aes(x, y, xend = xend, yend = y), data = dotplot_label_data,
linetype = "dashed", color = "gray65") +
xlim(0,1) +
ylab(NULL) +
scale_y_continuous(breaks = NULL)
```
## References