-
Notifications
You must be signed in to change notification settings - Fork 0
/
collective_geoms.Rmd
156 lines (106 loc) · 3.63 KB
/
collective_geoms.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
```{r, include = FALSE}
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
fig.show = "hold",
fig.asp = 0.618
)
```
# Collective geoms
```{r, message = FALSE}
library(ggplot2)
library(dplyr)
library(babynames)
```
## Exercises
**Q1:** Draw a boxplot of `hwy` for each value of `cyl`, without turning `cyl` into a factor. What extra aesthetic do you need to set?
**A:** We need to set group aesthetic:
```{r}
ggplot(mpg, aes(cyl, hwy, group = cyl)) +
geom_boxplot()
```
**Q2:** Modify the following plot so that you get one boxplot per integer value of `displ`.
```{r}
ggplot(mpg, aes(displ, cty)) +
geom_boxplot()
```
**A:** As the warning hints us, we have to set the group aesthetic:
```{r}
ggplot(mpg, aes(displ, cty, group = displ)) +
geom_boxplot()
```
**Q3:** When illustrating the difference between mapping continuous and discrete colours to a line, the discrete example needed `aes(group = 1)`. Why? What happens if that is omitted? What’s the difference between `aes(group = 1)` and `aes(group = 2)`? Why?
**A:** First, lets remove the group aesthetic:
```{r}
df <- data.frame(x = 1:3, y = 1:3, colour = c(1, 3, 5))
ggplot(df, aes(x, y, colour = factor(colour))) +
geom_line(size = 2) +
geom_point(size = 5)
ggplot(df, aes(x, y, colour = colour)) +
geom_line(size = 2) +
geom_point(size = 5)
```
Now let's use `aes(group = 2)`:
```{r}
ggplot(df, aes(x, y, colour = factor(colour))) +
geom_line(aes(group = 2), size = 2) +
geom_point(size = 5)
ggplot(df, aes(x, y, colour = colour)) +
geom_line(aes(group = 2), size = 2) +
geom_point(size = 5)
```
If we map a categorical variable to the color aesthetic, `geom_line()` connects (group) the observations in each level of the variable. In the first case, there is only one observation in each group so, specifying the groups manually makes these points connected (and in the second case, we can notice that the value of the group aesthetic doesn't matter).
**Q4:** How many bars are in each of the following plots?
```{r}
ggplot(mpg, aes(drv)) +
geom_bar()
ggplot(mpg, aes(drv, fill = hwy, group = hwy)) +
geom_bar()
mpg2 <- mpg %>% arrange(hwy) %>% mutate(id = seq_along(hwy))
ggplot(mpg2, aes(drv, fill = hwy, group = id)) +
geom_bar()
```
(Hint: try adding an outline around each bar with `colour = "white"`)
**A:** We can use `colour = "white"`, but it's still hard to count the number of bars. We can use dplyr to find the number of bars:
```{r}
ggplot(mpg, aes(drv)) +
geom_bar(color = "white")
mpg %>%
select(drv) %>%
unique() %>%
group_by(drv) %>%
count()
```
```{r}
ggplot(mpg, aes(drv, fill = hwy, group = hwy)) +
geom_bar(color = "white")
mpg %>%
select(drv, hwy) %>%
unique() %>%
count(drv)
```
```{r}
mpg2 <- mpg %>% arrange(hwy) %>% mutate(id = seq_along(hwy))
ggplot(mpg2, aes(drv, fill = hwy, group = id)) +
geom_bar(color = "white")
mpg2 %>%
select(drv, hwy, id) %>%
unique() %>%
count(drv)
```
**Q5:** Install the babynames package. It contains data about the popularity of babynames in the US. Run the following code and fix the resulting graph. Why does this graph make me unhappy?
```{r}
hadley <- filter(babynames, name == "Hadley")
ggplot(hadley, aes(year, n)) +
geom_line()
```
**A:** We want to see the growth of the newborn babies with the name called Hadley. If we take a look at the data, we can notice that there are 2 levels for the `sex` variable:
```{r}
hadley <- filter(babynames, name == "Hadley")
levels(as.factor(hadley$sex))
```
There is two way to fix this problem: using group aesthetic or using colour aesthetic:
```{r}
ggplot(hadley, aes(year, n, colour = sex)) +
geom_line()
```