-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathquickstart.Rmd
167 lines (129 loc) · 6.69 KB
/
quickstart.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
title: "Quick start"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Quick start}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = TRUE
)
```
# Quick start to `covidregionaldata`
`covidregionaldata` is designed to extract national and subnational Covid-19 data from publicly available sources, both daily and aggregated over time.
For example, we could:
* Get the Covid-19 cases and deaths for all countries for each day since reporting began.
* Get the total Covid-19 cases and deaths for all countries up to today.
* Get the number of new cases for each state in the United States over time.
* Get the total number of deaths for all local authorities in the United Kingdom.
We will demonstrate some of these examples below, as well as some useful data manipulation and visualisation.
## Retrieving national cases and deaths over time
Let's say that we want to see the evolution of the pandemic over time in all countries. To do this, we need to get the number of cases (positive test results) and deaths for each country and date since the pandemic started. Firstly load `covidregionaldata`, then call `get_national_data()`:
```{r setup}
library(covidregionaldata)
all_countries <- get_national_data()
```
We can take a look at this data by printing it to the console:
```{r}
all_countries
```
We can see that this provides information about the number of cases and deaths for each country over time. As a tibble is returned, we can easily manipulate and plot this using `tidyverse` packages. For example, we can plot the total deaths in Italy over time.
```{r fig.width = 6, message = FALSE}
library(dplyr)
library(ggplot2)
all_countries %>%
filter(country == "Italy") %>%
ggplot() +
aes(x = date, y = deaths_total) +
geom_line() +
labs(x = NULL, y = "All reported Covid-19 deaths, Italy") +
theme_minimal()
```
We could have also filtered by country using `get_national_data(countries = "Italy")` to achieve the same result.
To plot the evolution of cases for several countries since the start of the pandemic, we could perform something like the following:
```{r fig.width = 6}
all_countries %>%
filter(country %in% c(
"Italy", "United Kingdom", "Spain",
"United States"
)) %>%
ggplot() +
aes(x = date, y = cases_total, colour = country) +
geom_line() +
labs(x = NULL, y = "All reported Covid-19 cases", colour = "Country") +
theme_minimal() +
theme(legend.position = "bottom")
```
## Retrieving total cases and deaths for all countries
Using `get_national_data(totals = TRUE)` we can obtain total cases and deaths for all nations up to the latest date reported. This is useful to get a snapshot of the current running total for each country. Note how the data is sorted by the total number of cases:
```{r}
all_countries_totals <- get_national_data(totals = TRUE, verbose = FALSE)
all_countries_totals
```
## Retrieving subnational region data
As well as national data, we can also retrieve subnational data, such as states and counties in the US, or regions and local authorities in the UK, and perform similar manipulations. See the [README](https://epiforecasts.io/covidregionaldata/) for a current list of countries with subnational data.
To do this, we use `get_regional_data()` (we could also use the underlying method as follows `us <- USA$new(get = TRUE); us$return()`). We'll get the state-level data for the United States over time:
```{r}
usa_states <- get_regional_data(country = "USA")
```
This retrieves the new and total cases and deaths for all states over time in the United States. Note that the `country` argument to the `get_regional_data()` function must be specified. Let's plot out how the number of cases has evolved for a selection of states:
```{r fig.width = 6, warning = FALSE}
usa_states %>%
filter(state %in% c("New York", "Texas", "Florida")) %>%
ggplot() +
aes(x = date, y = cases_total, colour = state) +
geom_line() +
labs(x = NULL, y = "All reported Covid-19 cases", colour = "U.S. state") +
theme_minimal() +
theme(legend.position = "bottom")
```
The `totals` argument can also be used in `get_regional_data` to return the cumulative number of cases or deaths over time.
```{r,}
usa_states_totals <- get_regional_data(
country = "USA", totals = TRUE,
verbose = FALSE
)
```
Let's plot the total deaths for each state, ordered by total deaths:
```{r fig.width = 6}
usa_states_totals %>%
ggplot() +
aes(x = reorder(state, -deaths_total), y = deaths_total) +
geom_bar(stat = "identity") +
labs(x = "U.S. states", y = "All reported Covid-19 deaths") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 90),
axis.title.y = element_text(hjust = 1),
legend.position = "bottom"
)
```
Most countries have a hierarchy of political subdivisions. Where data are available, use the `level = "2"` argument to `get_regional_data`. We can dig down further into each US state by looking at data by county:
```{r, eval = FALSE}
usa_counties <- get_regional_data(country = "USA", level = "2", verbose = FALSE)
```
This returns data for each county on each day of reported data. As there are more than 3,000 counties in the US, with some reporting data since the 21st January, this makes for a very large download.
The same method will work for different countries where data are available, returning relevant sub-national units for each country. For example, in Germany data are available for Bundesland (federated state) and Kreise (district) areas.
Additional subnational data are supported via the `JHU()` and `Google()` classes. Use the `available_regions()` method once these data have been downloaded and cleaned (see their examples) for subnational data they internally support.
## Mapping data
We provide the relevant national and subnational ISO codes, or local alternatives, to enable mapping. We can map Covid-19 data at national or subnational level, using any standard shapefile. In R, the `rnaturalearth` package already provides shapefiles for all country borders, and the largest (administrative level 1) subnational units for most countries.
For simplicity here we will use the `rworldmap` package, which uses data from `rnaturalearth` for national country borders and inbuilt mapping. We can join Covid-19 data to the map using the `iso_code`.
```{r fig.width = 6}
# Get latest worldwide WHO data
map_data <- get_national_data(totals = TRUE, verbose = FALSE) %>%
rworldmap::joinCountryData2Map(
joinCode = "ISO2",
nameJoinColumn = "iso_code"
)
# Produce map
rworldmap::mapCountryData(map_data,
nameColumnToPlot = "deaths_total",
catMethod = "fixedWidth",
mapTitle = "Total Covid-19 deaths to date",
addLegend = TRUE
)
```