-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreproducible_research_assignment2.Rmd
149 lines (123 loc) · 4.86 KB
/
reproducible_research_assignment2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
title: "Analysis of health and economic consequences of severe weather in America"
author: "Nguyen Son Linh"
date: "5/28/2020"
output:
html_document:
keep_md: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## 1: Synopsis
Here we explore the NOAA database of severe weather events to find those that are most harmful on:
1. Health (injuries and deaths)
2. Property and crop
Information on the data can be obtained from this [documentation](https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf)
## 2: Data Processing
### 2.1: Data Loading
Download the raw data file and extract into a data frame. Also initiate necessary libraries
```{r DataLoading}
library(ggplot2)
library(tidyr)
library(dplyr)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "storm_data.csv")
storm_dt <- read.csv("storm_data.csv")
```
### 2.2 Data subsetting
Firstly, we shallowly explore the data by looking at the column names.
```{r ColNames}
colnames(storm_dt)
```
Observedly, we only need to keep columns relevant to our analysis, which are the event type, health and economic ones
```{r ColsToKeep}
cols_to_keep <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
```
Now we subset the data to include just those columns, as well as events that had damaging consequences. Reformat some columns for more alignment
```{r Data Subsetting, results='hide'}
storm_dt <- storm_dt[cols_to_keep]
storm_dt <- storm_dt %>%
filter(EVTYPE != "?" | FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0) %>%
mutate(CROPDMGEXP = toupper(CROPDMGEXP), PROPDMGEXP = toupper(PROPDMGEXP))
```
### 2.3: Convert alphanumeric component to values
Since the data given to us contains characters to represent the impact of events on property and crop, we need a method to turn it into real values
```{r Converting, results='hide'}
PROPDMGMultiplier <- c("\"\"" = 10^0,
"-" = 10^0,
"+" = 10^0,
"0" = 10^0,
"1" = 10^1,
"2" = 10^2,
"3" = 10^3,
"4" = 10^4,
"5" = 10^5,
"6" = 10^6,
"7" = 10^7,
"8" = 10^8,
"9" = 10^9,
"H" = 10^2,
"K" = 10^3,
"M" = 10^6,
"B" = 10^9)
CROPDMGMultiplier <- c("\"\"" = 10^0,
"?" = 10^0,
"0" = 10^0,
"K" = 10^3,
"M" = 10^6,
"B" = 10^9)
storm_dt <- storm_dt %>%
mutate(CROPDMGEXP = CROPDMGMultiplier[CROPDMGEXP]) %>%
mutate(PROPDMGEXP = PROPDMGMultiplier[PROPDMGEXP])
```
### 2.4: Calculate cost
Process missing data and calculate real cost.
```{r EconCost}
storm_dt$CROPDMGEXP <- ifelse(is.na(storm_dt$CROPDMGEXP), 1, storm_dt$CROPDMGEXP)
storm_dt$PROPDMGEXP <- ifelse(is.na(storm_dt$PROPDMGEXP), 1, storm_dt$PROPDMGEXP)
storm_dt <- storm_dt %>%
mutate(prop_cost = PROPDMG * PROPDMGEXP, crop_cost = CROPDMG * CROPDMGEXP)
```
### 2.5: Measure impact
We summarise the data to obtain the impact on health and economy
```{r MeasureImpact}
economy_damage <- storm_dt %>%
group_by(EVTYPE = as.factor(EVTYPE)) %>%
summarise(prop_cost = sum(prop_cost), crop_cost = sum(crop_cost),
total_cost = sum(prop_cost) + sum(crop_cost)) %>%
arrange(desc(total_cost))
human_damage <- storm_dt %>%
group_by(EVTYPE = as.factor(EVTYPE)) %>%
summarise(fatalities = sum(FATALITIES), injuries = sum(INJURIES),
totals = sum(FATALITIES) + sum(INJURIES)) %>%
arrange(desc(totals))
```
Retrieve the first 10 elements for plotting purposes, reshape for easier manipulation.
```{r Top10}
health_suffering <- human_damage[1:10,]
econ_suffering <- economy_damage[1:10,]
health_suffering <- health_suffering %>% gather(key = type, value = value, - EVTYPE)
econ_suffering <- econ_suffering %>% gather(key = type, value = value, - EVTYPE)
```
## 3: Results
### 3.1: Most harmful events to health
```{r HealthChart}
health_suffering %>%
ggplot(aes(x = reorder(EVTYPE, -value), y = value)) +
geom_bar(stat = "identity", aes(fill = type), position = "dodge") +
labs(title = "Top 10 fatal weather events", x = "Event", y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title.position = "panel")
```
### 3.2: Most harmful events to economy
```{r EconChart}
econ_suffering %>%
ggplot(aes(x = reorder(EVTYPE, -value), y = value)) +
geom_bar(stat = "identity", aes(fill = type), position = "dodge") +
labs(title = "Top 10 costly weather events", x = "Event", y = "Cost") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title.position = "panel")
```