assignments.Rmd

---
title: "Coding Practice"
output:
  html_document:
    toc: false
---

Assignments are designed to reinforce the code/lessons covered that week and provide you a chance to practice working with GitHub. Assignments are to be completed in your local project (on your computer) and pushed up to your GitHub repository for instructors to review by the following Wednesday. For example, homework for Week 1 (which is on September 22) should be submitted by September 28. That said, these due dates are largely suggestive as a way to help you prioritize and stay caught up as a group -- if you need, or want, more time, take it. At the end of the quarter, we will simply look over the tasks you have completed in concert with the reflection you submit.


<br>

## Assignments {.tabset .tabset-fade .tabset-pills}

### <small>Week 1</small>
<br>
Because this class caters to a range of experiences, we would like you to identify how it is you plan on earning credit for this course. Before next week, please complete [this form](https://forms.gle/HHTZJVVtScUrEMLm7) to select and describe your plans.

### <small>Week 2</small>

In Week 2's homework we are going to practice subsetting and manipulating vectors.  

First, open your r-davis-in-class-project-YourName and `pull`. Remember, we always want to start working on a github project by pulling, even if we are sure nothing has changed (believe me, this small step will save you lots of headaches).

Second, open a new script in your r-davis-in-class-project-YourName and save it to your `scripts` folder. Call this new script `week_2_homework`. 

Copy and paste the chunk of code below into your new `week_2_homework` script and run it. This chunk of code will create the vector you will use in your homework today. Check in your environment to see what it looks like. What do you think each line of code is doing? 

```{r}
set.seed(15)
hw2 <- runif(50, 4, 50)
hw2 <- replace(hw2, c(4,12,22,27), NA)
hw2
```


1. Take your `hw2` vector and removed all the NAs then select all the numbers between 14 and 38 inclusive, call this vector `prob1`. 

2. Multiply each number in the `prob1` vector by 3 to create a new vector called `times3`. Then add 10 to each number in your `times3` vector to create a new vector called `plus10`. 

3. Select every other number in your `plus10` vector by selecting the first number, not the second, the third, not the fourth, etc. If you've worked through these three problems in order, you should now have a vector that is 12 numbers long that looks **exactly** like this one:

```{r, echo = F}
prob1 <- hw2[!is.na(hw2)] #removing the NAs

prob1 <- prob1[prob1 >14 & prob1 < 38] #only selecting numbers between 14 and 38

times3 <- prob1 * 3 #multiplying by 3

plus10 <- times3 + 10 #adding 10 to the whole vector 

final <- plus10[c(TRUE, FALSE)] #selecting every other number
```

```{r}
final
```

Finally, save your script and push all your changes to your github account. 

<details>
<summary>**DO NOT OPEN** until you are ready to see the answers</summary>
```{r, eval=T}

prob1 <- hw2[!is.na(hw2)] #removing the NAs

prob1 <- prob1[prob1 >14 & prob1 < 38] #only selecting numbers between 14 and 38

times3 <- prob1 * 3 #multiplying by 3

plus10 <- times3 + 10 #adding 10 to the whole vector 

final <- plus10[c(TRUE, FALSE)] #selecting every other number using logical subsetting

```
</details>


### <small>Week 3</small>

Homework this week will be playing with the `surveys` data we worked on in class. First things first, open your r-davis-in-class-project and pull. Then create a new script in your `scripts` folder called `week_3_homework.R`. 

Load your `survey` data frame with the read.csv() function. Create a new data frame called `surveys_base` with only the species_id, the weight, and the plot_type columns. Have this data frame only be the first 5,000 rows. Convert both species_id and plot_type to factors. Remove all rows where there is an NA in the weight column. Explore these variables and try to explain why a factor is different from a character. Why might we want to use factors? Can you think of any examples? 

CHALLENGE:
Create a second data frame called `challenge_base` that only consists of individuals from your `surveys_base` data frame with weights greater than 150g.  

<details>
<summary>**DO NOT OPEN** until you are ready to see the answers for the the homework</summary>
```{r, eval=T}
#PROBLEM 1

surveys <- read.csv("data/portal_data_joined.csv") #reading the data in

colnames(surveys) #a list of the column names 

surveys_base <- surveys[1:5000, c(6, 9, 13)] #selecting rows 1:5000 and just columns 6, 9 and 13

surveys_base <- surveys_base[complete.cases(surveys_base), ] #selecting only the ROWS that have complete cases (no NAs) **Notice the comma was needed for this to work**

surveys_base$species_id <- factor(surveys_base$species_id) #converting factor data to character

surveys_base$plot_type <- factor(surveys_base$plot_type) #converting factor data to character

#Experimentation of factors
levels(surveys_base$species_id)
typeof(surveys_base$species_id)
class(surveys_base$species_id)

#CHALLENGE
challenge_base <- surveys_base[surveys_base[, 2]>150,] #selecting just the weights (column 2) that are greater than 150

```
</details>

<br> 

### <small>Week 4</small>

By now you should be in the rhythm of pulling from your git repository and then creating new homework script. This week the homework will review data manipulation in the tidyverse.

1. Create a tibble named `surveys` from the portal_data_joined.csv file.

2. Subset `surveys` using Tidyverse methods to keep rows with weight between 30 and 60, and print out the first 6 rows.

3. Create a new tibble showing the maximum weight for each species + sex combination and name it `biggest_critters`. Sort the tibble to take a look at the biggest and smallest species + sex combinations. HINT: it's easier to calculate max if there are no NAs in the dataframe...

4. Try to figure out where the NA weights are concentrated in the data- is there a particular species, taxa, plot, or whatever, where there are lots of NA values? There isn’t necessarily a right or wrong answer here, but manipulate surveys a few different ways to explore this. Maybe use `tally` and `arrange` here.

5. Take `surveys`, remove the rows where weight is NA and add a column that contains the average weight of each species+sex combination to the **full** `surveys` dataframe. Then get rid of all the columns except for species, sex, weight, and your new average weight column. Save this tibble as `surveys_avg_weight`.

6. Take `surveys_avg_weight` and add a new column called `above_average` that contains logical values stating whether or not a row’s weight is above average for its species+sex combination (recall the new column we made for this tibble).

<details>
<summary>**DO NOT OPEN** until you are ready to see the answers for the homework</summary>
```{r, eval = T}
library(tidyverse)
#1
surveys <- read_csv("data/portal_data_joined.csv")

#2
surveys %>% 
  filter(weight > 30 & weight < 60)

#3
biggest_critters <- surveys %>% 
  filter(!is.na(weight)) %>% 
  group_by(species_id, sex) %>% 
  summarise(max_weight = max(weight))

biggest_critters %>% arrange(max_weight)

biggest_critters %>% arrange(desc(max_weight))

#4
surveys %>% 
  filter(is.na(weight)) %>% 
  group_by(species) %>% 
  tally() %>% 
  arrange(desc(n))

surveys %>% 
  filter(is.na(weight)) %>% 
  group_by(plot_id) %>% 
  tally() %>% 
  arrange(desc(n))

surveys %>% 
  filter(is.na(weight)) %>% 
  group_by(year) %>% 
  tally() %>% 
  arrange(desc(n))

#5
surveys_avg_weight <- surveys %>% 
  filter(!is.na(weight)) %>% 
  group_by(species_id, sex) %>% 
  mutate(avg_weight = mean(weight)) %>% 
  select(species_id, sex, weight, avg_weight)

surveys_avg_weight

#6
surveys_avg_weight <- surveys_avg_weight %>% 
  mutate(above_average = weight > avg_weight)

surveys_avg_weight

```
</details>

### <small>Week 5</small>

This week's questions will have us practicing pivots and conditional statements.

1. Create a tibble named `surveys` from the portal_data_joined.csv file. Then manipulate `surveys` to create a new dataframe called `surveys_wide` with a column for genus and a column named after every plot type, with each of these columns containing the mean hindfoot length of animals in that plot type and genus. So every row has a genus and then a mean hindfoot length value for every plot type. The dataframe should be sorted by values in the Control plot type column. This question will involve quite a few of the functions you've used so far, and it may be useful to sketch out the steps to get to the final result.

2. Using the original `surveys` dataframe, use the two different functions we laid out for conditional statements, ifelse() and case_when(), to calculate a new weight category variable called `weight_cat`. For this variable, define the rodent weight into three categories, where "small" is less than or equal to the 1st quartile of weight distribution, "medium" is between (but not inclusive) the 1st and 3rd quartile, and "large" is any weight greater than or equal to the 3rd quartile. (Hint: the summary() function on a column summarizes the distribution). For ifelse() and case_when(), compare what happens to the weight values of NA, depending on how you specify your arguments.

BONUS: How might you soft code the values (i.e. not type them in manually) of the 1st and 3rd quartile into your conditional statements in question 2?

<details>
<summary>**DO NOT OPEN** until you are ready to see the answers for the homework</summary>
```{r, eval = T}
# 1
library(tidyverse)
surveys <- read_csv("data/portal_data_joined.csv")

surveys_wide <- surveys %>% 
  filter(!is.na(hindfoot_length)) %>% 
  group_by(genus, plot_type) %>% 
  summarise(mean_hindfoot = mean(hindfoot_length)) %>% 
  pivot_wider(names_from = plot_type, values_from = mean_hindfoot) %>% 
  arrange(Control)

# 2
summary(surveys$weight)
# The final "else" argument here, where I used the T ~ "large" applies even to NAs, which is not something we want
surveys %>% 
  mutate(weight_cat = case_when(
    weight <= 20.00 ~ "small",
    weight > 20.00 & weight < 48.00 ~ "medium",
    T ~ "large"
  ))

# To overcome this, case_when() allows us to not even use an "else" argument, and just specify the final argument to reduce confusion. This leaves NAs as is
surveys %>% 
  mutate(weight_cat = case_when(
    weight <= 20.00 ~ "small",
    weight > 20.00 & weight < 48.00 ~ "medium",
    weight >= 48.00 ~ "large"
  ))

# The "else" argument in ifelse() does not include NAs when specified, which is useful. The shortcoming, however, is that ifelse() does not allow you to leave out a final else argument, which means it is really important to always check the work on what that last argument assigns to.
surveys %>% 
  mutate(weight_cat = ifelse(weight <= 20.00, "small",
                             ifelse(weight > 20.00 & weight < 48.00, "medium","large")))

# BONUS:
summ <- summary(surveys$weight)
# Remember our indexing skills from the first weeks? Play around with single and double bracketing to see how it can extract values
summ[[2]]
summ[[5]]

# Then you can next these into your code
surveys %>% 
  mutate(weight_cat = case_when(
    weight >= summ[[2]] ~ "small",
    weight > summ[[2]] & weight < summ[[5]] ~ "medium",
    weight >= summ[[5]] ~ "large"
  ))

```
</details>

### <small>Week 6</small>

For our week six homework, we are going to be practicing the skills we learned with ggplot during class. You will be happy to know that we are going to be using a brand new data set called `gapminder`. This data set is looking at statistics for a few different counties including population, GDP per capita, and life expectancy. Download the data using the code below. Remember, this code is looking for a folder called `data` to put the .csv in, so make sure you have a folder named `data`, or modify the code to the correct folder name. 

```{r, warnings = F}
library(tidyverse)

gapminder <- read_csv("https://gge-ucd.github.io/R-DAVIS/data/gapminder.csv") #ONLY change the "data" part of this path if necessary

```


1. First calculates mean life expectancy on each continent. Then create a plot that shows how life expectancy has changed over time in each continent. Try to do this all in one step using pipes! (aka, try not to create intermediate dataframes)

2. Look at the following code and answer the following questions. What do you think the `scale_x_log10()` line of code is achieving? What about the `geom_smooth()` line of code? 

*Challenge!* Modify the above code to size the points in proportion to the population of the country. 
**Hint:** Are you translating data to a visual feature of the plot?

**Hint:** There's no cost to tinkering! Try some code out and see what happens with or without particular elements.

```{r}

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
    geom_point(aes(color = continent), size = .25) + 
    scale_x_log10() +
    geom_smooth(method = 'lm', color = 'black', linetype = 'dashed') +
    theme_bw()

```


3. Create a boxplot that shows the life expectency for Brazil, China, El Salvador, Niger, and the United States, with the data points in the backgroud using geom_jitter. Label the X and Y axis with "Country" and "Life Expectancy" and title the plot "Life Expectancy of Five Countries".

<details>
<summary>**DO NOT OPEN** until you are ready to see the answers! </summary>
```{r, eval = T}

library(tidyverse)

#PROBLEM 1:

gapminder %>%
  group_by(continent, year) %>% 
  summarize(mean_lifeExp = mean(lifeExp)) %>% #calculating the mean life expectancy for each continent and year
  ggplot()+
  geom_point(aes(x = year, y = mean_lifeExp, color = continent))+ #scatter plot
  geom_line(aes(x = year, y = mean_lifeExp, color = continent)) #line plot

#there are other ways to represent this data and answer this question. Try a facet wrap! Play around with themes and ggplotly!  


#PROBLEM 2:

#challenge answer
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
    geom_point(aes(color = continent, size = pop)) + 
    scale_x_log10() +
    geom_smooth(method = 'lm', color = 'black', linetype = 'dashed') +
    theme_bw()


#PROBLEM 3: 

countries <- c("Brazil", "China", "El Salvador", "Niger", "United States") #create a vector with just the countries we are interested in

gapminder %>% 
  filter(country %in% countries) %>% 
  ggplot(aes(x = country, y = lifeExp))+
  geom_boxplot() +
  geom_jitter(alpha = 0.3, color = "blue")+
  theme_minimal() +
  ggtitle("Life Expectancy of Five Countries") + #title the figure
  theme(plot.title = element_text(hjust = 0.5)) + #centered the plot title
  xlab("Country") + ylab("Life Expectancy") #how to change axis names


```
</details>


### <small>Week 7</small>

For week 7, we're going to be working on 2 critical `ggplot` skills: recreating a graph from a dataset and **googling stuff**.

Our goal will be to make this final graph using the `gapminder` dataset:

```{r, eval=T, echo=F, warning=F, message=F}
library(tidyverse)

gapminder <- read_csv("data/gapminder.csv")

pg <- gapminder %>% 
  select(country, year, pop, continent) %>% 
  filter(year > 2000) %>% 
  pivot_wider(names_from = year, values_from = pop) %>% 
  mutate(pop_change_0207 = `2007` - `2002`)
```

```{r, eval=T, echo=F, warning=F, message=F, fig.width = 8, fig.height = 5}
pg %>% 
  filter(continent != "Oceania") %>% 
  ggplot(aes(x = reorder(country, pop_change_0207), y = pop_change_0207)) +
  geom_col(aes(fill = continent)) +
  facet_wrap(~continent, scales = "free") +
  theme_bw() +
  scale_fill_viridis_d() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        legend.position = "none") +
  xlab("Country") +
  ylab("Change in Population Between 2002 and 2007")

```

The x axis labels are all scrunched up because we can't make the image bigger on the webpage, but if you make it and then zoom it bigger in RStudio it looks much better.

We'll touch on some intermediate steps here, since it might take quite a few steps to get from start to finish. Here are some things to note:

1. To get the population difference between 2002 and 2007 for each country, it would probably be easiest to have a country in each row and a column for 2002 population and a column for 2007 population.

2. Notice the order of countries within each facet. You'll have to look up how to order them in this way.

3. Also look at how the axes are different for each facet. Try looking through `?facet_wrap` to see if you can figure this one out.

4. The color scale is different from the default- feel free to try out other color scales, just don't use the defaults!

5. The theme here is different from the default in a few ways, again, feel free to play around with other non-default themes.

6. The axis labels are rotated! Here's a hint: `angle = 45, hjust = 1`. It's up to you (and Google) to figure out where this code goes!

7. Is there a legend on this plot?

This lesson should illustrate a key reality of making plots in R, one that applies as much to experts as beginners: 10% of your effort gets the plot 90% right, and 90% of the effort is getting the plot perfect. `ggplot` is incredibly powerful for exploratory analysis, as you can get a good plot with only a few lines of code. It's also extremely flexible, allowing you to tweak nearly everything about a plot to get a highly polished final product, but these little tweaks can take a lot of time to figure out!

So if you spend most of your time on this lesson googling stuff, you're not alone!

<details>
<summary>**DO NOT OPEN** until you are ready to see the answers</summary>
```{r, eval=FALSE}
library(tidyverse)

gapminder <- read_csv("data/gapminder.csv")

pg <- gapminder %>% 
  select(country, year, pop, continent) %>% 
  filter(year > 2000) %>% 
  pivot_wider(names_from = year, values_from = pop) %>% 
  mutate(pop_change_0207 = `2007` - `2002`)

pg %>% 
  filter(continent != "Oceania") %>% 
  ggplot(aes(x = reorder(country, pop_change_0207), y = pop_change_0207)) +
  geom_col(aes(fill = continent)) +
  facet_wrap(~continent, scales = "free") +
  theme_bw() +
  scale_fill_viridis_d() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        legend.position = "none") +
  xlab("Country") +
  ylab("Change in Population Between 2002 and 2007")
```
</details>

### <small>Week 8</small>

Let's look at some real data from Mauna Loa to try to format and plot. These meteorological data from Mauna Loa were collected every minute for the year 2001. *This dataset has 459,769 observations for 9 different metrics of wind, humidity, barometric pressure, air temperature, and precipitation.* Download this dataset [here](data/mauna_loa_met_2001_minute.rda). Save it to your `data/` folder. Alternatively, you can read the CSV directly from the R-DAVIS Github:
`mloa <- read_csv("https://raw.githubusercontent.com/gge-ucd/R-DAVIS/master/data/mauna_loa_met_2001_minute.csv")`

Use the [README](data/mauna_loa_README.txt) file associated with the Mauna Loa dataset to determine in what time zone the data are reported, and how missing values are reported in each column. With the `mloa` data.frame, remove observations with missing values in rel_humid, temp_C_2m, and windSpeed_m_s. Generate a column called "datetime" using the year, month, day, hour24, and min columns. Next, create a column called "datetimeLocal" that converts the datetime column to Pacific/Honolulu time (*HINT*: look at the lubridate function called `with_tz()`). Then, use dplyr to calculate the mean hourly temperature each month using the temp_C_2m column and the datetimeLocal columns. (*HINT*: Look at the lubridate functions called `month()` and `hour()`). Finally, make a ggplot scatterplot of the mean monthly temperature, with points colored by local hour.

Answers:
<details>
<summary>**DO NOT OPEN** until you are ready to see the answers</summary>
```{r, eval=FALSE}
library(tidyverse)
library(lubridate)
## Data import
mloa <- read_csv("https://raw.githubusercontent.com/gge-ucd/R-DAVIS/master/data/mauna_loa_met_2001_minute.csv")

mloa2 = mloa %>%
  # Remove NA's
  filter(rel_humid != -99) %>%
  filter(temp_C_2m != -999.9) %>%
  filter(windSpeed_m_s != -999.9) %>%
  # Create datetime column (README indicates time is in UTC)
  mutate(datetime = ymd_hm(paste0(year,"-", 
                                  month, "-", 
                                  day," ", 
                                  hour24, ":", 
                                  min), 
                           tz = "UTC")) %>%
  # Convert to local time
  mutate(datetimeLocal = with_tz(datetime, tz = "Pacific/Honolulu"))

## Aggregate and plot
mloa2 %>%
  # Extract month and hour from local time column
  mutate(localMon = month(datetimeLocal, label = TRUE),
         localHour = hour(datetimeLocal)) %>%
  # Group by local month and hour
  group_by(localMon, localHour) %>%
  # Calculate mean temperature
  summarize(meantemp = mean(temp_C_2m)) %>%
  # Plot
  ggplot(aes(x = localMon,
             y = meantemp)) +
  # Color points by local hour
  geom_point(aes(col = localHour)) +
  # Use a nice color ramp
  scale_color_viridis_c() +
  # Label axes, add a theme
  xlab("Month") +
  ylab("Mean temperature (degrees C)") +
  theme_classic()
```
</details>

### <small>Final assignment</small>  

Forthcoming  


<!-----
Alright folks, it's time for the final assignment of the quarter. The goal here is to generate an script that combines the skills we've learned throughout the quarter to produce several outputs. 
<br>


#### The Data
For this project you are going to be using some data sets about flights departing New York City in 2013. There are **several** CSV files you will need to use (as with any CSVs you're handed, they are likely imperfect and incomplete).
You should download the [flights](data/nyc_13_flights_small.csv), [planes](data/nyc_13_planes.csv), and [weather](data/nyc_13_weather.csv) CSV files. (Remember to put them into your data folder of your RProject to make reading them in easier!)

Hint: You may have to combine dataframes to answer some questions. Remember our `join` family of functions? You should be able to use the `join` type we covered in class. The `flights` dataset is the biggest one, so you should probably join the other data onto this one, meaning `flights` would be the first (of "left") argument in the left join. You can't join 3 tables together at once, but you can join tables `a` and `b` to make table `ab`, then join `ab` and `c` to get table `abc` which contains the columns from all 3 original tables.

#### Things to Include
1. Plot the departure delay of flights against the precipitation, and include a simple regression line as part of the plot. Hint: there is a `geom_` that will plot a simple `y ~ x` regression line for you, but you might have to use an argument to make sure it's a regular **l**inear **m**odel. Use `ggsave` to save your ggplot objects into a **new folder** you create called "plots".
2. Create a figure that has date on the x axis and each day's mean departure delay on the y axis. Plot only months September through December. Somehow distinguish between airline carriers (the method is up to you). Again, save your final product into the "plot" folder.
3. Create a dataframe with these columns: date (year, month and day), mean_temp, where each row represents the airport, based on airport code. Save this is a new csv into you `data` folder called `mean_temp_by_origin.csv`.
4. Make a function that can: (1) convert hours to minutes; and (2) convert minutes to hours (i.e., it's going to require some sort of conditional setting in the function that determines which direction the conversion is going). Use this function to convert departure delay (currently in minutes) to hours and then generate a boxplot of departure delay times by carrier. Save this function into a script called "customFunctions.R" in your scripts/code folder.
5. Below is the plot we generated from the new data in Q4. (Base code: 
`ggplot(df, aes(x = dep_delay_hrs, y = carrier, fill = carrier)) +
  geom_boxplot()`). The goal is to visualize delays by carrier. Do (at least) 5 things to improve this plot by changing, adding, or subtracting to this plot. The sky's the limit here, remember we often reduce data to more succinctly communicate things.


```{r answer key, echo = F, results = 'hide', message = F, warning = F, fig.show='hide'}
library(tidyverse)
#flight <- read.csv("data/nyc_13_flights.csv") 
#flight <- flight[sample(1:nrow(flight), size = 50000),]
#write.csv(flight, "data/nyc_13_flights_small.csv", row.names = F)

flight <- read.csv("data/nyc_13_flights_small.csv")
planes <- read.csv("data/nyc_13_planes.csv")
weather <- read.csv("data/nyc_13_weather.csv")

intersect(colnames(flight), colnames(planes))
intersect(colnames(flight), colnames(weather))

df <- flight %>% 
  left_join(planes) %>% 
  left_join(weather)

#Plot the departure delay of flights against the precipitation, and include a simple regression line as part of the plot
colnames(df)
max(df$dep_delay, na.rm = T)
df %>% 
  filter(dep_delay > 0) %>% 
  ggplot(aes(x = precip, y = dep_delay)) +
  geom_point() +
  geom_smooth(method = "lm")

# Create a figure that has date on the x axis and mean departure delay on the y axis. Plot only months September through December. Somehow distinguish between airline carriers (the method is up to you). 
library(lubridate)

df %>% 
  filter(dep_delay > 0) %>% 
  filter(month %in% c(9:12)) %>% 
  mutate(date = ymd(paste(year, month, day, sep = "-"))) %>% 
  mutate(mean_dep_delay = mean(dep_delay, na.rm = T)) %>% 
  unique() %>% 
  ggplot(aes(x = date, y = mean_dep_delay)) + 
  geom_point() +
  facet_wrap(~carrier)
  
# Create a dataframe with the average temperature by month at each origin airport, where the data is wide (i.e. every airport has a column)
df %>% 
  group_by(month, origin) %>% 
  summarize(mean_temp = mean(temp, na.rm = T)) %>% 
  pivot_wider(names_from = origin, values_from = mean_temp)

#4. Make a function that can: (1) convert hours to minutes; and (2) convert minutes to hours (i.e., it's going to require some sort of conditional setting in the function that determines which direction the conversion is going); use this function to convert departure delay (currently in minutes) to hours.

min2hr <- function(hr = NULL, min = NULL, from_unit = NULL){
if (from_unit == "minute"){
  hour = min/60
  print(hour)
} else if (from_unit == "hour"){
  minute = hr*60
  print(minute)
  }
}

min2hr(min = 760, from_unit = "minute")
min2hr(hr = 7, from_unit = "hour")
# or
min2hr <- function(hr = NULL, min = NULL, from_unit = NULL){
  ifelse(from_unit == "minute", min/60, hr*60)
}
# or
min2hr <- function(x = NULL, from_unit = NULL){
  ifelse(from_unit == "minute", x/60, x*60)
}

min2hr(760, from_unit = "minute")
min2hr(7, from_unit = "hour")

df$dep_delay_hrs <- NA
for(i in 1:nrow(df)){
  df$dep_delay_hrs[i] <- min2hr(df$dep_delay[i], from_unit = "minute")
}
# or
df$dep_delay_hrs <- map(df$dep_delay, ~ min2hr(.x, from_unit = "minute"))

## OR JUST TO SHOW A COOL FUNCTION
vec_min2hr <- Vectorize(min2hr,'x')
df <- df %>% mutate(dep_delay_hrs = vec_min2hr(dep_delay,from_unit = 'minute'))


```

```{r, echo = F}
df %>% 
  ggplot(aes(x = dep_delay_hrs, y = carrier, fill = carrier)) +
  geom_boxplot()
```

--->