README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# covid19R

<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
[![CRAN status](https://www.r-pkg.org/badges/version/covid19R)](https://CRAN.R-project.org/package=covid19R)
[![Travis build status](https://travis-ci.org/Covid19R/covid19R.svg?branch=master)](https://travis-ci.org/Covid19R/covid19R)
<!-- badges: end -->

The goal of covid19R is to provide a single package that allows users to access all of the tidy covid-19 datasets collected by data packages that implement the covid19R tidy data standard. It provides access to multiple data sets that meet a tidy data standard.

To learn more abou the Covid19R project, check [our extensive documentation](https://covid19r.github.io/documentation) about data standards, how to get your data added to this list, and more.

## Installation

<!--
You can install the released version of covid19R from [CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("covid19R")
```

-->

You can install the development version from [github](https://github.com/) with:

``` r
remotes::install_github("covid19r/covid19r")
```

## Getting the Data Information

To see what datasets are available, use `get_covid19_data_info()`

```{r info}
library(covid19R)

data_info <- get_covid19_data_info()

head(data_info) %>% knitr::kable()

```

## Accessing data

Once you have figured out what dataset you want, you can access it with `get_covid19_dataset()`

```{r}
library(dplyr)

nytimes_states <- get_covid19_dataset("covid19nytimes_states")

nytimes_states %>%
  filter(date == max(date)) %>%
  filter(data_type == "cases_total") %>%
  arrange(desc(value)) %>%
  head()
  
```

## The covid19R Data Standard

While many data sets have their own unique additional columns (e.g., Latitude, Longitude, population, etc.), all datasets have the following columns and are arranged in a long format:

* date - The date in YYYY-MM-DD form
* location - The name of the location as provided by the data source. The counties dataset provides county and state. They are combined and separated by a `,`, and can be split by `tidyr::separate()`, if you wish.
* location_type - The type of location using the covid19R controlled vocabulary. Nested locations are indicated by multiple location types being combined with a `_
* location_code - A standardized location code using a national or international standard. In this case, FIPS state or county codes. See https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code and https://en.wikipedia.org/wiki/FIPS_county_code for more
* location_code_type The type of standardized location code being used according to the covid19R controlled vocabulary. Here we use `fips_code`
* data_type - the type of data in that given row. Includes `total_cases` and `total_deaths`, cumulative measures of both.
* value - number of cases of each data type


## Vocabularies

The `location_type`, `location_code_type`, and `data_type` from datasets and `spatial_extent` from the data info table all have their own controlled vocabularies. Others might be introduced as the collection of packages matures. To see the possible values of a standardized vocabulary, use `get_covid19_controlled_vocab()`

```{r vocab}
get_covid19_controlled_vocab("location_type") %>%
  knitr::kable()
```