The odsAPI R package serves as an API wrapper, allowing you to seamlessly access the Explore API (version 2.1) of all Opendatasoft customers.
To install the package directly from GitHub, you can use the devtools package:
devtools::install_github("ogdtg/odsAPI")
library(odsAPI)
This package enables access to the Explore API of any data portal hosted by Opendatasoft. To select the specific portal you want to access, you need to set the domain correctly. In this context, the domain refers to the host of the data portal. Functions in the package will utilize this domain to create the necessary API endpoints.
For instance, if you set the domain to "data.tg.ch," the functions will generate endpoints like this: "https://data.tg.ch/api/explore/v2.1/."
You can set and use domains in three different ways:
If you primarily access a single domain, it's advisable to write a system variable to the .Renviron
file. This ensures that your domain is set across sessions, eliminating the need to configure it repeatedly.
To do this, use usethis::edit_r_environ()
to open the .Renviron
file and add the following line:
ODS_API_DOMAIN=YOUR_DOMAIN
Ensure that you do not enclose the domain in quotes. After saving and closing the .Renviron
file, your changes will take effect upon restarting R.
Once set, you can access your domain using Sys.getenv("ODS_API_DOMAIN")
. The package will handle this automatically, allowing you to use functions like this:
catalog <- get_catalog()
data <- get_dataset(dataset_id = "sk-stat-111")
If you prefer not to set a domain permanently for all future sessions, you can use set_domain()
. This function sets the domain for the current session only. You can overwrite this setting at any time with another set_domain()
command. Note that using set_domain()
in the current session will override the ODS_API_DOMAIN
variable from the .Renviron
file.
set_domain("data.tg.ch")
catalog <- get_catalog()
data <- get_dataset(dataset_id = "sk-stat-111")
Each function includes a domain
parameter, allowing you to set the domain directly. Domains set via set_domain()
or in the .Renviron
file are ignored when specifying the domain directly in a function.
catalog <- get_catalog(domain = "data.tg.ch")
data <- get_dataset(domain = "data.tg.ch",dataset_id = "sk-stat-111")
Assuming a domain is set
Retrieve the entire catalog with all metadata using get_catalog()
catalog <- get_catalog()
You can also query the data catalog, selecting specific columns such as dataset_id
and title
, using the query_catalog()
function:
queried_catalog <- query_catalog(select=c("dataset_id","title"))
To explore all querying possibilities, including filtering by description, ordering, date ranges, and more, check the create_query()
function:
create_query(
search_in = "description",
search_for = "Frauenfeld",
order_by = "records",
asc = FALSE,
select = c("dataset_id", "title", "records", "publisher", ),
date_start = "2020-01-01",
date_end = "2022-12-31",
date_var = "last_modified",
filter_query = "publisher='Dienststelle für Statistik'"
)
A common task is to search the data catalog for a specific term in the title. Therefore, the search_catalog
function wraps the query_catalog
.
If you just want to search for a specific word in the titles of your datasets you can do the following:
search_catalog(search_for = "wahl")
Retrieve full metadata using the get_metadata
function, which returns a list containing all metadata, including dataset fields. Two wrapper functions are available to fetch:
- Dataset metadata:
get_metas
- Fields metadata:
get_fields
# Metas and Fields
full_metadata <- get_metadata("sk-stat-111")
# Fields
fields <- get_fields("sk-stat-111")
# Metas
metas <- get_metas("sk-stat-111")
# Metas
Download complete datasets using the get_dataset
function, providing the dataset_id
obtained from the catalog:
data <- get_dataset("sk-stat-111")
A very useful function is query_dataset()
. With this function you can use ODSQL Syntax to filter and select specific columns, if you only need a part of the data.
Similar to the catalog, you can query the records of each dataset. For instance:
# All swiss residents in Frauenfeld from the rather large dataset
query_dataset(
dataset_id = "sk-stat-134",
politische_gemeinde = "Frauenfeld",
select_fields = c("bfs_nr_gemeinde", "politische_gemeinde", "ortschaft" ,"einwohner_schweizer")
)