Skip to content

Latest commit

 

History

History
97 lines (80 loc) · 6.04 KB

methods.md

File metadata and controls

97 lines (80 loc) · 6.04 KB

accessor methods

Juniper L. Simonis, 2020-01-19

This document outlines the processes by which an Access® database is converted into local files and data objects.

Here we show usage under default settings for accessor which are set for working with the California Delta fish salvage monitoring database as implemented in the dapperstats/salvage repo

accessor.bash script: remote or local Acess® database to local .csvs

The main conversion is from an .accdb or .mdb database that may be remote or local to a local set of .csv files named by the tables in the database. This is accomplished by the accessor.bash script, which combines two other scripts:

  1. retrieve_remote_db.bash is used to retrieve a remote db (if needed)
    • wget is used to robustly download the database
  2. msdb_to_csvs.bash converts a local Microsoft db to a set of .csvs within a folder
    • The mdbtools and unixodbc libraries are leveraged
    • mdb-tables retrieves the table names
    • mdb-export converts and exports the database to .csvs
    • mdb-tables and mdb-export are connected via xargs and bash
    • This code is based on a reply by Eduard Florinescu on the Ask Ubuntu Stack Exchange

accessor.bash can be run as a bash command, but needs to be given a path to either a remote (option -r) or local (option -l) database, as in

sudo bash scripts/accessor.bash -r ftp://ftp.wildlife.ca.gov/salvage/Salvage_data_FTP.accdb  

or

sudo bash scripts/accessor.bash -l path/to/local.mdb

In total, there are 5 options to the accessor.bash script:

  1. -r: path to a remote database
  2. -l: path to a local database
  3. -t: name for the temporary file when a remote database is downloaded
    • Default value is the name of the file on the remote server
  4. -d: path to the data directory where the database's folder of .csvs will be located
    • Default value is data
  5. -k: y or n as to whether or not to keep the temporary db file
    • Default value is n

Packaging into a Docker image

We package accessor into a stable Docker software container, as written out in the Dockerfile for the salvage database. The associated accessor Docker image is freely available on Docker Hub. Of particular note for general use of the image is the use of a CMD line in the Dockerfile, which calls container_cmd.bash on container running. container_cmd.bash is a wrapper on accessor.bash with an added layer that runs an interactive R session that loads the accessor R functions and reads in the database .csv files as a list of data.frames that matches the databases. The interactive R session is run by including the -i y tag. Further, the options for container_cmd.bash can be passed at the command line to the container and will override the existing defaults, allowing customization.

To use the current image to generate an up-to-date container with data for yourself:

  1. (If needed) install Docker
    • Specific instructions vary depending on OS
  2. Open up a docker-ready terminal
  3. Download the image
    • sudo docker pull dapperstats/accessor
  4. Build the container
    • sudo docker container run -ti --name acc dapperstats/accessor
    • To customize the arguments of container_cmd.bash in the container build, add the command with arguments like
      • sudo docker container run -ti --name acc dapperstats/accessor bash scripts/container_cmd.bash -r "ftp://ftp.wildlife.ca.gov/Delta%20Smelt/NBA.mdb" -i y
  5. Copy the data out from the container
    • sudo docker cp acc:/data .

Note that the customization option in step 4. allows for a user to

  1. -r: path to a remote database
  2. -l: path to a local database
  3. -t: name for the temporary file when a remote database is downloaded
    • Default value is the name of the file on the remote server
  4. -d: path to the data directory where the database's folder of .csvs will be located
    • Default value is data
  5. -k: y or n as to whether or not to keep the temporary db file
    • Default value is n
  6. -i: y or n as to whether or not to start an interactive R session
    • Default value is n

R script: local .csvs to R list object

An additional conversion makes the data available in R by reading in the folder of .csvs as a list of data.frames that is directly analagous to the .accdb or .mdb database of tables.

Within an instance of R, navigate to the folder where you have this code repository located, source the functions script, and read in the database:

source("scripts/r_functions.R")
database <- read_database()

The resulting database object is a named list of the database's tables, ready for analyses.

The default arguments to read_database assume that you have also either run accessor.bash or copied the data out from the Docker container in the directory where this code repository is located. However, the four arguments are flexible and general:

  1. database is the name of the database folder (no extension) containing the .csvs
    • Default is Salvage_data_FTP as with other functionality
  2. tables is a vectory of the database tables (.csvs) to read in
    • Default is NULL, which translates to "all tables"
  3. data_dir is the directory where the database folder is located
    • Default is data
  4. quiet simply toggles on/off messaging
    • Default is FALSE