Juniper L. Simonis, 2020-01-19
This document outlines the processes by which an Access® database is converted into local files and data objects.
Here we show usage under default settings for accessor
which are set for working with the California Delta fish salvage monitoring database as implemented in the dapperstats/salvage
repo
The main conversion is from an .accdb
or .mdb
database that may be remote or local to a local set of .csv
files named by the tables in the database.
This is accomplished by the accessor.bash
script, which combines two other scripts:
retrieve_remote_db.bash
is used to retrieve a remote db (if needed)wget
is used to robustly download the database
msdb_to_csvs.bash
converts a local Microsoft db to a set of.csv
s within a folder- The
mdbtools
andunixodbc
libraries are leveraged mdb-tables
retrieves the table namesmdb-export
converts and exports the database to.csv
smdb-tables
andmdb-export
are connected viaxargs
andbash
- This code is based on a reply by Eduard Florinescu on the Ask Ubuntu Stack Exchange
- The
accessor.bash
can be run as a bash
command, but needs to be given a path to either a remote (option -r
) or local (option -l
) database, as in
sudo bash scripts/accessor.bash -r ftp://ftp.wildlife.ca.gov/salvage/Salvage_data_FTP.accdb
or
sudo bash scripts/accessor.bash -l path/to/local.mdb
In total, there are 5 options to the accessor.bash
script:
-r
: path to a remote database-l
: path to a local database-t
: name for the temporary file when a remote database is downloaded- Default value is the name of the file on the remote server
-d
: path to the data directory where the database's folder of.csv
s will be located- Default value is
data
- Default value is
-k
:y
orn
as to whether or not to keep the temporary db file- Default value is
n
- Default value is
We package accessor
into a stable Docker
software container, as written out in the Dockerfile
for the salvage database.
The associated accessor Docker image is freely available on Docker Hub.
Of particular note for general use of the image is the use of a CMD
line in the Dockerfile
, which calls container_cmd.bash
on container running.
container_cmd.bash
is a wrapper on accessor.bash
with an added layer that runs an interactive R
session that loads the accessor
R functions and reads in the database .csv
files as a list
of data.frames
that matches the databases.
The interactive R
session is run by including the -i y
tag.
Further, the options for container_cmd.bash
can be passed at the command line to the container and will override the existing defaults, allowing customization.
To use the current image to generate an up-to-date container with data for yourself:
- (If needed) install Docker
- Specific instructions vary depending on OS
- Open up a docker-ready terminal
- Download the image
sudo docker pull dapperstats/accessor
- Build the container
sudo docker container run -ti --name acc dapperstats/accessor
- To customize the arguments of
container_cmd.bash
in the container build, add the command with arguments likesudo docker container run -ti --name acc dapperstats/accessor bash scripts/container_cmd.bash -r "ftp://ftp.wildlife.ca.gov/Delta%20Smelt/NBA.mdb" -i y
- Copy the data out from the container
sudo docker cp acc:/data .
Note that the customization option in step 4. allows for a user to
-r
: path to a remote database-l
: path to a local database-t
: name for the temporary file when a remote database is downloaded- Default value is the name of the file on the remote server
-d
: path to the data directory where the database's folder of.csv
s will be located- Default value is
data
- Default value is
-k
:y
orn
as to whether or not to keep the temporary db file- Default value is
n
- Default value is
-i
:y
orn
as to whether or not to start an interactive R session- Default value is
n
- Default value is
An additional conversion makes the data available in R
by reading in the folder of .csv
s as a list
of data.frames
that is directly analagous to the .accdb
or .mdb
database of tables.
Within an instance of R
, navigate to the folder where you have this code repository located, source the functions script, and read in the database:
source("scripts/r_functions.R")
database <- read_database()
The resulting database
object is a named list
of the database's tables, ready for analyses.
The default arguments to read_database
assume that you have also either run accessor.bash
or copied the data out from the Docker
container in the directory where this code repository is located.
However, the four arguments are flexible and general:
database
is the name of the database folder (no extension) containing the.csv
s- Default is
Salvage_data_FTP
as with other functionality
- Default is
tables
is a vectory of the database tables (.csv
s) to read in- Default is
NULL
, which translates to "all tables"
- Default is
data_dir
is the directory where thedatabase
folder is located- Default is
data
- Default is
quiet
simply toggles on/off messaging- Default is
FALSE
- Default is