Skip to content
Open
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 53 additions & 12 deletions chapters/03_secondary_practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,24 +266,65 @@ stable.
#### Databases

A **database** is a data set together with specialized software to help you
organize, query, and update that data set. Databases are usually more
computationally efficient than languages like R and Python. Most databases also
support querying large data sets (hundreds of gigabytes) without any extra
setup. Some databases support multiple users, which can be convenient for
collaborative work (this is also one reason why databases are widely used for
data storage in web applications).
work with the data set. Databases can:


:::{seealso}
See DataLab's [An Overview of Databases and Data Storage workshop
reader][datalab-db] for an introduction to the topic. Then see DataLab's
[Introduction to SQL for Querying Databases workshop reader][datalab-sql] to
learn how to query data in databases.
* create a definitive version of your data that multiple people can access
without creating conflicts.

* provide quality control measures to make sure typos don't get introduced into
your data.

* query millions of observations in seconds (unlike Excel or Google Sheets).

* store data more efficiently than regular files

* control who has access to your data, and what they can do with it, on a very
granular level.

Databases are a specialized tool. They are optimized for querying and
transforming data. As such, they are generally faster at those tasks than R or
Python. However, they don't do everything. You can't create a data visualization
using a database, or run a statistical analysis. A database is a perfect data
source for those tasks to ensure that your results are reproducible across
collaborators and over time. Databases also make an excellent platform for
sharing or publishing your data. Their permissions functionality means you can
easily give the public read only access to your finalized data products, while
keeping the raw or unprocessed data only accessible to your collaborators.

Historically, databases were restricted to primarily tabular data (rows and
columns), in what are called relational databases. In recent years though, new
types of databases have come on the scene that can efficiently store all sorts
of different data, including nested or tree based data, graph data, and
unstructured text. The type of data you have will inform what kind of
database would be best for you, which you can find out more about in the
[Overview of Databases and Data Storage reader][datalab-db]

Many, if not most, database systems have graphical user interface software to
make it easier to interace with the database. But, to get the most out of a
database, it is helpful to the query langauge associated with your database of
choice. For relational databases this is SQL. To get get started with SQL, see
[Introduction to SQL for Querying Databases workshop reader][datalab-sql].
Non-relational (SQL-based) databases use a variety of langauges to write
queries. MongoDB, a document based database, uses MQL, which you can learn in
W3School's [MongoDB Tutorial][mongo]. Like other computer languages, query
languages like SQL and MQL make your data cleaning and transformation processes
reproducible. However, they are much easier to learn because the set of tasks
they are designed to do is much smaller.


:::{tip}
DataLab's [An Overview of Databases and Data Storage workshop
reader][datalab-db] provides an introduction to databases. DataLab's
[Introduction to SQL for Querying Databases workshop reader][datalab-sql]
teaches you how to query data in relational databases. W3School's [MongoDB
tutorial][mongo] shows you how to query data in a document or text based
database using MQL.
:::

[datalab-db]: https://ucdavisdatalab.github.io/workshop_intro_to_databases/
[datalab-sql]: https://ucdavisdatalab.github.io/workshop_intro_to_sql/

[mongo]: https://www.w3schools.com/mongodb/index.php


Workflow Automation
Expand Down