diff --git a/chapters/03_secondary_practices.md b/chapters/03_secondary_practices.md index 6af33ea..9c4349c 100644 --- a/chapters/03_secondary_practices.md +++ b/chapters/03_secondary_practices.md @@ -266,24 +266,65 @@ stable. #### Databases A **database** is a data set together with specialized software to help you -organize, query, and update that data set. Databases are usually more -computationally efficient than languages like R and Python. Most databases also -support querying large data sets (hundreds of gigabytes) without any extra -setup. Some databases support multiple users, which can be convenient for -collaborative work (this is also one reason why databases are widely used for -data storage in web applications). +work with the data set. Databases can: + + +* create a definitive version of your data that multiple people can access +without creating conflicts. + +* provide quality control measures to make sure typos don't get introduced into +your data. + +* query millions of observations in seconds (unlike Excel or Google Sheets). + +* store data more efficiently than regular files + +* control who has access to your data, and what they can do with it, on a very +granular level. + +Databases are a specialized tool. They are optimized for querying and +transforming data. As such, they are generally faster at those tasks than R or +Python. However, they don't do everything. You can't create a data visualization +using a database, or run a statistical analysis. A database is a perfect data +source for those tasks to ensure that your results are reproducible across +collaborators and over time. Databases also make an excellent platform for +sharing or publishing your data. Their permissions functionality means you can +easily give the public read only access to your finalized data products, while +keeping the raw or unprocessed data only accessible to your collaborators. + +Historically, databases were restricted to primarily tabular data (rows and +columns), in what are called relational databases. In recent years though, new +types of databases have come on the scene that can efficiently store all sorts +of different data, including nested or tree based data, graph data, and +unstructured text. The type of data you have will inform what kind of +database would be best for you, which you can find out more about in the +[Overview of Databases and Data Storage reader][datalab-db] + +Many, if not most, database systems have graphical user interface software to +make it easier to interace with the database. But, to get the most out of a +database, it is helpful to the query langauge associated with your database of +choice. For relational databases this is SQL. To get get started with SQL, see +[Introduction to SQL for Querying Databases workshop reader][datalab-sql]. +Non-relational (SQL-based) databases use a variety of langauges to write +queries. MongoDB, a document based database, uses MQL, which you can learn in +W3School's [MongoDB Tutorial][mongo]. Like other computer languages, query +languages like SQL and MQL make your data cleaning and transformation processes +reproducible. However, they are much easier to learn because the set of tasks +they are designed to do is much smaller. :::{seealso} -See DataLab's [An Overview of Databases and Data Storage workshop -reader][datalab-db] for an introduction to the topic. Then see DataLab's -[Introduction to SQL for Querying Databases workshop reader][datalab-sql] to -learn how to query data in databases. +DataLab's [An Overview of Databases and Data Storage workshop +reader][datalab-db] provides an introduction to databases. DataLab's +[Introduction to SQL for Querying Databases workshop reader][datalab-sql] +teaches you how to query data in relational databases. W3School's [MongoDB +tutorial][mongo] shows you how to query data in a document or text based +database using MQL. ::: [datalab-db]: https://ucdavisdatalab.github.io/workshop_intro_to_databases/ [datalab-sql]: https://ucdavisdatalab.github.io/workshop_intro_to_sql/ - +[mongo]: https://www.w3schools.com/mongodb/index.php Workflow Automation