-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathweek12.html
92 lines (87 loc) · 14.1 KB
/
week12.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>R practice</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link href="css/bootstrap.min.css" rel="stylesheet">
<link href="css/custom.css" rel="stylesheet">
</head>
<body class="markdown github">
<header class="navbar-inverse navbar-fixed-top">
<div class="container">
<nav role="navigation">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="index.html" class="navbar-brand">J298 Data Journalism</a>
</div> <!-- /.navbar-header -->
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
<ul class="nav navbar-nav">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Class notes<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="week1.html">What is data?</a></li>
<li><a href="week2.html">Types of stories</a></li>
<li><a href="week3.html">Working with spreadsheets</a></li>
<li><a href="week4.html">Acquiring, cleaning, and formatting data</a></li>
<li><a href="week5.html">R, RStudio, and the tidyverse</a></li>
<li><a href="week6.html">Data journalism in the tidyverse</a></li>
<li><a href="week7.html">Don't let the data lie to you</a></li>
<li><a href="week8.html">Databases and SQL</a></li>
<li><a href="week9.html">Finding stories using maps</a></li>
<li><a href="week10.html">Maps meet databases</a></li>
<li><a href="week11.html">More PostGIS</a></li>
<li><a href="week12.html">R practice</a></li>
<li><a href="week13.html">PostGIS practice</a></li>
<li><a href="week14.html">More fun with R</a></li>
</ul>
</li>
<li><a href="software.html">Software</a></li>
<li><a href="datasets.html">Data</a></li>
<li><a href="questions.html">If you get stuck</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Email instructors<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="mailto:[email protected]">Peter Aldhous</a></li>
<li><a href="mailto:[email protected]">Amanda Hickman</a></li>
</ul>
</li>
</ul>
</div><!-- /.navbar-collapse -->
</nav>
</div> <!-- /.navbar-header -->
</header>
<div class="container all">
<h1 id="practice-with-r"><a name="practice-with-r" href="#practice-with-r"></a>Practice with R</h1><h3 id="getting-reacquainted-with-the-tidyverse"><a name="getting-reacquainted-with-the-tidyverse" href="#getting-reacquainted-with-the-tidyverse"></a>Getting reacquainted with the tidyverse</h3><p>The goal of today’s class is to practice the skills we learned in weeks 5 and 6. This will not be an excercise in copying and pasting code. Instead, you will work in small groups to answer questions from the data, in two stages.</p><p><strong>1)</strong> In plain English, work out the sequence of steps to achieve the desired result, using our basic operations with data:</p><ul>
<li><p><strong>Sort:</strong> Largest to smallest, oldest to newest, alphabetical etc.</p>
</li><li><p><strong>Filter:</strong> Select a defined subset of the data.</p>
</li><li><p><strong>Summarize/Aggregate:</strong> Deriving one value from a series of other values to produce a summary statistic. Examples include: count, sum, mean, median, maximum, minimum etc. Often you’ll <strong>group</strong> data into categories first, and then aggregate by group.</p>
</li><li><p><strong>Join:</strong> Merging entries from two or more datasets based on common field(s), e.g. unique ID number, last name and first name.</p>
</li></ul><p><strong>2)</strong> Convert those steps into <strong>dplyr</strong> code.</p><h3 id="the-data-we-will-use-today"><a name="the-data-we-will-use-today" href="#the-data-we-will-use-today"></a>The data we will use today</h3><p>Download the data for this session from <a href="data/week12.zip">here</a>, unzip the folder and place it on your desktop. It contains the following files, used in reporting <a href="https://www.newscientist.com/article/dn18806-revealed-pfizers-payments-to-censured-doctors/">this story</a>, which revealed that some of the doctors paid as “experts” by the drug company Pfizer had troubling disciplinary records:</p><ul>
<li><p><code>pfizer.csv</code> Payments made by Pfizer to doctors across the United States in the second half on 2009. Contains the following variables:</p>
<ul>
<li><code>org_indiv</code> Full name of the doctor, or their organization.</li><li><code>first_plus</code> Doctor’s first and middle names.</li><li><code>first_name</code> <code>last_name</code>. First and last names.</li><li><code>city</code> <code>state</code> City and state.</li><li><code>category of payment</code> Type of payment, which include <code>Expert-led Forums</code>, in which doctors lecture their peers on using Pfizer’s drugs, and `Professional Advising.</li><li><code>cash</code> Value of payments made in cash.</li><li><code>other</code> Value of payments made in-kind, for example puschase of meals.</li><li><code>total</code> value of payment, whether cash or in-kind.</li></ul>
</li><li><p><code>fda.csv</code> Data on warning letters sent to doctors by the US Food and Drug Administration, because of problems in the way in which they ran clinical trials testing experimental treatments. Contains the following variables:</p>
<ul>
<li><code>name_last</code> <code>name_first</code> <code>name_middle</code> Doctor’s last, first, and middle names.</li><li><code>issued</code> Date letter was sent.</li><li><code>office</code> Office within the FDA that sent the letter.</li></ul>
</li></ul><h3 id="setting-up"><a name="setting-up" href="#setting-up"></a>Setting up</h3><p>As before, save your R script to the folder with your data, then set the working directory to this folder by selecting from the top menu <code>Session>Set Working Directory>To Source File Location</code>. Then load the packages you will need today by running this code:</p><pre class="r hljs"><code class="R" data-origin="<pre><code class="R">library(dplyr)
library(readr)
library(lubridate)
library(ggplot2)
</code></pre>"><span class="hljs-keyword">library</span>(dplyr)
<span class="hljs-keyword">library</span>(readr)
<span class="hljs-keyword">library</span>(lubridate)
<span class="hljs-keyword">library</span>(ggplot2)
</code></pre><h4 id="load-and-view-data"><a name="load-and-view-data" href="#load-and-view-data"></a>Load and view data</h4><p>Load and inspect the two files containing the data. We will discuss the data in class before we get started with this exercise.</p><h4 id="find-doctors-in-california-paid-$10,000-or-more-by-pfizer-to-run-“expert-led-forums.”"><a name="find-doctors-in-california-paid-$10,000-or-more-by-pfizer-to-run-“expert-led-forums.”" href="#find-doctors-in-california-paid-$10,000-or-more-by-pfizer-to-run-“expert-led-forums.”"></a>Find doctors in California paid $10,000 or more by Pfizer to run “Expert-Led Forums.”</h4><p>List them in descending order of the total received.</p><h4 id="find-doctors-in-california-*or*-new-york-who-were-paid-$10,000-or-more-by-pfizer-to-run-“expert-led-forums.”"><a name="find-doctors-in-california-*or*-new-york-who-were-paid-$10,000-or-more-by-pfizer-to-run-“expert-led-forums.”" href="#find-doctors-in-california-*or*-new-york-who-were-paid-$10,000-or-more-by-pfizer-to-run-“expert-led-forums.”"></a>Find doctors in California <em>or</em> New York who were paid $10,000 or more by Pfizer to run “Expert-Led Forums.”</h4><p>Again, list them in descending order of the total received.</p><h4 id="find-doctors-in-states-*other-than*-california-who-were-paid-$10,000-or-more-by-pfizer-to-run-“expert-led-forums.”"><a name="find-doctors-in-states-*other-than*-california-who-were-paid-$10,000-or-more-by-pfizer-to-run-“expert-led-forums.”" href="#find-doctors-in-states-*other-than*-california-who-were-paid-$10,000-or-more-by-pfizer-to-run-“expert-led-forums.”"></a>Find doctors in states <em>other than</em> California who were paid $10,000 or more by Pfizer to run “Expert-Led Forums.”</h4><p>Again, list them in descending order of the total received.</p><h4 id="find-the-20-doctors-across-the-four-largest-states-(ca,-tx,-fl,-ny)-who-were-paid-the-most-for-professional-advice."><a name="find-the-20-doctors-across-the-four-largest-states-(ca,-tx,-fl,-ny)-who-were-paid-the-most-for-professional-advice." href="#find-the-20-doctors-across-the-four-largest-states-(ca,-tx,-fl,-ny)-who-were-paid-the-most-for-professional-advice."></a>Find the 20 doctors across the four largest states (CA, TX, FL, NY) who were paid the most for professional advice.</h4><h4 id="filter-the-data-for-all-payments-for-running-expert-led-forums-or-for-professional-advising,-and-arrange-alphabetically-by-doctor-(last-name,-then-first-name)"><a name="filter-the-data-for-all-payments-for-running-expert-led-forums-or-for-professional-advising,-and-arrange-alphabetically-by-doctor-(last-name,-then-first-name)" href="#filter-the-data-for-all-payments-for-running-expert-led-forums-or-for-professional-advising,-and-arrange-alphabetically-by-doctor-(last-name,-then-first-name)"></a>Filter the data for all payments for running Expert-Led Forums or for Professional Advising, and arrange alphabetically by doctor (last name, then first name)</h4><p>For brevity, filter using the <code>grepl</code> function we encountered in week 6.</p><h4 id="write-the-data-from-the-previous-query-to-a-csv-file"><a name="write-the-data-from-the-previous-query-to-a-csv-file" href="#write-the-data-from-the-previous-query-to-a-csv-file"></a>Write the data from the previous query to a CSV file</h4><h4 id="calculate-the-total-payments-by-state"><a name="calculate-the-total-payments-by-state" href="#calculate-the-total-payments-by-state"></a>Calculate the total payments by state</h4><p>List the states in descending order of the total received.</p><h4 id="calculate-the-total-payments-by-state-and-category"><a name="calculate-the-total-payments-by-state-and-category" href="#calculate-the-total-payments-by-state-and-category"></a>Calculate the total payments by state and category</h4><p>Sort by state and category.</p><h4 id="filter-the-fda-data-for-letters-sent-from-the-start-of-2005-onwards"><a name="filter-the-fda-data-for-letters-sent-from-the-start-of-2005-onwards" href="#filter-the-fda-data-for-letters-sent-from-the-start-of-2005-onwards"></a>Filter the FDA data for letters sent from the start of 2005 onwards</h4><h4 id="count-the-number-of-letters-issued-by-year"><a name="count-the-number-of-letters-issued-by-year" href="#count-the-number-of-letters-issued-by-year"></a>Count the number of letters issued by year</h4><p>For this one you will need to extract the year from the dates.</p><h4 id="count-the-number-of-letters-issued-by-year-and-month"><a name="count-the-number-of-letters-issued-by-year-and-month" href="#count-the-number-of-letters-issued-by-year-and-month"></a>Count the number of letters issued by year and month</h4><p>The resulting data should contain entries for Jan 2005, Feb 2005, and so on. You will need to extract both years and months from the dates.</p><h4 id="find-doctors-paid-to-run-" expert-led-forums"-who-also-received-a-warning-letter-from-the-fda"=""><a name="find-doctors-paid-to-run-" expert-led-forums"-who-also-received-a-warning-letter-from-the-fda"="" href="#find-doctors-paid-to-run-"></a>Find doctors paid to run “Expert-led forums” who also received a warning letter from the FDA</h4><p>Your final result should contain just these variables: <code>first_plus</code>, <code>last_name</code>, <code>city</code>, <code>state</code>, <code>total</code>, <code>issued</code>.</p><h4 id="find-the-doctor-paid-the-most-by-pfizer-(across-all-categories-of-payment)-who-also-received-a-warning-letter-from-the-fda"><a name="find-the-doctor-paid-the-most-by-pfizer-(across-all-categories-of-payment)-who-also-received-a-warning-letter-from-the-fda" href="#find-the-doctor-paid-the-most-by-pfizer-(across-all-categories-of-payment)-who-also-received-a-warning-letter-from-the-fda"></a>Find the doctor paid the most by Pfizer (across all categories of payment) who also received a warning letter from the FDA</h4><p>Do this in two steps: First calculate the total payments per doctor across all categories; then join that data frame to the FDA data. Make sure that the grouping variables in your initial calculation include those you’ll need to make the join!</p><p>Again, your final result should contain just these variables: <code>first_plus</code>, <code>last_name</code>, <code>city</code>, <code>state</code>, <code>total</code>, <code>issued</code>.</p><p>For both of the last two queries, further research would be needed to confirm that the doctors matched are the same individuals!</p><h3 id="assignment"><a name="assignment" href="#assignment"></a>Assignment</h3><ul>
<li>Complete the queries listed above, filing your entire R script via bCourses.</li></ul><h3 id="further-reading"><a name="further-reading" href="#further-reading"></a>Further reading</h3><p><strong><a href="https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html">Introduction to dplyr</a></strong></p><p><strong><a href="https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf">RStudio Data Wrangling Cheet Sheet</a></strong></p><p><strong><a href="http://stackoverflow.com/">Stack Overflow</a></strong><br>For any work involving code, this question-and-answer site is a great resource for when you get stuck, to see how others have solved similar problems. Search the site, or <a href="http://stackoverflow.com/questions/tagged/r">browse R questions</a></p>
</div> <!-- /.container all -->
<script src="https://code.jquery.com/jquery.min.js"></script>
<script src="js/bootstrap.min.js"></script>
</body>
</html>