USGS data are made available through the National Water Information System (NWIS).
+
+
USGS Web Retrievals
+
In this section, examples of National Water Information System (NWIS) retrievals show how to get raw data into R. This data includes site information, measured parameter information, historical daily values, unit values (which include real-time data but can also include other sensor data stored at regular time intervals), water quality data, groundwater level data, peak flow data, rating curve data, surface-water measurement data, water use data, and statistics data. The section Embedded Metadata shows instructions for getting metadata that is attached to each returned data frame.
+
The USGS organizes hydrologic data in a standard structure. Streamgages are located throughout the United States, and each streamgage has a unique ID (referred in this document and throughout the dataRetrieval
package as siteNumber
). Often (but not always), these ID’s are 8 digits for surface-water sites and 15 digits for groundwater sites. The first step to finding data is discovering this siteNumber
. There are many ways to do this, one is the National Water Information System: Mapper.
+
Once the siteNumber
is known, the next required input for USGS data retrievals is the “parameter code”. This is a 5-digit code that specifies the measured parameter being requested. For example, parameter code 00631 represents “Nitrate plus nitrite, water, filtered, milligrams per liter as nitrogen”, with units of “mg/l as N”.
+
Not every station will measure all parameters. A short list of commonly measured parameters is shown in Table 2.
+
+
+
+Table 2: Common USGS Parameter Codes |
+
+pCode |
+shortName |
+
+
+
+
+00060 |
+Discharge [ft3/s] |
+
+
+00065 |
+Gage height [ft] |
+
+
+00010 |
+Temperature [C] |
+
+
+00045 |
+Precipitation [in] |
+
+
+00400 |
+pH |
+
+
+
+
Two output columns that may not be obvious are “srsname” and “casrn”. Srsname stands for “Substance Registry Services”. More information on the srs name can be found here.
+
Casrn stands for “Chemical Abstracts Service (CAS) Registry Number”. More information on CAS can be found here.
+
For unit values data (sensor data measured at regular time intervals such as 15 minutes or hourly), knowing the parameter code and siteNumber
is enough to make a request for data. For most variables that are measured on a continuous basis, the USGS also stores the historical data as daily values. These daily values are statistical summaries of the continuous data, e.g. maximum, minimum, mean, or median. The different statistics are specified by a 5-digit statistics code.
+
Some common codes are shown in Table 3.
+
+
+
+Table 3: Commonly used USGS Stat Codes |
+
+StatCode |
+shortName |
+
+
+
+
+00001 |
+Maximum |
+
+
+00002 |
+Minimum |
+
+
+00003 |
+Mean |
+
+
+00008 |
+Median |
+
+
+
+
Examples for using these site numbers, parameter codes, and statistic codes will be presented in subsequent sections.
+
There are occasions where NWIS values are not reported as numbers, instead there might be text describing a certain event such as “Ice”. Any value that cannot be converted to a number will be reported as NA in this package (not including remark code columns), unless the user sets an argument convertType
to FALSE
. In that case, the data is returned as a data frame that is entirely character columns.
+
+
+
+
Daily Data
+
To obtain daily records of USGS data, use the readNWISdv
function. The arguments for this function are siteNumber
, parameterCd
, startDate
, endDate
, and statCd
(defaults to “00003”). If you want to use the default values, you do not need to list them in the function call. Daily data is pulled from https://waterservices.usgs.gov/rest/DV-Test-Tool.html.
+
The dates (start and end) must be in the format “YYYY-MM-DD” (note: the user must include the quotes). Setting the start date to “” (no space) will prompt the program to ask for the earliest date, and setting the end date to “” (no space) will prompt for the latest available date.
+
# Choptank River near Greensboro, MD:
+siteNumber <- "01491000"
+parameterCd <- "00060" # Discharge
+startDate <- "2009-10-01"
+endDate <- "2012-09-30"
+
+discharge <- readNWISdv(siteNumber, parameterCd, startDate, endDate)
+
The column “datetime” in the returned data frame is automatically imported as a variable of class “Date” in R. Each requested parameter has a value and remark code column. The names of these columns depend on the requested parameter and stat code combinations. USGS daily value qualification codes are often “A” (approved for publication) or “P” (provisional data subject to revision).
+
Another example would be a request for mean and maximum daily temperature and discharge in early 2012:
+
siteNumber <- "01491000"
+parameterCd <- c("00010", "00060") # Temperature and discharge
+statCd <- c("00001", "00003") # Mean and maximum
+startDate <- "2012-01-01"
+endDate <- "2012-05-01"
+
+temperatureAndFlow <- readNWISdv(siteNumber, parameterCd, startDate,
+ endDate, statCd = statCd)
+
The column names can be shortened and simplified using the renameNWISColumns
function. This is not necessary, but may streamline subsequent data analysis and presentation. Site information, daily statistic information, and measured parameter information is attached to the data frame as attributes. This is discussed further in the metadata section.
+
names(temperatureAndFlow)
+
## [1] "agency_cd" "site_no" "Date"
+## [4] "X_00010_00001_cd" "X_00010_00001" "X_00010_00003_cd"
+## [7] "X_00010_00003" "X_00060_00003_cd" "X_00060_00003"
+
temperatureAndFlow <- renameNWISColumns(temperatureAndFlow)
+names(temperatureAndFlow)
+
## [1] "agency_cd" "site_no" "Date"
+## [4] "Wtemp_Max_cd" "Wtemp_Max" "Wtemp_cd"
+## [7] "Wtemp" "Flow_cd" "Flow"
+
# Information about the data frame attributes:
+names(attributes(temperatureAndFlow))
+
## [1] "names" "row.names" "url"
+## [4] "siteInfo" "variableInfo" "disclaimer"
+## [7] "statisticInfo" "queryTime" "class"
+
statInfo <- attr(temperatureAndFlow, "statisticInfo")
+variableInfo <- attr(temperatureAndFlow, "variableInfo")
+siteInfo <- attr(temperatureAndFlow, "siteInfo")
+
An example of plotting the above data:
+
variableInfo <- attr(temperatureAndFlow, "variableInfo")
+siteInfo <- attr(temperatureAndFlow, "siteInfo")
+
+par(mar = c(5, 5, 5, 5)) #sets the size of the plot window
+
+plot(temperatureAndFlow$Date, temperatureAndFlow$Wtemp_Max, ylab = variableInfo$parameter_desc[1],
+ xlab = "")
+par(new = TRUE)
+plot(temperatureAndFlow$Date, temperatureAndFlow$Flow, col = "red",
+ type = "l", xaxt = "n", yaxt = "n", xlab = "", ylab = "",
+ axes = FALSE)
+axis(4, col = "red", col.axis = "red")
+mtext(variableInfo$parameter_desc[2], side = 4, line = 3, col = "red")
+title(paste(siteInfo$station_nm, "2012"))
+legend("topleft", variableInfo$param_units, col = c("black",
+ "red"), lty = c(NA, 1), pch = c(1, NA))
+
+
+
+
Unit Data
+
Any data collected at regular time intervals (such as 15-minute or hourly) are known as “unit values”. Many of these are delivered on a real time basis and very recent data (even less than an hour old in many cases) are available through the function readNWISuv
. Some of these unit values are available for many years, and some are only available for a recent time period such as 120 days. Here is an example of a retrieval of such data.
+
parameterCd <- "00060" # Discharge
+startDate <- "2012-05-12"
+endDate <- "2012-05-13"
+dischargeUnit <- readNWISuv(siteNumber, parameterCd, startDate,
+ endDate)
+dischargeUnit <- renameNWISColumns(dischargeUnit)
+
The retrieval produces a data frame that contains 96 rows (one for every 15 minute period in the day). They include all data collected from the startDate
through the endDate
(starting and ending with midnight locally-collected time). The dateTime column is converted to UTC (Coordinated Universal Time), so midnight EST will be 5 hours earlier in the dateTime column (the previous day, at 7pm).
+
To override the UTC timezone, specify a valid timezone in the tz argument. Default is “”, which will keep the dateTime column in UTC. Other valid timezones are:
+
America/New_York
+America/Chicago
+America/Denver
+America/Los_Angeles
+America/Anchorage
+America/Honolulu
+America/Jamaica
+America/Managua
+America/Phoenix
+America/Metlakatla
+
Data are retrieved from https://waterservices.usgs.gov/rest/IV-Test-Tool.html. There are occasions where NWIS values are not reported as numbers, instead a common example is “Ice”. Any value that cannot be converted to a number will be reported as NA in this package. Site information and measured parameter information is attached to the data frame as attributes. This is discussed further in metadata section.
+
+
+
Water Quality Data
+
To get USGS water quality data from water samples collected at the streamgage or other monitoring site (as distinct from unit values collected through some type of automatic monitor) we can use the function readNWISqw
, with the input arguments: siteNumber
, parameterCd
, startDate
, and endDate
. Additionally, the argument expanded
is a logical input that allows the user to choose between a simple return of datetimes/qualifier/values (expanded=FALSE
), or a more complete and verbose output (expanded=TRUE
). expanded = TRUE
includes such columns as remark codes, value qualifying text, and detection level for each parameter code. There also includes an argument “reshape”, that converts the expanded dataset to a “wide” format (each requested parameter code gets individual columns). The defaults are expanded=TRUE
, and reshape=FALSE
.
+
# Dissolved Nitrate parameter codes:
+parameterCd <- c("00618", "71851")
+startDate <- "1985-10-01"
+endDate <- "2012-09-30"
+
+dfLong <- readNWISqw(siteNumber, parameterCd, startDate, endDate)
+
+# Or the wide return:
+dfWide <- readNWISqw(siteNumber, parameterCd, startDate, endDate,
+ reshape = TRUE)
+
Site information and measured parameter information is attached to the data frame as attributes. This is discussed further in the metadata section. Additional metadata, such as information about the column names can be found by using the comment
function, also described in the metadata section.
+
+
+
+
Groundwater Level Data
+
Groundwater level measurements can be obtained with the readNWISgwl
function. Information on the returned data can be found with the comment
function, and attached attributes as described in the metadata section.
+
siteNumber <- "434400121275801"
+groundWater <- readNWISgwl(siteNumber)
+
+
+
Peak Flow Data
+
Peak flow data are instantaneous discharge or stage data that record the maximum values of these variables during a flood event. They include the annual peak flood event but can also include records of other peaks that are lower than the annual maximum. Peak discharge measurements can be obtained with the readNWISpeak
function. Information on the returned data can be found with the comment
function and attached attributes as described in the metadata section.
+
siteNumber <- "01594440"
+peakData <- readNWISpeak(siteNumber)
+
+
+
Rating Curve Data
+
Rating curves are the calibration curves that are used to convert measurements of stage to discharge. Because of changing hydrologic conditions these rating curves change over time. Information on the returned data can be found with the comment
function and attached attributes as described in the metadata section.
+
Rating curves can be obtained with the readNWISrating
function.
+
ratingData <- readNWISrating(siteNumber, "base")
+attr(ratingData, "RATING")
+
+
+
Surface-Water Measurement Data
+
These data are the discrete measurements of discharge that are made for the purpose of developing or revising the rating curve. Information on the returned data can be found with the comment
function and attached attributes as described in the metadata section.
+
Surface-water measurement data can be obtained with the readNWISmeas
function.
+
surfaceData <- readNWISmeas(siteNumber)
+
+
+
Water Use Data
+
Retrieves water use data from USGS Water Use Data for the Nation. See https://waterdata.usgs.gov/nwis/wu for more information. All available use categories for the supplied arguments are retrieved.
+
allegheny <- readNWISuse(stateCd = "Pennsylvania", countyCd = "Allegheny")
+
+
+national <- readNWISuse(stateCd = NULL, countyCd = NULL, transform = TRUE)
+
+
+
Statistics Data
+
Retrieves site statistics from the USGS Statistics Web Service beta.
+
discharge_stats <- readNWISstat(siteNumbers = c("02319394"),
+ parameterCd = c("00060"), statReportType = "annual")
+
+
+
+
Creating Tables in Microsoft® Software from R
+
There are a few steps that are required in order to create a table in Microsoft® software (Excel, Word, PowerPoint, etc.) from an R data frame. There are certainly a variety of good methods, one of which is detailed here. The example we will step through here will be to create a table in Microsoft Excel based on the data frame tableData:
+
availableData <- whatNWISdata(siteNumber, "dv")
+dailyData <- availableData["00003" == availableData$stat_cd,
+ ]
+
+tableData <- with(dailyData, data.frame(shortName = srsname,
+ Start = begin_date, End = end_date, Count = count_nu, Units = parameter_units))
+
First, save the data frame as a tab delimited file (you don’t want to use comma delimited because there are commas in some of the data elements):
+
write.table(tableData, file = "tableData.tsv", sep = "\t", row.names = FALSE,
+ quote = FALSE)
+
This will save a file in your working directory called tableData.tsv. You can see your working directory by typing getwd()
in the R console. Opening the file in a general-purpose text editor, you should see the following:
+
shortName Start End Count Units
+Temperature, water 2010-10-01 2012-06-24 575 deg C
+Stream flow, mean. daily 1948-01-01 2013-03-13 23814 ft3/s
+Specific conductance 2010-10-01 2012-06-24 551 uS/cm 25C
+Suspended sediment concentration (SSC) 1980-10-01 1991-09-30 3651 mg/l
+Suspended sediment discharge 1980-10-01 1991-09-30 3652 tons/day
+
Next, follow the steps below to open this file in Excel:
+
+- Open Excel
+- Click on the File tab
+- Click on the Open option
+- Navigate to the working directory (as shown in the results of
getwd()
)
+- Next to the File name text box, change the drop down type to All Files (.)
+- Double click tableData.tsv
+- A text import wizard will open up, in the first window, choose the Delimited radio button if it is not automatically picked, then click on Next.
+- In the second window, click on the Tab delimiter if it is not automatically checked, then click Finished.
+- Use the many formatting tools within Excel to customize the table
+
+
From Excel, it is simple to copy and paste the tables in other Microsoft® software. An example using one of the default Excel table formats is here. Additional formatting could be required in Excel, for example converting u to \(\mu\).
+
+
+