Skip to content

Decision: Reg content sources

Britta edited this page Apr 2, 2021 · 1 revision
Thing Info
Relevant features Timeline
Date started 2021-03-05
Date finished 2021-04-02
Decision status Done
Summary of outcome No interpreting rules from Federal Register. Source info for current: eCFR dailies. Source for historical: use as much detailed historical data as eCFR gives us, then fill in the past as much as can with the annual data from GPO.

Background/context

Core question

What levels or kinds of information should we use for historical and current regs display?

What sources of information should we use?

What we know

Currently we use GPO GovInfo.gov (Federal Register) for annual data and pull in Federal Register notices for updates (using Federal Register API). Alternative is eCFR API.

Reg content that is available to us to use (in machine-read format):

  • The GPO produces annual published versions of the regs for several years back. According to the GPO, these are the official versions.
  • eCFR publishes somewhat recent granular (monthly-ish) versions. Not formally designated official. But these versions match what people expect from eCFR (because it is eCFR), so they are sufficient for everyday CMCS use.
  • The eRegs open source repositories have some code for rule-by-rule parsing of Federal Register notices. This can produce versions at a day-by-day level of granularity. It’s inherently unreliable because the rules in the Federal Register are written by people and made to be read by people. People are naturally inconsistent in how they write these, so the automated parser creates approximate but inaccurate results. To produce reasonably accurate versions, a person would need to check and fix every parsed rule. Not usable in production.
  • The GPO produces detailed meta information for rules (including dates published, names, etc)

Quotes from user research:

  • "If this [eRegs content] is pulling from Medicaid.gov, the Medicaid.gov people need to be keeping it up to date. I’d feel more comfortable knowing that this was all coming from eCFR, instead of Medicaid.gov’s take on eCFR."
  • "The Cornell law site pops up for me [in Google] because I'm on my personal computer, but then you have to go verify anyway. I'm like, I better go to the Federal Register to make sure this is really the most updated version."

Tradeoffs

People's needs:

  • All users of current content need daily updates for current content.
  • Policy owners and users need eRegs to prioritize accuracy and reliability, to reduce misinterpretations, even if it's less detailed.
  • In general, many CMCS readers are likely to find any form of history view helpful, because there's no reasonable way to do that with current tools. But they'd prefer as granular as possible.

Current understanding of tradeoffs between maintenance needs and user goals:

  • For the basic version of things, we want to prioritize automated relatively-low-maintenance regs content parsing, to enable eRegs to work reliably for core user needs even if it has a skeleton crew.
  • We can also plan for enhancements that would deliver even more value to readers and be higher-maintenance, but that's second.

Things we need to decide + options for them

What do we display for historical versions?

Status quo: GPO for annual data + parse Federal Register rules for details.

Alternative: Annual historical versions + more detailed recent info from eCFR.

This would mean displaying the official content published in 2016, 2017, 2018, and the diffs between those years.

Can we figure out what date something was changed, so we can have the drop-down where you pick the specific date to display effective as of? We can say “here’s the final rule that was official in 2001, and here are the final rules that were published between 2000 and 2001”. We can’t say “this was the exact thing that was effective on x”.

It’s probably the right tradeoff to show the accurate annual changes even though they’re less specific — they’re more accurate and easier as a first iteration (not trying to do the granular diffs).

We can add in the available monthly-ish versions from eCFR for more recent versions.

What source(s) of data do we use for current/latest content?

Status quo: GPO for annual data + parse Federal Register rules for changes.

We could tell our users: "eRegs pulls annual reg data from GPO, and then it parses changes from Federal Register rules for daily updates."

We believe our users would not feel confident that the regs they're using for their research are the correct regs. (For good reason, because it can have errors.)

Alternative: Use official annual versions from GPO.

The current official published version is 2019. 2020 will come out in a month. That’s the only official version. But this would not be acceptable to our users - they need the latest effective version, even if the GPO doesn't consider it to be official CFR content.

Alternative: Match eCFR’s recency and accuracy, by using eCFR data.

Our users already use and trust eCFR are a source of truth for their research.

We can pull daily current content from eCFR and we expect it would work reliably. For example, today's content from eCFR for part 433: https://ecfr.federalregister.gov/api/versioner/v1/full/2020-05-02/title-42.xml?part=433

We expect this would continue to be the same format in the future, even if/when the GPO replaces classic eCFR with beta eCFR.

Outcome (decision)

No interpreting rules from Federal Register. Source info for current: eCFR dailies. Source for historical: use as much detailed historical data as eCFR gives us, then fill in the past as much as can with the annual data from GPO.

Consequences

We need to rewrite the parser.

We need to implement historical timeline in a somewhat different way than the prototypes we tested.

We need to figure out how to add explanations for readers for reading the historical timeline.

Overview

Data

Features

Decisions

User research

Usability studies

Design

Development

Clone this wiki locally