Created: | 2014-06-06 |
---|---|
Author: | James Turk |
Status: | Accepted |
Definition and procedures around defining Open Civic Data Divisions and their identifiers.
- Division
A political geography such as a state, county, or congressional district, which may have multiple boundaries over its lifetime. Types of divisions include, among others:
- Governmental jurisdiction - A division that a government has jurisdiction over. (e.g. North Carolina)
- Political district - A division that elects a representative to a legislature. (e.g. North Carolina Congressional District 4)
- Service zone - An area to which a government provides a service. (e.g. Washington DC Police District 105)
- Boundary
- A geographical boundary, defined by a shapefile or a sequence of address ranges. (e.g. NC Congressional District 10 for the 113th Congress)
This document describes an identifier scheme for assigning globally unique identifiers to divisions. It does not intend to describe any scheme for boundaries.
Divisions can be seen as the smallest building block in the Open Civic Data ecosystem, Jurisdictions and Organizations will exist within a Division and People are elected to represent a Division. As such, providing unique identifiers enables collaboration across groups dealing with any of these types.
This proposal in fact predates the formal proposal process by a full year, originally part of the ocd-division-ids repository, the ids are already in use by Sunlight, Google, Granicus, Open North, Open Elections, and several other projects. (This document simply exists to formalize what was already decided.)
Identifiers are in the format ocd-division/country:<country_code>(/<type>:<type_id>)*
- country_code
- An ISO-3166-1 alpha-2 code.
- type
The type of boundary. (e.g. country, state, town, city, cd, sldl, sldu)
- Valid characters are lowercase UTF-8 letters, hyphen (-), and underscore (_).
- Use existing types where possible.
- type_id
An identifier that is locally unique to its scope.
- Valid characters are lowercase UTF-8 letters, numerals (0-9), period (.), hyphen (-), underscore (_), and tilde (~). These characters match the unreserved characters in a URI RFC 3986 section 2.3.
- Characters must be converted to UTF-8.
- Uppercase characters must be converted to lowercase.
- Spaces must be converted to underscores.
- All invalid characters must be converted to tildes (~).
- Leading zeros should be removed unless doing so changes the meaning of the identifier.
- If possible, all divisions of the same type should be defined at the same time; for example, all state divisions should be defined at once. Similarly, all cities in North Carolina should be defined at once, to avoid adopting a scheme that produces collisions.
- When selecting a
type_id
, preference should be given to existing, common identifiers, like postal abbreviations for US states. Numeric identifiers (such as US county FIPS codes) should be avoided if textual names are clear and unambiguous; however, numeric identifiers may be appended to disambiguate atype_id
. - The set of types within each country should not grow unnecessarily. Each country maintainer should publish a list of types for easy reference. The addition of a new type must be justified.
- For example: In the US, there are no clear-cut differences between cities, towns, villages, etc. Therefore, the Census-recommended term
place
is used as the type of cities, etc.
- For example: In the US, there are no clear-cut differences between cities, towns, villages, etc. Therefore, the Census-recommended term
The identifiers
directory contains CSV files assigning all OCD identifiers:
- A single CSV file per country, in the format
country-<country_code>.csv
.- The URLs of these files are stable.
- An optional directory per country, in the format
country-<country_code>
:- A file hierarchy, in which CSV files describe parts of the top-level country CSV file.
- The URLs of these files are not stable.
The corrections
directory contains CSV files that map incorrect OCD identifiers to correct OCD identifiers. Common errors include missing diacritics, differences in hyphenation and word order, use of Roman numerals, etc.
If a CSV file has no header row, the CSV is assumed to have two columns with the headers id and name.
If a CSV file has a header row, the first column name must be id.
Column names with special meaning are:
- name
- The name of the division.
- sameAs
- An OCD identifier which identifies the same division as this identifier. The row corresponding to the identifier in this column must have a blank value in its sameAs column, i.e. there must be no daisy-chaining or circular references.
- sameAsNote
- A note describing how or why the division has multiple identifiers.
- validThrough
- The date on which the division is no longer valid, in the format
YYYY
,YYYY-MM
orYYYY-MM-DD
. A division may become invalid if, for example, a political district is abolished. - validFrom
- The date on which a division becomes valid, in the format
YYYY
,YYYY-MM
orYYYY-MM-DD
. A division may become valid if, for example, a political district is created.
- There are no restrictions on other columns.
- An effort should be made to use descriptive CSV filenames.
A correction CSV file must contain:
- incorrectId
- An incorrect OCD identifier, i.e. an OCD identifier that was never valid.
- id
- The corrected OCD identifier.
- note
- Free-text describing the error, e.g. "missing diacritics".
- All OCD identifiers are first-class. However, if it is necessary for a system for choose a "primary" or "preferred" identifier for a division, it should use those identifiers with an empty
sameAs
column. - The
sameAs
relationship is symmetric and transitive. ThesameAs
relationship is not true for all time; it is only true in the present.
This project has an informal governance structure, led by the project's early contributors and informed by the Open Civic Data Google Group. Responsibility for a country's identifiers may be assigned to country-specific organizations.
- United States
- ocd-division/country:us
- North Carolina
- ocd-division/country:us/state:nc
- North Carolina 2nd Congressional District
- ocd-division/country:us/state:nc/cd:2
- North Carolina State Lower Legislative District 1
- ocd-division/country:us/state:nc/sldl:1
- Wake County, North Carolina
- ocd-division/country:us/state:nc/county:wake
- Cary, North Carolina (note that despite being within Wake County this is not indicated due to not being an identifying feature)
- ocd-division/country:us/state:nc/place:cary
- Kildaire Farms Homeowners Association, Cary, North Carolina
- ocd-division/country:us/state:nc/place:cary/hoa:kildaire_farms
- Washington DC, Ward 8
- ocd-division/country:us/district:dc/ward:8
- Washington DC, ANC 4A
- ocd-division/country:us/district:dc/anc:4a
- Washington DC, ANC 4A, section 08 (note: this is a strict subset of the ANC for purposes of representation)
- ocd-division/country:us/district:dc/anc:4a/section:8
- New York City, City Council District 36 (happens to be in Brooklyn- but not significant to include in id)
- ocd-division/country:us/state:ny/place:new_york/council_district:36
- Canadian Federal Electoral District 13004 aka Fundy Royal (known as Royal from 1914-1966, Fundy-Royal from 1966-2003, and Fundy from 2003-2004- hence the use of a numeric identifier assigned by the government)
- ocd-division/country:ca/ed:13004
This document has been placed in the public domain per the Creative Commons CC0 1.0 Universal license (http://creativecommons.org/publicdomain/zero/1.0/deed).