Skip to content

Tasks: Updating Census data

James McKinney edited this page Nov 3, 2017 · 7 revisions

Every five years, we must update the Census data on which Represent relies. The last update was for Census 2016.

First, manually compare the tables for Census division and subdivision types across Census years. Note any new types to be integrated. Then, update the software and spreadsheets as described below.

Software

The following repositories may require updates, in this order:

ocd-division-ids

  • utils.rb: Update census_division_type_names and census_subdivision_type_names
  • ca_census_divisions.rb: Update names
  • ca_census_subdivisions.rb: Update names, name
  • ca_municipal_subdivisions.rb: Update names, posts_count, has_children, type_map, census_subdivisions_on, census_subdivisions_sk
  • ca_provinces_and_territories.rb: Update rows
  • ca_regions.rb: Update rows
  • classes.rb: Update normalize, @name_mappings, @type_patterns

Also, in scripts/country-ca/, grep for \b\d{7}\b|ocd-division/country:ca to find constants.

Regenerate the identifiers.

represent-canada

If a province or territory is added or removed, update key_map in finder/static/js/data.js.

Also, grep for ocd-division/country:ca to find constants.

represent-canada-data

Update country-ca.csv:

curl -O https://raw.githubusercontent.com/opencivicdata/ocd-division-ids/master/identifiers/country-ca.csv
  • tasks.py: In spreadsheet, update the StatCan URLs and the for-loops that follow
  • constants.py: Update municipal_subdivisions
  • boundaries/ca_cd/definition.py: Update URLs and run invoke shapefiles --base=boundaries/ca_cd
  • boundaries/ca_csd/definition.py: Update URLs and run invoke shapefiles --base=boundaries/ca_csd

Also, grep for (?<!'division_id': )'ocd-division/country:ca|[^\n,:-]\b\d{7}\b[^&.<]|[^\n',/]ocd-division/country:ca|[^/]cs?d: to find constants except in division_id keys of definition files, manifest, country-ca.csv, file paths, and data files.

Run:

  • ruby boundaries/ca_qc_districts/sets.rb and its following steps
  • invoke definitions
  • ../represent-canada/manage.py analyzeshapefiles -d . > manifest
  • invoke spreadsheet --base=. --private-base=../represent-canada-private-data

Note: The geographic codes in the following files are validated by the definitions task:

  • boundaries/ca_nb_wards/definition.py
  • boundaries/ca_ns_districts/definition.py
  • boundaries/ca_on_waterloo_wards/definition.py
  • boundaries/ca_qc_districts/definition.py

represent-canada-private-data

From represent-canada-data, run invoke definitions --base=../represent-canada-private-data

Also, grep for (?<!'division_id': )'ocd-division/country:ca|[^\n,:-]\b\d{7}\b[^&.<]|[^\n',/]ocd-division/country:ca|[^/]\bcs?d: to find constants.

Note: The geographic codes in the following files are validated by the definitions task:

  • boundaries/ca_sk_divisions/definition.py

scrapers-ca

Update country-ca.csv:

curl -O https://raw.githubusercontent.com/opencivicdata/ocd-division-ids/master/identifiers/country-ca.csv
  • tasks.py: In get_definition, update the StatCan URLs and the for-loops that follow

Also, grep for (?<!division_id = )'ocd-division/country:ca|[^:]\b\d{7}\b|[^\n',]ocd-division/country:ca to find constants except in division_id variables of __init__.py files.

Run invoke tidy

scrapers_ca_app

  • reports/management/commands/status.py: Update the StatCan URLs and the for-loops that follow

Also, in reports/, grep for [^:]\b\d{7}\b|ocd-division/country:ca to find constants.

Run heroku run pupa dbinit ca

Spreadsheets

The following spreadsheets store Census codes, names and populations:

Boundaries data request progress is validated by the spreadsheet task in represent-canada-data. To validate and update the others, from scrapers-ca, run:

invoke validate_spreadsheet 'https://docs.google.com/spreadsheets/d/1AmLQD2KwSpz3B4eStLUPmUQJmOOjRLI3ZUZSD5xUTWM/pub?gid=0&single=true&output=csv' Code Name
invoke validate_spreadsheet 'https://docs.google.com/spreadsheets/d/1AmLQD2KwSpz3B4eStLUPmUQJmOOjRLI3ZUZSD5xUTWM/pub?gid=743638453&single=true&output=csv' Code Name
invoke validate_spreadsheet 'https://docs.google.com/spreadsheets/d/11qUKd5bHeG5KIzXYERtVgs3hKcd9yuZlt-tCTLBFRpI/pub?gid=0&single=true&output=csv' Identifier Name
invoke validate_spreadsheet 'https://docs.google.com/spreadsheets/d/11qUKd5bHeG5KIzXYERtVgs3hKcd9yuZlt-tCTLBFRpI/pub?gid=1&single=true&output=csv' Identifier Name
invoke validate_spreadsheet 'https://docs.google.com/spreadsheets/d/11qUKd5bHeG5KIzXYERtVgs3hKcd9yuZlt-tCTLBFRpI/pub?gid=2&single=true&output=csv' Identifier Name

You will need to update the populations in Data catalog contact information and Boundaries data request progress.