Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add country field to Address. Deprecate countryName. #1372

Merged
merged 7 commits into from
Aug 10, 2021

Conversation

duncandewhurst
Copy link
Contributor

@duncandewhurst duncandewhurst commented Aug 3, 2021

Closes #524

Once merged:

@duncandewhurst duncandewhurst added this to the 1.2.0 milestone Aug 3, 2021
@duncandewhurst
Copy link
Contributor Author

@jpmckinney @yolile a couple of questions to resolve before this PR is ready for review:

  1. Currently, this PR deprecates Address.countryName based on the justification in Add Address.country field (with ISO3166-1 alpha-2 codelist) #524 (comment). On reflection, I have two concerns about this:
    1. I anticipate some publishers in lower-tech contexts will collect country as a free-text field, so we will have helpdesk requests about where to put this data in OCDS.
    2. ISO3166-1 is not freely/readily available, so we may be creating a barrier to data use by removing the human-readable country name from the standard.
  2. I've integrated the script from https://github.com/open-contracting-extensions/ocds_countryCode_extension/tree/master/script into manage.py. Is https://ocds-standard-development-handbook.readthedocs.io/en/latest/standard/technical/deployment.html#perform-periodic-updates the right place to put the instructions for downloading ISO3166 in machine-readable form?

@yolile
Copy link
Member

yolile commented Aug 3, 2021

I anticipate some publishers in lower-tech contexts will collect country as a free-text field, so we will have helpdesk requests about where to put this data in OCDS.

Hmm, true, but, maybe, before publishing OCDS data (eg as part of the OCDS transformation process), they can map the names into their ISO codes?

ISO3166-1 is not freely/readily available, so we may be creating a barrier to data use by removing the human-readable country name from the standard.

Also true, but as I understand, we will publish the CSV with the codes and names, right? And I guess it also depends on the use case. For example, if the use case is to show the data in a website/mobile app etc, the developers can do the map and show the human-readable in the website using our CSV as the base, and as mentioned in this #524 (comment), having the codes is better for that case. Then, another use case, let say that you want to analyze the suppliers by country or similar, it is easier to do that if the names/codes are standardized already and I think it doesn't matter too much if the name is a human-readable name or a code (same if you are planning to use the data for an AI project, etc)

I've integrated the script from https://github.com/open-contracting-extensions/ocds_countryCode_extension/tree/master/script into manage.py. Is https://ocds-standard-development-handbook.readthedocs.io/en/latest/standard/technical/deployment.html#perform-periodic-updates the right place to put the instructions for downloading ISO3166 in machine-readable form?

Yes, I think there is a good place to put it

@duncandewhurst
Copy link
Contributor Author

I anticipate some publishers in lower-tech contexts will collect country as a free-text field, so we will have helpdesk requests about where to put this data in OCDS.

Hmm, true, but, maybe, before publishing OCDS data (eg as part of the OCDS transformation process), they can map the names into their ISO codes?

Ideally, yes. But, if the values are free-text (i.e. users type the country names rather than choosing them from a list) the mapping would only cover historic data - users might enter new acronyms, abbreviations or misspellings after OCDS is implemented.

ISO3166-1 is not freely/readily available, so we may be creating a barrier to data use by removing the human-readable country name from the standard.

Also true, but as I understand, we will publish the CSV with the codes and names, right? And I guess it also depends on the use case. For example, if the use case is to show the data in a website/mobile app etc, the developers can do the map and show the human-readable in the website using our CSV as the base, and as mentioned in this #524 (comment), having the codes is better for that case. Then, another use case, let say that you want to analyze the suppliers by country or similar, it is easier to do that if the names/codes are standardized already and I think it doesn't matter too much if the name is a human-readable name or a code (same if you are planning to use the data for an AI project, etc)

Good points, I forgot we were publishing the names ourselves. I agree that, depending on the application, codes might be better - I wasn't suggested to not add the code field. I also agree that it's not much of a barrier for more capable users to download the CSV and look up the human-readable names. However, from a spreadsheet user / low capacity perspective, users are likely to want the names in the results of their analysis and importing and joining the CSV might be a barrier.

I've integrated the script from https://github.com/open-contracting-extensions/ocds_countryCode_extension/tree/master/script into manage.py. Is https://ocds-standard-development-handbook.readthedocs.io/en/latest/standard/technical/deployment.html#perform-periodic-updates the right place to put the instructions for downloading ISO3166 in machine-readable form?

Yes, I think there is a good place to put it

Great. I'll do that once this PR is merged.

Copy link
Member

@jpmckinney jpmckinney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding whether to have two fields or one: This is similar to the discussion in #1295 about valueEntered. When data is known to have quality issues, it sometimes makes sense to segment the "good" data from the "possibly bad" data.

For fields like numerical values, this is important, because major use cases become more difficult if the quality is mixed.

For address data, if the quality is bad, you just don't know the address and can't geolocate it. So, no use cases are really harmed by putting non-ISO codes (like full country names) in the country field. Whether the bad data is segmented or not, your use case is frustrated.

That said, if a system has only non-standardized country names, then putting their bad data in the only available field will cause lots of structural errors. (But still, the deprecated field can instead be used, which yields a deprecation warning.)

The choice really depends on the prevalence of systems that have free-text data entry for country names. We can perhaps deprecate it, and consult on whether to deprecate it or not. I added it to my note here: https://crm.open-contracting.org/issues/6532#note-14

manage.py Outdated Show resolved Hide resolved
manage.py Outdated Show resolved Hide resolved
manage.py Outdated Show resolved Hide resolved
docs/schema/codelists.md Outdated Show resolved Hide resolved
manage.py Outdated Show resolved Hide resolved
manage.py Outdated Show resolved Hide resolved
schema/release-schema.json Outdated Show resolved Hide resolved
schema/release-schema.json Show resolved Hide resolved
schema/release-schema.json Outdated Show resolved Hide resolved
duncandewhurst and others added 3 commits August 10, 2021 16:14
update: don't invoke update_country. update_country: use file argument 
and schemadir.
@duncandewhurst duncandewhurst marked this pull request as ready for review August 10, 2021 04:35
Copy link
Member

@jpmckinney jpmckinney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a couple commits.

The instructions for update-country are now part of the --help message, so the handbook can just remind the reader to follow the instructions in ./manage.py update-country --help

@duncandewhurst
Copy link
Contributor Author

PR for handbook update: open-contracting/standard-development-handbook#247

@duncandewhurst duncandewhurst merged commit 4915837 into 1.2-dev Aug 10, 2021
@duncandewhurst duncandewhurst deleted the 524-add-country-code branch August 10, 2021 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants