Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a language field, provide guidance on how to publish in a language other than English #348

Closed
duncandewhurst opened this issue Jan 16, 2022 · 10 comments · Fixed by #362
Assignees
Labels
documentation This issue relates to the documentation schema This issue relates to the schema
Milestone

Comments

@duncandewhurst
Copy link
Contributor

OC4IDS lacks a language field to declare the language of free-text fields and doesn't provide any guidance on language or translation.

Unlike OCDS, in-file translation isn't supported (patternProperties were removed in #52).

Proposal

  1. Add a language field, using the same description and codelist from OCDS.

  2. Add guidance on language and translation to cover:

  • How to publish in your own language (i.e. set language, publish free-text fields in your own language, don't translate codes or field names)
  • How to publish translations/multiple languages (as above, but publish separate files for each language)
@duncandewhurst duncandewhurst added schema This issue relates to the schema documentation This issue relates to the documentation labels Jan 16, 2022
@duncandewhurst
Copy link
Contributor Author

duncandewhurst commented Jan 16, 2022

Sharing some guidance I prepared for CoST Thailand:


You can publish the value of free-text fields (e.g. title, description, parties/name etc.) in your own language.

In order for your data to be interoperable and compatible with OC4IDS tools and methodologies, you cannot:

  • Translate codes from OC4IDS's codelists, e.g. the type field uses values from the ProjectType codelist, like 'construction', which cannot be translated.
  • Translate field names (keys), e.g. you cannot translate title to ชื่อ.

The following JSON snippet is valid OC4IDS data. title can be published in Thai because it is a free-text field:

{
  "id": "1",
  "title": "ตัวอย่างโครงการ",
  "type": "construction"
}

The following JSON snippet is not valid because "การก่อสร้าง" is not a valid code from the ProjectType codelist:

{
  "id": "1",
  "title": "ตัวอย่างโครงการ",
  "type": "การก่อสร้าง"
}

The following JSON snippet is not valid because "ชื่อ" and "พิมพ์" are not valid field names in OC4IDS:

{
  "id": "1",
  "ชื่อ": "ตัวอย่างโครงการ",
  "พิมพ์": "construction"
}

In order to ease access for non-English speakers, you can publish a spreadsheet or CSV file with field and code titles from an OC4IDS translation. Currently, OC4IDS is available in English and Spanish. If you would like to translate the schema to your own language, please contact the OC4IDS Helpdesk.

The following CSV except uses field and code titles from the Spanish translation of OC4IDS:

Identificador o Referencia Título del Proyecto Tipo de proyecto
1 proyecto de ejemplo Construcción

You can use Flatten Tool to generate a spreadsheet or CSV file with translated field titles; for example, the following command converts the example OC4IDS JSON file to xlsx format using field titles from the Spanish schema:

flatten-tool flatten -s https://standard.open-contracting.org/infrastructure/0.9/es/_downloads/f53c05d8f3cfd5c65a3b33cdf80c5079/project-schema.json -f xlsx --use-titles --root-id=id --root-list-path=projects example.json

@duncandewhurst
Copy link
Contributor Author

CoST Thailand requested a list of which fields can and cannot be translated. I've annotated a copy of the flattened schema with that information. We could consider including this in the documentation and/or adding these columns to the output of OCDS Kit's mapping-sheet command.

@duncandewhurst duncandewhurst added this to the 0.9.4 milestone Jul 13, 2022
@duncandewhurst
Copy link
Contributor Author

Adding to 0.9.4 milestone since the schema change is simple, although adding the guidance can be worked on as an iterative improvement in the meantime.

We could consider including this in the documentation and/or adding these columns to the output of OCDS Kit's mapping-sheet command.

Another option is to add a property to the schema, which could be done manually, or using a pre-commit script that applies the logic used in the spreadsheet linked in the previous comment.

@duncandewhurst
Copy link
Contributor Author

@jpmckinney are you happy with the proposal in this issue and do you have an opinion on how best to indicate which fields can and cannot be translated?

@jpmckinney
Copy link
Member

The proposal in the issue description looks good, and the following comment looks like a good draft of a guidance page.

As far as I know, any field that does not have a codelist and has a type of string is translatable. If that's the case, then no additional property needs to be added the schema. That said, you can have a pre-commit script calculate a CSV from the schema, and display that CSV as a table in the docs using a Sphinx directive – if you think that'll help readers.

@duncandewhurst
Copy link
Contributor Author

As far as I know, any field that does not have a codelist and has a type of string is translatable. If that's the case, then no additional property needs to be added the schema. That said, you can have a pre-commit script calculate a CSV from the schema, and display that CSV as a table in the docs using a Sphinx directive – if you think that'll help readers.

There are some exceptions to that rule, which can be included a pre-commit script:

  • fields with a format of date-time
  • fields with a format of uri (iris were not introduced until JSON schema draft 07)
  • id, contractingProcesses/id and contractingProcesses/summary/ocid - The identifier prefix must be in Latin script (e.g. oc4ids-bu3kcz). The remainder of the identifier can be translated.
  • forecasts/observations/measure and metrics/observations/measure - Numeric measures cannot be translated. String measures can be translated.

I think it's easier for readers to check if a field is translatable by looking it up in a table than by applying a set of rules so I'll add the script as suggested.

@duncandewhurst duncandewhurst self-assigned this Aug 3, 2022
@jpmckinney
Copy link
Member

jpmckinney commented Aug 3, 2022

Sounds good. Looks like it can be determined algorithmically at least.

Edit: I'm not sure that id fields should be translatable. It'll make it difficult to find the translation of an object.

@duncandewhurst
Copy link
Contributor Author

Edit: I'm not sure that id fields should be translatable. It'll make it difficult to find the translation of an object.

Good point. The question from CoST Thailand that prompted this issue originally was about publishing in Thai, rather than translating an English publication. I've perhaps muddled the issue by using 'translate' as a synonym for 'publish in your own language' so I'll make sure to be clear about that in the guidance and the script output.

@jpmckinney
Copy link
Member

Aha, right, so any field value that can contain non-English text can also be translated, except for IDs. (I'm just checking whether we're on the same page – the guidance can be more clearly written than that.)

@duncandewhurst
Copy link
Contributor Author

Yep, that sounds right to me.

@duncandewhurst duncandewhurst moved this to Todo in OC4IDS 0.9.4 Aug 15, 2022
@duncandewhurst duncandewhurst moved this from Todo to In Progress in OC4IDS 0.9.4 Aug 15, 2022
Repository owner moved this from In Progress to Done in OC4IDS 0.9.4 Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This issue relates to the documentation schema This issue relates to the schema
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants