Skip to content

Commit d05cecf

Browse files
authored
Merge pull request #1665 from open-contracting/229-translations
Abandon in-file translations, document full-file translation guidance
2 parents 096e4cf + 37a024a commit d05cecf

File tree

10 files changed

+249
-2238
lines changed

10 files changed

+249
-2238
lines changed

docs/examples/language_localization/language.csv

-2
This file was deleted.

docs/examples/language_localization/language.json

-9
This file was deleted.

docs/guidance/map.md

+3
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,13 @@ Once complete, you can:
5050

5151
Before mapping individual fields and codes, consider whether to first [localize OCDS](map/localization) to your context. Localization can be useful when you need to map several different systems, or when multiple organizations will work on implementing OCDS in your country.
5252

53+
For guidance on how to publish data in multiple languages, see [translations](map/translations).
54+
5355
```{toctree}
5456
:hidden:
5557
5658
map/localization
59+
map/translations
5760
```
5861

5962
## Download the mapping templates

docs/guidance/map/translations.md

+166
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# Translations
2+
3+
If your data sources have separate data elements for different language versions of the same content, you can publish each language as a separate OCDS dataset.
4+
5+
```{admonition} What does "same content" mean?
6+
Two texts have the same content if they describe the same thing: for example, "United Kingdom" in English and "Royaume-Uni" in French. On the other hand, if your contracting process has different contact points for different language speakers, that content is *not* the same. Your OCDS data should therefore contain one contact point for each, using the [Additional Contact Points](https://extensions.open-contracting.org/en/extensions/additionalContactPoint/master/) extension.
7+
```
8+
9+
## What you can translate
10+
11+
You can publish the values of these fields in any language:
12+
13+
% STARTLIST
14+
- `description`, in any location
15+
- `legalName`, in any location
16+
- `name`, in any location
17+
- `rationale`, in any location
18+
- `statusDetails`, in any location
19+
- `title`, in any location
20+
- `parties/address/locality`
21+
- `parties/address/region`
22+
- `parties/address/streetAddress`
23+
- `planning/budget/project`
24+
- `tender/awardCriteriaDetails`
25+
- `tender/exclusionGrounds`
26+
- `tender/procurementMethodDetails`
27+
- `tender/procurementMethodRationale`
28+
- `tender/selectionCriteria`
29+
- `tender/submissionMethodDetails`
30+
% ENDLIST
31+
32+
When publishing OCDS data in different languages, remember to set the `language` field. For example:
33+
34+
`````{tab-set}
35+
36+
````{tab-item} English
37+
```{code-block} json
38+
:emphasize-lines: 8,11
39+
{
40+
"ocid": "ocds-213czf-000-00001",
41+
"id": "1",
42+
"date": "2024-01-01T00:00:00Z",
43+
"tag": [
44+
"tender"
45+
],
46+
"language": "en",
47+
"tender": {
48+
"id": "1",
49+
"title": "Purchase of office supplies"
50+
}
51+
}
52+
```
53+
````
54+
55+
````{tab-item} Spanish
56+
```{code-block} json
57+
:emphasize-lines: 8,11
58+
{
59+
"ocid": "ocds-213czf-000-00001",
60+
"id": "1",
61+
"date": "2024-01-01T00:00:00Z",
62+
"tag": [
63+
"tender"
64+
],
65+
"language": "es",
66+
"tender": {
67+
"id": "1",
68+
"title": "Compra de material de oficina"
69+
}
70+
}
71+
```
72+
````
73+
74+
`````
75+
76+
## What you cannot translate
77+
78+
The names of fields, and the values of all fields not listed above, need to be the same across your translated OCDS datasets, in order to support interoperability, which is the purpose of standardization. These cover:
79+
80+
- **Identifiers**, like `ocid`, `id`, etc.
81+
- **Codes**, like release tags, identifier schemes and milestone codes
82+
- **Formatted values**, like URLs, dates, email addresses, telephone numbers and postal codes
83+
- **Non-text fields**, like numbers and booleans
84+
85+
For example, the name of the `tag` field needs to be "tag", and its value needs to be a list of codes from the [release tag codelist](../../schema/codelists.md#release-tag).
86+
87+
`````{tab-set}
88+
89+
````{tab-item} Valid
90+
```{code-block} json
91+
:emphasize-lines: 5-7
92+
{
93+
"ocid": "ocds-213czf-000-00001",
94+
"id": "1",
95+
"date": "2024-01-01T00:00:00Z",
96+
"tag": [
97+
"tender"
98+
],
99+
"language": "es",
100+
"tender": {
101+
"id": "1",
102+
"title": "Compra de material de oficina"
103+
}
104+
}
105+
```
106+
````
107+
108+
````{tab-item} Invalid (incorrect name)
109+
```{code-block} json
110+
:emphasize-lines: 5
111+
{
112+
"ocid": "ocds-213czf-000-00001",
113+
"id": "1",
114+
"date": "2024-01-01T00:00:00Z",
115+
"etiqueta": [
116+
"tender"
117+
],
118+
"language": "es",
119+
"tender": {
120+
"id": "1",
121+
"title": "Compra de material de oficina"
122+
}
123+
}
124+
```
125+
````
126+
127+
````{tab-item} Invalid (incorrect value)
128+
```{code-block} json
129+
:emphasize-lines: 6
130+
{
131+
"ocid": "ocds-213czf-000-00001",
132+
"id": "1",
133+
"date": "2024-01-01T00:00:00Z",
134+
"tag": [
135+
"licitación"
136+
],
137+
"language": "es",
138+
"tender": {
139+
"id": "1",
140+
"title": "Compra de material de oficina"
141+
}
142+
}
143+
```
144+
````
145+
146+
`````
147+
148+
## Translating headers in spreadsheets
149+
150+
To ease access for non-English speakers, instead of using field *names* as column headers (which are always in English), you can use field *titles*, which are translated in [OCDS translations](localization.md#translating-the-standard).
151+
152+
For example, this CSV excerpt uses field titles from the Spanish translation:
153+
154+
| ID de Entrega | Fecha de entrega |
155+
| ------------- | -------------------- |
156+
| 1 | 2024-01-01T00:00:00Z |
157+
158+
You can use [Flatten Tool](https://flatten-tool.readthedocs.io/en/latest/) to generate files with field titles. For example, this command converts an OCDS release package to XLSX format, using field titles from the release schema:
159+
160+
```bash
161+
flatten-tool flatten release_package.json --output-format=xlsx --use-titles --schema=release-schema.json --root-id=ocid --root-list-path=releases
162+
```
163+
164+
```{note}
165+
Field titles are available in English, Spanish and French. To translate titles to another language, [contact the Data Support Team](mailto:[email protected]).
166+
```

docs/history/changelog.md

+24-14
Original file line numberDiff line numberDiff line change
@@ -293,27 +293,37 @@ Per the [normative and non-normative content and changes policy](../governance/n
293293

294294
* [#1450](https://github.com/open-contracting/standard/pull/1450) Replace a repeated example in schema/merging/ with a link to guidance/build/merging/.
295295

296+
* [#1665](https://github.com/open-contracting/standard/pull/1665) Abandon in-file translations.
297+
296298
### Documentation
297299

298-
* [#1094](https://github.com/open-contracting/standard/pull/1094) Add guidance on populating `Organization.id` for parties without an organization identifier.
299300
* [#1115](https://github.com/open-contracting/standard/pull/1115) Add guidance on when having multiple suppliers per award.
300301
* [#1161](https://github.com/open-contracting/standard/pull/1161) Change recommendation for unknown time component.
301-
* [#1189](https://github.com/open-contracting/standard/pull/1189) Add recommendations about publishing and referencing documents in the document reference section.
302302
* [#1208](https://github.com/open-contracting/standard/pull/1208) Update guidance with new field definitions.
303303
* [#1216](https://github.com/open-contracting/standard/pull/1216) Update definitions of contracting process, record, and ocid. Introduce definition of planning process.
304-
* [#1307](https://github.com/open-contracting/standard/pull/1307) Clarify uniqueness rules for records.
305-
* [#1315](https://github.com/open-contracting/standard/pull/1315) Add rules on setting `id` and `date` for compiled releases to the merging specification.
306-
* [#1344](https://github.com/open-contracting/standard/pull/1344) Add contract suspension worked example.
307-
* [#1375](https://github.com/open-contracting/standard/pull/1375) Update guidance for empty fields in the merging documentation.
308-
* [#1466](https://github.com/open-contracting/standard/pull/1466) Reference worked examples in release and record reference documentation.
309-
* [#1466](https://github.com/open-contracting/standard/pull/1482) Add examples in release reference documentation.
310-
* [#1618](https://github.com/open-contracting/standard/pull/1618) Add conformance rule about normative statements.
311-
* [#1618](https://github.com/open-contracting/standard/pull/1618) Remove validator and application conformance rules.
312304
* [#1618](https://github.com/open-contracting/standard/pull/1618) Move governance policies from Google Docs, updating references for OCDS 1.1.5 and OCDS 1.2.0, and removing references to GitHub issues.
313-
* [#1643](https://github.com/open-contracting/standard/pull/1643) Update identifier section in release reference.
314-
* [#1655](https://github.com/open-contracting/standard/pull/1655) Rewrite identifiers reference and examples for clarity.
315-
* [#1659](https://github.com/open-contracting/standard/pull/1659) Add `Record` definition schema table to record reference.
316-
* [#1664](https://github.com/open-contracting/standard/pull/1664) Recommend linking to alternative representations using `documents`.
305+
* Records
306+
* [#1307](https://github.com/open-contracting/standard/pull/1307) Clarify uniqueness rules for records.
307+
* [#1659](https://github.com/open-contracting/standard/pull/1659) Add `Record` definition schema table to record reference.
308+
* Merging
309+
* [#1315](https://github.com/open-contracting/standard/pull/1315) Add rules on setting `id` and `date` for compiled releases to the merging specification.
310+
* [#1375](https://github.com/open-contracting/standard/pull/1375) Update guidance for empty fields in the merging documentation.
311+
* Identifiers
312+
* [#1094](https://github.com/open-contracting/standard/pull/1094) Add guidance on populating `Organization.id` for parties without an organization identifier.
313+
* [#1643](https://github.com/open-contracting/standard/pull/1643) Update identifier section in release reference.
314+
* [#1655](https://github.com/open-contracting/standard/pull/1655) Rewrite identifiers reference and examples for clarity.
315+
* Documents
316+
* [#1189](https://github.com/open-contracting/standard/pull/1189) Add recommendations about publishing and referencing documents in the document reference section.
317+
* [#1664](https://github.com/open-contracting/standard/pull/1664) Recommend linking to alternative representations using `documents`.
318+
* Conformance
319+
* [#1618](https://github.com/open-contracting/standard/pull/1618) Add conformance rule about normative statements.
320+
* [#1618](https://github.com/open-contracting/standard/pull/1618) Remove validator and application conformance rules.
321+
* Examples
322+
* Add examples:
323+
* [#1344](https://github.com/open-contracting/standard/pull/1344) Contract suspension
324+
* [#1665](https://github.com/open-contracting/standard/pull/1665) Translations
325+
* [#1466](https://github.com/open-contracting/standard/pull/1466) Reference examples in release and record reference documentation.
326+
* [#1466](https://github.com/open-contracting/standard/pull/1482) Add examples in release reference documentation.
317327

318328
## [1.1.5] - 2020-08-20
319329

docs/schema/reference.md

+2-28
Original file line numberDiff line numberDiff line change
@@ -34,34 +34,8 @@ There can be cases where a publisher needs to remove, rather than update, a valu
3434

3535
## Language
3636

37-
Many publishers need to be able to share key data in multiple languages. All free-text title and description fields in the Open Contracting Data Standard can be given in one or more languages.
38-
39-
Language variations are included by a copy of multi-lingual fields, suffixed with a language code.
40-
41-
E.g. `title` and `title_es`
42-
43-
In order to allow users to identify the language used in non-suffixed fields, OCDS release and records should declare the default language in the `language` field.
44-
45-
Languages must be identified using language tags taken from [BCP47](https://tools.ietf.org/html/bcp47). The specification allows BCP47 values in order to accommodate variations in dialect where this is important. However, publishers **should** use the lowercase two-letter [ISO639-1 language tags](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) in the vast majority of circumstances, to avoid users having to distinguish between sub-tag variations (for example, OCDS publishers should use 'en' instead of 'en_US' or 'en_GB').
46-
47-
To include a language variation of a field, the field name must be suffixed with _ and the appropriate language tag. For example: `title_es` for Spanish.
48-
49-
### Worked example
50-
51-
A contract for ‘Software consultancy services’ is published in a release with the default language sent to ‘en’ (the ISO639-1 code for English). The following examples give the description of an item as English, French and Spanish.
52-
53-
**json**
54-
55-
```{jsoninclude} ../examples/language_localization/language.json
56-
:jsonpointer:
57-
:expand: tender,item
58-
```
59-
60-
**csv**
61-
62-
```{csv-table-no-translate}
63-
:header-rows: 1
64-
:file: ../examples/language_localization/language.csv
37+
```{deprecated} 1.2
38+
OCDS 1.1 allowed data to be published in multiple languages by suffixing a language code to a field name: for example, `title` for the default language and `title_es` for Spanish. OCDS 1.2 uses [full-file translations](../guidance/map/translations), instead.
6539
```
6640

6741
## Release structure

manage.py

+54-1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import re
88
import sys
99
import warnings
10+
from collections import defaultdict
1011
from contextlib import contextmanager
1112
from copy import deepcopy
1213
from glob import glob
@@ -22,6 +23,7 @@
2223
from babel.messages.pofile import read_po
2324
from docutils.utils import relative_path
2425
from lxml import etree
26+
from ocdskit.schema import get_schema_fields
2527

2628
basedir = Path(__file__).resolve().parent
2729
schemadir = basedir / 'schema'
@@ -508,16 +510,67 @@ def missing_changelog(ignore_base):
508510
@cli.command()
509511
def pre_commit():
510512
"""
511-
Update derivative schema files.
513+
Update derivative schema files, and generate a CSV file of multilingual fields.
512514
513515
\b
514516
- meta-schema.json
515517
- dereferenced-release-schema.json
516518
- versioned-release-validation-schema.json
517519
"""
520+
nonmultilingual = {
521+
# Identifiers.
522+
'amendsReleaseID', 'id', 'identifier', 'ocid', 'relatedItems', 'releaseID',
523+
# Missing format properties. https://github.com/open-contracting/standard/issues/881
524+
'email',
525+
# Published-defined formats.
526+
'faxNumber', 'postalCode', 'telephone',
527+
# Published-defined codelists.
528+
'code', 'scheme',
529+
}
530+
518531
release_schema = json_load('release-schema.json')
519532
jsonref_release_schema = json_load('release-schema.json', jsonref, merge_props=True)
520533

534+
counts = defaultdict(list)
535+
for field in get_schema_fields(jsonref_release_schema):
536+
name = field.path_components[-1]
537+
# Skip definitions (output dereferenced properties only). Skip deprecated fields.
538+
if field.definition_pointer_components or field.deprecated:
539+
continue
540+
multilingual = (
541+
# If a field can be a non-string, it is not multilingual.
542+
not any(t in field.schema['type'] for t in ('boolean', 'integer', 'number', 'object'))
543+
# If a field's value is constrained to a codelist or format, it is not multilingual.
544+
and not any(prop in field.schema for prop in ('codelist', 'format'))
545+
# If an array can contain non-strings, it is not multilingual.
546+
and not ('array' in field.schema['type'] and 'object' in field.schema['items']['type'])
547+
# Specific exceptions.
548+
and name not in nonmultilingual
549+
)
550+
field.sep = '/'
551+
if name in counts and bool(counts[name]) ^ multilingual:
552+
if multilingual:
553+
raise Exception(f'{name} is multilingual at {field.path}, but not elsewhere')
554+
else:
555+
raise Exception(f'{name} is multilingual at {" & ".join(counts[name])}, but not at {field.path}')
556+
if multilingual:
557+
counts[name].append(field.path)
558+
else:
559+
counts[name] = []
560+
561+
bulletlist = [
562+
'% STARTLIST',
563+
*sorted([f'- `{name}`, in any location' for name, paths in counts.items() if len(paths) > 1]),
564+
*sorted([f'- `{paths[0]}`' for _, paths in counts.items() if len(paths) == 1]),
565+
'% ENDLIST',
566+
]
567+
568+
path = basedir / 'docs' / 'guidance' / 'map' / 'translations.md'
569+
with path.open() as f:
570+
contents = f.read()
571+
with path.open('w') as f:
572+
f.write(re.sub(r'% STARTLIST.+% ENDLIST', '\n'.join(bulletlist), contents, flags=re.DOTALL))
573+
521574
json_dump('meta-schema.json', get_metaschema())
522575
json_dump('dereferenced-release-schema.json', jsonref_release_schema)
523576
json_dump('versioned-release-validation-schema.json', get_versioned_release_schema(release_schema))

0 commit comments

Comments
 (0)