Skip to content

Commit 343ff2b

Browse files
committed
guidance/map/translations: Rewrite with simpler structure
- Clarify that this page is for sources with distinct fields for each language. If a field mixes languages, this page is not relevant. - Clarify that translated text must have the same content across translations. - Instead of a long list of repetitive field names, summarize the patterns.
1 parent c298091 commit 343ff2b

File tree

3 files changed

+105
-126
lines changed

3 files changed

+105
-126
lines changed

docs/_static/i18n.csv

-71
This file was deleted.

docs/guidance/map/translations.md

+63-33
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,41 @@
11
# Translations
22

3-
If you need to publish data in multiple languages, you can publish full-file translations. That is, separate OCDS releases for each language.
3+
If your data sources have separate data elements for different language versions of the same content, you can publish each language as a separate OCDS dataset.
44

5-
You can publish the values of free-text fields – like titles and descriptions – in any language, but you need to ensure that the values of `id` fields are consistent across languages so that users can find the translations of objects. You ought to set the `language` field to the language used in free-text fields.
5+
```{admonition} What does "same content" mean?
6+
Two texts have the same content if they describe the same thing: for example, "United Kingdom" in English and "Royaume-Uni" in French. On the other hand, if your contracting process has different contact points for different language speakers, that content is *not* the same. Your OCDS data should therefore contain one contact point for each, using the [Additional Contact Points](https://extensions.open-contracting.org/en/extensions/additionalContactPoint/master/) extension.
7+
```
68

7-
For example, for data published in English and Spanish:
9+
## What you can translate
10+
11+
You can publish the values of these fields in any language:
12+
13+
% STARTLIST
14+
- `description`, in any location
15+
- `legalName`, in any location
16+
- `name`, in any location
17+
- `rationale`, in any location
18+
- `statusDetails`, in any location
19+
- `title`, in any location
20+
- `parties/address/locality`
21+
- `parties/address/region`
22+
- `parties/address/streetAddress`
23+
- `planning/budget/project`
24+
- `tender/awardCriteriaDetails`
25+
- `tender/exclusionGrounds`
26+
- `tender/procurementMethodDetails`
27+
- `tender/procurementMethodRationale`
28+
- `tender/selectionCriteria`
29+
- `tender/submissionMethodDetails`
30+
% ENDLIST
31+
32+
When publishing OCDS data in different languages, remember to set the `language` field. For example:
833

934
`````{tab-set}
1035
1136
````{tab-item} English
12-
```json
37+
```{code-block} json
38+
:emphasize-lines: 8,11
1339
{
1440
"ocid": "ocds-213czf-000-00001",
1541
"id": "1",
@@ -20,14 +46,15 @@ For example, for data published in English and Spanish:
2046
"language": "en",
2147
"tender": {
2248
"id": "1",
23-
"title": "Purchase of office supplies",
49+
"title": "Purchase of office supplies"
2450
}
2551
}
2652
```
2753
````
2854
2955
````{tab-item} Spanish
30-
```json
56+
```{code-block} json
57+
:emphasize-lines: 8,11
3158
{
3259
"ocid": "ocds-213czf-000-00001",
3360
"id": "1",
@@ -38,20 +65,30 @@ For example, for data published in English and Spanish:
3865
"language": "es",
3966
"tender": {
4067
"id": "1",
41-
"title": "Compra de material de oficina",
68+
"title": "Compra de material de oficina"
4269
}
4370
}
4471
```
4572
````
4673
4774
`````
4875

49-
In order for your data to be interoperable and compatible with OCDS tools and methodologies, you cannot translate field names (keys) or codes from codelists. For example, the name of the `tag` field cannot be translated and its items need to be codes from the [release tag codelist](../../schema/codelists.md#release-tag), like 'tender':
76+
## What you cannot translate
77+
78+
The names of fields, and the values of all fields not listed above, need to be the same across your translated OCDS datasets, in order to support interoperability, which is the purpose of standardization. These cover:
79+
80+
- **Identifiers**, like `ocid`, `id`, etc.
81+
- **Codes**, like release tags, identifier schemes and milestone codes
82+
- **Formatted values**, like URLs, dates, email addresses, telephone numbers and postal codes
83+
- **Non-text fields**, like numbers and booleans
84+
85+
For example, the name of the `tag` field needs to be "tag", and its value needs to be a list of codes from the [release tag codelist](../../schema/codelists.md#release-tag).
5086

5187
`````{tab-set}
5288
53-
````{tab-item} Valid data
54-
```json
89+
````{tab-item} Valid
90+
```{code-block} json
91+
:emphasize-lines: 5-7
5592
{
5693
"ocid": "ocds-213czf-000-00001",
5794
"id": "1",
@@ -62,14 +99,15 @@ In order for your data to be interoperable and compatible with OCDS tools and me
6299
"language": "es",
63100
"tender": {
64101
"id": "1",
65-
"title": "Compra de material de oficina",
102+
"title": "Compra de material de oficina"
66103
}
67104
}
68105
```
69106
````
70107
71-
````{tab-item} Invalid data (translated field name)
72-
```json
108+
````{tab-item} Invalid (incorrect name)
109+
```{code-block} json
110+
:emphasize-lines: 5
73111
{
74112
"ocid": "ocds-213czf-000-00001",
75113
"id": "1",
@@ -80,14 +118,15 @@ In order for your data to be interoperable and compatible with OCDS tools and me
80118
"language": "es",
81119
"tender": {
82120
"id": "1",
83-
"title": "Compra de material de oficina",
121+
"title": "Compra de material de oficina"
84122
}
85123
}
86124
```
87125
````
88126
89-
````{tab-item} Invalid data (translated code)
90-
```json
127+
````{tab-item} Invalid (incorrect value)
128+
```{code-block} json
129+
:emphasize-lines: 6
91130
{
92131
"ocid": "ocds-213czf-000-00001",
93132
"id": "1",
@@ -98,39 +137,30 @@ In order for your data to be interoperable and compatible with OCDS tools and me
98137
"language": "es",
99138
"tender": {
100139
"id": "1",
101-
"title": "Compra de material de oficina",
140+
"title": "Compra de material de oficina"
102141
}
103142
}
104143
```
105144
````
106145
107146
`````
108147

109-
The fields whose values can be translated are listed in the [internationalization lookup table](#internationalization-lookup-table).
148+
## Translating headers in spreadsheets
110149

111-
## Translating headers in spreadsheets/CSVs
150+
To ease access for non-English speakers, instead of using field *names* as column headers (which are always in English), you can use field *titles*, which are translated in [OCDS translations](localization.md#translating-the-standard).
112151

113-
In order to ease access for non-English speakers, instead of using field *names* as column headers (which are always in English), you can use field *titles*.
114-
115-
The titles are currently available in English, Spanish and French. If you would like to translate the titles to your own language, please [contact the OCDS Helpdesk](mailto:[email protected]).
116-
117-
For example, this CSV excerpt uses field titles from the Spanish translation of OCDS:
152+
For example, this CSV excerpt uses field titles from the Spanish translation:
118153

119154
| ID de Entrega | Fecha de entrega |
120155
| ------------- | -------------------- |
121156
| 1 | 2024-01-01T00:00:00Z |
122157

123-
You can use [Flatten Tool](https://flatten-tool.readthedocs.io/en/latest/) to generate files with translated field titles. For example, this command converts an OCDS release package to XLSX format, using field titles from the schema:
158+
You can use [Flatten Tool](https://flatten-tool.readthedocs.io/en/latest/) to generate files with field titles. For example, this command converts an OCDS release package to XLSX format, using field titles from the release schema:
124159

125160
```bash
126-
flatten-tool flatten -s release-schema.json -f xlsx --use-titles --root-id=id --root-list-path=releases release_package.json
161+
flatten-tool flatten release_package.json --output-format=xlsx --use-titles --schema=release-schema.json --root-id=ocid --root-list-path=releases
127162
```
128163

129-
## Internationalization lookup table
130-
131-
Use the following table to check whether a field can be translated. You can download the table as a [CSV spreadsheet](../../_static/i18n.csv).
132-
133-
```{csv-table}
134-
:file: ../../_static/i18n.csv
135-
:header-rows: 1
164+
```{note}
165+
Field titles are available in English, Spanish and French. To translate titles to another language, [contact the Data Support Team](mailto:[email protected]).
136166
```

manage.py

+42-22
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import re
88
import sys
99
import warnings
10+
from collections import defaultdict
1011
from contextlib import contextmanager
1112
from copy import deepcopy
1213
from glob import glob
@@ -509,14 +510,14 @@ def missing_changelog(ignore_base):
509510
@cli.command()
510511
def pre_commit():
511512
"""
512-
Update derivative schema files, and generate a CSV file of translatable fields.
513+
Update derivative schema files, and generate a CSV file of multilingual fields.
513514
514515
\b
515516
- meta-schema.json
516517
- dereferenced-release-schema.json
517518
- versioned-release-validation-schema.json
518519
"""
519-
nontranslatable = {
520+
nonmultilingual = {
520521
# Identifiers.
521522
'amendsReleaseID', 'id', 'identifier', 'ocid', 'relatedItems', 'releaseID',
522523
# Missing format properties. https://github.com/open-contracting/standard/issues/881
@@ -528,28 +529,47 @@ def pre_commit():
528529
}
529530

530531
release_schema = json_load('release-schema.json')
531-
532532
jsonref_release_schema = json_load('release-schema.json', jsonref, merge_props=True)
533533

534-
with (basedir / 'docs' / '_static' / 'i18n.csv').open('w') as f:
535-
writer = csv.writer(f, lineterminator='\n')
536-
writer.writerow(['path', 'title'])
537-
for field in get_schema_fields(jsonref_release_schema):
538-
if (
539-
# Output dereferenced properties, not definitions.
540-
not field.definition_pointer_components
541-
# Skip deprecated fields.
542-
and not field.deprecated
543-
# If a field can be a non-string, it is not translatable.
544-
and not any(t in field.schema['type'] for t in ('boolean', 'integer', 'number', 'object'))
545-
# If a field's value is constrained to a codelist or format, it is not translatable.
546-
and not any(prop in field.schema for prop in ('codelist', 'format'))
547-
# If an array can contain non-strings, it is not translatable.
548-
and not ('array' in field.schema['type'] and 'object' in field.schema['items']['type'])
549-
# Specific exceptions.
550-
and not field.path_components[-1] in nontranslatable
551-
):
552-
writer.writerow([field.path.replace('.', '/'), field.schema['title']])
534+
counts = defaultdict(list)
535+
for field in get_schema_fields(jsonref_release_schema):
536+
name = field.path_components[-1]
537+
# Skip definitions (output dereferenced properties only). Skip deprecated fields.
538+
if field.definition_pointer_components or field.deprecated:
539+
continue
540+
multilingual = (
541+
# If a field can be a non-string, it is not multilingual.
542+
not any(t in field.schema['type'] for t in ('boolean', 'integer', 'number', 'object'))
543+
# If a field's value is constrained to a codelist or format, it is not multilingual.
544+
and not any(prop in field.schema for prop in ('codelist', 'format'))
545+
# If an array can contain non-strings, it is not multilingual.
546+
and not ('array' in field.schema['type'] and 'object' in field.schema['items']['type'])
547+
# Specific exceptions.
548+
and name not in nonmultilingual
549+
)
550+
field.sep = '/'
551+
if name in counts and bool(counts[name]) ^ multilingual:
552+
if multilingual:
553+
raise Exception(f'{name} is multilingual at {field.path}, but not elsewhere')
554+
else:
555+
raise Exception(f'{name} is multilingual at {" & ".join(counts[name])}, but not at {field.path}')
556+
if multilingual:
557+
counts[name].append(field.path)
558+
else:
559+
counts[name] = []
560+
561+
bulletlist = [
562+
'% STARTLIST',
563+
*sorted([f'- `{name}`, in any location' for name, paths in counts.items() if len(paths) > 1]),
564+
*sorted([f'- `{paths[0]}`' for _, paths in counts.items() if len(paths) == 1]),
565+
'% ENDLIST',
566+
]
567+
568+
path = basedir / 'docs' / 'guidance' / 'map' / 'translations.md'
569+
with path.open() as f:
570+
contents = f.read()
571+
with path.open('w') as f:
572+
f.write(re.sub(r'% STARTLIST.+% ENDLIST', '\n'.join(bulletlist), contents, flags=re.DOTALL))
553573

554574
json_dump('meta-schema.json', get_metaschema())
555575
json_dump('dereferenced-release-schema.json', jsonref_release_schema)

0 commit comments

Comments
 (0)