Abandon in-file translations, document full-file translation guidance #229

timgdavies · 2015-08-25T08:52:40Z

The current schema uses an approach to indicate language based on suffixes to property names.

{
    "language": "en",
    "tender": {
        "item": {
                "description":"Software consultancy services",
                "description_es":"Servicios de consultoria en software",
                "description_fr":"Services de conseil en logiciels"
        }
    }
}

This was discussed in #21 during the beta, and was chosen on the understanding that we wanted to avoid depth where we could (for easier flat renderings of the the data). However, on reflection, the approach we have adopted may not have been the most appropriate.

In particular, @elf-pavlik has pointed out here in #40 that JSON-LD uses language maps, and it would be much easier to create a JSON LD rendering of our data this way.

{
    "@context": { "@language": "en" },
    "tender": {
        "item": {
              "description": {
                "en" :"Software consultancy services",
                "es": "Servicios de consultoria en software",
                "fr": "Services de conseil en logiciels"
        }
    }
}

Our flattening approach would also just render this to 'description/en', 'description/es' and so-on, which seems fairly intuitive, and our earlier fears about over-use of objects does not appear a major one.

However, to change to language maps would potentially be a backwards incompatible update.

It might make sense to look at doing it early: but would definitely need wide discussion.

akuckartz · 2015-08-27T03:33:53Z

👍

Bjwebb · 2015-09-14T14:03:52Z

In particular, @elf-pavlik has pointed out here in #40 that JSON-LD uses language maps, and it would be much easier to create a JSON LD rendering of our data this way.

Looks to me like JSON-LD can also handle the current OCDS structure - #40 (comment)

However, to change to language maps would potentially be a backwards incompatible update.

This would almost certainly break backwards compatibility, at least for data users. Is there a way of changing this that wouldn't?

timgdavies · 2016-05-20T12:49:25Z

We can't currently see a way of changing language which does not involve a backwards-incompatible change, so this would have to go on the stack for a 2.0 update right now.

timgdavies · 2016-09-18T14:02:34Z

After further team discussions, we agreed that this would not likely be a desirable change, even for 2.0, so closing.

akuckartz · 2016-09-18T14:26:14Z

we agreed that this would not likely be a desirable change, even for 2.0, so closing

Can the reasons be summarized ?

timgdavies · 2016-09-18T14:30:17Z

I've unfortunately lost my full notes of the discussion, so @Bjwebb and others may be able to add, but from memory:

The backwards incompatibility is a big concern to many of our publishers;
We want to avoid fields that can have two types (string or object), so we would need to switch to a model in which there is no 'default language' (i.e. all strings are always a language map object) which makes things more difficult for data users;
We can still map to JSON LD using our current approach;

kindly · 2016-11-10T17:25:53Z

Reopening this issue as nobody seems too happy with the current approach.

My suggestion would be.

On a core schema have a translation definition, keeping the properties field blank for now. It would look like this.

"definitions": {
   "Translation": {
         "type": "object",
         "properties": {}
    }
}

For any field that required different language versions of it you would add a ref to that object i.e

{
"description_translation": {"#ref": "#/definitions/Translation"},
"title_translation": {"#ref": "#/definitions/Translation"}
}

You would need this for every field that needed translating and would replace the pattern properties used currently.
Extensions would do the same thing, if they needed a field translated they would just add their field like the above.

There would need to be a repo with extensions for every language. These extensions would place a lang code into the Translation definition.

They would look like (say for french) the release-schema.json.

"definitions": {
   "Translation": {
         "properties": {
             "fr": {
                  "type": "string"
               }
        }
    }
}

Doing it this way means that if you apply the language extensions it will work for all other extensions (i.e they are composable and do not require special ordering of patches applying the schema) as long as they define translatable fields the same way as above.

Downsides for this approach are:

Not entirely backward compatible, but better then replacing all text fields with an object. It would be very easy to convert to the new format using an upgrade tool though.
Do not like the labelling "translations", any suggestions as to what else to call them?
Cases like Canada where 2 languages are considered equal still not very well supported. The main fields (the ones being translated) are still to be kept mandatory i.e description will be mandatory.
It is also not easy to make the translated fields required as required fields are defined as a list.

timgdavies · 2016-11-12T15:40:35Z

I think downside 3 (multiple equal languages) will be a particular blocker to this issue.

This does make me wonder whether we should take an entirely different track - and say that OCDS does not support in-file translations, and that translations should be provided in a separate file for each language - served up via content negotiation or alternative URI structure....

mpostelnicu · 2016-11-13T13:59:17Z

Hey,

Just an idea of handling this using oneOf draft 04 schema keyword http://json-schema.org/example2.html.

It is fully backwards compatible with our (old) non translated schema. I've made an example of the Item element below:

        "Item": {
            "type": "object",
            "description": "A good, service, or work to be contracted.",
            "required": ["id"],
            "properties": {
                "id": {
                    "description": "A local identifier to reference and merge the items by. Must be unique within a given array of items.",
                    "type": ["string", "integer"],
                    "mergeStrategy": "overwrite"
                }
                ,
                "description": {
                    "description": "A description of the goods, services to be provided.",
                    "mergeStrategy": "ocdsVersion",
                    "type": ["array","string"],
                    "oneOf" : [
                        {                        
                        "type":"string"
                        },
                        {
                        "description": "An array of translated strings, for multilingual support",
                        "type": "array",
                        "minItems":1,
                        "items": {
                               "$ref" : "#/definitions/TranslatedString" }                                                        
                        }                        
                    ]
                    }               
            }
        }

......

"definitions": {
            "TranslatedString": {
            "type": "object",
            "properties": {
                "languageCode": {
                    "description": "ISO 639 language code",
                    "type": "string",
                    "minLength": 2,
                    "maxLength": 2
                },
                "value": {
                    "description": "The value of the string in the specified language",
                    "type": "string"
                }
            }
        }
}

Using this schema change, you could validate both this:

{
"id":"1",
"date":"2016-09-22T06:38:12Z",
"tag": [
"planning"
],
"initiationType": "tender",
"ocid": "ocds-11",
"tender": {
"id": "ocds-11",
"status": "active",
"items": [
{
"id": "213966",
"description": [
    { 
        "languageCode" : "ro",
        "value" : "Descriere"
    } , 

    { "languageCode": "en",
    "value" : "Some Description"
    }

    ],
"classification": {
"id": "5",
"description": "Five"
}
}
]
}
}

and this (the old format)

{
"id":"1",
"date":"2016-09-22T06:38:12Z",
"tag": [
"planning"
],
"initiationType": "tender",
"ocid": "ocds-11",
"tender": {
"id": "ocds-11",
"status": "active",
"items": [
{
"id": "213966",
"description": "Description",
"classification": {
"id": "5",
"description": "Five"
}
}
]
}
}

I tested all the json snippets above and they are valid (against both old and this new schema).
Hope it is helpful

timgdavies · 2016-11-14T09:23:05Z

Thanks @mpostelnicu

I can see this works well from a schema perspective - but my fear is that it is difficult for users to deal with - if they have to anticipate either a string or an object.

I guess this can be handled by:

Providing technically skilled users with guidance on the need to wrap any requests for string values in some sort of helper function which will handle language maps;
Providing more basic users with some conversion tooling which will simplify a file into a single-language version, or otherwise make it easy to work with language maps

However, your proposal is particularly interesting with respect to backwards compatibility: data valid against 1.0 would still be valid against 1.1 if it included language maps in this way.

The pattern properties (title_es etc) could be deprecated, but not removed until 2.0.

kindly · 2016-11-22T16:09:22Z

Thank you @mpostelnicu

I think using the oneOf is the only way to get wherer we want, even though it would be a pain for some data consumers as @timgdavies said. We also already use it in the record package schema.

However, I would not have the language as a list as it causes issues in making a flattened representation of it and limits validation options (i.e it is hard to validate if people have put the same language code twice)

I would suggest doing it like:

"definitions": {
   "multilingualString": {
      "oneOf": [{"type" : "string"},
                     {"type": "object", properties: {"en": {type: "string"}... for all lang codes or use patternProperties}
    }
}

So for any translatable fields would look like

{
"description": {"#ref": "#/definitions/multilingualString"},
"title": {"#ref": "#/definitions/multilingualString"}
}

Use cases like Canada patch the schema (in one place) to make en and fr required and also limit the properties to just them (so any lang code outside would look like an extra field).

timgdavies · 2017-01-27T16:06:33Z

Discussed with @kindly

We will work this up and put this forward to peer reviewers for 1.1, for a view on accepting or pushing to 2.0 (as it creates some backwards compatibility issues)

kindly · 2017-02-09T12:55:44Z

A first attempt at a patch for this can be found here.

https://github.com/open-contracting/ocds_upgrade_1_1_patches/blob/master/229_language_map/release-schema.json

At the moment this uses patternProperties for what can be in the Language Map but we could rely on extension to add particular languages to the map instead or as well. Having the pattern properties lowers initial barrier to using the language map but is less explicit.

More importantly, this patch breaks backwards compatibility and for that reason needs a lot of consideration for inclusion into a non major release.

timgdavies · 2017-06-15T09:55:16Z

Moving to 2.0.

This was postponed beyond 1.1. Whilst this change is considered a good move long-run, it involves a backwards incompatible change, and would require substantial refactoring of documentation generation tools and other resources.

duncandewhurst · 2020-06-22T02:35:33Z

Jordan expressed a preference for translating entire releases, rather than in-file translations of individual fields.

jpmckinney · 2023-06-07T04:32:47Z

I checked using the aggregated data at https://ocdsdata.fra1.digitaloceanspaces.com/metadata/stats.json. It's missing changes to spiders since a year ago, but in short the main users of _xx suffixes are (I don't match the full pattern, because that pattern matches almost any non-translation field):

armenia (lapsed)
canada_buyandsell (pilot)

Then there are a few fields for:

honduras_portal_api_records 2023-06-06 classifications_ga
honduras_portal_api_records 2023-06-06 classifications_ue
honduras_portal_api_releases 2023-05-30 classifications_ga
honduras_portal_api_releases 2023-05-30 classifications_ue
honduras_portal_bulk 2023-06-04 classifications_ga
honduras_portal_bulk 2023-06-04 classifications_ue
kyrgyzstan 2023-01-06 identifier_legalName_kg

So, I think it's okay to abandon in-file translations as a standardized method. We can recommend full-file translations for publishers like Canada, and for the few fields above, there can be local extensions.

What do you think, @yolile ?

jpmckinney · 2023-06-07T18:10:25Z

Noting that we can add some guidance relating to this (#1064).

yolile · 2023-06-07T18:35:16Z

The Honduras ones are more like a typo rather than the actual use of the in-file translations. So I think it is fine to abandon them.

duncandewhurst · 2023-06-18T23:53:04Z

Abandoning in-file translations sounds good to me. OC4IDS 0.9.4 adds some guidance that might be relevant: https://standard.open-contracting.org/staging/infrastructure/0.9-dev/en/guidance/language/#publishing-data-in-your-own-language

Clarifying what is required to close this issue:

Identify and remove mechanisms for in-file translations in the schema
Identify and remove normative and non-normative documentation about in-file translations
Add non-normative documentation about publishing full-file translations.

Sound good?

jpmckinney · 2023-06-19T17:58:03Z

Sounds good! (1) is mostly removing the patternProperties fields.

duncandewhurst · 2023-11-28T01:57:53Z

@jpmckinney for (3), would reframing the OC4IDS guidance on publishing data in your own language as a worked example on translations be sufficient?

jpmckinney · 2023-11-28T22:55:08Z

Yes, that would be a good start. I think for the lookup we can just have a list of translatable fields (as it is otherwise very long).

duncandewhurst · 2023-11-29T22:21:11Z

Sounds good! (1) is mostly removing the patternProperties fields.

I'm assuming that we actually want to deprecate the patternProperties fields. Let me know if not.

jpmckinney · 2023-11-29T23:29:21Z

lib-cove doesn't check patternProperties in _get_schema_deprecated_paths. I think better to just remove, as it will solve a bug related to patternProperties (open-contracting/lib-cove-ocds#73). Users will just get warnings about additional fields instead of warnings about deprecated fields – it's about equivalent.

duncandewhurst · 2023-11-29T23:34:36Z

Sounds good! Shall I remove or deprecate the language section of the schema reference page?

jpmckinney · 2023-11-29T23:39:06Z

I think just delete, since that content can't be followed if patternProperties is removed. Implementers of 1.1 can navigate to the docs for 1.1.

timgdavies added the 1 - Discussion label May 20, 2016

timgdavies added this to the Version 2.0 milestone Jul 25, 2016

timgdavies assigned kindly and timgdavies Aug 3, 2016

timgdavies closed this as completed Sep 18, 2016

kindly reopened this Nov 10, 2016

timgdavies modified the milestones: Version 1.1, Version 2.0 Jan 27, 2017

timgdavies added 3 - Ready and removed 1 - Discussion labels Feb 8, 2017

duncandewhurst mentioned this issue Apr 28, 2017

Adding support for language suffixes to address fields #451

Closed

timgdavies removed this from the Version 1.1 milestone Jun 15, 2017

timgdavies added 1 - Discussion and removed 3 - Ready labels Jun 15, 2017

timgdavies added this to the Version 2.0 milestone Jun 15, 2017

jpmckinney mentioned this issue Aug 6, 2017

Archive some more repositories #529

Closed

6 tasks

jpmckinney mentioned this issue Sep 24, 2019

Guidance: Publishing OCDS in your language #925

Closed

jpmckinney changed the title ~~Use language maps (2.0) or abandon in-file translations (1.2)~~ Discussion: Use language maps (2.0) or abandon in-file translations (1.2) Jul 17, 2020

jpmckinney changed the title ~~Discussion: Use language maps (2.0) or abandon in-file translations (1.2)~~ Discussion: Abandon in-file translations (1.2) or use language maps (2.0) Jul 17, 2020

jpmckinney removed the Focus - Internationalization label Jul 18, 2020

jpmckinney mentioned this issue Sep 17, 2020

Fixed eligibilityCriteria description, added .tender.selectionCriteria #1072

Closed

jpmckinney added the discussion label Oct 24, 2020

jpmckinney changed the title ~~Discussion: Abandon in-file translations (1.2) or use language maps (2.0)~~ Abandon in-file translations Jun 7, 2023

jpmckinney removed the discussion label Jun 7, 2023

jpmckinney mentioned this issue Jun 7, 2023

Guidance: Publishing OCDS in your language #1064

Open

duncandewhurst changed the title ~~Abandon in-file translations~~ Abandon in-file translations, document full-file translation guidance Jun 19, 2023

duncandewhurst mentioned this issue Nov 29, 2023

Abandon in-file translations, document full-file translation guidance #1665

Merged

jpmckinney closed this as completed in #1665 Dec 14, 2023

jpmckinney added this to OCDS 1.2 Jul 2, 2024

jpmckinney moved this to Done in OCDS 1.2 Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abandon in-file translations, document full-file translation guidance #229

Abandon in-file translations, document full-file translation guidance #229

timgdavies commented Aug 25, 2015 •

edited

Loading

akuckartz commented Aug 27, 2015

Bjwebb commented Sep 14, 2015

timgdavies commented May 20, 2016

timgdavies commented Sep 18, 2016

akuckartz commented Sep 18, 2016

timgdavies commented Sep 18, 2016

kindly commented Nov 10, 2016 •

edited

Loading

timgdavies commented Nov 12, 2016

mpostelnicu commented Nov 13, 2016 •

edited

Loading

timgdavies commented Nov 14, 2016

kindly commented Nov 22, 2016 •

edited

Loading

timgdavies commented Jan 27, 2017

kindly commented Feb 9, 2017

timgdavies commented Jun 15, 2017

duncandewhurst commented Jun 22, 2020

jpmckinney commented Jun 7, 2023 •

edited

Loading

jpmckinney commented Jun 7, 2023

yolile commented Jun 7, 2023

duncandewhurst commented Jun 18, 2023

jpmckinney commented Jun 19, 2023

duncandewhurst commented Nov 28, 2023

jpmckinney commented Nov 28, 2023

duncandewhurst commented Nov 29, 2023

jpmckinney commented Nov 29, 2023

duncandewhurst commented Nov 29, 2023

jpmckinney commented Nov 29, 2023 •

edited

Loading

Abandon in-file translations, document full-file translation guidance #229

Abandon in-file translations, document full-file translation guidance #229

Comments

timgdavies commented Aug 25, 2015 • edited Loading

akuckartz commented Aug 27, 2015

Bjwebb commented Sep 14, 2015

timgdavies commented May 20, 2016

timgdavies commented Sep 18, 2016

akuckartz commented Sep 18, 2016

timgdavies commented Sep 18, 2016

kindly commented Nov 10, 2016 • edited Loading

timgdavies commented Nov 12, 2016

mpostelnicu commented Nov 13, 2016 • edited Loading

timgdavies commented Nov 14, 2016

kindly commented Nov 22, 2016 • edited Loading

timgdavies commented Jan 27, 2017

kindly commented Feb 9, 2017

timgdavies commented Jun 15, 2017

duncandewhurst commented Jun 22, 2020

jpmckinney commented Jun 7, 2023 • edited Loading

jpmckinney commented Jun 7, 2023

yolile commented Jun 7, 2023

duncandewhurst commented Jun 18, 2023

jpmckinney commented Jun 19, 2023

duncandewhurst commented Nov 28, 2023

jpmckinney commented Nov 28, 2023

duncandewhurst commented Nov 29, 2023

jpmckinney commented Nov 29, 2023

duncandewhurst commented Nov 29, 2023

jpmckinney commented Nov 29, 2023 • edited Loading

timgdavies commented Aug 25, 2015 •

edited

Loading

kindly commented Nov 10, 2016 •

edited

Loading

mpostelnicu commented Nov 13, 2016 •

edited

Loading

kindly commented Nov 22, 2016 •

edited

Loading

jpmckinney commented Jun 7, 2023 •

edited

Loading

jpmckinney commented Nov 29, 2023 •

edited

Loading