-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abandon in-file translations, document full-file translation guidance #229
Comments
👍 |
Looks to me like JSON-LD can also handle the current OCDS structure - #40 (comment)
This would almost certainly break backwards compatibility, at least for data users. Is there a way of changing this that wouldn't? |
We can't currently see a way of changing language which does not involve a backwards-incompatible change, so this would have to go on the stack for a 2.0 update right now. |
After further team discussions, we agreed that this would not likely be a desirable change, even for 2.0, so closing. |
Can the reasons be summarized ? |
I've unfortunately lost my full notes of the discussion, so @Bjwebb and others may be able to add, but from memory:
|
Reopening this issue as nobody seems too happy with the current approach. My suggestion would be. On a core schema have a translation definition, keeping the properties field blank for now. It would look like this. "definitions": {
"Translation": {
"type": "object",
"properties": {}
}
} For any field that required different language versions of it you would add a ref to that object i.e {
"description_translation": {"#ref": "#/definitions/Translation"},
"title_translation": {"#ref": "#/definitions/Translation"}
} You would need this for every field that needed translating and would replace the pattern properties used currently. There would need to be a repo with extensions for every language. These extensions would place a lang code into the Translation definition. They would look like (say for french) the release-schema.json. "definitions": {
"Translation": {
"properties": {
"fr": {
"type": "string"
}
}
}
} Doing it this way means that if you apply the language extensions it will work for all other extensions (i.e they are composable and do not require special ordering of patches applying the schema) as long as they define translatable fields the same way as above. Downsides for this approach are:
|
I think downside 3 (multiple equal languages) will be a particular blocker to this issue. This does make me wonder whether we should take an entirely different track - and say that OCDS does not support in-file translations, and that translations should be provided in a separate file for each language - served up via content negotiation or alternative URI structure.... |
Hey, Just an idea of handling this using It is fully backwards compatible with our (old) non translated schema. I've made an example of the Item element below:
Using this schema change, you could validate both this:
and this (the old format)
I tested all the json snippets above and they are valid (against both old and this new schema). |
Thanks @mpostelnicu I can see this works well from a schema perspective - but my fear is that it is difficult for users to deal with - if they have to anticipate either a string or an object. I guess this can be handled by:
However, your proposal is particularly interesting with respect to backwards compatibility: data valid against 1.0 would still be valid against 1.1 if it included language maps in this way. The pattern properties ( |
Thank you @mpostelnicu I think using the oneOf is the only way to get wherer we want, even though it would be a pain for some data consumers as @timgdavies said. We also already use it in the record package schema. However, I would not have the language as a list as it causes issues in making a flattened representation of it and limits validation options (i.e it is hard to validate if people have put the same language code twice) I would suggest doing it like:
So for any translatable fields would look like
Use cases like Canada patch the schema (in one place) to make en and fr required and also limit the properties to just them (so any lang code outside would look like an extra field). |
Discussed with @kindly We will work this up and put this forward to peer reviewers for 1.1, for a view on accepting or pushing to 2.0 (as it creates some backwards compatibility issues) |
A first attempt at a patch for this can be found here. At the moment this uses patternProperties for what can be in the Language Map but we could rely on extension to add particular languages to the map instead or as well. Having the pattern properties lowers initial barrier to using the language map but is less explicit. More importantly, this patch breaks backwards compatibility and for that reason needs a lot of consideration for inclusion into a non major release. |
Moving to 2.0. This was postponed beyond 1.1. Whilst this change is considered a good move long-run, it involves a backwards incompatible change, and would require substantial refactoring of documentation generation tools and other resources. |
Jordan expressed a preference for translating entire releases, rather than in-file translations of individual fields. |
I checked using the aggregated data at https://ocdsdata.fra1.digitaloceanspaces.com/metadata/stats.json. It's missing changes to spiders since a year ago, but in short the main users of _xx suffixes are (I don't match the full pattern, because that pattern matches almost any non-translation field):
Then there are a few fields for:
So, I think it's okay to abandon in-file translations as a standardized method. We can recommend full-file translations for publishers like Canada, and for the few fields above, there can be local extensions. What do you think, @yolile ? |
Noting that we can add some guidance relating to this (#1064). |
The Honduras ones are more like a typo rather than the actual use of the in-file translations. So I think it is fine to abandon them. |
Abandoning in-file translations sounds good to me. OC4IDS 0.9.4 adds some guidance that might be relevant: https://standard.open-contracting.org/staging/infrastructure/0.9-dev/en/guidance/language/#publishing-data-in-your-own-language Clarifying what is required to close this issue:
Sound good? |
Sounds good! (1) is mostly removing the patternProperties fields. |
@jpmckinney for (3), would reframing the OC4IDS guidance on publishing data in your own language as a worked example on translations be sufficient? |
Yes, that would be a good start. I think for the lookup we can just have a list of translatable fields (as it is otherwise very long). |
I'm assuming that we actually want to deprecate the patternProperties fields. Let me know if not. |
lib-cove doesn't check |
Sounds good! Shall I remove or deprecate the language section of the schema reference page? |
I think just delete, since that content can't be followed if |
The current schema uses an approach to indicate language based on suffixes to property names.
This was discussed in #21 during the beta, and was chosen on the understanding that we wanted to avoid depth where we could (for easier flat renderings of the the data). However, on reflection, the approach we have adopted may not have been the most appropriate.
In particular, @elf-pavlik has pointed out here in #40 that JSON-LD uses language maps, and it would be much easier to create a JSON LD rendering of our data this way.
Our flattening approach would also just render this to 'description/en', 'description/es' and so-on, which seems fairly intuitive, and our earlier fears about over-use of objects does not appear a major one.
However, to change to language maps would potentially be a backwards incompatible update.
It might make sense to look at doing it early: but would definitely need wide discussion.
The text was updated successfully, but these errors were encountered: