Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language maps on literals with no language don't play well together #480

Closed
gkellogg opened this issue Apr 12, 2017 · 23 comments
Closed

Language maps on literals with no language don't play well together #480

gkellogg opened this issue Apr 12, 2017 · 23 comments

Comments

@gkellogg
Copy link
Member

As described in this email, mixing literals having language, with those not having a language in a language map creates an odd structure when compacting:

If a string is associated with a property defined as a languageMap, but
does not have a language associated with it, it creates two keys in the
JSON ... the langStrings go in one, and the non-langStrings in another.
This is unintuitive and exposes some of the weirdest weirdness of RDF
(langStrings) to unsuspecting JSON developers.

A proposal to discuss:

If compaction would result in an attempt to add a string without an
associated language into a LanguageMap, then the processor SHOULD assign
the undefined language code UND as the key in the array.

Thus,
_:x rdfs:label "Fish"@en, "Poisson"@fr, "51234" .

Would result in:

{
"@id": "_:x",
"label": {"en": "Fish", "fr": "Poisson", "UND": "51234"}
}

Rather than the current compaction result:

{
"@id": "_:x",
"label": {"en": "Fish", "fr": "Poisson"},
"rdfs:label": "51234"
}

Notes:

  • PHP does not support "" as a key in a dictionary, and thus UND as the key
  • This does not propose an inverse expansion rule, in case someone has an
    explicit @und langString [seems unlikely], where it should not become a
    regular xsd:string

References:

digitalbazaar/jsonld.js#151
IIIF/api#755

@gkellogg gkellogg added this to the JSON-LD 1.1 milestone Apr 12, 2017
@gkellogg gkellogg self-assigned this Apr 12, 2017
@gkellogg
Copy link
Member Author

Jakob Voß notes that "und" actually means "Unknown language", and "zxx" may be more appropriate, which means "No linguistic content/Not applicable".

Perhaps this (or something similar) would be a way to address this. When trying to use a language-mapped term with a literal with no language, the language tag "zxx" is used as a key. {“@value”: “foo”, “@language”: “zxx”} should probably also work and perhaps just expand to {“@value”: “foo”}.

cc/ @azaroth42

@azaroth42
Copy link
Contributor

Thanks for making the issue!

UND is "Undetermined" ... which, to me, captures both "q2341234"@UND where there is no linguistic content, and "fish"@UND where there is, but the language isn't known.

As such, as a fallback for when there isn't a language specified, it makes more sense to me to use UND than the more explicit ZXX which would be incorrect for the "fish" case.

@azaroth42
Copy link
Contributor

And, with all due respect to Jakob ... the I18N group agrees:

https://www.w3.org/International/questions/qa-no-language

@workergnome
Copy link

workergnome commented Apr 12, 2017 via email

@gkellogg
Copy link
Member Author

@workergnome Yes, but we need to consider the results of expanding and removing the concept of a language-map, or compacting something that didn't originally have a language map.

But, we could decide that for JSON-LD, the use of a language-map term is sufficient indicate that literal values, without an explicit @type, are considered to have an unknown language (und), in most cases, this is probably the case.

@gkellogg
Copy link
Member Author

Jakob Voß writes:

"" is a legal object key in JSON and PHP just happens to be one programming languages with full support of JSON, so why design JSON-LD with focus on a particular choice of implementation in PHP? The language can deal with "" keys in JSON data pretty well:

<?php
$json = '{"":1}';
$data = json_decode($json, true);
echo $data[""]; # prints '1'
?>

If you prefer PHP objects over PHP arrays, the "" can internally be replaces by another special value but that's a personal choice of implementation and out of the scope of JSON-LD.

Using "" for non-language strings looks like the cleanest and most obvious solution to me.

@dlongley
Copy link
Member

Cross-posting from the mailing list in response to Jakob Voß's comments:

I don't remember all of the implementation trouble related to the use of empty strings in PHP, but I do remember it being more complex than is being hinted at here. I'm pretty sure that you can't use PHP arrays because you lose the ability to easily distinguish between arrays and objects, which is a requirement for proper implementation.

In any event, how difficult it is to implement a syntax's processing rules in common programming languages should absolutely be a factor in related design decisions -- otherwise adoption could be harmed in a significant way.

It would indeed be nice if there were no issue here, but, unfortunately, I'm not yet convinced that's true. My memory is that the two PHP implementers agreed that this was a pain point worth avoiding. If we can
get a new implementer to step forward (or PRs to the existing implementations) that clearly demonstrate that this problem can be fully avoided without serious detriment to performance or significantly increased implementation complexity, then I'll agree that we should no longer consider it when making future design decisions.

@dlongley
Copy link
Member

Related: https://bugs.php.net/bug.php?id=46600

It seems that this limitation in PHP may have finally been dealt with in June of last year -- we'll need to figure out which version of PHP has the fix (and how prevalent), if true.

@dlongley
Copy link
Member

PHP 7.1 (the latest version) now supports empty string properties in objects:

http://php.net/manual/en/migration71.incompatible.php

Decoding an empty key now results in an empty property name, rather than empty as a property name.

@azaroth42
Copy link
Contributor

Fantastic ... back to the list for a revised proposal...

@dlongley
Copy link
Member

@azaroth42,

Fantastic ... back to the list for a revised proposal...

Keep in mind that hardly anyone is using 7.1 yet -- I suspect it may take a while for it to get sufficient adoption. So that's a consideration here. Long term, however, I think we can rid ourselves of this particular annoyance. :)

@azaroth42
Copy link
Contributor

True, but by the time JSON-LD 1.1 hits TR, hopefully that will have changed. And if we can push people in the right direction if they need a particular feature in a particular language, that seems like an acceptable situation to me.

@gkellogg
Copy link
Member Author

gkellogg commented Apr 14, 2017 via email

@lanthaler
Copy link
Member

Leaving the PHP issue aside, this would fundamentally change how compaction works. Till now, it didn't implicitly introduce values. This would change that. To be consistent, we would need to do the same with other containers... such as @index.

The solution I normally recommend for this is to defined an additional term like fallbackLabel.

@azaroth42
Copy link
Contributor

With the "" option, it would continue to not introduce values. It's just a significantly more convenient syntax than needing both label_with_language and label_without_language keys.

I do agree that we should consider consistency with other containers, however. Will work on that.

@azaroth42
Copy link
Contributor

After re-reading @index a couple of times, I'm sorry but I don't see how this applies?

@index containers explicitly persist through compaction and expansion, and there's no way to generate them from RDF directly. It seems to me like the only ramification is that we would allow "" as a key in an @index? Could you expand your comment a little please @lanthaler?

@gkellogg
Copy link
Member Author

gkellogg commented May 3, 2017

@lanthaler the idea is to not need to use different properties for such cases. If you have some dc:title properties, some with language, and some without, splitting these between different properties makes this less convenient, not more convenient for developers.

One possibility is to use the language tag @none as a stand-in for no language, so which might look like the following:

{
  "title": {
    "en": "The Queen",
    "@none": "alternate value, without language"
  }
}

I would say that this is equivalent to zxx ("No linguistic content/Not applicable"), but more consistent with keyword use in JSON-LD (although this comes from framing).

Furthermore, to follow Postel's Law, we might treat @none, zxx, and und equivalent when expanding a language map (and possible a value object) to expand to no a value object with no @language (or @type). Compacting to a language map would also consider literals without type or language and gather them under the @none key.

We may want to consider what happens with RDF literals having zxx or und language tags, which I have never seen in real world data.

@lanthaler
Copy link
Member

@lanthaler the idea is to not need to use different properties for such cases. If you have some dc:title properties, some with language, and some without, splitting these between different properties makes this less convenient, not more convenient for developers.

I'd argue that's not the case, at least not for the "" proposal. That feels very unnatural (to say the least) in basically every programming language. @none looks better but I feel that this is such a corner case with a not-too-bad workaround that I'd opt for consciously not solving it with syntactic sugar.

@lanthaler
Copy link
Member

After re-reading @index a couple of times, I'm sorry but I don't see how this applies?

The proposal was to implicitly tag a string to have a language tag of "" or @none even though it wasn't there in the expanded form. To be consistent, we would need to do the same for index maps which I strongly think we should not do.

@index containers explicitly persist through compaction and expansion, and there's no way to generate them from RDF directly.

JSON-LD as a superset of RDF. We decided to guarantee lossless compaction/expansion but not roundtrips to RDF.

@gkellogg
Copy link
Member Author

gkellogg commented May 3, 2017

Empty-string issues aside, I disagree that using @none in an index or language map is unnatural; I think we need to get some votes from other parties. Please 👍 or 👎 this comment based on your agreement with the proposal.

PROPOSAL: Index Maps and Language Maps may include the @none key; When expanding, values do not receive an @index or @language entry. When compacting, value objects having neither @language, @index, or @type are included within the mapped values.

The alternative, which exists currently, is that such values cannot be serialized using such a term, and are serialized either using another matching term, or using an absolute IRI.

@gkellogg
Copy link
Member Author

gkellogg commented May 15, 2017

RESOLVED: Index Maps and Language Maps may include the @none key; When expanding, values do not receive an @index or @language entry. When compacting, value objects having neither @language, @index, or @type are included within the mapped values.

@niklasl
Copy link
Member

niklasl commented Jul 7, 2017

There might be a possible conflict here between language containers and the use of terms with an explicit language of null (i.e. string values). That is, for compaction in JSON-LD 1.0 you can define a language container term (e.g. labelByLang: {@id: rdfs:label, @container: @language}) and then a "companion term" catching the non-lang-tagged strings (e.g. labelString: {@id: rdfs:label, @language: null}) . It would be great if this feature doesn't interfere with that (and it would be backwards incompatible otherwise).

gkellogg pushed a commit that referenced this issue Dec 7, 2017
* Add @graph container tests.

* A couple of more graph expansion and compaction tests for corner cases.

* Add tests for expanding and compacting named graphs where term definition includes `@graphid`.

* Expand and compact `@container: @index` where value is a graph.

* Disable highlight.js, and update our use of the "highlight" class to "hl-bold". This makes rendered JavaScript examples slightly less pretty, but restores the specific highlighting used with "****" in examples.

* Sort many definition lists automatically using `@data-sort`.

* Add @Version to context definitions in examples using 1.1 features.

* Syntax updates to describe Graph Containers.

* Add inline ednotes for places that are affected by issue #480.

* Add compaction and expansion tests for `@graph` with `@index` and `@id`.

* Add syntax for graph maps and API algorithms for all graph containers.
gkellogg added a commit that referenced this issue Jan 17, 2018
… containers.

Changes behavior for id maps to use `@none` instead of a blank node identifier.

Fixes #480
@gkellogg
Copy link
Member Author

Replaced by #569.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants