Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for rdf:dirLangString #49

Closed
gkellogg opened this issue Jan 23, 2025 · 18 comments
Closed

Support for rdf:dirLangString #49

gkellogg opened this issue Jan 23, 2025 · 18 comments
Labels
spec:new-feature Change adding new functionality (class 4)

Comments

@gkellogg
Copy link
Member

gkellogg commented Jan 23, 2025

Although HTML has a dir attribute that can be used to specify the initial text direction of text elements, XML has not such attribute defined.

In HTML, you can say something like the following (from here):

<blockquote dir="ltr" lang="en" cite="Romeo and Juliet (II, ii, 1-2)">But,
  soft! What light through yonder window breaks?
  It is the east, and Juliet is the sun.</blockquote>

In RDF/XML we would need to define an attribute to do what HTML dir does for RDF/XML literals. This could be by adding an attribute to the existing rdf namespace, or using some other namespace. SPARQL XML Results uses the Internationalization Tag Set its (xmlns:its="http://www.w3.org/2005/11/its").

Propagation rules would be like xml:lang, with the attribute defining how strings in descendant elements are treated. For instance, Example 8 might be updated as follows

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:its="http://www.w3.org/2005/11/its">

  <rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar">
    <dc:title>RDF 1.2 XML Syntax</dc:title>
    <dc:title xml:lang="en">RDF 1.2 XML Syntax</dc:title>
    <dc:title xml:lang="en-US">RDF 1.2 XML Syntax</dc:title>
    <dc:title xml:lang="he" its:dir="rtl" its:version="2.0">HTML היא שפת סימון.</dc:title>
  </rdf:Description>

  <rdf:Description rdf:about="http://example.org/buecher/baum" xml:lang="de">
    <dc:title>Der Baum</dc:title>
    <dc:description>Das Buch ist außergewöhnlich</dc:description>
    <dc:title xml:lang="en">The Tree</dc:title>
  </rdf:Description>

</rdf:RDF>
@gkellogg gkellogg added the spec:new-feature Change adding new functionality (class 4) label Jan 23, 2025
@Tpt
Copy link

Tpt commented Jan 23, 2025

+1 to @gkellogg I would use its for consistenct with SPARQL XML results.

However, this might slightly break backward compatibility:

A document:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ex="http://example.com/"
         xmlns:its="http://www.w3.org/2005/11/its">
  <rdf:Description rdf:about="foo" xml:lang="en" its:dir="ltr">
    <ex:title>Foo</ex:title>
  </rdf:Description>
</rdf:RDF>

that is parsed in RDF/XML 1.1 as:

ex:foo ex:title "Foo"^^en ; its:dir "ltr" .

is now parsed:

ex:foo ex:title "Foo"^^en--ltr .

@gkellogg
Copy link
Member Author

However, this might slightly break backward compatibility:

Arguably, any use of its:dir in an RDF/XML document would have been broken already. But, pretty much any additional elements/attributes we provide could be thought of as breaking backwards compatibility if they happened to be in an old document where they were not taken into consideration.

@pchampin
Copy link
Contributor

About backward compatibility:
in #50 you suggest to add a version information. This could also be a solution to this backward compatibility issue:
If the version is explicitly 1.2 (or greater), then the http://www.w3.org/2005/11/its namespace is treated as a special one. Otherwise, it is treated as a regular prefix (for better or for worse).

@gkellogg
Copy link
Member Author

Maybe we define an rdf:version attribute to be used at the top-level, where the rdf namespace is introduced. RDF 1.2 features are enabled when the processor sees rdf:version="1.2". This could also allow us to handle rdf:ID differently, to create a reifier instead of the statement triples, but that would be for another issue.

@afs
Copy link
Contributor

afs commented Jan 25, 2025

A property of its:dir will be http://www.w3.org/2005/11/itsdir (no separator) so we might say "possible but acceptably unlikely".

We can have dir -- a non-namespaced attribute is not allowed as a property. It is a 1.1 parse error - like parseType="Triple".

@rubensworks proposed its:dir in SPARQL XML Results, and because of that, and also the exemplar of xml:lang, in JSON results, "its:dir".

@gkellogg
Copy link
Member Author

A property of its:dir will be http://www.w3.org/2005/11/itsdir (no separator) so we might say "possible but acceptably unlikely".

Note that its:dir isn't treated as an RDF property, but as an XML attribute, so its expansion isn't significant for the emitted triples.

We can have dir -- a non-namespaced attribute is not allowed as a property. It is a 1.1 parse error - like parseType="Triple".

+1

@rubensworks proposed its:dir in SPARQL XML Results, and because of that, and also the exemplar of xml:lang, in JSON results, "its:dir".

+1

@gkellogg
Copy link
Member Author

  • Use of direction in literals is predicated on rdf:version being set at the top, with value at least "1.2".
  • its:dir also needs to be used with its:version="2.0", but Postel's Law would say we should accept it without the version being specified, although serializers MUST use it.

Here are some potential test cases:

Language with no direction

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/" xml:lang="en" rdf:version="1.2">
  <rdf:Description rdf:about="http://example.org/joe" ex:name="bar"/>
</rdf:RDF>

Should result in <http://example.org/joe> <http://example.org/name> "bar"@en .

Language with direction

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/" xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0" its:dir="ltr" xml:lang="en" rdf:version="1.2">
  <rdf:Description rdf:about="http://example.org/joe" ex:name="bar"/>
</rdf:RDF>

Should result in <http://example.org/joe> <http://example.org/name> "bar"@en--ltr .

Language with direction and no RDF version

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/" xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0" its:dir="ltr" xml:lang="en">
  <rdf:Description rdf:about="http://example.org/joe" ex:name="bar"/>
</rdf:RDF>

Should result in <http://example.org/joe> <http://example.org/name> "bar"@en . Note that direction is unused because rdf:version is not specified.

Language with direction and no ITS version

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/" xmlns:its="http://www.w3.org/2005/11/its" its:dir="ltr" xml:lang="en" rdf:version="1.2">
  <rdf:Description rdf:about="http://example.org/joe" ex:name="bar"/>
</rdf:RDF>

Could result in <http://example.org/joe> <http://example.org/name> "bar"@en--ltr . if we decide its:version is required, otherwise, it would silently (or with warning) generate <http://example.org/joe> <http://example.org/name> "bar"@en .

Language with direction on element directly

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/" xmlns:its="http://www.w3.org/2005/11/its" rdf:version="1.2">
  <rdf:Description rdf:about="http://example.org/joe">
    <ex:name xml:lang="en" its:version="2.0" its:dir="ltr">bar</ex:name>
  </rdf:Description>
</rdf:RDF>

Different form, with attributes on element, rather than inherited from ancestors.

Direction with no language

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/" xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0" its:dir="ltr" rdf:version="1.2">
  <rdf:Description rdf:about="http://example.org/joe" ex:name="bar"/>
</rdf:RDF>

Should result in <http://example.org/joe> <http://example.org/name> "bar" . as direction can't be used without language.

@rubensworks
Copy link
Member

About backward compatibility:
in #50 you suggest to add a version information. This could also be a solution to this backward compatibility issue:
If the version is explicitly 1.2 (or greater), then the http://www.w3.org/2005/11/its namespace is treated as a special one. Otherwise, it is treated as a regular prefix (for better or for worse).

Just mentioning w3c/rdf-star-wg#141 here as well regarding the issue of backwards-compatibility, to have the discussions linked.

@afs
Copy link
Contributor

afs commented Jan 27, 2025

Adding a version (here, Turtle, TriG) is a possibility and might be a good idea generally for the future.

RDF 1.2 does not change the vast majority of RDF data. Triple terms and base direction will be uncommon.

My concern is that we end up "splitting the world" into "RDF 1.1" and "RDF 1.2". All RDF 1.1 is valid RDF 1.2 and RDF 1.2 features will be uncommon (IMO).

If an RDF 1.2 system have to get clever (scan data first; or implementation additional content negotiation when it currently uses a plain framework for HTTP), I think we will see slower migration or systems that are RDF 1.2 only, to the exclusion of RDF 1.1 consumers (e.g. new media types then RDF 1.2 systems only supporting those media types. (c.f. XML 1.1)

@afs
Copy link
Contributor

afs commented Jan 27, 2025

@gkellogg Thank you for the examples.

I can't work out from the spec if adding new vocabulary - rdf:version - is a parse error or not to rdf/xml 1.1. What do deployed parsers do here? "Language with direction" parses as legal RDF 1.1 if rdf:version is ignored.

We have to decide whether that is OK. Base direction is a somewhat different case to triple terms in this respect.

There is a fear of syntax errors. But rejecting requests (profiles, content-negotiation cases) is bad as well as. There is no perfect answer.

I agree that base direction, no language should be ignored.

@gkellogg
Copy link
Member Author

My concern is that we end up "splitting the world" into "RDF 1.1" and "RDF 1.2". All RDF 1.1 is valid RDF 1.2 and RDF 1.2 features will be uncommon (IMO).

I sympathize, and we faced similar discussions with JSON-LD 1.1 and its use of @version.

If an RDF 1.2 system have to get clever (scan data first; or implementation additional content negotiation when it currently uses a plain framework for HTTP), I think we will see slower migration or systems that are RDF 1.2 only, to the exclusion of RDF 1.1 consumers (e.g. new media types then RDF 1.2 systems only supporting those media types. (c.f. XML 1.1)

For JSON-LD, we made version 1.1 the default behavior, with a way to specify it so that 1.0 versions would recognize this. We could, of course, do away with rdf:version and leave it that systems which recognize its:dir operate differently. Even for Triple Terms, 1.1 systems that see parseType="Triple" should blow up. It's if we want to change the behavior of rdf:ID to create Triple Terms rather than reification triples that an explicit version is helpful; or, we could adopt another property (e.g., rdf:reifies) that would also be ignored by older implementations. But, that's for another PR.

@gkellogg Thank you for the examples.

I can't work out from the spec if adding new vocabulary - rdf:version - is a parse error or not to rdf/xml 1.1. What do deployed parsers do here? "Language with direction" parses as legal RDF 1.1 if rdf:version is ignored.

My parser (currently) just ignores this as an unknown attribute.

5.1.4 Attribute Event indicates that local names other than ID, about, resource, parseType, or type are forbidden, so this creates an opportunity for an extension point.

We have to decide whether that is OK. Base direction is a somewhat different case to triple terms in this respect.

xml:base is not treated as other attributes.

RDF/XML allows further abbreviating IRIs in XML attributes in two ways. The XML Infoset provides a base URI attribute xml:base that sets the base URI for resolving relative IRI references, otherwise the base URI is that of the document. The base URI applies to all RDF/XML attributes that deal with IRIs which are rdf:about, rdf:resource, rdf:ID and rdf:datatype.

@gkellogg gkellogg added the needs discussion Proposed for discussion in an upcoming meeting label Jan 27, 2025
@TallTed
Copy link
Member

TallTed commented Jan 28, 2025

could adopt another property (e.g., rdf:reifies) that would also be ignored by older implementations

I would not want such predicates to be ignored, but rather, to be treated as just as opaque as any other unrecognized predicate.

Similarly, I would want RDF 1.1 systems to treat triple terms as literals, which, it occurs to me, could be done just by wrapping whatever markup code we finalize on in quotation marks, e.g. —

ex:subg rdf:reifies "<< ex:subj2 ex:pred ex:obj >>"

Hey, presto, it's a quoted triple and a quoted triple term! Maybe it warrants a specific literal type, like —

ex:subg rdf:reifies "<< ex:subj2 ex:pred ex:obj >>"^^rdf:tripleterm

@gkellogg
Copy link
Member Author

could adopt another property (e.g., rdf:reifies) that would also be ignored by older implementations

I would not want such predicates to be ignored, but rather, to be treated as just as opaque as any other unrecognized predicate.

We don't have a choice about what 1.1 implementations do when faced with an attribute from the RDF namespace, as the spec says these are forbidden. Some implementations may through an error, others ignore, and others may (erroneously) output a triple. It's the 1.2 behavior we can control, and it's an open question if we use rdf:version and what the behavior is if it is not >= 1.2 or does not exist.

Similarly, I would want RDF 1.1 systems to treat triple terms as literals, which, it occurs to me, could be done just by wrapping whatever markup code we finalize on in quotation marks, e.g. —

ex:subg rdf:reifies "<< ex:subj2 ex:pred ex:obj >>"

Hey, presto, it's a quoted triple and a quoted triple term! Maybe it warrants a specific literal type, like —

ex:subg rdf:reifies "<< ex:subj2 ex:pred ex:obj >>"^^rdf:tripleterm

Again, we can't change the behavior of 1.1 systems, and 1.2 systems should output a Triple Term.

@TallTed
Copy link
Member

TallTed commented Jan 28, 2025

Again, we can't change the behavior of 1.1 systems,

Yes. But what is the correct behavior of 1.1 systems that encounter this triple?

ex:subg rdf:reifies "<< ex:subj2 ex:pred ex:obj >>"^^rdf:tripleterm .

If they are required to choke on rdf:tripleterm because of the prefix, then change the prefix, to ex: or rdfs: or whatever doesn't cause a choke but simply gets treated as "unknown" and hence remain an unparsed but validly typed literal.

I am thinking that an RDF&SPARQL 1.2-aware application could use SPARQL 1.1 to query a SPARQL 1.1 engine atop a RDF 1.1 datastore that holds data such as the above, and get output including the above triple, and handle the literal typed as tripleterm as a tripleterm. SPARQL UPDATE 1.1 queries could be used to insert/update that data.

@afs
Copy link
Contributor

afs commented Jan 29, 2025

See the schema in the RDF/XML spec.

I uses Emacs nXML mode which is RDF/XML schema aware. XML Attribute rdf:version in the rdf:RDF is flagged as invalid. It is not flagged as an error in a nodeElement.

@afs
Copy link
Contributor

afs commented Jan 29, 2025

I would want RDF 1.1 systems to treat triple terms as literals

But what is the correct behavior of 1.1 systems that encounter this triple?

ex:subg rdf:reifies "<< ex:subj2 ex:pred ex:obj >>"^^rdf:tripleterm .

That is well-defined in RDF 1.1 and would be the same in RDF 1.2 (due to the charter). The triple term would be in the value space (if possible at all), not part of the RDF abstract data model. Triple terms as datatype literals has been discussed before.

It is not related to RDF/XML - discussion needs to be in the unstar discussions.

@FabienGandon
Copy link

I believe that it would be a good practice (from now own) to encourage the explicit mention of the RDF version as in rdf:version="1.2"

As for the direction I agree with @gkellogg : its:dir used with its:version="2.0" with Postel's Law.

In other words I like the proposal of recommending
its:version="2.0" its:dir="ltr" xml:lang="en" rdf:version="1.2"
but accepting to parse
its:dir="ltr" xml:lang="en" rdf:version="1.2"

@gkellogg gkellogg removed the needs discussion Proposed for discussion in an upcoming meeting label Feb 20, 2025
@gkellogg
Copy link
Member Author

Closed via #54.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:new-feature Change adding new functionality (class 4)
Projects
None yet
Development

No branches or pull requests

7 participants