Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop the requirement to support ill-typed literals with recognized datatype IRIs #60

Open
wouterbeek opened this issue Aug 30, 2023 · 33 comments · May be fixed by #124
Open

Drop the requirement to support ill-typed literals with recognized datatype IRIs #60

wouterbeek opened this issue Aug 30, 2023 · 33 comments · May be fixed by #124
Labels
needs discussion Proposed for discussion in an upcoming meeting spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial

Comments

@wouterbeek
Copy link

Observation

RDF 1.1 requires that implementations support ill-typed literals, including ill-typed literals with recognized datatype IRIs.

Ill-typed literals with recognized datatype IRIs do not have any known use cases. They are semantically inconsistent, do not denote anything, have no value, and any triple that contains them is false in every interpretation.

Notice that there is nothing wrong with requiring implementations to support ill-typed literals with unrecognized datatype IRIs. For example, it is good that RDF implementations are required to support literals like [1] that have a datatype IRI that is not broadly recognized.

[1] '### Header'^^<https://example.com/markdown>

However, it is unclear why implementations are allowed to support, let alone are required to support, ill-typed literals with recognized datatype IRIs.

Example

Suppose a triple store recognizes the RDF datatype IRIs + the XSD datatype IRIs + the GeoSPARQL datatype IRIs. Such a triple store can upon data ingest immediately detect that [2] and [3] are ill-typed literals with recognized datatype IRI.

[2] 'Yes'^^xsd:boolean
[3] 'The sea is everything. It covers seven tenths of the terrestrial globe.'^^xsd:boolean

The RDF 1.1 standard forbids triple stores to throw an error upon encountering data that contains [2] or [3], even though this may be the preferred data quality approach for many users.

Suggestion

In RDF 1.2, let's weaken the RDF 1.1 phrase "Implementations MUST accept ill-typed literals" to:

  1. "Implementations MUST support ill-typed literals with unrecognized datatype IRIs."
  2. "Implementations MAY support ill-typed literals with recognized datatype IRIs."

Implementations MUST support the RDF datatype IRIs, and MAY support any other datatype IRIs that they believe important enough for their users. The notion "recognized datatype IRI" is used as defined in RDF 1.1 Semantics.

Ramifications

The proposed change makes it possible for RDF 1.2 data to be accepted in one implementation, but not in another implementation. For example, it is possible to upload data that contains literals [2] and [3] into an implementation that does not recognize the xsd:boolean datatype IRI. But it is not possible to upload the same data into an implementation that does recognize the xsd:boolean datatype IRI.

This differentiation is a good thing, because it allows stricter implementations to be created, rather than requiring all implementations to support the exact same ill-typed nonsense data.

Notice that RDF 1.1 Semantics already allows implementations to differ from one another in their support for more/fewer recognized datatype IRIs. Implementations that differ in their recognized datatype IRIs already differ in their behavior in RDF 1.1.

@wouterbeek wouterbeek changed the title Allow implementations that only support s Drop the requirement to support ill-typed literals with recognized datatype IRIs Aug 30, 2023
@afs
Copy link
Contributor

afs commented Aug 31, 2023

The current text is a bit strange.

Implementations MUST accept ill-typed literals and produce RDF graphs from them.

I don't think that the "MUST" can be meaningful if the literals are outside RDF-semantics. In RDF concepts, the text
can be dropped, or replaced with non-defining descriptive/advice text (after the numbered list), and refer to RDF Semantics.

Implementations MAY produce warnings when encountering ill-typed literals.

Any system can issue warning for anything regardless of this text so it can be dropped or made advice text as encouragement to do that.

For RDF Concepts , can we just say:
"Implementations SHOULD accept ill-typed literals"

which allows variation when there's justification.

("support" is stronger than "accept". "Accept" is about RDF terms (correct syntax). I would read "Support" is about acting, e.g. on the values c.f. D-entailment.)

@afs afs added the discuss-f2f Proposed for discussion during the next face-to-face meeting label Sep 6, 2023
@ktk ktk removed the discuss-f2f Proposed for discussion during the next face-to-face meeting label Oct 3, 2023
@pchampin
Copy link
Contributor

Implementations MUST accept ill-typed literals and produce RDF graphs from them.

I don't think that the "MUST" can be meaningful if the literals are outside RDF-semantics. In RDF concepts, the text can be dropped, or replaced with non-defining descriptive/advice text (after the numbered list), and refer to RDF Semantics.

+1

Actually, I consider this bit of RDF Concepts to contradict RDF Semantics §7.2, which says:

RDF processors MAY treat an unsatisfiable graph as signaling an error condition, but this is not required.

and in fact some implementations already do :)

This makes a strong case for replacing this MUST with a MAY in RDF-syntax, IMO.

@gkellogg gkellogg added the spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial label Jan 30, 2024
@afs
Copy link
Contributor

afs commented Jan 30, 2024

MAY is weak IMO.

It would be nice to encourage the behavior of passing through syntactically correct data with "SHOULD accept ill-typed literals".

@csarven
Copy link
Member

csarven commented Jan 30, 2024

This can be expressed as an advisory in the specification as a Note or within the Considerations section providing additional context for implementations to evaluate advantages and pitfalls.

@pfps pfps added the needs discussion Proposed for discussion in an upcoming meeting label Jun 27, 2024
@pchampin
Copy link
Contributor

pchampin commented Nov 4, 2024

This was discussed during the rdf-star meeting on 31 October 2024.

View the transcript

Drop the requirement to support ill-typed literals with recognized datatype IRIs 2

pfps: I agree with what Andy says in the issue
… the wording should change from MUST to SHOULD

AndyS: it depends what "support" really means here
… I don't think ill typed literal making the whole graph invalid is very useful

AZ: I also want to ask what is ment by "support". If you have a system that does not recognize a datatype IRI
… if you want to move that to another triplestore, you might lose something.
… I'm not sure what support means. It should pass as syntactically correct.
… By the semantics of illtyped literals, since RDF 1.1, if you have an ill-typed literal in a graph, it makes the graph inconsistent, unsatisfiable.

<AndyS> RDF concepts -- "The list of datatypes supported by an implementation is determined by its recognized datatype IRIs." seems to be the nearest to defining "support".

AZ: If you say this kind of graphs may not be supported, what about other kinds of inconsistencies. Should any such graph not be supported?
… I'm not sure if I agree with this proposal.

pfps: one option would be to tweak the wording

<pfps> One option is that implementations MUST accept input documents with ill-typed literals and SHOULD include the resultant triple in the RDF graph.

gkellogg: it makes no sense to talk about an ill-type literal for non-recognised datatypes
… it all depends on what "support" means

<pfps> That is - parsing MUST NOT stop at an ill-typed literal but the system MAY choose to not include the triple in the resultant graph.

gkellogg: I think the idea is to be able to only retain well-typed literals

<pfps> I would add that if an implementation drops the triple then it MUST produce a warning.

gkellogg: it would be reasonable for RDF systems to not deal with ill-typed literals

TallTed: the current text is "MUST accept", not "MUST support"
… "accept" means it can evolve
… triplestores should be able to take any literals
… but then it may deal with the literal for some processes adequatel
… you can do almost anything with RDF and unless there is a strong argument against that, we should keep it like this

ktk: how are different implementations dealing with this?

AndyS: in SPARQL, there are cases when you need to assign a value, so it does not work with ill-typed literals but that a SPARQL process
… there could be wording to make this a little more flexible with "MAY"
… it's difficult to make it a "MUST"

<pfps> agreed that it is difficult to require a warning

james: We are very accepting (in our implem) and it has been very useful
… I think it should be a "MUST" for reasons of interoperability

<AndyS> "SHOULD accept" -- MUST for warnings is a bit strange. We don't have a "warning" mechanism in the specs.

james: but it's personal opinion

Souri: when we find an ill-typed literal, we separate it
… we continue even if we find error and they get reported
… we do not accept it in that form, so for us, a MUST would not work

AndyS: choosing the datatypes you choose to handle is something you do when you use the data
… at loading time, you may not have decided

TallTed: I'm concerned to hear that some implementations are not conformant
… It's blocking evolution, because there may be new datatypes supporting in the future
… The reasoning I see is that the proposal is done because there are implementations that are not conformant

<niklasl> +1 for evolution (with the caveat that I prefer opt-in "drop unrecognized" modes to avoid sending inexplicable data onward).

Souri: if we have an xsd:integer with "abc" lexical form, we don't accept it, but if you have ex:mytype, we don't do anything
… we report the problem and users can decide what to do with this problem

<Zakim> pfps, you wanted to say that implementations that reject unrecognized datatypes are broken but ones that do not fully accept known ill-typed literals are not so bad

james: we do 2 kind of things, one on the values to do efficient operation, and one that just take any literal transparently
… in the past, we did not do anything with time, then it evolved to handle it appropriately

ktk: what do we do with this issue? We don't really have a conclusion

<TallTed> "MUST accept" is current text

Souri: in Oracle, we don't want to have, e.g., 31st February, so we reject it
… we do not hide it, we report it
… I would not like to have "MUST accept"

TallTed: not accepting data is bad but you can handle the ill-typed literals after they are loaded
… in the future, there could be a change that makes a lexical form acceptable

tl: I like the idea that there are several phases, 1st you parse and put in store, then other processes
… then the user can be informed of problematic literals
… you would get an error if you use reasoner on the data

AndyS: I find the use cases of rejecting or not rejecting both reasonable
… the problem is when an entire graph is rejected

<AndyS> My pref is change "MUST accept" to "SHOULD accept". All the described handling cases seem reasonable for their different cases.

Souri: w do not reject entire graph, just the triples with ill-typed literals

<Dominik_T> +1 for SHOULD

Souri: the earlier the problems can be pointed out the better
… customers are also happy with this

<ktk> Strawpoll: "Implementations MUST accept ill-typed literals" gets changed to "Implementations SHOULD accept ill-typed literals"

<Dominik_T> +1

<gtw> +1

<ktk> +1

<pfps> +1

<AndyS> +1

<Souri> +1

<gkellogg> +1

<AZ> -0.3141592

<enrico> 0

<TallTed> -0.5

<james> 0

<niklasl> +0.5 (I might prefer some "SHOULD by default, MUST if asked to accept"...)

TallTed: if we make this change, we have to be really clear how errors are dealt with

AndyS: I don't think we should go into how errors and warnings are handled

<TallTed> An ill typed literal is not a syntax error.

<TallTed> An ill typed literal conforms to syntax.

<Souri> +1 to AndyS

AndyS: there's an historical example (??) where specs mentioned what to do with errors and it took a large space, and was eventually dropped

niklasl: I had experienced cases of systems that reject things that I would have like be accepted because things evolved
… although I'm sympathetic to the arguments (thus my +0.5 vote)
… it could be something that users can opt-in or out

ktk: there could be a note that explain what pitfalls etc occur and how to deal with them


@pfps
Copy link
Contributor

pfps commented Nov 13, 2024

A message, https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Nov/0008.html, was sent to the WG mailing list with a proposal to resolve this issue. The contents of the message are:

PROPOSAL: Change the requirements for handling ill-typed literals so that
implementations may drop triples containing literals that are ill-typed for
datatypes that they recognize.

The relevant wording is in RDF 1.2 Concepts and Syntax:

If the literal's datatype IRI is in the set of recognized datatype IRIs, let d
be the referent of the datatype IRI.
If the literal's lexical form is in the lexical space of d, then the literal
value is the result of applying the lexical-to-value mapping of d to the
lexical form.
Otherwise, the literal is ill-typed and no literal value can be associated
with the literal. Such a case produces a semantic inconsistency but is not
syntactically ill-formed. Implementations MUST accept ill-typed literals and
produce RDF graphs from them. Implementations MAY produce warnings when
encountering ill-typed literals.

A possible change if the proposal is accepted is:

...
Otherwise, the literal is ill-typed and no literal value can be associated
with the literal. Such a case produces a semantic inconsistency but is not
syntactically ill-formed. Implementations MUST accept ill-typed literals from
their recognized datatypes and produce RDF graphs for inputs that contain
them. Implementations MAY produce warnings when encountering ill-typed
literals from their recognized datatypes. Implementations MAY exclude triples
that contain ill-typed literals from their recognized datatypes from the RDF
graphs they produce and SHOULD produce warnings when this happens.
NOTE: Implementations MUST accept all literals with datatypes that are not in
their recognized datatypes.

@pfps
Copy link
Contributor

pfps commented Nov 13, 2024

The note in the possible wording above should be expanded to read:

NOTE: Implementations MUST NOT exclude triples that do not contain literals with datatypes that are in their recognized datatypes from the RDF graphs they produce.

@afs
Copy link
Contributor

afs commented Nov 13, 2024

A possible change if the proposal is accepted is:
...
Implementations MUST accept ill-typed literals from
their recognized datatypes and produce RDF graphs for inputs that contain
them. ... Implementations MAY exclude triples
that contain ill-typed literals from their recognized datatypes from the RDF
graphs they produce

This seems to be contradictory.

Is "accept" meaning "can parse and continue" here? and then "produce ..." needs changing.

Or is the MUST supposed to be SHOULD? "Implementations SHOULD accept ill-typed literals"

@pfps
Copy link
Contributor

pfps commented Nov 13, 2024

@afs I don't think so. Implementations MUST accept ill-typed literals, in that they MUST NOT halt when they encounter an ill-typed literal, but they MAY decide to not include triples with ill-typed literals in the RDF graphs they produce.

@rat10
Copy link

rat10 commented Nov 13, 2024

I agree with @afs that this can seem contradictory.

I read the "produce RDF graphs for inputs" in

Implementations MUST accept ill-typed literals from their recognized datatypes and produce RDF graphs for inputs that contain them.

as requiring a conforming application to be able to return the graph with all its ill-typed literals intact, or otherwise it can't claim to be returning the original graph. Any other graph it produces, e.g. one in which ill-typed literals are dropped, can not claim to be the original graph.

I read the second sentence

Implementations MAY exclude triples that contain ill-typed literals from their recognized datatypes from the RDF graphs they produce

as honoring the fact that implementations are free to change graphs according to their needs. That is evident, but then the proposal adds the requirement that implementations SHOULD warn users if they drop ill-formed literals, and that seems sensible.

If my interpretation is correct I support the design of the proposal, but maybe the wording could be made clearer?

@lisp
Copy link

lisp commented Nov 14, 2024

i suggest that the proposed change not be made for at least these reasons:

  • as it is phrased, it conceals a contradiction. (see https://chatgpt.com/share/6735385d-fa10-8001-8a67-590ab0c1a087) this is likely to lead to confusion in its interpretation.
  • the arguments in favor of the issue itself - in essence to replace a "MUST" with a "SHOULD", are not adequately motivated: a type structure with capacity for ill-typed literals is not significantly more complex than one without and does not significantly affect execution resources.
  • as interoperability is more important than product differentiation for rdf as an interchange medium its core recommendations should prescribe the former rather than catering to the latter.
  • an implementation is free to assert non-conformance to whatever degree its market dictates. we perform universal d-entailment and temporal value normalization, despite that this is non-conformant, because global enterprise services would be much harder to implement without it. on the hand, we accept all literals into the graphs that we construct.

@pfps
Copy link
Contributor

pfps commented Nov 14, 2024

I don't see much loss in having literals like "a"^^xsd:int not being preserved.

@rat10
Copy link

rat10 commented Nov 14, 2024

And "I love you!"^^xsd:int?

@TallTed
Copy link
Member

TallTed commented Nov 14, 2024

I don't see much loss in having literals like "a"^^xsd:int not being preserved.

And "I love you!"^^xsd:int?

It's good that you don't have such data to deal with. That's no guarantee that such data will never be encountered.

What if the (apparently xsd:string) were significantly longer? Might you consider it worth preservation? How do you, and more importantly how do you tell others to, draw the line?

What if the apparent ill-typing is something like "5.7"^^xsd:int? Should implementers be told a priori that such data should be changed on load, forcing it to "6"^^xsd:int, if the datatype is considered most important, or to "5.7"^^xsd:decimal or "5.7"^^xsd:string, if the literal is considered most important?

(I think) I'm fine with an implementation declaring that such data will be handled in that way, especially if some kind of alert is raised when it happens.

I don't think I'm OK with our spec dictating any of the above handlings. Significantly, this would break from RDF 1.1 and 1.0, and I daresay, some datasets and stores would become non-compliant.

I think there is little difference between rejecting literal data that doesn't match my internal definition of its declared data type, and rejecting literal data that has a declared data type that I don't recognize. I believe both should be accepted and stored. Errors may arise when some comparison function is applied to the literal based on its declared type and which fails because the literal does not actually fit that declared type. That is OK! This is the point at which the user may decide to change the type of that literal, or change the literal to suit the type, or some combination of the two.

@ktk
Copy link

ktk commented Nov 21, 2024

Section in current version of RDF 1.2 Concepts and Abstract Syntax: https://www.w3.org/TR/rdf12-concepts/#section-Graph-Literal

It says:

If the literal's datatype IRI is in the set of recognized datatype IRIs, let d be the referent of the datatype IRI.

  • If the literal's lexical form is in the lexical space of d, then the literal value is the result of applying the lexical-to-value mapping of d to the lexical form.
  • Otherwise, the literal is ill-typed and no literal value can be associated with the literal. Such a case produces a semantic inconsistency but is not syntactically ill-formed. Implementations MUST accept ill-typed literals and produce RDF graphs from them. Implementations MAY produce warnings when encountering ill-typed literals.

If the literal's datatype IRI is not in the set of recognized datatype IRIs, then the literal value is not defined by this specification.

The proposal is to include all possible variations being considered as comment here, so we can conduct a straw poll on them.

@afs
Copy link
Contributor

afs commented Nov 21, 2024

Minimal change: Change to "Implementations SHOULD accept ..."

where SHOULD is defined by RFC 2119:

This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

This is then not a purely optional feature - that would use the word "MAY".

@TallTed
Copy link
Member

TallTed commented Dec 5, 2024

I think that if we change this MUST to SHOULD, we should at least add a requirement that implementations MUST document their handling, probably to include some alert about dropping/ignoring, of such literals during LOAD, etc.

@w3cbot
Copy link

w3cbot commented Dec 5, 2024

This was discussed during the #rdf-star meeting on 05 December 2024.

View the transcript

Drop the requirement to support ill-typed literals with recognized datatype IRIs 2

ora: from outside the group

ktk: last time (3 weeks ago), .... now we have a proposal.

<ktk> w3c/rdf-concepts#60 (comment)

<gb> Issue 60 Drop the requirement to support ill-typed literals with recognized datatype IRIs (by wouterbeek) [needs discussion] [spec:enhancement]

james: as a matter of process, why is this being worked on?

pchampin: W3C hat off -- while there was no errata, it seems a low hanging fruit and some existing implementation implement "should" - codify existing practice.

<pchampin> I still need to write down an answer, but I have some data about that

james: I asked for a catalog of implementations

pchampin: I have data

james: I am concerned because of interoperability

ora: table until next time (US "table" meaning)

gkellogg: it relates to "supported datatypes" and hence for the instance and interoperability is not related to "supported datatypes"
… could label tests

tallted: not an erratum until previous work reviewed

pchampin: not all system do the same
… does not impact interoperability because ill-typed literals do not have a meaning
… to ted - datatype are not opaque in the same way as general IRIs

james: could interpret it as an errata in the other direction
… lisp has "undef" clauses in the spec

<pchampin> +1 to provide a way for implementation to be explicit about what they are doing with ill-formed literals

ora: suggest we see what data pchampin has collected

<Zakim> TallTed, you wanted to ask about application-specific datatypes, which might be subtypes (or supertypes) of xsd

TallTed: at the moment, anyone can define a datatype
… works until it encounters a system that drops them

<niklasl> https://www.w3.org/TR/rdf12-concepts/#dfn-recognized-datatype-iri

AndyS: the text refers to "recognized datatypes" that a system must declare.
… So I don't think that TallTed's concern will happen.
… I don't know systems that declare the datatypes they recognize.

<ora> STRAWPOLL: table (@en-us) until we have seen pchampin's data

w3c/rdf-concepts#60

<gb> Issue 60 Drop the requirement to support ill-typed literals with recognized datatype IRIs (by wouterbeek) [needs discussion] [spec:enhancement]

<ora> STRAWPOLL: table (@en-us) w3c/rdf-concepts#60 until we have seen pchampin's data

<gtw> +1

<ora> +1

<gkellogg> +1

<tl> +1

<pchampin> +1

<ktk> +1

<Souri> +1

<james> +1

<niklasl> +1

<TallTed> +1

<AndyS> +1

<doerthe> +1

<william_vw> +1

<AZ> +1

<eBremer> +1

<Tpt> +1

<olaf> +1


@afs
Copy link
Contributor

afs commented Dec 5, 2024

@pchampin
Copy link
Contributor

pchampin commented Dec 5, 2024

As a quick way of testing how different implementations handle ill-typed literals (for recognized datatypes), I tested the following SPARQL query on different SPARQL endpoint / libraries:

SELECT ("a"^^<http://www.w3.org/2001/XMLSchema#integer> as ?x) {}

The following implementation return "a"^^xsd:int:

The following implementations raise an error

I'm not able to test on other implementations at this moment, but if others can test on Amazon Neptune, Oxigraph, others... I'm interested.

edited thanks to @lisp for providing more results

@afs
Copy link
Contributor

afs commented Dec 6, 2024

Useful background.

RDF 1.1 Concepts: 5.4

RDF processors are not required to recognize datatype IRIs

The only spec constraint is not redefining xsd:*, rdf:XMLiteral, rdf:HTML or rdf:JSON.

Unless a system advertises (in its documentation?) the set of recognized datatypes, we have to presume there aren't any.

A web search didn't turn up any hits for recognized datatypes.

Any "MUST" or "SHOULD" is on the elements of the set D so as non-empty is optional, it isn't a strong condition. It exists for building higher layers of the stack but not SPARQL.

@TallTed
Copy link
Member

TallTed commented Dec 6, 2024

Since you included version info for (almost) all the other engines, please note that the Virtuoso instance currently hosting DBpedia is version 08.03.3332 (53ce9926d2). This info is in the footer of the /sparql page, as well as being queryable via SPARQL.

@jaw111
Copy link

jaw111 commented Dec 7, 2024

Dydra

$ curl https://dydra.com/nlv01111/gsp/sparql \
  -H "Accept: application/sparql-results+json" \
  -H "Content-Type: application/sparql-query" \
  --data-binary 'SELECT ("a"^^<http://www.w3.org/2001/XMLSchema#integer> as ?x) {}' \
  -is
HTTP/1.1 200 OK
Server: nginx/1.23.3
Date: Sat, 07 Dec 2024 12:55:57 GMT
Content-Type: application/sparql-results+json; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Last-Modified: Tue, 27 Dec 2022 13:50:14 GMT
Etag: 6BAC9724-9A0B-5941-8ADA-3B4C44295565
Vary: Accept, Accept-Datetime, Accept-Encoding, Origin, Revision
Accept-Ranges: none
Cache-Control: private
Request-Id: 99E41F7C-B49A-11EF-BF49-F02F7494A8ED
Strict-Transport-Security: max-age=31536000
Access-Control-Max-Age: 86400
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: *
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Accept, Accept-Asynchronous, Accept-Datetime, Asynchronous-Content-Type, Asynchronous-Location, Asynchronous-Method, Authorization, Content-Disposition, Content-Encoding, Content-Type, Etag, Graph, Introspection-Content-Type, Link, Location, Revision, X-Requested-With
Access-Control-Expose-Headers: *

{ "head": { "vars": [ "x" ] },
   "results": {
   "bindings": [
 { "x": {"type":"literal", "datatype":"http://www.w3.org/2001/XMLSchema#integer", "value":"a"} } ] } }

@jaw111
Copy link

jaw111 commented Dec 8, 2024

Neptune (1.2.1.0)

$ curl "https://${NEPTUNE_HOST}:8182/sparql" \
  --aws-sigv4 "aws:amz:eu-west-1:neptune-db" \
  --user "${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}" \
  -H "x-amz-security-token: ${AWS_SESSION_TOKEN}" \
  -H "Accept: application/sparql-results+json" \
  -H "Content-Type: application/sparql-query" \
  --data-binary 'SELECT ("a"^^<http://www.w3.org/2001/XMLSchema#integer> as ?x) {}' \
  -is
HTTP/1.1 200 OK
transfer-encoding: chunked
content-type: application/sparql-results+json; charset=UTF-8

{
  "head" : {
    "vars" : [ "x" ]
  },
  "results" : {
    "bindings" : [ {
      "x" : {
        "datatype" : "http://www.w3.org/2001/XMLSchema#integer",
        "type" : "literal",
        "value" : "a"
      }
    } ]
  }
}

@Tpt
Copy link
Contributor

Tpt commented Dec 8, 2024

Blazegraph returns an error: https://w.wiki/CM8o

Oxigraph returns "a"^^xsd:integer as expected

@afs
Copy link
Contributor

afs commented Dec 12, 2024

More background:
https://www.w3.org/TR/rdf12-semantics/#datatypes

RDF processors are not required to recognize any datatype IRIs other than rdf:langString and xsd:string ...

@afs
Copy link
Contributor

afs commented Dec 12, 2024

An alternative approach is to put the "recognized dataypes" concept into RDF Semantics.

  • The terminology RDF Processor isn't defined.
  • RDF Concepts does not require any datatypes to be recognized.
  • RDF Semantics needs recognized datatypes as a sound basis for semantic extensions.

Idea: define "RDF Processor" as a system that implements RDF Semantics (and not simple semantics). We could have, informally, RDFS processor, D-entailment processor, simple processor etc.

This would go in RDF Semantics where it is needed.

Remove the "recognized datatype" content from RDF Concepts.

SPARQL is not affected nor is any SPARQL implementation's handling of datatypes.

@rat10
Copy link

rat10 commented Dec 12, 2024

Why do we need to continue this discussion?

  • AFAIKT the collected data from real world implementations suggests that the vast majority of implementations follow the specification in that they accept literals with ill-formed datatype declarations. So the working assumption that the spec should be corrected to follow standard practice seems to be refuted.
  • There have been arguments brought forward why the present arrangement is useful, e.g. that the data may still be useful even if the datatype declaration doesn't fit the content.
  • The request has been made that before we ponder such a change we should check why the MUST was put up in the first place, i.e. dig up the respective discussions in RDF 1999/1.0/1.1 WG archives. That seems like a reasonable request to me, however it also looks like a lot of work, and we still have a lot of other work to do.

@afs
Copy link
Contributor

afs commented Dec 12, 2024

Recognized datatype comes from RDF MT (Semantics) datatype maps.

RDF New , section "4. Semantics"

Datatype entailment formally refers to a set of 'recognized' datatypes, replacing datatype maps in RDF 1.0 Semantics, but this does not have any effect on implementation.

Datatype entailment formally refers to a set of 'recognized' datatype IRIs. The RDF 1.0 Semantics used the concept of a datatype map: in the new semantic description, this is the mapping from recognized IRIs to the datatypes they identify. This change does not have any effect on implementation or semantic entailments.

@ktk
Copy link

ktk commented Dec 14, 2024

I don't think the SPARQL test alone is accurate. Stardog seems to accept it in SPARQL, but by default it will refuse it on loading, as "strict parsing" is true by default. See docs here https://docs.stardog.com/operating-stardog/database-administration/database-configuration#strictparsing

@afs
Copy link
Contributor

afs commented Dec 15, 2024

Other tests would be:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ("500"^^xsd:byte as ?x) {}

xsd:byte has the same lexical representation as integer; it has minExclusive and maxExclusive restrictions.

To see if there is a numeric value:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ("500"^^xsd:byte - 400 as ?x2) {}

@w3cbot
Copy link

w3cbot commented Dec 19, 2024

This was discussed during the #rdf-star meeting on 19 December 2024.

View the transcript

Drop the requirement to support ill-typed literals with recognized datatype IRIs 3

ktk: what is the conclusion on this one?

AndyS: I dug around in the history a bit. Found that rdf 1.1 mentions this.
… mention in concepts was new in 1.1. Datatypes map from 1.0 that was in the model theory.
… can find no other use of requried datatypes other than in semantics.
… semantics needs it to give us a base for building extensions on top of.
… can find no system with documentation with "supported datatypes".
… my proposal is to put it back into 1.1 semantics.
… certain datatypes required, not mentioned in concepts.

gkellogg: we're not modifying 1.1 semantics.

AndyS: we're keeping it in 1.2 semantics, but not having it in 1.2 concepts.

<Zakim> pfps, you wanted to ask whether we are talking about required or recognized data types

pfps: I'm confused. Are we talking about recognized or required datatypes.

AndyS: phrase is "required"

pfps: items have "recognized".
… item we are on is "Drop the requirement to support ill-typed literals with recognized datatype IRIs", not "required datatypes".

AndyS: should have said "recognized". semantics does put requirements on the "recognized" set.

pfps: not sure why that is relevant.

AndyS: I am suggesting taking it out of concepts. Seems to have no purpose except to support extensions.

pfps: why is that relevant to discussion of "recognized" datatypes?

pchampin: the normative statement we are discussing is from rdf concepts.
… what AndyS is suggesting is that we remove that statement as well as all other metnions of "recognized datatypes" because they don't have interest in rdf concepts.
… would solve issue and make specs more consistent.

ora2: the only required datatypes are string and something else...?

AndyS: in rdf semantics, string and langString. I presume we would add langString as well.
… in rdf 1.0, it's not described like that, but also rdf:XML which was mandated.
… that's now a datatype like any other.

gkellogg: as well as rdf:HTML

AndyS: and a few others. those were never required.

james: I'm trying to understand the consequence of removing the requirement for recognized datatypes.
… would that mean that treatment of datatypes of terms where the syntax does not match that specified for the datatype, that treatment becomes undefined?

AndyS: no, because in RDF concepts, we only have datatype terms. we don't have any meaning to the datatypes.
… it's just a lexical form and a datatype.
… if you're not matching the datatype, whether you understand it or not, it's outside of rdf concepts.
… base to build simple semantics. from there, other entailments defined in 1.2 document...

gkellogg: I think it comes down to XML scheme built-in datatypes are recommended but not required in concepts.
… has the notion of value space as well as lexical space.
… concern is about xsd types that are malformed.
… given that they are recommended in concepts, implementations do not need to recognize these datatypes but we don't say what the behavior should be if they are recognized.
… especially regarding ill-typed.

AndyS: xsd datatypes are really there for different purpose.
… e.g. if we're going to have numbers/dates/etc., let's use that instead of a new way to do that.

pchampin: looking at rdf 1.1 semantics, I seem to remember there was something that gave some implementation leeway when the graph is unsatisfiable.
… if I'm right, then this would contradict the statement we're talking about in concepts.
… an ill-typed literal would make the graph unsatisfiable.
… that's why I like the proposal to defer this to semantics. the reason to not support ill-formed recognized datatypes is since the graph is inconsistent, the impl may not want to bother with dealing with that and just throw an error.

<niklasl> "If the literal is ill-typed then the L2V(I(aaa)) mapping has no value, and so the literal cannot denote anything. In this case, any triple containing the literal must be false. Thus, any triple, and hence any graph, containing an ill-typed literal will be D-unsatisfiable, i.e. false in every D-interpretation." from

<niklasl> https://www.w3.org/TR/rdf12-semantics/#D_interpretations

pchampin: I think that makes sense. Somehow permited by rdf semantics at this point, so why should it be forbidden by concepts.

pfps: in rdf 1.1 semantics, there was a change.
… in 1.0 ill-typed literals did not make the graph inconsistent.
… in 1.1, they do.

pchampin: maybe this statement in concepts in related to that in 1.0.

ora2: this makes me think since this is hard for us, we should somehow spell out exactly what it is that we are hoping/expecting implementations to do.

AndyS: my hope is it makes no difference.
… the set of recognized datatypes really has to be something implementations declare.
… as far as i can see, nobody is doing that.
… not the case of syntactic conformance to the datatype.
… in XSD, the numbers have value constraints as well as lexical constraints.

ora2: how are implementations supposed to declare this?

AndyS: we don't have mechanisms to do that.

ora2: only remotely remeniscent mechanism is SPARQL implementations disclosing entailment regimes.

AndyS: most in SPARQL happens because you perform arithmatic on datatypes. rules from F&O applies even when you're in simple entailment.
… that's why it's difficult to probe what SPARQL systems do.

ora2: let's for a moment ignore that we don't have such a mechanism. pretend there is one. we should have a clear recipe for implementations as to how they should behave.
… or not behave.

AndyS: at the moment, says "MUST accept ill-defined literals".

<AZ> Re. what pchampin said: In RDF 1.1 Semantics, Section 7.2 says "RDF processors MAY treat an unsatisfiable graph as signaling an error condition, but this is not required."

<pchampin> https://www.w3.org/TR/rdf11-semantics/#D_entailment

pchampin: I found what I had remembered. section 7.2.
… rdf 1.1 concepts and semantics are inconsistent with each other.
… "you MUST support" vs. "you MAY signal an error"

ora2: that and the fact that ill-formed literals lead to unsatisfiable graph.

pchampin: yes. since rdf 1.1.
… RDF processors may signal an error.

james: MAY do that if they're doing D-entailment is what you read.

pchampin: only makes sense with D-entailment.

james: not required anywhere that they do D-entailment.
… different for SPARQL processors vs. RDF stores.

pchampin: I agree. The phrasing refers to "recognized types", which I think is only defined with respect to D-entailment.

ora2: we would do a service to the community if we spelled this out for people.

AndyS: if we remove it, we're doing what everyone has been doing up to now.

ora2: not only spell it out but define anomalous behavior.

AndyS: nowhere in the specs touches on this as a round-trip requirement.
… just "RDF processors" with undefined behavior.
… valuable for products to make product decisions about what is allowed.

james: the simplest simplistic formulation might be if you are doing D-entailment, you MUST reject. If not, you MUST accept.
… people might not conform for product reasons, but would improve expectations.

AZ: I think what RDF concepts and semantics say is not necessarily inconsistent.
… may be talking about different things.

<pchampin> james, I'm not sure I like "improving expectations" if we envision that tools won't meet these expectations

AZ: example of ontologies in OWL can be inconsistent. You can have tools that open inconsistent ontologies like Protoge.
… no problem. But when you try to do reasoning, then it brings an error.
… maybe what concepts is talking about is different than what semantics is talking about.

<pfps> I agree with Antoine, actually the reasoning might not even signal an "error" but just complain about inconsistency and proceed.

AZ: parser should handle any literal. but if you want to do some processes later on the data you parsed, maybe can raise an error. but that's at a different level.
… this is what should be specified more clearly.

ora2: so a layered approach. parsing can turn into a graph, but can't do whatever reasoning you want. different things.

<pchampin> +1 AZ, the problem of this MUST is that it is not granular enough

ora2: still, I think we need to be clear on this and spell it out.

AndyS: don't we have the structure for saying that? Isn't that simple entailment?

ora2: if you don't do D-entailment, isn't that what we conluded a moment ago?

AndyS: everything must be build on simple entailment. that defines truth.

pchampin: I really like AZ's suggestion.
… talking about simple entailment is not the same thing.
… some components don't do entailment. parser is just here to transform a string into triples.
… like the idea that a parser should forward any triple it finds that is syntactically valid.
… but then the storage layer may reject based on some storage constraints.
… I don't like the MUST because it's not granular enough.
… if we talk about entailment, we are already pinning this to a specific kind of component.
… I don't think that's the right thing to do.
… I wouldn't like my turtle parser to choke on ill-formed literals. Happy for storage to not allow me to put them in.

niklasl: I agree with that from user perspective.

<james> pchampin: clarity as to which component supports what would reduce ambiguity.

niklasl: if I had system with entailment built in. If I know that my implementation has D-entailment.

<pchampin> and that's where "SHOULD support" would leave it to implementers discretion

niklasl: more problem having it keep around boolean values with invalid values. not the contract I expect it to fill.

ora2: I will try to take a stab at writing this out in an understandable way.

AndyS: related thing is are we going to take it out of concepts? leave it just in semantics.

ora2: I don't know how to answer that. Think we need a paragraph or two somewhere that spells it out.

<TallTed> I think moving it to semantics makes most sense

<TallTed> yes, spelled out better would also be good. :-) I'll try to help with text once it hits a PR.

<niklasl> +1


@afs
Copy link
Contributor

afs commented Dec 21, 2024

Remove the "recognized datatype" content from RDF Concepts.

RDF-Semantics does not reference RDF-Concepts for recognized datatypes.
It has its own definition dfn-recognized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion Proposed for discussion in an upcoming meeting spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial
Projects
None yet
Development

Successfully merging a pull request may close this issue.