Clarify data model regarding binary data #942

handrews · 2020-05-31T22:33:07Z

It is advantageous in hypermedia environments to apply untyped schemas to binary data, despite the lack of a truly suitable mapping into the data model. This removes some JSON-specific language and provides additional guidance on media types.

See OAI/OpenAPI-Specification#2200 for the detailed use case. Prior to that PR, OAS was modeling unencoded binary data as strings with a custom format.

The expansion of contentMediaType is motivated by the need to indicate media types for binary resources in hypermedia environments.

Existing usage (e.g. OpenAPI 3.0) considered unencoded binary data to be strings, but such data violates the expectations of JSON strings. A better approach is to purely indicate the media type and avoid constraining the instance by JSON type as no JSON type is suitable.

It is advantageous in hypermedia environments to apply untyped schemas to binary data, despite the lack of a truly suitable mapping into the data model.

This expansion of contentMediaType is motivated by the need to indicate media types for binary resources in hypermedia environments. Existing usage (e.g. OpenAPI 3.0) considered unencoded binary data to be strings, but such data violates the expectations of JSON strings. A better approach is to purely indicate the media type and avoid constraining the instance by JSON type as no JSON type is suitable.

awwright · 2020-06-01T01:29:18Z

I'm trying to understand what this gets us.

Trying to use JSON Schema with non-JSON media types isn't prohibited, just undefined. Other people may attempt to define behavior in these cases, though. I don't think it needs any special treatment.

There's also this paragraph that specifies how JSON Schema applies to media types merely compatible with JSON:

JSON Schema is only defined over JSON documents. However, any document or memory structure that can be parsed into or processed according to the JSON Schema data model can be interpreted against a JSON Schema, including media types like CBOR.

handrews · 2020-06-01T02:45:46Z

@awwright did you look at the OAS PR that I linked?

ssilverman

I like that this now describes that there can exist an application-defined mapping of arbitrary data into the data model. I've tried to highlight what I believe needs a little more clarification for this concept.

ssilverman · 2020-06-11T15:11:16Z

jsonschema-core.xml

+                    <t>
+                        Binary data MAY be treated as an instance, however no data type in the data model
+                        is suitable.  Therefore, only schemas such as the empty schema that do not
+                        constrain the type can be considered to pass.  Rationales for and behavior of


I think commas may be required: "Rationales for, and behavior of,"

Not in this instance.

ssilverman · 2020-06-11T15:18:41Z

jsonschema-core.xml

@@ -219,6 +223,12 @@
                            <t hangText="string:">A string of Unicode code points, from the JSON "string" value</t>
                        </list>
                    </t>
+                    <t>
+                        Binary data MAY be treated as an instance, however no data type in the data model
+                        is suitable.  Therefore, only schemas such as the empty schema that do not


The added paragraph above this one says that it's possible there's some application-defined mapping from binary data to the JSON data model. The wording of this paragraph seems to waffle between "MAY be treated as an instance" and "can't do anything unless it's an empty schema". Maybe add some language that suggests that a schema can be applied if that data model is applied. For example (changes in bold), "Binary data MAY be treated as an instance, however no data type in the data model is directly suitable. Therefore, only schemas such as the empty schema that do not constrain the type can be considered to pass outright."

I'm sure there's some better language, but this paragraph feels like it's fighting with itself if there's no acknowledgement that there exists that "application-defined mapping" in the previous paragraph.

ssilverman · 2020-06-11T15:37:49Z

jsonschema-validation.xml

+                    All keywords in this section generally apply to strings, and have no
+                    effect on other JSON data types.  Additionally, they MAY be used without
+                    type information when describing resources of other media types, subject
+                    to certain restrictions.


Could one or two examples of restrictions be given here?

ssilverman · 2020-06-11T15:46:56Z

jsonschema-validation.xml

+                    to fail validation.
+                </t>
+                <t>
+                    The optional automatic decoding, parsing, and validating
                    process SHOULD be equivalent to fully evaluating the instance against
                    the original schema, followed by using the annotations to decode, parse,


Should this say instead, "The optional automatic decoding, parsing, and validating process SHOULD be equivalent to fully evaluating the instance against process SHOULD be equivalent to fully evaluating an instance against a schema, followed by using the annotations to decode, parse the original schema". The thing is, this section is about applying schemas or rules to encoded "content". if we use "evaluating the instance against the original schema", then it feels like there's confusion between the schema containing all this stuff vs. the stuff inside "content". Does this make sense?

jsonschema-validation.xml

ssilverman · 2020-06-11T15:48:47Z

jsonschema-validation.xml

@@ -1015,8 +1031,9 @@

            <section title="contentSchema">
                <t>
-                    If the instance is a string, and if "contentMediaType" is present, this
-                    property contains a schema which describes the structure of the string.
+                    If the instance is a string or an untyped binary resource,


What about: "If the instance is a string or an untyped binary resource having an application-defined mapping to the data model"

Relequestual · 2020-06-11T16:40:00Z

I've read this PR, it's content, and the associated OAS PR, but I still can't understand what this is about.
I can explain which parts I don't understand, but I don't know if doing so is a good use of your time.

That being said, if you feel you'd like me to understand and approve this seemingly simple PR, I'm happy to detail what I'm missing.

I'd like to understand it at some point... =D

Relequestual

Approved! Finally get it.

I think you previously assumed knowledge of the OpenAPI spec.
I was looking for examples of the actual binary data in the OAS examples, but it wasn't there, which is why I was confused.

handrews · 2020-06-25T20:48:30Z

Closes #941 (which I forgot to link)

awwright

Overall I think this is adding a lot of language that is not going to be relevant to most cases, which is confusing. It's describing a case that weren't even invalid before, just undefined (which means another specification, like OpenAPI, can define more specific behavior instead).

Do we specifically have to call out "binary data"? Or can we talk about a superset of JSON instead, like YAML or native ECMAScript values (which looks similar, but has strange cases like NaN and recursive references).

Maybe say something like

It is possible to apply JSON Schema to a superset of this data model; in this case, an instance may be outside any of the six primitive types. Implementations may invent new types that are a superset of existing types (similar to how "number" is a superset of "integer"), or a type disjoint from all six types altogether (like "binary").

awwright · 2020-06-28T07:29:38Z

jsonschema-validation.xml

+                    this property describes the decoded string.  If the "type" keyword is
+                    absent, this keyword MAY be interpreted as describing an unencoded binary
+                    resource.  The exact meaning and behavior of this untyped usage is
+                    application-defined.


This is effectively permitting values outside the data model, which sort of defeats the point of having a data model, doesn't it?

@awwright I like your idea of simply extending the data model. I'd probably recommend that we just add binary rather than throwing it entirely open, but I could be persuaded on that point.

What's going on here is that I'm recognizing a thing that a significant number of people (OpenAPI users) were already doing in the wild, in a way that was more problematic. OAS 3.0 treats binary data as "type": "string" plus a custom format, which is definitely wrong. Unencoded binary data cannot be considered a JSON string in any reasonable universe.

Since OAS 3.1 is picking up the content* keywords, they were willing to drop the custom format and use of "type": "string" as long as there was something that they could recommend for this use case. This is a scenario where nothing's going to be validated, but a JSON Schema is used for descriptive purposes. Which is fine with content* because they don't do validation anyway. They are annotations, so like all annotations, their behavior is application-defined. This is particularly unusual behavior so it is called out. Also, we want it called out so people don't come up with worse "solutions" again.

I would argue, let's just call binary for now, and if other use cases present, then open it up later.
I'd like to avoid speculation and stay grounded on use cases right infront of our faces first.
=]

@Relequestual @awwright just to clarify, we're now proposing to add binary as a type?

I feel like this should get an issue for visibility so that others can comment on it. It's a major change from what I worked out with OAS. I'm reasonably OK with making the change, but it's kind of a big deal.

There are also alternatives such as a "binary": true approach. This would avoid breaking compatibility with anything that hardcodes the current set of valid type values.

I will file an issue for this.

awwright · 2020-07-19T18:55:50Z

(re #942 (comment) from review)

@handrews I don't think we should add things to JSON Schema that are not supported by application/json. And JSON does not support binary data, of course. So the specification defining "type": "binary" I think is out.

However whatever we write, the potential applications should be clear. If we just say "binary data" it's not clear to me how you get binary data into an instance. So a brief example may be in order.

Or we just say "Schemas may be applied to supersets of the data model defined here; we won't go into all the details but all of the rules still work as written: type-specific rules such as "minimum" don't apply, but generic keywords such as "allOf" and "format" do."

handrews · 2020-07-19T19:04:09Z

@awwright I'm trying to figure out how that's different than what I'm doing. Is the main goal here to change this:

Documents of other media types MAY be treated as instances
if a suitable application-defined mapping of the media type into the
data model can be determined.

to say "superset of the data model"?

awwright · 2020-07-19T22:32:14Z

@handrews That paragraph, as it is, doesn't suggest to me that JSON Schema supports a superset of the data model/domain.

So, to rephrase, I have one of two suggestions:

Remove the reference to "binary data" and just say "superset", and let other documents like OAI get to decide what that means, or
Include some example of a document that supports "binary data" because it's not obvious what that means in a document that's only supposed to be talking about application/json (or other strictly equivalent serializations).

Relequestual · 2020-07-31T21:42:13Z

@awwright given @handrews tumbedup your suggestion, care to make a change suggestion to this PR?
When making a review on code, clicking the +- button allows you to make direct change suggestions, which can then be easily added to the PR.

handrews · 2020-07-31T23:33:42Z

@Relequestual @awwright yes, please feel free to do this. I'll get to it at some point but TBH personal things keep coming up 😐

awwright · 2020-08-01T06:33:55Z

Mission accepted

awwright · 2020-08-15T22:42:49Z

I haven't forgotten about this... I've re-read Core a couple times and I don't think there's anything that prohibits using annotations for non-JSON documents. So would anyone be adverse to me modifying the the Data Model and some other sections to offer better advice for these situations?

I'd issue this as a new PR because that'll be easier for me, than trying to figure out what parts of this PR to amend.

handrews · 2020-08-15T23:17:30Z

@awwright sounds good to me, I'll close this out.

handrews added 2 commits May 31, 2020 15:19

Generalize data model language

02c7e74

It is advantageous in hypermedia environments to apply untyped schemas to binary data, despite the lack of a truly suitable mapping into the data model.

handrews added core annotation labels May 31, 2020

handrews added this to the draft-08-patch1 milestone May 31, 2020

handrews requested review from philsturgeon, awwright, Relequestual and ssilverman May 31, 2020 22:33

ssilverman reviewed Jun 11, 2020

View reviewed changes

Relequestual approved these changes Jun 23, 2020

View reviewed changes

awwright requested changes Jun 28, 2020

View reviewed changes

Relequestual linked an issue Jul 16, 2020 that may be closed by this pull request

Allow "contentMediaType" without "type" #941

Closed

handrews added the Status: Blocked label Jul 19, 2020

Relequestual assigned awwright Jul 31, 2020

handrews closed this Aug 15, 2020

awwright mentioned this pull request Aug 17, 2020

Clarify how JSON Schema works with a superset of the defined data model #970

Merged

handrews deleted the bin-content branch October 1, 2020 02:38

Relequestual mentioned this pull request Dec 1, 2020

Clarify contains when applying to an empty array #1042

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify data model regarding binary data #942

Clarify data model regarding binary data #942

handrews commented May 31, 2020

awwright commented Jun 1, 2020

handrews commented Jun 1, 2020

ssilverman left a comment •

edited

Loading

ssilverman Jun 11, 2020

Relequestual Jun 23, 2020

ssilverman Jun 11, 2020

ssilverman Jun 11, 2020

ssilverman Jun 11, 2020

ssilverman Jun 11, 2020

Relequestual commented Jun 11, 2020 •

edited

Loading

Relequestual left a comment

handrews commented Jun 25, 2020

awwright left a comment •

edited

Loading

awwright Jun 28, 2020 •

edited

Loading

handrews Jul 3, 2020

Relequestual Jul 14, 2020

handrews Jul 19, 2020

awwright commented Jul 19, 2020 •

edited

Loading

handrews commented Jul 19, 2020

awwright commented Jul 19, 2020

Relequestual commented Jul 31, 2020

handrews commented Jul 31, 2020

awwright commented Aug 1, 2020

awwright commented Aug 15, 2020

handrews commented Aug 15, 2020

Clarify data model regarding binary data #942

Clarify data model regarding binary data #942

Conversation

handrews commented May 31, 2020

awwright commented Jun 1, 2020

handrews commented Jun 1, 2020

ssilverman left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Relequestual commented Jun 11, 2020 • edited Loading

Relequestual left a comment

Choose a reason for hiding this comment

handrews commented Jun 25, 2020

awwright left a comment • edited Loading

Choose a reason for hiding this comment

awwright Jun 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awwright commented Jul 19, 2020 • edited Loading

handrews commented Jul 19, 2020

awwright commented Jul 19, 2020

Relequestual commented Jul 31, 2020

handrews commented Jul 31, 2020

awwright commented Aug 1, 2020

awwright commented Aug 15, 2020

handrews commented Aug 15, 2020

ssilverman left a comment •

edited

Loading

Relequestual commented Jun 11, 2020 •

edited

Loading

awwright left a comment •

edited

Loading

awwright Jun 28, 2020 •

edited

Loading

awwright commented Jul 19, 2020 •

edited

Loading