Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify data model regarding binary data #942

Closed
wants to merge 2 commits into from

Conversation

handrews
Copy link
Contributor

It is advantageous in hypermedia environments to apply untyped schemas to binary data, despite the lack of a truly suitable mapping into the data model. This removes some JSON-specific language and provides additional guidance on media types.

See OAI/OpenAPI-Specification#2200 for the detailed use case. Prior to that PR, OAS was modeling unencoded binary data as strings with a custom format.

The expansion of contentMediaType is motivated by the need to indicate media types for binary resources in hypermedia environments.

Existing usage (e.g. OpenAPI 3.0) considered unencoded binary data to be strings, but such data violates the expectations of JSON strings. A better approach is to purely indicate the media type and avoid constraining the instance by JSON type as no JSON type is suitable.

It is advantageous in hypermedia environments to apply untyped
schemas to binary data, despite the lack of a truly suitable
mapping into the data model.
This expansion of contentMediaType is motivated by the need to
indicate media types for binary resources in hypermedia environments.

Existing usage (e.g. OpenAPI 3.0) considered unencoded binary data
to be strings, but such data violates the expectations of JSON strings.
A better approach is to purely indicate the media type and avoid
constraining the instance by JSON type as no JSON type is suitable.
@awwright
Copy link
Member

awwright commented Jun 1, 2020

I'm trying to understand what this gets us.

Trying to use JSON Schema with non-JSON media types isn't prohibited, just undefined. Other people may attempt to define behavior in these cases, though. I don't think it needs any special treatment.

There's also this paragraph that specifies how JSON Schema applies to media types merely compatible with JSON:

JSON Schema is only defined over JSON documents. However, any document or memory structure that can be parsed into or processed according to the JSON Schema data model can be interpreted against a JSON Schema, including media types like CBOR.

@handrews
Copy link
Contributor Author

handrews commented Jun 1, 2020

@awwright did you look at the OAS PR that I linked?

Copy link
Member

@ssilverman ssilverman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this now describes that there can exist an application-defined mapping of arbitrary data into the data model. I've tried to highlight what I believe needs a little more clarification for this concept.

<t>
Binary data MAY be treated as an instance, however no data type in the data model
is suitable. Therefore, only schemas such as the empty schema that do not
constrain the type can be considered to pass. Rationales for and behavior of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think commas may be required: "Rationales for, and behavior of,"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this instance.

@@ -219,6 +223,12 @@
<t hangText="string:">A string of Unicode code points, from the JSON "string" value</t>
</list>
</t>
<t>
Binary data MAY be treated as an instance, however no data type in the data model
is suitable. Therefore, only schemas such as the empty schema that do not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added paragraph above this one says that it's possible there's some application-defined mapping from binary data to the JSON data model. The wording of this paragraph seems to waffle between "MAY be treated as an instance" and "can't do anything unless it's an empty schema". Maybe add some language that suggests that a schema can be applied if that data model is applied. For example (changes in bold), "Binary data MAY be treated as an instance, however no data type in the data model is directly suitable. Therefore, only schemas such as the empty schema that do not constrain the type can be considered to pass outright."

I'm sure there's some better language, but this paragraph feels like it's fighting with itself if there's no acknowledgement that there exists that "application-defined mapping" in the previous paragraph.

All keywords in this section generally apply to strings, and have no
effect on other JSON data types. Additionally, they MAY be used without
type information when describing resources of other media types, subject
to certain restrictions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could one or two examples of restrictions be given here?

to fail validation.
</t>
<t>
The optional automatic decoding, parsing, and validating
process SHOULD be equivalent to fully evaluating the instance against
the original schema, followed by using the annotations to decode, parse,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say instead, "The optional automatic decoding, parsing, and validating process SHOULD be equivalent to fully evaluating the instance against process SHOULD be equivalent to fully evaluating an instance against a schema, followed by using the annotations to decode, parse the original schema". The thing is, this section is about applying schemas or rules to encoded "content". if we use "evaluating the instance against the original schema", then it feels like there's confusion between the schema containing all this stuff vs. the stuff inside "content". Does this make sense?

jsonschema-validation.xml Show resolved Hide resolved
@@ -1015,8 +1031,9 @@

<section title="contentSchema">
<t>
If the instance is a string, and if "contentMediaType" is present, this
property contains a schema which describes the structure of the string.
If the instance is a string or an untyped binary resource,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about: "If the instance is a string or an untyped binary resource having an application-defined mapping to the data model"

@Relequestual
Copy link
Member

Relequestual commented Jun 11, 2020

I've read this PR, it's content, and the associated OAS PR, but I still can't understand what this is about.
I can explain which parts I don't understand, but I don't know if doing so is a good use of your time.

That being said, if you feel you'd like me to understand and approve this seemingly simple PR, I'm happy to detail what I'm missing.

I'd like to understand it at some point... =D

Copy link
Member

@Relequestual Relequestual left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Finally get it.

I think you previously assumed knowledge of the OpenAPI spec.
I was looking for examples of the actual binary data in the OAS examples, but it wasn't there, which is why I was confused.

@handrews
Copy link
Contributor Author

Closes #941 (which I forgot to link)

Copy link
Member

@awwright awwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is adding a lot of language that is not going to be relevant to most cases, which is confusing. It's describing a case that weren't even invalid before, just undefined (which means another specification, like OpenAPI, can define more specific behavior instead).

Do we specifically have to call out "binary data"? Or can we talk about a superset of JSON instead, like YAML or native ECMAScript values (which looks similar, but has strange cases like NaN and recursive references).

Maybe say something like

It is possible to apply JSON Schema to a superset of this data model; in this case, an instance may be outside any of the six primitive types. Implementations may invent new types that are a superset of existing types (similar to how "number" is a superset of "integer"), or a type disjoint from all six types altogether (like "binary").

Comment on lines +1017 to +1020
this property describes the decoded string. If the "type" keyword is
absent, this keyword MAY be interpreted as describing an unencoded binary
resource. The exact meaning and behavior of this untyped usage is
application-defined.
Copy link
Member

@awwright awwright Jun 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively permitting values outside the data model, which sort of defeats the point of having a data model, doesn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awwright I like your idea of simply extending the data model. I'd probably recommend that we just add binary rather than throwing it entirely open, but I could be persuaded on that point.

What's going on here is that I'm recognizing a thing that a significant number of people (OpenAPI users) were already doing in the wild, in a way that was more problematic. OAS 3.0 treats binary data as "type": "string" plus a custom format, which is definitely wrong. Unencoded binary data cannot be considered a JSON string in any reasonable universe.

Since OAS 3.1 is picking up the content* keywords, they were willing to drop the custom format and use of "type": "string" as long as there was something that they could recommend for this use case. This is a scenario where nothing's going to be validated, but a JSON Schema is used for descriptive purposes. Which is fine with content* because they don't do validation anyway. They are annotations, so like all annotations, their behavior is application-defined. This is particularly unusual behavior so it is called out. Also, we want it called out so people don't come up with worse "solutions" again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue, let's just call binary for now, and if other use cases present, then open it up later.
I'd like to avoid speculation and stay grounded on use cases right infront of our faces first.
=]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Relequestual @awwright just to clarify, we're now proposing to add binary as a type?

I feel like this should get an issue for visibility so that others can comment on it. It's a major change from what I worked out with OAS. I'm reasonably OK with making the change, but it's kind of a big deal.

There are also alternatives such as a "binary": true approach. This would avoid breaking compatibility with anything that hardcodes the current set of valid type values.

I will file an issue for this.

@Relequestual Relequestual linked an issue Jul 16, 2020 that may be closed by this pull request
@awwright
Copy link
Member

awwright commented Jul 19, 2020

(re #942 (comment) from review)

@handrews I don't think we should add things to JSON Schema that are not supported by application/json. And JSON does not support binary data, of course. So the specification defining "type": "binary" I think is out.

However whatever we write, the potential applications should be clear. If we just say "binary data" it's not clear to me how you get binary data into an instance. So a brief example may be in order.

Or we just say "Schemas may be applied to supersets of the data model defined here; we won't go into all the details but all of the rules still work as written: type-specific rules such as "minimum" don't apply, but generic keywords such as "allOf" and "format" do."

@handrews
Copy link
Contributor Author

@awwright I'm trying to figure out how that's different than what I'm doing. Is the main goal here to change this:

Documents of other media types MAY be treated as instances
if a suitable application-defined mapping of the media type into the
data model can be determined.

to say "superset of the data model"?

@awwright
Copy link
Member

@handrews That paragraph, as it is, doesn't suggest to me that JSON Schema supports a superset of the data model/domain.

So, to rephrase, I have one of two suggestions:

  1. Remove the reference to "binary data" and just say "superset", and let other documents like OAI get to decide what that means, or

  2. Include some example of a document that supports "binary data" because it's not obvious what that means in a document that's only supposed to be talking about application/json (or other strictly equivalent serializations).

@Relequestual
Copy link
Member

@awwright given @handrews tumbedup your suggestion, care to make a change suggestion to this PR?
When making a review on code, clicking the +- button allows you to make direct change suggestions, which can then be easily added to the PR.

@handrews
Copy link
Contributor Author

@Relequestual @awwright yes, please feel free to do this. I'll get to it at some point but TBH personal things keep coming up 😐

@awwright
Copy link
Member

awwright commented Aug 1, 2020

Mission accepted

@awwright
Copy link
Member

I haven't forgotten about this... I've re-read Core a couple times and I don't think there's anything that prohibits using annotations for non-JSON documents. So would anyone be adverse to me modifying the the Data Model and some other sections to offer better advice for these situations?

I'd issue this as a new PR because that'll be easier for me, than trying to figure out what parts of this PR to amend.

@handrews
Copy link
Contributor Author

@awwright sounds good to me, I'll close this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow "contentMediaType" without "type"
4 participants