-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify data model regarding binary data #942
Conversation
It is advantageous in hypermedia environments to apply untyped schemas to binary data, despite the lack of a truly suitable mapping into the data model.
This expansion of contentMediaType is motivated by the need to indicate media types for binary resources in hypermedia environments. Existing usage (e.g. OpenAPI 3.0) considered unencoded binary data to be strings, but such data violates the expectations of JSON strings. A better approach is to purely indicate the media type and avoid constraining the instance by JSON type as no JSON type is suitable.
I'm trying to understand what this gets us. Trying to use JSON Schema with non-JSON media types isn't prohibited, just undefined. Other people may attempt to define behavior in these cases, though. I don't think it needs any special treatment. There's also this paragraph that specifies how JSON Schema applies to media types merely compatible with JSON:
|
@awwright did you look at the OAS PR that I linked? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that this now describes that there can exist an application-defined mapping of arbitrary data into the data model. I've tried to highlight what I believe needs a little more clarification for this concept.
<t> | ||
Binary data MAY be treated as an instance, however no data type in the data model | ||
is suitable. Therefore, only schemas such as the empty schema that do not | ||
constrain the type can be considered to pass. Rationales for and behavior of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think commas may be required: "Rationales for, and behavior of,"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this instance.
@@ -219,6 +223,12 @@ | |||
<t hangText="string:">A string of Unicode code points, from the JSON "string" value</t> | |||
</list> | |||
</t> | |||
<t> | |||
Binary data MAY be treated as an instance, however no data type in the data model | |||
is suitable. Therefore, only schemas such as the empty schema that do not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added paragraph above this one says that it's possible there's some application-defined mapping from binary data to the JSON data model. The wording of this paragraph seems to waffle between "MAY be treated as an instance" and "can't do anything unless it's an empty schema". Maybe add some language that suggests that a schema can be applied if that data model is applied. For example (changes in bold), "Binary data MAY be treated as an instance, however no data type in the data model is directly suitable. Therefore, only schemas such as the empty schema that do not constrain the type can be considered to pass outright."
I'm sure there's some better language, but this paragraph feels like it's fighting with itself if there's no acknowledgement that there exists that "application-defined mapping" in the previous paragraph.
All keywords in this section generally apply to strings, and have no | ||
effect on other JSON data types. Additionally, they MAY be used without | ||
type information when describing resources of other media types, subject | ||
to certain restrictions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could one or two examples of restrictions be given here?
to fail validation. | ||
</t> | ||
<t> | ||
The optional automatic decoding, parsing, and validating | ||
process SHOULD be equivalent to fully evaluating the instance against | ||
the original schema, followed by using the annotations to decode, parse, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say instead, "The optional automatic decoding, parsing, and validating process SHOULD be equivalent to fully evaluating the instance against process SHOULD be equivalent to fully evaluating an instance against a schema, followed by using the annotations to decode, parse the original schema". The thing is, this section is about applying schemas or rules to encoded "content". if we use "evaluating the instance against the original schema", then it feels like there's confusion between the schema containing all this stuff vs. the stuff inside "content". Does this make sense?
@@ -1015,8 +1031,9 @@ | |||
|
|||
<section title="contentSchema"> | |||
<t> | |||
If the instance is a string, and if "contentMediaType" is present, this | |||
property contains a schema which describes the structure of the string. | |||
If the instance is a string or an untyped binary resource, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about: "If the instance is a string or an untyped binary resource having an application-defined mapping to the data model"
I've read this PR, it's content, and the associated OAS PR, but I still can't understand what this is about. That being said, if you feel you'd like me to understand and approve this seemingly simple PR, I'm happy to detail what I'm missing. I'd like to understand it at some point... =D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved! Finally get it.
I think you previously assumed knowledge of the OpenAPI spec.
I was looking for examples of the actual binary data in the OAS examples, but it wasn't there, which is why I was confused.
Closes #941 (which I forgot to link) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is adding a lot of language that is not going to be relevant to most cases, which is confusing. It's describing a case that weren't even invalid before, just undefined (which means another specification, like OpenAPI, can define more specific behavior instead).
Do we specifically have to call out "binary data"? Or can we talk about a superset of JSON instead, like YAML or native ECMAScript values (which looks similar, but has strange cases like NaN and recursive references).
Maybe say something like
It is possible to apply JSON Schema to a superset of this data model; in this case, an instance may be outside any of the six primitive types. Implementations may invent new types that are a superset of existing types (similar to how "number" is a superset of "integer"), or a type disjoint from all six types altogether (like "binary").
this property describes the decoded string. If the "type" keyword is | ||
absent, this keyword MAY be interpreted as describing an unencoded binary | ||
resource. The exact meaning and behavior of this untyped usage is | ||
application-defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is effectively permitting values outside the data model, which sort of defeats the point of having a data model, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awwright I like your idea of simply extending the data model. I'd probably recommend that we just add binary
rather than throwing it entirely open, but I could be persuaded on that point.
What's going on here is that I'm recognizing a thing that a significant number of people (OpenAPI users) were already doing in the wild, in a way that was more problematic. OAS 3.0 treats binary data as "type": "string"
plus a custom format
, which is definitely wrong. Unencoded binary data cannot be considered a JSON string in any reasonable universe.
Since OAS 3.1 is picking up the content*
keywords, they were willing to drop the custom format
and use of "type": "string"
as long as there was something that they could recommend for this use case. This is a scenario where nothing's going to be validated, but a JSON Schema is used for descriptive purposes. Which is fine with content*
because they don't do validation anyway. They are annotations, so like all annotations, their behavior is application-defined. This is particularly unusual behavior so it is called out. Also, we want it called out so people don't come up with worse "solutions" again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue, let's just call binary for now, and if other use cases present, then open it up later.
I'd like to avoid speculation and stay grounded on use cases right infront of our faces first.
=]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Relequestual @awwright just to clarify, we're now proposing to add binary
as a type?
I feel like this should get an issue for visibility so that others can comment on it. It's a major change from what I worked out with OAS. I'm reasonably OK with making the change, but it's kind of a big deal.
There are also alternatives such as a "binary": true
approach. This would avoid breaking compatibility with anything that hardcodes the current set of valid type
values.
I will file an issue for this.
(re #942 (comment) from review) @handrews I don't think we should add things to JSON Schema that are not supported by However whatever we write, the potential applications should be clear. If we just say "binary data" it's not clear to me how you get binary data into an instance. So a brief example may be in order. Or we just say "Schemas may be applied to supersets of the data model defined here; we won't go into all the details but all of the rules still work as written: type-specific rules such as |
@awwright I'm trying to figure out how that's different than what I'm doing. Is the main goal here to change this:
to say "superset of the data model"? |
@handrews That paragraph, as it is, doesn't suggest to me that JSON Schema supports a superset of the data model/domain. So, to rephrase, I have one of two suggestions:
|
@Relequestual @awwright yes, please feel free to do this. I'll get to it at some point but TBH personal things keep coming up 😐 |
Mission accepted |
I haven't forgotten about this... I've re-read Core a couple times and I don't think there's anything that prohibits using annotations for non-JSON documents. So would anyone be adverse to me modifying the the Data Model and some other sections to offer better advice for these situations? I'd issue this as a new PR because that'll be easier for me, than trying to figure out what parts of this PR to amend. |
@awwright sounds good to me, I'll close this out. |
It is advantageous in hypermedia environments to apply untyped schemas to binary data, despite the lack of a truly suitable mapping into the data model. This removes some JSON-specific language and provides additional guidance on media types.
See OAI/OpenAPI-Specification#2200 for the detailed use case. Prior to that PR, OAS was modeling unencoded binary data as strings with a custom
format
.The expansion of contentMediaType is motivated by the need to indicate media types for binary resources in hypermedia environments.
Existing usage (e.g. OpenAPI 3.0) considered unencoded binary data to be strings, but such data violates the expectations of JSON strings. A better approach is to purely indicate the media type and avoid constraining the instance by JSON type as no JSON type is suitable.