Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify 'byte' format #50

Closed
webron opened this issue Apr 29, 2014 · 22 comments
Closed

Clarify 'byte' format #50

webron opened this issue Apr 29, 2014 · 22 comments
Assignees

Comments

@webron
Copy link
Member

webron commented Apr 29, 2014

Basically, should be a binary value encoded in base64.

@webron
Copy link
Member Author

webron commented Mar 5, 2015

@fehguy - just making sure, is that correct? 'byte' is a binary value encoded in base64?

@fehguy
Copy link
Contributor

fehguy commented Mar 5, 2015

Yes I believe so. I'm not positive that there can't be different encoding based on media type though. Might keep this open for further clarification

@wing328
Copy link

wing328 commented May 11, 2015

@fehguy @webron recently a user reported using byte (type=string, format=byte) to document the API. He would expect byte would translate to byte[] in CSharp (both consumes and produces for that endpoint are set to application/octet-stream) but currently byte is mapped to string

In view of the conversation above, I wonder if you can confirm again byte (type=string, format=byte) is mapped to base64-encoded string.

@webron
Copy link
Member Author

webron commented May 11, 2015

@wing328 - do you remember if the user used it for input, output or both?

@wing328
Copy link

wing328 commented May 11, 2015

@webron both as mentioned in swagger-api/swagger-codegen#733

@webron
Copy link
Member Author

webron commented May 12, 2015

@wing328 - FWIW, I think the proper way to document it with Swagger 2.0 would be to set the type to file. That doesn't mean this topic shouldn't be resolved or that we shouldn't find a better representation in the next version of the spec.

@boazsapir
Copy link

I would like to propose a way to resolve this and support both raw binary and base64 encoded input and output, depending on the content type. Please see the detailed proposal bellow.
I also plan to open a pull request with an implementation of a Java code generation which supports binary data, based on this proposed spec update and on what was offered by my colleague in swagger-api/swagger-codegen#669

Swagger Spec Update Proposal – Handling of Binary Input and Output

Goals

  • support raw as well as base64 encoded binary data as input and output of operations documented by Swagger.
  • Clarify the usage of the “byte” format which is currently part of the spec but is not well understood.

Scope

This proposal deals only with binary data passed to the server as a byte array in a POST or PUT request body or returned to the client as the response body of a request. It does not cover the topic of file uploads and downloads.

From the specification language point of view, this proposal only affects operations whose body parameter and/or the response is of type "string" and format "byte"

For example:

Parameter definition:
{
    "name": "BinaryData",
    "in": "body",
    "required": true,
    "schema": {
        "type": "string",
        "format": "byte"
    }
}
Response definition:
    "200": {
        "description": "Encrypted data",
        "schema": {
            "type": "string",
            "format": "byte"
        }
    }

Principle

The expected data format to be passed in the cases mentioned above is determined by the content type specified in the "consumes" / "produces" properties of the operation.

Spec Details

When the body parameter is of type "string" and format "byte", there are two cases depending on the value of the "consumes" property of the operation:

  • for
"consumes": [
    "application/octet-stream"
]

the input data passed by the parameter to the server is expected to be “raw” binary data (not encoded).
For any other value of "consumes", the data should be base64 encoded.
In a similar way, the response data will depend on the value of the "produces" property:

  • for
"produces": [
    "application/octet-stream"
]

The expected response body is “raw” binary data
For any other value of "produces", the data will be base64 encoded
Note that there is no change in the behavior in case of a "string" body parameter or "string" response without the "byte" format.

Code Generation (Java as a non-normative example)

The wrapper method generated by swagger-codegen should have a byte[] input parameter in any case of a "type": "string", "format": "byte" body parameter. The method's code should call base64encode in order to convert the data in case an encoded string is expected by the server (i.e. the consumed content type is not "application/octet-stream"). No conversion should happen in case binary data is expected (i.e. the consumed content type is "application/octet-stream").

Similarly, the wrapper method should have a byte[] return value in any case of a "type": "string", "format": "byte" response. The method's code should call base64decode in order to convert the data in case an encoded string is expected from the server (i.e. the produced content type is not "application/octet-stream"). No conversion should happen in case binary data is expected (i.e. the produced content type is "application/octet-stream").
Note that the decision whether to convert the returned data is based on the content type mentioned in the spec and not by the content type specified in the actual response header.

@webron
Copy link
Member Author

webron commented Jun 5, 2015

@boazsapir - thanks for the elaborate proposal.

We're trying to gather some more information here. The original intent was that { "type": "string", "format": "byte" } is Base64 encoding of binary data as previously mentioned. We can't quite change the definition, but we can find ways to overcome your needs even with the current specification.

What would really help us is if you can find us official resources describing sending binary data using application/octet-stream or with any other mime-type for that matter. Are you aware of any documentation or able to find one?

@boazsapir
Copy link

@webron please refer to this official documentation: http://www.iana.org/assignments/media-types/application/octet-stream

@casualjim
Copy link
Contributor

shouldn't base64 encoding be an artifact of the thing that consumes the particular format.

For example if my endpoint does:

consumes:
  - application/x-protobuf
  - application/json

There are essentially 2 types of formats that we send binary and text based.

For a text based format like json or xml when you specify an array of bytes it has to resort to base64 encoding to make it fit in the wire format.
But for binary formats like protobuf, an encoding step like that is not necessary; those formats can just transfer a byte array as bytes untouched.

With application/octet-stream you're also saying you want a binary format so the same rule applies.
But because it's so general it probably requires extra documentation or some other form of knowledge sharing on how you decode that blob of bytes into something useful.

@boazsapir
Copy link

@casualjim do you agree that in the case of application/octet-stream the automatically generated code (in swagger-codegen) should always send/receive a byte array without encoding?

@webron
Copy link
Member Author

webron commented Jun 15, 2015

@boazsapir - we can't really change the spec. The combination of string+byte doesn't really make sense as a binary representation. We can make clarifications though and can try and find workarounds given the limitations. I'm not yet sure what would be the right course of action here.

@webron
Copy link
Member Author

webron commented Jun 15, 2015

Okay, I've searched a bit around the JSON Schema world and came across a reference to the JSON Schema Hypermedia which we don't really support (for now) but has a possible related solution via the media property.

If you look here and follow the imgData property definition, you'll come across the media property and the binaryEncoding definition. The type of imgData is still string which works well with our own definition.

binaryEncoding accepts values as defined in RFC2045, which provides the values: 7bit, 8bit, binary, quoted-printable, base64 and other tokens as defined by that RFC. The definition of binary is "Binary data" refers to data where any sequence of octets whatsoever is allowed. - so... bingo?

The first catch is that we are not in a position to change the structure of the current spec. For this, I would say let's stick to format and the combination of "type": "string", "format": "binary" would indicate the binary data. This falls within the restrictions of the current structure, and adding that information to the spec will not harm it but allow for clarification.

The second catch is the meaning of 'byte' as a 'format'. Since the original intent was base64, I suggest we keep the meaning as-is right now to avoid conflicts.

I find this could be a good start for better defining file transfers in the next version of the spec, and believe it should satisfy the current needs of the people involved.

@boazsapir, @fehguy, @wing328 - would love to hear your opinions, and please try to find faults with it so we can fix those if they exist prior to moving forward with the tooling support.

@boazsapir
Copy link

@webron this seems to solve my issue of supporting raw binary data, and is even simpler than my proposed spec change as it does not involve differentiation by content type (consumes/produces).
Do you think that, in the case of "format" : "binary", the content type "octet-stream" should be mandatory?

@webron
Copy link
Member Author

webron commented Jun 16, 2015

I don't think we should impose it in this version of the spec as we're just adding clarifications and not adding restrictions. The way it would be interpreted is tool-dependent and we can say that combinations of binary and most mime types would simply not make sense and are those would not be supported by the tools. That should be fine.

@boazsapir
Copy link

ok, so from my perspective your proposal is good

@boazsapir
Copy link

I can provide an implementation by modifying the pull requests I submitted on Sunday

@webron
Copy link
Member Author

webron commented Jun 16, 2015

@boazsapir - I'd like to get @fehguy's feedback before we finalize it, as someone who knows the spec well, to make sure we didn't miss any obvious problems. We also need to make sure this gets cross-tool support.

@wing328
Copy link

wing328 commented Jun 18, 2015

@webron your proposed change looks good me. Thanks!

@fehguy
Copy link
Contributor

fehguy commented Jul 30, 2015

@webron @boazsapir sorry for the lag on this. I think the proposal from @webron is good

@webron
Copy link
Member Author

webron commented Jul 31, 2015

Thanks everyone for the feedback. The spec has been modified to reflect the clarifications in this issue. I'll close this issue as it related to the byte format and open a new one so we can explore better file upload/download support in the future.

@webron
Copy link
Member Author

webron commented Mar 16, 2016

We should take this into account possible in issues: #565 and #579.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants