Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-Introducing Protocol Buffer Representation #626

Merged
merged 18 commits into from
Sep 30, 2020
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.idea/
gen/
.vscode/
gen/
.DS_Store
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ The following documents are available:
| Kafka Protocol Binding | [v1.0](https://github.com/cloudevents/spec/blob/v1.0/kafka-protocol-binding.md) | [master](https://github.com/cloudevents/spec/blob/master/kafka-protocol-binding.md) |
| MQTT Protocol Binding | [v1.0](https://github.com/cloudevents/spec/blob/v1.0/mqtt-protocol-binding.md) | [master](https://github.com/cloudevents/spec/blob/master/mqtt-protocol-binding.md) |
| NATS Protocol Binding | [v1.0](https://github.com/cloudevents/spec/blob/v1.0/nats-protocol-binding.md) | [master](https://github.com/cloudevents/spec/blob/master/nats-protocol-binding.md) |
| Protobuf Event Format | | [master][proto-working] |
| Web hook | [v1.0](https://github.com/cloudevents/spec/blob/v1.0/http-webhook.md) | [master](https://github.com/cloudevents/spec/blob/master/http-webhook.md) |
| |
| **Additional Documentation:** |
Expand Down Expand Up @@ -153,3 +154,6 @@ Periodically, the group may have in-person meetings that coincide with a major
conference. Please see the
[meeting minutes](https://docs.google.com/document/d/1OVF68rpuPK5shIHILK9JOqlZBbfe91RNzQ7u_P7YCDE/edit#)
for any future plans.

[proto-working]: ./protobuf-format.md
[proto-latest]: ./protobuf-format.md
227 changes: 227 additions & 0 deletions protobuf-format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
# Protobuf Event Format for CloudEvents - Version 1.0

## Abstract

[Protocol Buffers][proto-home] is a mechanism for marshalling structured data,
this document defines how CloudEvents are represented using [version 3][proto-3]
of that specification.

In this document the terms *Protocol Buffers*, *protobuf*, and *proto* are used
interchangeably.

## Status of this document

This document is a working draft.

## Table of Contents

1. [Introduction](#1-introduction)
2. [Attributes](#2-attributes)
3. [Data](#3-data)
4. [Transport](#4-transport)
5. [Examples](#5-examples)

## 1. Introduction

[CloudEvents][ce] is a standardized and protocol-agnostic definition of the
structure and metadata description of events. This specification defines how the
elements defined in the CloudEvents specification are are represented using
a protobuf schema.

The [Attributes](#2-attributes) section describes the naming conventions and
data type mappings for CloudEvents attributes for use as protobuf message
properties.

The [Data](#3-data) section describes how the event payload is carried.

### 1.1. Conformance

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in [RFC2119][rfc2119].

### 1.2 Content-Type

There is no official IANA *media-type* designation for protobuf, as such this
specification uses 'application/protobuf' to identify such content.

## 2. Attributes

This section defines how CloudEvents attributes are represented in the protobuf
[schema][proto-schema].

## 2.1 Type System

The CloudEvents type system is mapped to protobuf as follows :

| CloudEvents | protobuf |
| ------------- | ---------------------------------------------------------------------- |
| Boolean | [boolean][proto-scalars] |
| Integer | [int32][proto-scalars] |
| String | [string][proto-scalars] |
| Binary | [bytes][proto-scalars] |
| URI | [string][proto-scalars] following [RFC 3986 §4.3][rfc3986-section43]|
| URI-reference | [string][proto-scalars] following [RFC 3986 §4.1][rfc3986-section41] |
| Timestamp | [Timestamp][proto-timestamp] |

## 2.3 Required Attributes

Required attributes are represented explicitly as protobuf fields.

## 2.4 Optional Attributes & Extensions

Optional and extension attributes are represented using a map construct enabling
direct support of the CloudEvent [type system][ce-types].

```proto
map<string, CloudEventAttributeValue> attribute = 1;

message CloudEventAttributeValue {

oneof attr_oneof {
bool ce_boolean = 1;
int32 ce_integer = 2;
string ce_string = 3;
bytes ce_binary = 4;
string ce_uri = 5;
string ce_uri_reference = 6;
google.protobuf.Timestamp ce_timestamp = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are unknown attribute keys handled? If I get a string for "datakey" with value "/path", should it be encoded as ce_string, ce_binary, or ce_uri_reference? (The same is also true for timestamp)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The originator of the event presumably know the 'type', so they wold populate the extension attribute accordingly.

I'll add an example...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a component which is adapting structured JSON input from e.g. AWS Kinesis to gRPC transport? This sort of intermediate component may not be aware of the types of all extensions (by design), but without type information from the upstream transport, a datakey attribute might be transcribed to a different encoding than the recipient is expecting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an example of what this looks like in another typed transport see the AMQP spec, where clients need to be prepared to accept either string or typed values for each attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're conflating 'transport' and 'format' .. this spec is only applicable in structured mode and in those situations the payload is opaque (apologies if I mis-stated that somewhere, now you've got me worried).

In AMQP this would be transported as a application/cloudevents+x-protobuf content type.

It'd be the gRPC transport specification (if we consider it a transport) that would define how binary and structured payloads would be exchanged.

Switching between different structured formats (JSON->Proto, Avro->JSON) is a higher-order function and not a concern of the transport binding I believe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought this was handled, but I'm looking back, and it looks like we talked past each other.

Assume the following structured JSON CloudEvent with no payload:

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "id" : "A234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "2018-04-05T17:31:00Z",
}

What does the equivalent structured Proto CloudEvent look like? (using text format for readability)

id: "A234-1234-1234"
source: "/mycontext"
spec_version: "1.0"
type: "com.example.someevent"
attributes: [{
  key: "time"
  value: {
    ce_timestamp: {...}
}, {
  key: "comexampleextension1"
  value: {
    ce_string: "...."
  }
}]

Alternately, comexamplextension1 could be encoded as:

{
  key: "comexampleextension1"
  value: {
    ce_timestamp: { ... }
  }
}

If there is a generic JSON -> Proto CloudEvents bridge:

  • How does it know how to encode comexampleextension1?
  • Do all downstream clients need to be able to handle both ce_timestamp and ce_string format?
  • Does the same issue apply to the other types?

This is somewhat of a generic problem with the CloudEvents extension model, but mentioning it again here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use google.protobuf.Timestamp here, we will always be using UTC for the timezone, while the spec says that it should be a timezoned datetime (which cares an extra information and can be used to align the notifications schedule, for example).

Copy link
Contributor Author

@JemDay JemDay May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with respect to your prior comment (which I can't comment on directly)..

1 - I think I updated the spec after you commented, the context attributes defined in the spec are now represented explicitly.
2 - Again, with the re-structure I don't think that's an issue. In Proto typically everything is optional;i believe there is a way to flag 'required' but Proto purists may object :-)
3 - For me it's about retaining the type information, a receiver can determine that the context is URI because the value is in that field.

This might be a question for the larger group, my interpretation was that the intention was to ensure timezone information was included so representing the value as UTC met that requirement from my point of view.

On closer inspection there's an implication that timezones MUST be communicated as strings - which seems a bit inefficient especially when native representations are available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proto2 has a notion of optional/required semantics. These were dropped in proto3 due to operational experience -- as protocol buffers evolved over time, fields which were previously "required" might no longer make sense, but converting a required field to optional was an extensive multi-stage rollout process across all consumers of the protocol buffer, even those which did not use the fields explicitly (like intermediate proxies).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two notes here on details, while I'm still looking at the broader conversation. (Sorry for parachuting in; a colleague pointed me at this issue, and this particular subthread touches on things I know about.)

While proto3 still doesn't have "required" semantics, it has very recently regained the idea of "optional" fields. The word "optional" is pretty much historical now, but it's really about "does this field support the notion of being implicitly set vs explicitly set". An optional int32 proto3 field that has been explicitly set to 0 by user code is serialized even though 0 is the default value. A regular int32 proto3 field will never have its default value serialized, whether it was set by the user or not.

Separately, on time zones, I'd like us to be really clear on terminology. RFC 3339 does not allow the time zone to be conveyed. It allows a UTC offset to be specified. They're not the same thing, and that affects scheduling. If I send an event with timestamp that is in UTC+1 (as the UK is right now), and I promise to send another event one day later in local time, you can't tell whether that will be another timestamp with an offset of +1 or not. It's unfortunate that RFC 3339 (and ISO-8601 before it) conflates UTC offsets and time zones, but let's try not to in any CloudEvents documentation :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part definitely needs more discussion - if the format needs to be able to store a timestamp with offset from UTC as per RFC 3339, Timestamp will not do that on its own. We could create our own message for that concept, but it wouldn't be as easy to handle - Timestamp has built-in goodness in terms of a readable JSON representation, and utility methods in at least some languages (e.g. C#).

I would suggest that in retrospect, the CE spec deciding that a timestamp doesn't just represent an instant in time may have been a mistake - but it's unclear what we need to do now. Being able to convert between formats without loss of information would certainly be a useful property...

}
}
```

In this model an attribute's name is used as the map *key* and is
associated with its *value* stored in the appropriately typed property.

This approach allows for attributes to be represented and transported
with no loss of *type* information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the JSON format, presumably - where for an extension attribute (where we don't necessarily know the type inherently in the way that we do for the built-in attributes) we don't know whether a given attribute value is a string that also happens to be a URI but shouldn't be treated as one, or is really intended to be a URI. (Hope that makes sense...)

I think we're missing some detail in terms of requirements of a CloudEvent format - when designing one, it would be nice to have guidance about what's required, desirable etc.


## 3. Data

The specification allows for data payloads of the following types to be explicitly represented:

* string
* bytes
* protobuf object/message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was a json structure (possibly equivalent to protobuf Struct).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure i follow your comment, My intention was to allow the transportation of Proto event data in a way that is more idiomatic for Proto. ie there's no need to marshal the data into a separate field, the pack/unpack feature of Any supports this directly.

I'll add an example..


```proto
oneof data_oneof {

// Binary data
bytes binary_data = 2;

// String data
string text_data = 3;

// Protobuf Message data
google.protobuf.Any proto_data = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to use Any here; Any is basically a wrapper for a string type and an enveloped message. For CloudEvents, I think it would be more natural to put the message bytes into raw_data, and the message type into a CE extension attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment ... if I'm dealing natively with protobuf it's not very idiomatic to marshal it separately.

I don't believe the spec calls for all event data to be encoded as a string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the various binary encodings, they encode the payload as a simple bytestream (not a string): HTTP, Kafka, MQTT, AMQP. AVRO has some funniness, but I wouldn't try to emulate it.

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's at least worth considering a field of

Value json_data = 5;

Value is intended to be "any JSON" value; compare that with Struct which is "a JSON object".

If we're trying to convert from a JSON formatted event with a JSON payload to a Protobuf-formatted event with the same payload, Value would be a more "native" representation than string in my view. But we really need to work out the use cases here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that this would avoid any confusion between a text payload that happens to be valid JSON, and one which was designed to be JSON.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting one ... probably should have discussed yesterday.

```

* Where the data is a protobuf message it MUST be stored in the `proto_data` property.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evankanderson I think part of my concern is caused by this clause. If a passthrough intermediary (that doesn't know about the target type) receives the message, what's the point in "trying" to populate the 'proto_data' property? Why not just populate the 'binary_data' with the bytes of the serialized proto and pass it on?

I understand the previous comment @JemDay made about natively working with protos and I like that idea. However, I think the spec should allow intermediaries to opt-out when the concrete type is not known.

Copy link
Contributor Author

@JemDay JemDay Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be my most contentious proposal ;-) It becomes relevant when moving between formats (eg JSON <-> Proto).

Assuming a JSON formatted CloudEvent (containing a protobuf data payload) is received by an intermediary that wants to output a CE in proto format ...

The required attributes map directly (and hopefully simply).

  • The received CE has a datacontenttype of application/protobuf
  • The received CE has a data_base64 containing the encoded serialized representation.
  • The received CE has a dataschema of something like proto:io.cloudevent.examples.SomeData (it needs to be a URI)

The intermediary decodes the data_base64 and that becomes the Any.Builder.setValue(), the path portion of dataschema is passed into Any.Builder.setTypeUrl().

I believe that works ...

* `datacontenttype` MAY be populated with `application/protobuf`
* `dataschema` SHOULD be populated with the type URL of the protobuf data message.

* When the type of the data is text, the value MUST be stored in the `text_data` property.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really clear what's meant by "the type of the data" here - does JSON data count as text by default, for example?

* `datacontenttype` SHOULD be popuplated with the appropriate media-type.

* When the type of the data is binary the value MUST be stored in the `binary_data` property.
* `datacontenttype` SHOULD be populated with the appropriate media-type.



## 4. Transport

Transports that support content identification MUST use the following designation:

```text
application/cloudevents+protobuf
```
Comment on lines +133 to +139
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me what kind of wire format are we targeting for transporting protobufs? Are you implicitly referring to marshalling protos to binary encoding (https://developers.google.com/protocol-buffers/docs/encoding)?

Alternatively, we can always convert protos to JSON when putting them on wire, or maybe even use textprotos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was assuming that this was the binary encoding, given that JSON is already covered.

I'd strongly discourage using textprotos - they're not supported in all languages, and not well-defined externally as far as I'm aware.


## 5. Examples

The following code-snippets shows how proto representations might be constucted asuming the availability of some convenience methods ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following code-snippets shows how proto representations might be constucted asuming the availability of some convenience methods ...
The following code-snippets shows how proto representations might be constructed assuming the availability of some convenience methods ...


### 5.1 Plain Text event data

```java
public static Spec.CloudEvent plainTextExample() {
Spec.CloudEvent.Builder ceBuilder = Spec.CloudEvent.newBuilder();

ceBuilder
//-- Required Attributes.
.setId(UUID.randomUUID().toString())
.setSpecVersion("1.0")
.setType("io.cloudevent.example")
.setSource("producer-1")

//-- Data.
.setTextData("This is a plain text message");

//-- Optional Attributes
withCurrentTime(ceBuilder, "time");
withAttribute(ceBuilder, "datacontenttype", "text/plain");

// Build it.
return ceBuilder.build();
}

```

### 5.2 Proto message as event data

Where the event data payload is itself a protobuf message (with its own schema)
a protocol buffer idiomatic method can be used to carry the data.

```java
private static Spec.CloudEvent protoExample() {

//-- Build an event data protobuf object.
Test.SomeData.Builder dataBuilder = Test.SomeData.newBuilder();

dataBuilder
.setSomeText("this is an important message")
.setIsImportant(true);

//-- Build the CloudEvent.
Spec.CloudEvent.Builder ceBuilder = Spec.CloudEvent.newBuilder();

ceBuilder
.setId(UUID.randomUUID().toString())
.setSpecVersion("1.0")
.setType("io.cloudevent.example")
.setSource("producer-2")

// Add the proto data into the CloudEvent envelope.
.setProtoData(Any.pack(dataBuilder.build()));

// Add the protto type URL
withAttribute(ceBuilder, "dataschema", ceBuilder.getProtoData().getTypeUrl());

// Set Content-Type (Optional)
withAttribute(ceBuilder, "datacontenttype", "application/protobuf");

//-- Done.
return ceBuilder.build();

}
```

## References

* [Protocol Buffer 3 Specification][proto-3]
* [CloudEvents Protocol Buffers format schema][proto-schema]

[Proto-3]: https://developers.google.com/protocol-buffers/docs/reference/proto3-spec
[proto-home]: https://developers.google.com/protocol-buffers
[proto-scalars]: https://developers.google.com/protocol-buffers/docs/proto3#scalar
[proto-wellknown]: https://developers.google.com/protocol-buffers/docs/reference/google.protobuf
[proto-timestamp]: https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#google.protobuf.Timestamp
[proto-schema]: ./spec.proto
[json-format]: ./json-format.md
[ce]: ./spec.md
[ce-types]: ./spec.md#type-system
[rfc2119]: https://tools.ietf.org/html/rfc2119
[rfc3986-section41]: https://tools.ietf.org/html/rfc3986#section-4.1
[rfc3986-section43]: https://tools.ietf.org/html/rfc3986#section-4.3
[rfc3339]: https://tools.ietf.org/html/rfc3339
59 changes: 59 additions & 0 deletions spec.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
/**
* CloudEvent Protobuf Format
*
* - Required context attributes are explicity represented.
* - Optional and Extension context attributes are carried in a map structure.
* - Data may be represented as binary, text, or protobuf messages.
*/

syntax = "proto3";

package io.cloudevents.v1;

import "google/protobuf/any.proto";
import "google/protobuf/timestamp.proto";

option csharp_namespace = "Io.CloudEvents.V1.Proto";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be decided for sure later on (on a per language basis) but the CloudEvents C# SDK uses a namespace of CloudNative.CloudEvents. We probably want to fit in with that. (So CloudNative.CloudEvents.V1.Proto perhaps? Or CloudNative.CloudEvents.V1.Protobuf?

option go_package = "cloudevents.io/genproto/v1";
option java_package = "io.cloudvents.v1.proto";
option php_namespace = "Io\\CloudEvents\\V1\\Proto";
option ruby_package = "Io::CloudEvents::V1::Proto";

message CloudEvent {

// -- CloudEvent Context Attributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should revisit all the comments. I'd expect a comment for every field and message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack ... Will address...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally it would be good to get this done before it's merged, but I'm okay for us to do it later.


// Required Attributes
string id = 1;
string source = 2; // URI-reference
string spec_version = 3;
string type = 4;

// Optional & Extension Attributes
map<string, CloudEventAttributeValue> attribute = 5;

// -- CloudEvent Data (Bytes, Text, or Proto)
oneof data_oneof {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oneof data_oneof {
oneof data {

I believe that will generate nicer code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still no sign of this or the later oneof being fixed.

bytes binary_data = 6;
string text_data = 7;
google.protobuf.Any proto_data = 8;
}

/**
* The CloudEvent specification defines
* seven attribute value types...
*/

message CloudEventAttributeValue {

oneof attr_oneof {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oneof attr_oneof {
oneof value {

(Same comment as before about _oneof not being necessary... and there's no need for the "attr" side given that this is already within a CloudEventAttribute message.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And again.

bool ce_boolean = 1;
int32 ce_integer = 2;
string ce_string = 3;
bytes ce_bytes = 4;
string ce_uri = 5;
string ce_uri_ref = 6;
google.protobuf.Timestamp ce_timestamp = 7;
}
}
}