Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-Introducing Protocol Buffer Representation #626

Merged
merged 18 commits into from
Sep 30, 2020

Conversation

JemDay
Copy link
Contributor

@JemDay JemDay commented May 20, 2020

re-introducing protocol buffer support, this takes a slightly different approach to the one originally proposed.

I've used one-of constructs which I'm aware does have the potential to introduce some processing overhead but it does enable a level of structural correctness.

It's been a while since I've used Proto on a day-2-day basis so I'd appreciate comments.

Signed-off-by: Day, Jeremy(jday) [email protected]

@duglin
Copy link
Collaborator

duglin commented May 21, 2020

ping @evankanderson

@clemensv
Copy link
Contributor

Is the ownership story regarding Protobuf resolved, i.e. did Google put it into CNCF, yet?

@markfisher
Copy link

ideally this could replace the LiiklusEvent as defined here: https://github.com/bsideup/liiklus/blob/master/protocol/src/main/proto/LiiklusService.proto

and possibly, we could then discuss Liiklus' proto service moving into the CloudEvents scope as well (along with unary service perhaps)

cc @bsideup

@JemDay
Copy link
Contributor Author

JemDay commented May 21, 2020

@markfisher - I toyed with adding gRPC but that might be a more of a transport specification thing.

Do we consider gRPC a transport ?

I've updated the schema to represent the required and optional attributes as first-class fields.

@evankanderson
Copy link
Contributor

I believe gRPC is part of CNCF, so a proto format for CloudEvents would align with gRPC transport definitions.

I'll take a look later today; I think there was also an earlier protobuf definition PR, so we might want to reconcile the two.

}
```

NOTE: Field assignment numbers are non-normative.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field assignment numbers absolutely have to be defined by the spec. Proto wire format encodes tag + type as a prefix, so if the tag doesn't match the expected type, the entire message will be discarded during decoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comment was mis-leading ... what I was trying to imply was that the content in this document were 'representative' but the actual schema is defined in the spec.proto file .. I'll clean that up.


* string
* bytes
* protobuf object/message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was a json structure (possibly equivalent to protobuf Struct).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure i follow your comment, My intention was to allow the transportation of Proto event data in a way that is more idiomatic for Proto. ie there's no need to marshal the data into a separate field, the pack/unpack feature of Any supports this directly.

I'll add an example..

string text_data = 3;

// Protobuf Message data
google.protobuf.Any proto_data = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to use Any here; Any is basically a wrapper for a string type and an enveloped message. For CloudEvents, I think it would be more natural to put the message bytes into raw_data, and the message type into a CE extension attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment ... if I'm dealing natively with protobuf it's not very idiomatic to marshal it separately.

I don't believe the spec calls for all event data to be encoded as a string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the various binary encodings, they encode the payload as a simple bytestream (not a string): HTTP, Kafka, MQTT, AMQP. AVRO has some funniness, but I wouldn't try to emulate it.

of the content. This can be determined by examining the data for invalid UTF-8
sequences or by consulting the `datacontenttype` attribute.

* Where the data is a protobuf message it MUST be stored in the `proto_data`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why complicate this vs. just using bytes (and possibly needing to decode the payload a second time), vs needing to scan the payload to determine which field to put the content into?

Also, is it an error to receive a utf-8 string or a proto message as raw_data? If not, this seems like a case where the recipient will know what to do best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your point but I'm not sure it complicates things. My thought process was that it was a bit odd to put textual content into a byte array when Proto allow us to be explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the binary encoding of protobuf lets you "Pun" two different protos between different length-delimited field types, like so:

// Use this type if you are an intermediate router and don't need to care about interpreting or validating payload.
message Envelope {
  string name =1;
  bytes payload = 2;
}

message Data {
  map<string, int32> values = 1;
  double magnitude = 2;
}

// If you are an endpoint node, decode Envelope as Decoded and you can interpret payload as the Data message.
message Decoded {
  string name = 1;
  Data data = 2;
}

The main benefit of using string vs bytes is that Proto will generate the correct types from the auto-generated code, possibly verifying that the string is valid UTF-8 along the way (I don't recall if all implementations verify the UTF-8-ness of the bytestream). The cost is that any proto interpretation which understands the message will unpack and verify all the inner payload fields, whether or not they are used by the application.


```java
public static Spec.CloudEvent plainTextExample() {
Spec.CloudEvent.Builder ceBuilder = Spec.CloudEvent.newBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You never define the enclosing CloudEvent type, only extensions and data_oneof.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec.CloudEvent is the enclosing type, the example method returns an instance of a CloudEvent.

bytes ce_binary = 4;
string ce_uri = 5;
string ce_uri_reference = 6;
google.protobuf.Timestamp ce_timestamp = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are unknown attribute keys handled? If I get a string for "datakey" with value "/path", should it be encoded as ce_string, ce_binary, or ce_uri_reference? (The same is also true for timestamp)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The originator of the event presumably know the 'type', so they wold populate the extension attribute accordingly.

I'll add an example...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a component which is adapting structured JSON input from e.g. AWS Kinesis to gRPC transport? This sort of intermediate component may not be aware of the types of all extensions (by design), but without type information from the upstream transport, a datakey attribute might be transcribed to a different encoding than the recipient is expecting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an example of what this looks like in another typed transport see the AMQP spec, where clients need to be prepared to accept either string or typed values for each attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're conflating 'transport' and 'format' .. this spec is only applicable in structured mode and in those situations the payload is opaque (apologies if I mis-stated that somewhere, now you've got me worried).

In AMQP this would be transported as a application/cloudevents+x-protobuf content type.

It'd be the gRPC transport specification (if we consider it a transport) that would define how binary and structured payloads would be exchanged.

Switching between different structured formats (JSON->Proto, Avro->JSON) is a higher-order function and not a concern of the transport binding I believe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought this was handled, but I'm looking back, and it looks like we talked past each other.

Assume the following structured JSON CloudEvent with no payload:

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "id" : "A234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "2018-04-05T17:31:00Z",
}

What does the equivalent structured Proto CloudEvent look like? (using text format for readability)

id: "A234-1234-1234"
source: "/mycontext"
spec_version: "1.0"
type: "com.example.someevent"
attributes: [{
  key: "time"
  value: {
    ce_timestamp: {...}
}, {
  key: "comexampleextension1"
  value: {
    ce_string: "...."
  }
}]

Alternately, comexamplextension1 could be encoded as:

{
  key: "comexampleextension1"
  value: {
    ce_timestamp: { ... }
  }
}

If there is a generic JSON -> Proto CloudEvents bridge:

  • How does it know how to encode comexampleextension1?
  • Do all downstream clients need to be able to handle both ce_timestamp and ce_string format?
  • Does the same issue apply to the other types?

This is somewhat of a generic problem with the CloudEvents extension model, but mentioning it again here.

direct support of the CloudEvent [type system][ce-types].

```proto
map<string, CloudEventAttribute> extensions = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map is labeled extensions. Are the core attributes handled the same way, or by unique tags?

The example at the end suggests that known attributes are assigned unique tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally I had all the defined attributes in a map structure, I flipped to a model when the defined attributes are explicit (as in the JSON schema).

So in that model only extension attributes need to be carried in the map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is about what happens when core attributes are added, e.g. in a 1.1 spec.

Clients which were built before the 1.1 spec will map these attributes into the extensions map, while post-1.1 clients will map them into a unique tag-assigned field. Recipients of messages will then need to write complicated logic to map pre-1.1 messages to post-1.1 messages, losing most of the benefit of code auto-generation for protocol buffers.

Yes, this is a "perfect is the enemy of the good" case, but my experience has been that not planning for expansion tends to lead to "everything was good until we tried to touch something, and now it's all terrible."

Another acceptable scenario is to accept that we can't add further attributes to the 1.0 proto message definition, and all additional attributes (whether currently documented extensions or not-yet-defined extensions) must be stored in extensions.

The last option is to keep a global tag number registry mapping extension names to tags, but that only helps humans and doesn't really help conversion gateways, which would need to be able to load the list dynamically when transforming a message between proto and other encodings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's a half-way house solution where all the 'required' attributes are explicitly define but the optional ones aren't, on the basis that we can't add any required attributes without changing the major version..

spec.proto Outdated
google.protobuf.Timestamp time = 8;

// -- CloudEvent Extension Attributes
map<string, CloudEventAttribute> extensions = 9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also my comments in #576 (comment)

You may want to define all optional attributes (core and extension) as part of the same string-keyed map, rather than putting a few as tags but not later-defined attributes.

Copy link
Contributor Author

@JemDay JemDay May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's an interesting point ... we don't do that in that in the other format definitions but is something to consider.

It does have the potential to introduce complexity where consumers/produces might be using different version of the spec as new extensions are added - Proto can cope with that but it would increase complexity for consumers.

I'd argue for sticking with the defined 'required' and 'optional' context attributes.

@evankanderson
Copy link
Contributor

If expect a follow-on to proto definition would be a set of gRPC transports. At a minimum, this would be unary request-response and streaming request-response, but there might be other useful profiles.

@evankanderson
Copy link
Contributor

(oh, and see also #576)

@bsideup
Copy link
Contributor

bsideup commented May 22, 2020

@JemDay Nice! I have a couple of questions since I think I miss some important details:

  1. I wonder why we have attributes as map<string, value> while they are strictly defined in the spec?
  2. With map, how we plan to deprecate certain attributes? Or define required/optional?
  3. Does it really makes sense to have ce_uri and ce_uri_reference, given that we're not gaining a type safety here, since protobuf is missing URI type? Perhaps we should add message URI?

Thanks in advance!

bytes ce_binary = 4;
string ce_uri = 5;
string ce_uri_reference = 6;
google.protobuf.Timestamp ce_timestamp = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use google.protobuf.Timestamp here, we will always be using UTC for the timezone, while the spec says that it should be a timezoned datetime (which cares an extra information and can be used to align the notifications schedule, for example).

Copy link
Contributor Author

@JemDay JemDay May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with respect to your prior comment (which I can't comment on directly)..

1 - I think I updated the spec after you commented, the context attributes defined in the spec are now represented explicitly.
2 - Again, with the re-structure I don't think that's an issue. In Proto typically everything is optional;i believe there is a way to flag 'required' but Proto purists may object :-)
3 - For me it's about retaining the type information, a receiver can determine that the context is URI because the value is in that field.

This might be a question for the larger group, my interpretation was that the intention was to ensure timezone information was included so representing the value as UTC met that requirement from my point of view.

On closer inspection there's an implication that timezones MUST be communicated as strings - which seems a bit inefficient especially when native representations are available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proto2 has a notion of optional/required semantics. These were dropped in proto3 due to operational experience -- as protocol buffers evolved over time, fields which were previously "required" might no longer make sense, but converting a required field to optional was an extensive multi-stage rollout process across all consumers of the protocol buffer, even those which did not use the fields explicitly (like intermediate proxies).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two notes here on details, while I'm still looking at the broader conversation. (Sorry for parachuting in; a colleague pointed me at this issue, and this particular subthread touches on things I know about.)

While proto3 still doesn't have "required" semantics, it has very recently regained the idea of "optional" fields. The word "optional" is pretty much historical now, but it's really about "does this field support the notion of being implicitly set vs explicitly set". An optional int32 proto3 field that has been explicitly set to 0 by user code is serialized even though 0 is the default value. A regular int32 proto3 field will never have its default value serialized, whether it was set by the user or not.

Separately, on time zones, I'd like us to be really clear on terminology. RFC 3339 does not allow the time zone to be conveyed. It allows a UTC offset to be specified. They're not the same thing, and that affects scheduling. If I send an event with timestamp that is in UTC+1 (as the UK is right now), and I promise to send another event one day later in local time, you can't tell whether that will be another timestamp with an offset of +1 or not. It's unfortunate that RFC 3339 (and ISO-8601 before it) conflates UTC offsets and time zones, but let's try not to in any CloudEvents documentation :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part definitely needs more discussion - if the format needs to be able to store a timestamp with offset from UTC as per RFC 3339, Timestamp will not do that on its own. We could create our own message for that concept, but it wouldn't be as easy to handle - Timestamp has built-in goodness in terms of a readable JSON representation, and utility methods in at least some languages (e.g. C#).

I would suggest that in retrospect, the CE spec deciding that a timestamp doesn't just represent an instant in time may have been a mistake - but it's unclear what we need to do now. Being able to convert between formats without loss of information would certainly be a useful property...

@duglin
Copy link
Collaborator

duglin commented May 26, 2020

@JemDay is this one ready for review?

@JemDay
Copy link
Contributor Author

JemDay commented May 26, 2020

@duglin - Probably ready for a broader conversation, I do need to go through the comments from @evankanderson a bit more.

Items to thing about :

  • representing attributes explicitly or as a simple map (I believe I saw an issue/question relating to how this is done in Avro)
  • Do we consider gRPC to be a transport, or is it a special case like http webhooks.

### 1.2 Content-Type

There is no official IANA *media-type* designation for protobuf, as such this
specification uses 'application/x-protobuf' to identify such content.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a draft for protobuf media type here: https://tools.ietf.org/html/draft-rfernando-protocol-buffers-00.
We recently decided in our company to use the media type application/protobuf proposed in the draft above.
Would you mind switching to that definition instead of the proposed application/x-protobuf?

Suggested change
specification uses 'application/x-protobuf' to identify such content.
specification uses 'application/protobuf' to identify such content based on the draft [Encoding rules and MIME type for Protocol Buffers](https://tools.ietf.org/html/draft-rfernando-protocol-buffers-00).

Copy link
Contributor Author

@JemDay JemDay Jun 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

I certainly think this warrants further discussion, i've seen both application/protobuf and application/x-protobuf used.

Looking at Spring for example they appear to use the x-protobuf.

I'm not sure there's one correct answer to this :-(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my mistake - we mainly thought about the datacontenttype attribute.

I think the proposal made in the call today to use application/cloudevents+protobuf for the Content-Type http header field for structured mode is reasonable.

@JemDay : You can ignore my suggested change.

- Only required attributes are formally represented.
- Optional & Extension attributes are carried in the map.
- Documentation & Examples changed as appropiate.
- Added version marker to proto & java package.

Signed-off-by: Day, Jeremy(jday) <[email protected]>
Signed-off-by: Day, Jeremy(jday) <[email protected]>
@duglin
Copy link
Collaborator

duglin commented Jun 4, 2020

@JemDay is this one ready for consideration on tomorrow's call?

@JemDay
Copy link
Contributor Author

JemDay commented Jun 4, 2020

@evankanderson - I think all of the concerns have been covered now and I believe this is pretty closely aligned with the commentary in Issue #576.

Any comments ?

Copy link
Contributor

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One larger concern around the attribute model, and it looks like the payload question hasn't been resolved. (See this comment for reference to the other mechanisms for handling payloads.)

IMO, the complexity of deeply handling the payload structure in a format which supports binary strings is probably not worth it compared with the simplicity of having an envelope which can nest another proto within.

Secondarily, it seems like the datacontenttype and Any message may be providing conflicting guidance about the expected content of the message if you use Any, since it encodes a type URL along with a byte payload.

bytes ce_binary = 4;
string ce_uri = 5;
string ce_uri_reference = 6;
google.protobuf.Timestamp ce_timestamp = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought this was handled, but I'm looking back, and it looks like we talked past each other.

Assume the following structured JSON CloudEvent with no payload:

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "id" : "A234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "2018-04-05T17:31:00Z",
}

What does the equivalent structured Proto CloudEvent look like? (using text format for readability)

id: "A234-1234-1234"
source: "/mycontext"
spec_version: "1.0"
type: "com.example.someevent"
attributes: [{
  key: "time"
  value: {
    ce_timestamp: {...}
}, {
  key: "comexampleextension1"
  value: {
    ce_string: "...."
  }
}]

Alternately, comexamplextension1 could be encoded as:

{
  key: "comexampleextension1"
  value: {
    ce_timestamp: { ... }
  }
}

If there is a generic JSON -> Proto CloudEvents bridge:

  • How does it know how to encode comexampleextension1?
  • Do all downstream clients need to be able to handle both ce_timestamp and ce_string format?
  • Does the same issue apply to the other types?

This is somewhat of a generic problem with the CloudEvents extension model, but mentioning it again here.

Copy link

@alexvanboxel alexvanboxel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just came across this PR today, what was the outcome of the discussion of the call yesterday? I see a few missed opportunities here:

  1. adding the well-known optional attributes like subject, datacontenttype, dataschema and type. This means that you only add extensions in the attributes field (I also agree with @evankanderson that it's probably better to have a repeated field iso a map)
  2. not using the one_of for the data, the funny thing is that the 3 proposed representation all are encoded exactly the same on the wire format of proto
  3. and most importantly: align the proto tag number with Any type: giving string dataschema = 1 and bytes data = 2. This would make the CloudEvent 100% compatible with Any if both are filled correctly. This gives a lot of flexibility if you know that you are sending Any types.

@JemDay
Copy link
Contributor Author

JemDay commented Jun 9, 2020

@evankanderson .. A decision was taken prior to 1.0 not to encode type information for attributes, so in your example if an intermediary were to translate between JSON and Proto formats the comexampleextension1 attribute can only be interpreted as a String and as such your first encoding example would be correct. This a general issue with custom extensions when attempting to move between different formats, or indeed from binary->structured transport representations.

I'm not clear on the thrust of your question regarding downstream clients - I would assume there's some sort of negotiation prior to receiving events, if i can't handle events in proto format (or avro) then i wouldn't subscribe to them. if i understand proto then i would understand the type system it exposes. If i were an SDK writer and i exposed a 'setAttribute(type, value)' type method a protobuf encoder can retain the type information.

JSON <-> Proto Format Bridge - I think this would be doable

I don't believe the type issue is specific to proto, it also exists in a pure JSON model where custom extensions are present, the receiver only sees a string it can't opine on it being a URI, a timestamp, etc.

I'll comment on the payload concern seperately...

@JemDay
Copy link
Contributor Author

JemDay commented Jun 9, 2020

@alexvanboxel .. Re your comments

    • What you describe was my original submission, conversations lead to the current model where only required attributes are formally represented. Primarily this was to allow new optional attributes to be added in future minor spec revisions and supported in a consistent fashion.
    • I think it's more idiomatic to support variants representations in this fashion to simplify SDK development. I don't see this is any different from the JSON Format supporting data payloads of binary (in base64), string, or Object. We could have required JSON data payloads to be encoded as a string but chose to be idiomatic in that format.
    • I may be missing the point but are you suggesting that instead of an explicit one_of you'd prefer data to simply be an Any which would then natively support bytes, String, or indeed a proto message ? I can buy into that if that's the thrust of your comment ;-)

Comment on lines +129 to +135
## 4. Transport

Transports that support content identification MUST use the following designation:

```text
application/cloudevents+protobuf
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me what kind of wire format are we targeting for transporting protobufs? Are you implicitly referring to marshalling protos to binary encoding (https://developers.google.com/protocol-buffers/docs/encoding)?

Alternatively, we can always convert protos to JSON when putting them on wire, or maybe even use textprotos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was assuming that this was the binary encoding, given that JSON is already covered.

I'd strongly discourage using textprotos - they're not supported in all languages, and not well-defined externally as far as I'm aware.

@JemDay
Copy link
Contributor Author

JemDay commented Sep 15, 2020

I've added some language specific package options as-per the example from @jskeet - i'm not conversant in some of these languages so if anybody can sanity check that would be good.

@jskeet - On the subject of Timestamp, my understanding is that the specification simply demands a time instant. I don't believe there's any requirement that any original offsets be retained across processing boundaries so long as the semantic is retained.

@jskeet
Copy link
Contributor

jskeet commented Sep 16, 2020

@JemDay: The specification isn't as clear on this as it might be. The description is "Date and time expression using the Gregorian Calendar" - which is interesting given that the calendar system is actually irrelevant for an instant in time until you choose a text representation. The text representation is specified to be RFC 3339, which does allow for UTC offsets.

I'm happy to create a PR to clarify the specification here, in terms of:

  • It's just intended to be an instant in time
  • Any UTC offset present in a local representation need not be preserved across processing boundaries
  • We should specify a required level of precision (milliseconds?) but again allow for more information to be present-but-not-necessarily-preserved

Once all of those are in place (or at least agreed to in principle), I'm fine using Timestamp from protos.

Does that sound like the kind of PR that would be welcomed?

(I don't know whether it's worth mentioning leap seconds in there at all. I've tried to avoid them in all the date/time work I've done, but they could definitely be relevant for some CloudEvents users...)

@JemDay
Copy link
Contributor Author

JemDay commented Sep 16, 2020

@jskeet - I think this a specification clarity issue, believe it is meant to be an "instant in time", and that we follow RFC 3330 when encoding into a string. Might be simpler to discuss it tomorrow in the WG call before doing a PR (I could event make it part of this one).

With respect to your suggestion regarding using the 'Value' type as a more first-class way to support JSON data; I understand the model but I'm unclear on how this would get realized in code. Looking at the Java language spec I don't see an option to fo something like mergeFrom(javax.json.JsonValue jsonValue); on the Value builder. Without these sort of constructs I'm not sure what we gain.

@jskeet
Copy link
Contributor

jskeet commented Sep 16, 2020

@JemDay: Timestamps - I'll try to find a WG that I can come to, but it definitely won't be tomorrow. (The WG timing tends to be clash with all kinds of things for me - which is to be expected for an international meeting, of course.)

For the Value part supporting JSON data - based on last week's call, I don't think it's worth doing.

@JemDay
Copy link
Contributor Author

JemDay commented Sep 16, 2020

@jskeet : Thanks, i believe we're aligned.

I chatted with @clemensv and there is no requirement for the offset information to be propagated, i'll try tweaking the spec to clarify that - so we're good-to-go with the Timestamp usage as defined.

@jskeet
Copy link
Contributor

jskeet commented Sep 16, 2020

@JemDay: Excellent. Please cc me on the PR; as a date/time nerd I'm happy to be picky :)

@JemDay
Copy link
Contributor Author

JemDay commented Sep 21, 2020

All - I believe we've reached consensus so could I asked for a few LGTM to allow this PR to be merged and closed. I will open a separate PR to clarify the expectations related to Timestamps in the specification, that does not impact this proposal.

Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few things I'd still like to see addressed - but if others want to get the first draft in and then iterate, that's fine by me. It's still only going to be a draft, right?

spec.proto Outdated
import "google/protobuf/any.proto";
import "google/protobuf/timestamp.proto";

option csharp_namespace = "Io.CloudEvents.V1.Proto";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be decided for sure later on (on a per language basis) but the CloudEvents C# SDK uses a namespace of CloudNative.CloudEvents. We probably want to fit in with that. (So CloudNative.CloudEvents.V1.Proto perhaps? Or CloudNative.CloudEvents.V1.Protobuf?


message CloudEvent {

// -- CloudEvent Context Attributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally it would be good to get this done before it's merged, but I'm okay for us to do it later.

spec.proto Outdated
map<string, CloudEventAttribute> attribute = 5;

// -- CloudEvent Data (Bytes, Text, or Proto)
oneof data_oneof {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on this one.

spec.proto Outdated

message CloudEventAttribute {

oneof attr_oneof {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And again.

- Added Java directive for seperate fies.
- re-pluralized 'attributes' to follow proto convention.

Signed-off-by: Day, Jeremy(jday) <[email protected]>
@JemDay
Copy link
Contributor Author

JemDay commented Sep 22, 2020

@jskeet - I believe your concerns are covered now. I re-pluralized the 'attributes' and removed the c# directive, we can add that in as a seperate PR. I'm no C# expert but i believe the major version must be encoded into the package to enable V1 and V2 generated code to live side-by-side. That packaging should be independant (but not clash with) the C# SDK packages.

@jskeet
Copy link
Contributor

jskeet commented Sep 23, 2020

Definitely happy to sort out the C# namespace later, and I agree that we should include the version in the namespace. Not entirely convinced about it being independent of the SDK, but we can decide that a little later.

@duglin
Copy link
Collaborator

duglin commented Sep 24, 2020

@jskeet are ok with this PR as-is? It sounds like there might be some follow-on discussions, but I wanted to make sure we didn't merge it if there are still things that need to be resolved first.

@jskeet
Copy link
Contributor

jskeet commented Sep 24, 2020

@duglin: I would prefer to get the oneof names nicer to start with - is there any controversy about those? I don't have any fundamental objections though.

@duglin
Copy link
Collaborator

duglin commented Sep 24, 2020

I'll need @JemDay to speak to that :-)

@JemDay
Copy link
Contributor Author

JemDay commented Sep 24, 2020

@jskeet - Can you suggest alternatives ... we need to differentiate between a "binary", "text", and "Proto Message" data payload.

@jskeet
Copy link
Contributor

jskeet commented Sep 24, 2020

@JemDay: All I'm suggesting is removing the _oneof suffix from data_oneof and attr_oneof. Keep the oneofs themselves - it's just the name that will be nicer without the suffix.

@evankanderson
Copy link
Contributor

Note that if you're only talking about binary Protobuf representation, then changing the name of data_oneof and attr_oneof is a wire-compatible change, since the name of the oneof is not transmitted and is only an affordance for library code.

This also means those naming fixups could be made in a subsequent PR if desired.

@jskeet
Copy link
Contributor

jskeet commented Sep 24, 2020

I'm assuming anything can still be changed in the near term - merging this isn't defining a 1.0 for the protobuf representation is it? Because if it is, that should include source compatibility as well, which would require the names to be finalized.

@evankanderson
Copy link
Contributor

There are two different things -- this is defining the wire-format protocol; there are separately per-language SDKs. The Golang SDK, for example is at v2.2.0 supporting the v1 and v0.3 wire-format specs.

@jskeet
Copy link
Contributor

jskeet commented Sep 24, 2020

this is defining the wire-format protocol

Well it's defining a .proto file, which I'd expect to be the canonical source for SDKs. It would be pretty silly for someone to create a wire-compatible-but-separate proto file. So why not aim to get both right to start with?

@jskeet
Copy link
Contributor

jskeet commented Sep 24, 2020

(I'd also suggest that we should determine that the JSON serialization format of the protobuf data is also standardized - that doesn't affect oneof names as far as I remember, but it does affect tag names, basically making tag renaming a breaking change.)

Signed-off-by: Day, Jeremy(jday) <[email protected]>
@JemDay
Copy link
Contributor Author

JemDay commented Sep 24, 2020

@jskeet - Tidied the names as suggested.

The JSON format has its own specification, we could tweak that if needed - I don't think we did that for Avro.

@jskeet
Copy link
Contributor

jskeet commented Sep 24, 2020

I'm not suggesting a change to the standard JSON CloudEvent format. I'm suggesting that we make sure that the proto representation doesn't change in any way that breaks source or proto3-JSON format, once we've declared it as 1.0. That's important because it should be reasonable for folks to use the proto3-JSON format in test data, and not be broken by changes.

I don't think it's a major imposition - it just means that we'd be standardizing the .proto file rather than just the binary wire format. I'm definitely up for discussing that, but it would be my recommendation.

Will double-check the oneofs and LGTM this PR though.

Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to have a pass at reformatting comments and adding more of them in a future PR if that would be helpful.

@duglin
Copy link
Collaborator

duglin commented Sep 24, 2020

We're in kind of a weird situation. The PR was approved on today's call with the assumption that all parties involved in the PR were ok with the current state. After a few more commits it looks like that is true. However, we technically didn't approve this one, which mean we should really wait. BUT, based on the lack of comments on today's call I suspect the people who really care are part of this PR's thread and not the weekly call. So, rather than waiting for next week's call I'm going to let this sit until EOD tomorrow to see if there are any lingering concerns - then I'll merge.

Hope that's ok with everyone.

@duglin
Copy link
Collaborator

duglin commented Sep 30, 2020

ok ... sat long enough. Merging!!!
Thanks to all, especially @JemDay , for all of the effort on pushing this thru!

Approved on the 9/24 call

@duglin duglin merged commit 6cbb44a into cloudevents:master Sep 30, 2020
n3wscott pushed a commit to n3wscott/cloudevents-spec that referenced this pull request Nov 30, 2020
* Re-Introducing Protocol Buffer Representation

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* required and optional attributes are explicitly represented

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* tweaks and examples to address review comments

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* Updated following dicussion during 6/28 weekly meeting.

- Only required attributes are formally represented.
- Optional & Extension attributes are carried in the map.
- Documentation & Examples changed as appropiate.
- Added version marker to proto & java package.

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* tweaks

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* Include spec reference

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* Changed media-type designation as-per weekly call discussion

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* Fixed links to work before merging to master

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* Really fixing the links this time

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* DO NOT MERGE: data representation proposal, attribute type naming change

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* Moved back to one_of representation for Data, updated docs

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* Minor tweaks based on review commentary

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* - Modified proto map property to increase raadability of generated code.
- Clarified usage of `dataschema` CloudEvent attribute.
- Updated examples.

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* Changes based on review comments

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* Added language specific package definitions

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* - Removed c# directive, can be added back in a later PR.
- Added Java directive for seperate fies.
- re-pluralized 'attributes' to follow proto convention.

Signed-off-by: Day, Jeremy(jday) <[email protected]>

* Tidied 'one-of' names

Signed-off-by: Day, Jeremy(jday) <[email protected]>
@duglin duglin mentioned this pull request Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.