Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avro Event Format for CloudEvents #463

Merged
merged 11 commits into from
Jul 25, 2019
Merged

Avro Event Format for CloudEvents #463

merged 11 commits into from
Jul 25, 2019

Conversation

fabiojose
Copy link
Contributor

Signed-off-by: Fabio José [email protected]

Closes #426

Accidentally I've closed the original PR, but open this new one with the right changes.

@NickLarsenNZ
Copy link

NickLarsenNZ commented Jun 20, 2019

Original PR with comments trail: #438

@duglin
Copy link
Collaborator

duglin commented Jun 20, 2019

@fabiojose can you update the README to point to this doc?

avro-format.md Outdated Show resolved Hide resolved
avro-format.md Outdated Show resolved Hide resolved
spec.avsc Outdated Show resolved Hide resolved
Signed-off-by: Fabio José <[email protected]>
README.md Outdated Show resolved Hide resolved
@duglin
Copy link
Collaborator

duglin commented Jun 20, 2019

We tentatively approved this on today's call, but people asked for one more week to make sure. Please look it over so we can resolve it next week

@duglin
Copy link
Collaborator

duglin commented Jun 27, 2019

@JemDay I believe you asked for one more week,last week. You ok with this one?

avro-format.md Outdated Show resolved Hide resolved
avro-format.md Outdated Show resolved Hide resolved
avro-format.md Outdated Show resolved Hide resolved
Signed-off-by: Fabio José <[email protected]>
@clemensv
Copy link
Contributor

clemensv commented Jul 3, 2019

I don't think the definition of a CloudEvent as a "record" satisfies our need for having extensibility, because record requires all fields to be known for encoding. I haven't yet done any deeper investigation, but I just stumbled over this post here and this schema in one of the answers seems clever:

{
    "name": "metadata",
    "type": {
        "type": "map",
        "values": [
            "null",
            "int",
            "float",
            "string",
            "boolean",
            "long",
            {
                "type": "map",
                "values": [
                    "null",
                    "int",
                    "float",
                    "string",
                    "boolean",
                    "long"
                ]
            }
        ]
    }
}

@JemDay
Copy link
Contributor

JemDay commented Jul 3, 2019

I think the prior comment from @clemensv may be touching slightly on what i'd been thinking about given proposal was very structured (which in a way i like) vs the approach taken with proto which is much more minimalistic. The proto schema is more aligned with the example @clemensv cited.

As it stands with the current schema proposal a mechanism to carry 'extensions' is required.

@fabiojose
Copy link
Contributor Author

Very clever @clemensv

I tested it and got the following schema:

{
  "namespace":"io.cloudevents",
  "type":"record",
  "name":"CloudEvent",
  "version":"0.4-wip",
  "doc":"Avro Event Format fo CloudEvents",
  "fields":[
    {
      "name":"metadata",
      "type":{
        "type":"map",
        "values":[
          "null",
          "int",
          "float",
          "string",
          "boolean",
          "long",
          "CloudEvent"
        ]
      }
    }
  ]
}

It looks more flexible than the first version.

@evankanderson
Copy link
Contributor

evankanderson commented Jul 11, 2019

We don't have float or boolean types defined at the moment.

We may also need to support binary types. I don't know how "binary" is spelled in Avro, possibly bytes.

- remove: float, boolean and long
- change field name: metadata to attribute

Signed-off-by: Fabio José <[email protected]>
@fabiojose
Copy link
Contributor Author

fabiojose commented Jul 12, 2019

Good points @evankanderson!

Done!

{
  "namespace":"io.cloudevents",
  "type":"record",
  "name":"CloudEvent",
  "version":"0.4-wip",
  "doc":"Avro Event Format for CloudEvents",
  "fields":[
    {
      "name":"attribute",
      "type":{
        "type":"map",
        "values":[
          "null",
          "int",
          "string",
          "bytes",
          "CloudEvent"
        ]
      }
    }
  ]
}

@duglin
Copy link
Collaborator

duglin commented Jul 15, 2019

@clemensv @evankanderson does this look ok now?

@JemDay
Copy link
Contributor

JemDay commented Jul 25, 2019

Disclaimer : Not an avro expert..

For consistency reasons with protobuf...

  • Could we change the name of the definition file to cloudevent.avsc
  • Does avro allow you to follow the same model as used in cloudevent.proto

To be fair i would have preferred the message in proto to have been called CloudEvent as opposed to CloudEventMap - i guess that's a different PR

Otherwise +1

@fabiojose
Copy link
Contributor Author

Hi @JemDay

"Could we change the name of the definition file to cloudevent.avsc": Sure!

  • I had used the spec.avsc to follow the same naming convention used by spec.json.

"Does avro allow you to follow the same model as used in cloudevent.proto"

  • unfortunately, not

@JemDay
Copy link
Contributor

JemDay commented Jul 25, 2019

Thanks ... thinking a bit more, maybe aligning around spec.format-suffix is better

Still +1

@fabiojose
Copy link
Contributor Author

Good point! I will change the file name.

@duglin
Copy link
Collaborator

duglin commented Jul 25, 2019

at this point I think consistency with the others is more important tho

@duglin
Copy link
Collaborator

duglin commented Jul 25, 2019

@fabiojose is this one ready to go? it was approved to but I don't want to merge if you still have a pending change

@fabiojose
Copy link
Contributor Author

Done, no pending changes!

The file will be spec.avsc, just like before.

@duglin duglin merged commit 6a71c04 into cloudevents:master Jul 25, 2019
@catkins-miso
Copy link

catkins-miso commented Jan 21, 2020

It seems there would have been a performance impact removing the first-class context attributes in favor of the attribute map approach. I understand the point of adding the attribute map for extensions, but I think the schema-on-read approach could have been isolated without impacting other use cases. In particular, if I'm listening to a firehose topic (events of many types)--or implementing an intermediary--I need to quickly evaluate the type context attribute. Not to mention Avro has good mechanics for forward and backwards compatibility which I think could have addressed concerns about introducing new context attributes.

@evankanderson
Copy link
Contributor

I think the attributes may have been simplified in the interim for v1 (when this was written, they were a json-style string -> value or object mapping, rather than a string -> simple type mapping.

Unfortunately, I think the v1.0 spec ship has sailed; I'm not sure what to suggest here.

@catkins-miso
Copy link

Thanks for the response @evankanderson. I did a little preliminary benchmarking (Apache Avro 1.9 dotnetcore), and I am seeing no meaningful difference in the binary serde performance from the attributes and having distinct fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avro format enconding
7 participants