-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2 roadmap proposal #230
base: main
Are you sure you want to change the base?
V2 roadmap proposal #230
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,201 @@ | ||
# V2 implementation proposal | ||
|
||
## Why this proposal | ||
|
||
The current implementation uses different `CloudEvent` classes for the implementation | ||
of different protocol bindings. This adds complexity the integration of new technologies | ||
like `pydantic`. | ||
|
||
The separation of responsibilities in the package is not clear. We have multiple JSON | ||
implementations, while the format specification is the same for all bindings in the | ||
[`CloudEvent` spec](https://github.com/cloudevents/spec/blob/v1.0.2/cloudevents/formats/json-format.md): | ||
|
||
* Pydantic implementation is calling the HTTP implementation for this, adding unnecessary overhead | ||
* the Kafka binding uses its own implementation, creating code duplication and unnecessary maintenance efforts | ||
|
||
We should have a single `CloudEvent` class used for all bindings/formats since the event | ||
is the Python representation of the event data, which is the same no matter what binding | ||
is used. (we potentially could receive a HTTP event and forward it as it is using Kafka). | ||
|
||
The specific bindings and formats implementations should be separate from the `CloudEvent` | ||
class. | ||
|
||
This proposal contains some base rules for the package architecture and how we could implement | ||
the `CloudEvent` spec in this package. | ||
|
||
## Requirements | ||
|
||
In order to be able to transform a python CloudEvent to/from a specific | ||
protocol binding, this SDK has the following responsibilities: | ||
|
||
* Identifying what attributes have to be marshalled/unmarshalled (identifying the cloudevents extensions) | ||
* Validate required fields and generation of defaults | ||
* Marshalling/unmarshalling the cloudevents core and extensions fields | ||
* Marshalling/unmarshalling the `data` field | ||
* Encode/decode the marshalled fields to/from the specific binding format | ||
|
||
## Modules | ||
|
||
We should separate the implementation of the requirements so that they can be tested independently and | ||
each component be used for composition. | ||
|
||
Ideally we should have these modules (naming TBC): | ||
|
||
* `cloudevents.event` - This would contain the `CloudEvent` base class. This class is the same for all bindings/formats. | ||
* `cloudevents.formats` - This would contain the structured formats implementations (JSON, AVRO, Protobuf) | ||
* `cloudevents.bindings` - This would contain the specific binding implementations (kafka, http) | ||
Comment on lines
+45
to
+46
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the idea of moving forward and scoping modules like they are scoped in the spec. Looks like a good go-to solution to me |
||
|
||
This matches the naming and concepts we have in the `CloudEvent` spec | ||
|
||
## Matching requirements and modules | ||
|
||
### The `cloudevents.event` module and `CloudEvent` class | ||
|
||
The `CloudEvent` class would satisfy the following requirements: | ||
|
||
* Validate required fields and generation of default values | ||
* Marshalling/unmarshalling the `data` field | ||
* Identifying what attributes have to be marshalled/unmarshalled | ||
|
||
The `data` field depends on the specific event and the `contentdatatype` attribute and | ||
its marshalling/unmarshalling logic should be the same for any binding/format. It seems | ||
a `CloudEvent` class responsibility to take care of this. (We should implement this logic | ||
in overridable methods) | ||
Comment on lines
+60
to
+63
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So from one perspective, I do agree with the statement, but from the other, it sometimes is beneficial to have the marshalling/unmarshalling logic moved elsewhere. IMO we can achieve that by using mixin classes. Again, this is more of an implementation detail |
||
|
||
Ideally the end user will be able to extend the `CloudEvent` class for their necessity. Ie: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Extend or create a separate one from scratch and just the behavior should be the same. I guess we may benefit from having a |
||
|
||
* adding new extensions fields | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's the responsibility of the libraries CloudEvent implementation to handle all the possible extensions fields as far as I can tell. There may be special extension-specific mixins that provide additional behavior beneficial for this or that implementation, but overall we should be able to digest any extension fields with just the base implementation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, I used the word "extend" in a generic meaning. It could be also a mixin with some custom implementation. The idea is that it should be possible (and easy) to create a |
||
* adding fields/methods not meant to be marshalled (internal fields) | ||
* implementing new defaults (ie. creating a `MyEvent` class with `type="myevent"` default) | ||
|
||
We also want a DTO that can be accepted by the most common libraries (ie. JSON) to keep the | ||
`CloudEvent` class decoupled from the formats/bindings. Using a dictionary seems a good idea. | ||
Comment on lines
+71
to
+72
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So what we want is a canonical representation. That's what But I am also not sure if we want to always keep a canonical representation as dict. Maybe it's better to have a dataclass instead or expose some immutable data structure over dict. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would expect the end user to never deal with this DTO. They should always rely on either the This DTO structure was thought to be used only internally, to move data between the base class, the formats and the bindings. I thought about dictionaries because they are fast, and libraries we might want to use for formats and bindings will probably support them out of the box (the classic example would be the JSON library that maps JSON objects to dicts) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we'll then use Protocol definition as the base DTO. It already is immutable, well-defined in the spec as well as has a I'm kinda just thinking out loud here as I don't have a strong opinion on this at the moment. Also depending on whether the canonical representation is gonna be part of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ideally I think it would be more maintainable to keep formats implementation separate from the base class, so that they can be implemented and tested independently. The more formats we implement, the bigger the class will become and the created objects as a consequence. I am thinking about performance in systems where we want to handle a high number of events/sec and allocating memory for larger objects could become impactful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could also keep the existing implementation interface where we have "get_attributes" and "get_data" and use them in bindings and formats. If we find at some point that having a dict in the middle of the pipeline might help we can introduce it. |
||
|
||
We should implement a `to_dict`/`from_dict` functionalities in the `CloudEvent` class. | ||
|
||
QUESTION: Can we define a data type for the `data` field after it's been marshalled? | ||
(ie. can we say it will be either `str` or binary data?) | ||
Comment on lines
+76
to
+77
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe the type should be known in advance. |
||
|
||
### The `cloudevents.formats` module | ||
|
||
Formats do solve a single responsibility: | ||
|
||
* Marshalling/unmarshalling the cloudevents core and extensions fields | ||
|
||
Their implementation has to be used when other bindings implement their | ||
structured input/output and make sure their implementation is compatible. | ||
|
||
Each format accepts the dictionary from the `CloudEvent` as param | ||
and produces an output based on the format itself. Ie. | ||
|
||
```python | ||
def to_json(event: dict) -> str: ... | ||
def from_json(event: str) -> dict: ... | ||
``` | ||
|
||
TODO: Review typing as JSON produces a string but other formats might be different. | ||
Comment on lines
+79
to
+96
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This part is also a little bit tricky. I'd rather avoid doing a The next part is the API. As we've faced in the current implementation, it should be structured in a way that it's super easy to swap the underlying implementation (e.g. replace native |
||
|
||
### The `cloudevents.bindings` module | ||
|
||
Bindings do solve these responsibilities: | ||
|
||
* Marshalling/unmarshalling the cloudevents core and extensions fields **(if binary format)** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. bindings sometimes support structured formats as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea is for structured formats to use the formats implementation, to marshall the specific fields. (The assumption here is that formats are the same for all the binding implementations) Ie. If I need to marshall a CloudEvent in a JSON structured HTTP event, I expect the SDK to:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that assumption is not true taking into account the spec. I'm not quite sure I fully understand the example, but in HTTP event attributes are converted to headers and data is the HTTP request payload AFAIK. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the HTTP structured mode the whole event (attributes and data) is marshalled into the body. Headers MAY include the attributes, but attributes MUST be in the body. For JSON conversion I'd expect something like this: def to_http_json(cloudevent: CloudEvent):
dict_event = cloudevent.to_dict()
body = formats.to_json(dict_event)
headers = [] # compose headers according to spec, eventually grabbing data from dict_event
return headers, body |
||
* Encode/decode the marshalled fields to/from the specific binding format | ||
|
||
They also have the responsibility of coordinating the other modules. | ||
|
||
This is a **concept** example for `cloudevents.bindings.http` module | ||
|
||
```python | ||
from cloudevents.formats.json import to_json, from_json | ||
from cloudevents.event import CloudEvent | ||
|
||
|
||
# Shared dataclass to help with typing, might be a dictionary or a named tuple (not important) | ||
@dataclass | ||
class HTTPCloudEvent: | ||
headers: list | ||
body: Any # unsure about typing here | ||
|
||
|
||
def to_binary(event: CloudEvent) -> HTTPCloudEvent: | ||
e = event.to_dict() | ||
# logic to marshall into http binary output | ||
return HTTPCloudEvent(headers=[], body=[]) | ||
|
||
|
||
def to_json(event: CloudEvent) -> HTTPCloudEvent: | ||
e = to_json(event.to_dict()) | ||
# logic to compose the JSON response | ||
return HTTPCloudEvent(headers=[], body=[]) | ||
|
||
|
||
def from_binary(headers: list, body: dict) -> CloudEvent: | ||
e = {} | ||
# specific logic to unmarshall raw headers into a dictionary | ||
return CloudEvent.from_dict(e) | ||
|
||
|
||
def from_json(body: str) -> CloudEvent: | ||
e = from_json(body) | ||
return CloudEvent.from_dict(e) | ||
|
||
|
||
def from_http(headers: list, body: dict) -> CloudEvent: | ||
# identify if structured or binary based on headers / body | ||
# and use the proper `from_` function | ||
# We should also handle here batches | ||
... | ||
``` | ||
|
||
Other bindings would behave similarly, where the end user would need only | ||
to use the relevant module in `cloudevents.bindings` namespace. | ||
|
||
## CloudEvent subclasses | ||
|
||
We'll want SDK users to be able to extend the `CloudEvent` class, so they will | ||
be able to manage marshalling/unmarshalling of the `data` field for different | ||
event types and their python typing. | ||
|
||
This could be easily done by introducing a mapping between the `type` field and | ||
`CloudEvent` subclasses as a parameter in the `from_` functions. Ie. | ||
|
||
```python | ||
def from_binary(headers: list, body: dict, event_types: Optional[dict[str, type]] = None) -> CloudEvent: | ||
event_as_dict = {} | ||
# specific logic to unmarshall raw headers into a dictionary | ||
if event_types: | ||
# Use | ||
return event_types.get(event_as_dict['type'], CloudEvent).from_dict(e) | ||
else: | ||
return CloudEvent.from_dict(e) | ||
``` | ||
|
||
## Pydantic support | ||
|
||
The current Pydantic implementation is implemented as a possible substitute of | ||
HTTP Event class, by looking at its conversion module and tests, but it's used | ||
by calling the functions in conversion module. | ||
|
||
Pydantic offers good support and performance for data validation, defaults and | ||
JSON serialization (some of the requirements we have identified) | ||
|
||
We need to make a decision: | ||
|
||
* We use pydantic as a base for `CloudEvent` class: | ||
* PROs | ||
* No need to implement data validation | ||
* Performance (pydantic V2 is written in Rust) | ||
* We can lay down some base logic on `data` field serialization (ie. nested pydantic models) | ||
* Enable users to use pydantic for automatic documentation functionalities | ||
* CONs | ||
* We introduce a new package dependency, and depend on its implementation | ||
* _minor_ If we use pydantic JSON serialization it is implemented in the model class (slightly different from what proposed but we can use wrapper functions in the JSON format module) | ||
* We remove pydantic support: | ||
* PROs | ||
* We have more control over validation and defaults implementation | ||
* We don't depend on other packages | ||
* CONs | ||
* Performance on validation and defaults (we can still use libraries like `orjson` for JSON performance) | ||
* Additional effort in writing and maintaining data validation and required/default fields | ||
... | ||
Comment on lines
+174
to
+201
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even though I love Pydantic, I don't think we should have it as a core dependency and I'd rather have a Pydantic-specific extension/implementation of CloudEvents. I do agree that it kinda solves a lot of questions out of the box, but still is a dependency we can't drag in. Especially while there are multiple projects that are still using Pydantic v1 while the v2 is kinda the mainstream already. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a strong opinion here. I just thought it might be easier to maintain a single approach for more consistency. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It definitely might be, but as I mentioned, some libraries currently depend on CloudEvents and they may not be able to move forward if CloudEvents brings Pydantic as a required dependency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd arguably say this is the responsibility of the specific binding format extension rather than the core library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a list of all the operations needed to transform a python class in a specific event format implementation