The Kafka Protocol Binding for CloudEvents defines how events are mapped to Kafka messages.
- 1.1. Conformance
- 1.2. Relation to Kafka
- 1.3. Content Modes
- 1.4. Event Formats
- 1.5. Security
- 2.1. data
- 3.1. Key Attribute
- 3.2. Binary Content Mode
- 3.3. Structured Content Mode
CloudEvents is a standardized and protocol-agnostic definition of the structure and metadata description of events. This specification defines how the elements defined in the CloudEvents specification are to be used in the Kafka protocol as Kafka messages (aka Kafka records).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119.
This specification does not prescribe rules constraining transfer or settlement of event messages with Kafka; it solely defines how CloudEvents are expressed in the Kafka protocol as Kafka messages.
The specification defines two content modes for transferring events: structured and binary.
In the structured content mode, event metadata attributes and event data are placed into the Kafka message value section using an event format.
In the binary content mode, the value of the event data
MUST be placed into
the Kafka message's value section as-is, with the content-type
header value
declaring its media type; all other event attributes MUST be mapped to the
Kafka message's header section.
Implementations that use Kafka 0.11.0.0 and above MAY use either binary or structured modes. Implementations that use Kafka 0.10.x.x and below MUST use only use structured mode and encode the event in JSON. This is because older versions of Kafka lacked support for message level headers.
Event formats, used with the structured content mode, define how an event is expressed in a particular data format. All implementations of this specification MUST support the JSON event format.
This specification does not introduce any new security features for Kafka, or mandate specific existing features to be used.
This specification does not further define any of the CloudEvents event attributes.
data
is assumed to contain opaque application data that is
encoded as declared by the datacontenttype
attribute.
An application is free to hold the information in any in-memory representation of its choosing, but as the value is transposed into Kafka as defined in this specification, core Kafka provides data available as a sequence of bytes.
For instance, if the declared datacontenttype
is
application/json;charset=utf-8
, the expectation is that the data
value is made available as UTF-8 encoded JSON text.
With Kafka 0.11.0.0 and above, the content mode is chosen by the sender of the event. Protocol usage patterns that might allow solicitation of events using a particular content mode might be defined by an application, but are not defined here.
The receiver of the event can distinguish between the two content modes by
inspecting the content-type
Header of the
Kafka message. If the header is present and its value is prefixed with the
CloudEvents media type application/cloudevents
, indicating the use of a known
event format, the receiver uses structured mode,
otherwise it defaults to binary mode.
If a receiver finds a CloudEvents media type as per the above rule, but with an
event format that it cannot handle, for instance
application/cloudevents+avro
, it MAY still treat the event as binary and
forward it to another party as-is.
If the content-type
header is not present then the receiver uses
structured mode with the JSON event format.
The 'key' attribute is populated by a partitionKeyExtractor function. The partitionKeyExtractor is a protocol specific function that contains bespoke logic to extract and populate the value. A default implementation of the extractor will use the Partitioning extension value.
The binary content mode accommodates any shape of event data, and allows for efficient transfer and without transcoding effort.
For the binary mode, the header content-type
property MUST be mapped
directly to the CloudEvents datacontenttype
attribute.
The data
byte-sequence MUST be used as the value of the Kafka
message.
All CloudEvents attributes and
CloudEvent Attributes Extensions
with exception of data
MUST be individually mapped to and from the Header
fields in the Kafka message.
CloudEvent attributes are prefixed with ce_
for use in the
message-headers section.
Examples:
* `time` maps to `ce_time`
* `id` maps to `ce_id`
* `specversion` maps to `ce_specversion`
The value for each Kafka header is constructed from the respective header's Kafka representation, compliant with the Kafka message format specification.
This example shows the binary mode mapping of an event into the
Kafka message. All other CloudEvents attributes
are mapped to Kafka Header fields with prefix ce_
.
Mind that ce_
here does refer to the event data
content carried in the payload.
------------------ Message -------------------
Topic Name: mytopic
------------------- key ----------------------
Key: mykey
------------------ headers -------------------
ce_specversion: "1.0"
ce_type: "com.example.someevent"
ce_source: "/mycontext/subcontext"
ce_id: "1234-1234-1234"
ce_time: "2018-04-05T03:56:24Z"
content-type: application/avro
.... further attributes ...
------------------- value --------------------
... application data encoded in Avro ...
-----------------------------------------------
The structured content mode keeps event metadata and data together in the payload, allowing simple forwarding of the same event across multiple routing hops, and across multiple protocols.
If present, the Kafka message header property content-type
MUST be set to the
media type of an event format.
Example for the JSON format:
content-type: application/cloudevents+json; charset=UTF-8
The chosen event format defines how all attributes,
and data
, are represented.
The event metadata and data are then rendered in accordance with the event format specification and the resulting data becomes the Kafka application data section.
Implementations MAY include the same Kafka headers as defined for the binary mode.
This example shows a JSON event format encoded event:
------------------ Message -------------------
Topic Name: mytopic
------------------- key ----------------------
Key: mykey
------------------ headers -------------------
content-type: application/cloudevents+json; charset=UTF-8
------------------- value --------------------
{
"specversion" : "1.0",
"type" : "com.example.someevent",
"source" : "/mycontext/subcontext",
"id" : "1234-1234-1234",
"time" : "2018-04-05T03:56:24Z",
"datacontenttype" : "application/xml",
... further attributes omitted ...
"data" : {
... application data encoded in XML ...
}
}
-----------------------------------------------
- Kafka The distributed stream platform
- Kafka-Message-Format The Kafka format message
- RFC2046 Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types
- RFC2119 Key words for use in RFCs to Indicate Requirement Levels
- RFC3629 UTF-8, a transformation format of ISO 10646
- RFC7159 The JavaScript Object Notation (JSON) Data Interchange Format