You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the IR format has evolved, an IR stream (ignoring the preamble and end-of-stream byte) is no longer a sequence of serialized unstructured log events. In addition to log events, we’ve introduced other concepts that may change the stream's state but without producing a log event. For clarity, we’ll refer to these “concepts,” including log events, as IR units. For example, to support loggers that change time zones, we’ve added an IR unit that indicates a UTC offset change. These new IR units may appear in between log-event IR units. Moreover, the order of these IR units is unpredictable, so we cannot say, for instance, that they will appear after every three log events. The IR units we have in the latest IR format are:
Log event: user-decided information stored as key-value pairs
UTC offset change: change of UTC offset indicating the time zone information
Schema tree node insertion: insert new nodes to grow the schema tree
End of stream indicator: end of stream
Note that although our current IR streams can be stateful, that statefulness was always updated with each log event. For instance, the four-byte-integer encoding IR stream stores the timestamp in each log event as a timestamp delta; thus, an IR stream reader needs to keep track of the absolute timestamp of the last log event so that it can calculate the absolute timestamp of the next log event as last_log_event_abs_timestamp + next_log_event_timestamp_delta. This stream state is updated after deserializing each log event. However, as mentioned before, a UTC-offset-change IR unit may be updated in between any number of log events, and it then affects any log events deserialized afterward. (Although we discuss reading/deserializing IR streams above, the process is similar for writing/serializing.)
The current IR stream reader APIs make it easy to read log events, but have several limitations for IR formats that include additional IR units like UTC offset changes. Currently, when the caller calls the API to read a log event, the reader will read all IR units up to and including the next log event. For instance, if there are one or more UTC offset changes before the next log event, each would be read—updating the reader’s state—and then the log event would be read and returned. The limitations of this design are as follows:
When the IR stream reader encounters an error, the reader needs to revert any state changes it made and reset the read head (i.e., each read needs to operate like a transaction). This is for two reasons:
A common error scenario is a truncated read in our FFI libraries—the caller passes in a buffer of IR stream data to be read, and that buffer may not contain enough content to completely read up to and including the next log event.
We want to leave the error handling decision to the caller rather than swallowing it.
If the caller cares about any state changes, they need to manually compare the reader’s state before and after reading a log event in order to determine what state has been updated.
Callers also can’t determine if there were consecutive updates to a piece of state (e.g., a UTC offset change).
Thus, we propose redesigning the reader’s APIs to solve these issues.
Possible implementation
To read IR streams, we propose a class structure that consists of a deserializer class and optional user-defined IR unit handlers. Intuitively, the deserializer will be responsible for deserializing IR units from the stream. Users of the deserializer can pass in IR unit handlers for the IR units they are interested in. When the deserializer deserializes one of these IR units, it will call the relevant IR unit handler, allowing the user to perform any additional handling for the IR unit.
The text was updated successfully, but these errors were encountered:
Request
As the IR format has evolved, an IR stream (ignoring the preamble and end-of-stream byte) is no longer a sequence of serialized unstructured log events. In addition to log events, we’ve introduced other concepts that may change the stream's state but without producing a log event. For clarity, we’ll refer to these “concepts,” including log events, as IR units. For example, to support loggers that change time zones, we’ve added an IR unit that indicates a UTC offset change. These new IR units may appear in between log-event IR units. Moreover, the order of these IR units is unpredictable, so we cannot say, for instance, that they will appear after every three log events. The IR units we have in the latest IR format are:
Log event
: user-decided information stored as key-value pairsUTC offset change
: change of UTC offset indicating the time zone informationSchema tree node insertion
: insert new nodes to grow the schema treeEnd of stream indicator
: end of streamNote that although our current IR streams can be stateful, that statefulness was always updated with each log event. For instance, the four-byte-integer encoding IR stream stores the timestamp in each log event as a timestamp delta; thus, an IR stream reader needs to keep track of the absolute timestamp of the last log event so that it can calculate the absolute timestamp of the next log event as last_log_event_abs_timestamp + next_log_event_timestamp_delta. This stream state is updated after deserializing each log event. However, as mentioned before, a UTC-offset-change IR unit may be updated in between any number of log events, and it then affects any log events deserialized afterward. (Although we discuss reading/deserializing IR streams above, the process is similar for writing/serializing.)
The current IR stream reader APIs make it easy to read log events, but have several limitations for IR formats that include additional IR units like UTC offset changes. Currently, when the caller calls the API to read a log event, the reader will read all IR units up to and including the next log event. For instance, if there are one or more UTC offset changes before the next log event, each would be read—updating the reader’s state—and then the log event would be read and returned. The limitations of this design are as follows:
Thus, we propose redesigning the reader’s APIs to solve these issues.
Possible implementation
To read IR streams, we propose a class structure that consists of a deserializer class and optional user-defined IR unit handlers. Intuitively, the deserializer will be responsible for deserializing IR units from the stream. Users of the deserializer can pass in IR unit handlers for the IR units they are interested in. When the deserializer deserializes one of these IR units, it will call the relevant IR unit handler, allowing the user to perform any additional handling for the IR unit.
The text was updated successfully, but these errors were encountered: