Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffi: Add Deserializer class to deserialize log events from key-value pair IR format. #511

Merged
merged 33 commits into from
Aug 25, 2024

Conversation

LinZhihao-723
Copy link
Member

Description

This PR implements Deserializer to deserialize log events from a key-value pair IR format. Deserializer is a class that stores all the necessary states of a key-value pair IR stream (currently including the schema tree and the UTC offset). It also provides API to read until the next key-value pair log event. The read operation is designed to be a transaction: if a failure happens during the deserialization, all the states will be reverted.
Notice that the class is designed to take any arbitrary ReadInterface to read bytes from a stream. The caller needs to maintain the stream to get the expected deserialization results. Due to the current limitations on our ffi layer design, we can't bind this class to one particular reader until all ffi libraries have implemented their reader wrapper.

Validation performed

  • Ensure the current code built with all the workflows
  • Add simple unit test to ensure: (1) the deserialization on the serialized byte stream will not return error; (2) in the deserialization result, the number of kv pairs should equal to the number of leaf nodes in the input map

@LinZhihao-723 LinZhihao-723 changed the title [WIP] ffi: Add Deserializer class to deserialize log events from key-value pair IR format. ffi: Add Deserializer class to deserialize log events from key-value pair IR format. Aug 16, 2024
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paused my review so you can take a look at the docstrings again and make sure they are complete and follow our conventions.

Comment on lines 17 to 19
* This class:
* - maintains all the necessary internal data structure to track deserialization state;
* - provide APIs to deserialize log events from an IR stream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is obvious, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I've added more details to explain the APIs should work as a transaction

[[nodiscard]] auto deserialize_schema(
ReaderInterface& reader,
encoded_tag_t& tag,
std::vector<SchemaTreeNode::id_t>& schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for someone new to this code, schema might be a confusing/overloaded term (and it's not something that we've documented well in SchemaTree). So perhaps we should create a type alias and add a docstring for it explaining that it's a collection of schema tree leaf node IDs that represents the schema of a KV-pair log event.

}

auto schema_tree_node_tag_to_type(encoded_tag_t tag) -> std::optional<SchemaTreeNode::Type> {
std::optional<SchemaTreeNode::Type> type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SchemaTreeNode::Type is a trivial type, so could we return it directly from the method without triggering a copy?

Comment on lines 399 to 410
encoded_tag_t str_packet_tag{};
if (auto const err{deserialize_tag(reader, str_packet_tag)};
IRErrorCode::IRErrorCode_Success != err)
{
return err;
}
std::string key_name;
if (auto const err{deserialize_string(reader, str_packet_tag, key_name)};
IRErrorCode::IRErrorCode_Success != err)
{
return err;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract this into a method? Then you also wouldn't need the comment.

@LinZhihao-723
Copy link
Member Author

The centOS build failed due to #521 and it is irrelevant to this PR.

@LinZhihao-723 LinZhihao-723 merged commit ea7678f into y-scope:main Aug 25, 2024
12 of 13 checks passed
gibber9809 pushed a commit to gibber9809/clp that referenced this pull request Oct 25, 2024
jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants