-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ffi: Extract IR unit deserialization methods from Deserializer
into standalone functions to prepare for a templatized Deserializer
.
#544
Conversation
Warning Rate limit exceeded@LinZhihao-723 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 54 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe changes introduce new functionality related to deserialization methods for Intermediate Representation (IR) units. This includes the addition of new source files for handling schema tree nodes and key-value pair log events. Significant modifications were made to the Changes
Sequence Diagram(s)sequenceDiagram
participant Reader
participant Deserializer
participant SchemaTree
participant LogEvent
Reader->>Deserializer: Read data stream
Deserializer->>SchemaTree: Deserialize schema tree nodes
Deserializer->>LogEvent: Deserialize key-value pair log events
Deserializer->>Reader: Handle EOF conditions
Deserializer->>Reader: Return error if no messages available
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Outside diff range and nitpick comments (5)
components/core/src/clp/ffi/ir_stream/utils.cpp (1)
55-67
: New functionir_error_code_to_errc
looks good, with a minor suggestion.The function provides a clear mapping between custom IR error codes and standard error codes. The use of a switch statement is appropriate, and the default case ensures all possible input values are handled.
Consider adding a comment explaining the rationale behind each mapping, especially for cases where multiple IR error codes map to the same std::errc. This would improve maintainability and make it easier for other developers to understand or modify the mappings in the future.
Here's an example of how you could add comments:
auto ir_error_code_to_errc(IRErrorCode ir_error_code) -> std::errc { switch (ir_error_code) { case IRErrorCode_Incomplete_IR: // Incomplete IR suggests the result is partial, hence out of range return std::errc::result_out_of_range; case IRErrorCode_Corrupted_IR: case IRErrorCode_Decode_Error: // Both corrupted IR and decode errors indicate issues with the protocol return std::errc::protocol_error; case IRErrorCode_Eof: // End of file means no more messages are available return std::errc::no_message_available; default: // For any unhandled error codes, return not supported return std::errc::not_supported; } }components/core/src/clp/ffi/ir_stream/utils.hpp (1)
73-77
: LGTM: Well-designed error code conversion functionThe new
ir_error_code_to_errc
function is a valuable addition for error handling. The use of[[nodiscard]]
is appropriate to ensure the error code is not ignored. The function signature is clear and follows the existing coding style.Consider adding a brief comment explaining the purpose of this function, similar to other functions in this file. For example:
/** * Converts an internal IRErrorCode to its equivalent std::errc. * @param ir_error_code The internal error code to convert. * @return Equivalent `std::errc` code indicating the same error type. */ [[nodiscard]] auto ir_error_code_to_errc(IRErrorCode ir_error_code) -> std::errc;This would maintain consistency with the documentation style of other functions in the file.
components/core/CMakeLists.txt (1)
336-337
: LGTM! Consider grouping related files together.The addition of the new source files for IR unit deserialization methods is appropriate and follows the project's conventions. These files will be correctly included in the unit test build.
For improved readability and organization, consider grouping these new files with other related IR stream files in the
SOURCE_FILES_unitTest
list. This could make it easier to locate and manage related files in the future.components/core/src/clp/ffi/ir_stream/Deserializer.cpp (2)
133-133
: Limit scope ofschema_tree_node_key_name
variableThe variable
schema_tree_node_key_name
is declared outside the loop but is only used within theif (is_schema_tree_node_tag(tag))
block. To enhance code readability and maintainability, consider declaring it within this block to limit its scope.Apply this diff to move the declaration:
- std::string schema_tree_node_key_name; while (true) { if (auto const err{deserialize_tag(reader, tag)}; IRErrorCode::IRErrorCode_Success != err) { return ir_error_code_to_errc(err); } if (cProtocol::Eof == tag) { return std::errc::no_message_available; } if (is_schema_tree_node_tag(tag)) { + std::string schema_tree_node_key_name; auto const result{deserialize_ir_unit_schema_tree_node_insertion( reader, tag, schema_tree_node_key_name )}; // ...
177-177
: Avoid unnecessarystd::move
in return statementUsing
std::move
inreturn std::move(result);
may inhibit copy elision (Return Value Optimization). Sinceresult
is a local variable being returned, the compiler can optimize the return without the need forstd::move
. Consider removingstd::move
to potentially improve performance.Apply this diff to remove the unnecessary
std::move
:- return std::move(result); + return result;
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (8)
- components/core/CMakeLists.txt (1 hunks)
- components/core/src/clp/ffi/ir_stream/Deserializer.cpp (2 hunks)
- components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1 hunks)
- components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1 hunks)
- components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (1 hunks)
- components/core/src/clp/ffi/ir_stream/utils.cpp (2 hunks)
- components/core/src/clp/ffi/ir_stream/utils.hpp (3 hunks)
- components/core/tests/test-ir_encoding_methods.cpp (2 hunks)
Files skipped from review due to trivial changes (1)
- components/core/src/clp/ffi/ir_stream/Deserializer.hpp
Additional comments not posted (9)
components/core/src/clp/ffi/ir_stream/utils.cpp (1)
5-5
: New includes look good.The addition of <system_error> and "decoding_methods.hpp" headers is appropriate for the new functionality introduced in this file.
Also applies to: 11-11
components/core/src/clp/ffi/ir_stream/utils.hpp (1)
8-8
: LGTM: Appropriate include for new functionalityThe addition of
#include <system_error>
is necessary and appropriate for the newir_error_code_to_errc
function that returnsstd::errc
. This change doesn't affect existing code and supports the new error handling capabilities.components/core/tests/test-ir_encoding_methods.cpp (3)
1126-1126
: LGTM: EOF marker added correctly.The addition of the EOF marker to the ir_buf is appropriate and consistent with the IR protocol.
1147-1147
: Improved code readability.The addition of this blank line enhances the code's readability by clearly separating different sections of the test.
1148-1150
: Excellent addition: EOF handling test implemented.This new test is crucial for verifying the deserializer's behaviour when it encounters the EOF marker. It ensures that the correct error (std::errc::no_message_available) is returned, which is essential for robust error handling in the deserialization process.
components/core/src/clp/ffi/ir_stream/Deserializer.cpp (1)
134-171
: Deserialization loop logic is correct and robustThe modifications to the deserialization loop provide a clear and efficient mechanism for processing IR units. Error handling and control flow are appropriately managed, ensuring that the deserializer updates its state correctly.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (3)
188-212
: Ensure consistent error handling indeserialize_schema_tree_node_parent_id
When deserializing
parent_id
, unrecognized tags result inIRErrorCode::IRErrorCode_Corrupted_IR
. Confirm that this aligns with the overall error-handling strategy, and consider providing more descriptive error messages if necessary.
424-458
: Ensure correct deserialization flow indeserialize_value_and_construct_node_id_value_pairs
The logic for reading the next tag is conditional on the sizes of
schema
andnode_id_value_pairs
. Verify that tags are deserialized at the appropriate times to maintain synchronization and prevent deserialization errors.
498-534
: Validate handling of empty schemas indeserialize_ir_unit_kv_pair_log_event
When the schema is empty, the code expects the
tag
to becProtocol::Payload::ValueEmpty
. Ensure this expectation is always met and handle any anomalies that might occur due to corrupted or unexpected data streams.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
Show resolved
Hide resolved
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. Lets just fix the doc strings and update the PR description regarding the long-term plan for public APIs as we discussed.
|
||
namespace clp::ffi::ir_stream { | ||
/** | ||
* Deserializes schema tree node insertion IR unit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion applies to many other doc strings in this PR. Some are already correct thought.
* Deserializes schema tree node insertion IR unit. | |
* Deserializes a schema tree node insertion IR unit. |
* indicating the failure: | ||
* - std::errc::result_out_of_range if the IR stream is truncated. | ||
* - std::errc::protocol_error if the deserialized node type isn't supported. | ||
* - Same as `deserialize_schema_tree_node_parent_id` or `deserialize_schema_tree_node_key_name`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per offline discussion, lets change "Same as" to the following style to match the new guidelines.
* - Same as `deserialize_schema_tree_node_parent_id` or `deserialize_schema_tree_node_key_name`. | |
* - Forwards `deserialize_schema_tree_node_key_name`'s return values. | |
* - Forwards `deserialize_schema_tree_node_parent_id`'s return values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
426-460
: Clarify the logic in the value deserialization loop for better readabilityIn
deserialize_value_and_construct_node_id_value_pairs
, the conditionif (schema.size() != node_id_value_pairs.size())
within the loop may not be immediately clear to readers. Adding comments to explain why the tag is deserialized again only when more values are expected could improve code clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- components/core/src/clp/ffi/ir_stream/Deserializer.hpp (2 hunks)
- components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1 hunks)
- components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- components/core/src/clp/ffi/ir_stream/Deserializer.hpp
- components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp
🧰 Additional context used
📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (2)
Learnt from: LinZhihao-723 PR: y-scope/clp#544 File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:321-393 Timestamp: 2024-09-24T22:40:28.731Z Learning: In the codebase, the check for duplicate `node_id`s is performed at the caller end, so it is not necessary to check for duplicates in `deserialize_value_and_insert_to_node_id_value_pairs`.
Learnt from: LinZhihao-723 PR: y-scope/clp#544 File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:170-186 Timestamp: 2024-09-24T22:33:59.488Z Learning: Only trivially destructible global objects are allowed in the coding standard, so avoid suggesting the use of non-trivially destructible global objects like `static std::unordered_map`.
🔇 Additional comments (3)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (3)
172-189
: Functionschema_tree_node_tag_to_type
is implemented correctlyThe function accurately maps encoded tags to
SchemaTreeNode::Type
using a switch-case statement. The use ofstd::optional
to handle unknown tags is appropriate and enhances robustness.
323-395
: Deserialization of various value types is correctly implementedThe function
deserialize_value_and_insert_to_node_id_value_pairs
effectively handles the deserialization of different value types and appropriately inserts them intonode_id_value_pairs
. The switch-case structure is clear, and error handling is thorough.
500-536
: Deserialization of key-value pair log events is properly handledThe function
deserialize_ir_unit_kv_pair_log_event
correctly deserializes key-value pair log events, effectively managing schemas and value pairs. Error handling ensures robustness against incomplete or corrupted IR streams.
auto deserialize_string(ReaderInterface& reader, encoded_tag_t tag, std::string& deserialized_str) | ||
-> IRErrorCode { | ||
size_t str_length{}; | ||
if (cProtocol::Payload::StrLenUByte == tag) { | ||
uint8_t length{}; | ||
if (false == deserialize_int(reader, length)) { | ||
return IRErrorCode::IRErrorCode_Incomplete_IR; | ||
} | ||
str_length = static_cast<size_t>(length); | ||
} else if (cProtocol::Payload::StrLenUShort == tag) { | ||
uint16_t length{}; | ||
if (false == deserialize_int(reader, length)) { | ||
return IRErrorCode::IRErrorCode_Incomplete_IR; | ||
} | ||
str_length = static_cast<size_t>(length); | ||
} else if (cProtocol::Payload::StrLenUInt == tag) { | ||
uint32_t length{}; | ||
if (false == deserialize_int(reader, length)) { | ||
return IRErrorCode::IRErrorCode_Incomplete_IR; | ||
} | ||
str_length = static_cast<size_t>(length); | ||
} else { | ||
return IRErrorCode::IRErrorCode_Corrupted_IR; | ||
} | ||
if (clp::ErrorCode_Success != reader.try_read_string(str_length, deserialized_str)) { | ||
return IRErrorCode::IRErrorCode_Incomplete_IR; | ||
} | ||
return IRErrorCode::IRErrorCode_Success; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a maximum limit to str_length
to prevent potential resource exhaustion
In the function deserialize_string
, str_length
is derived from deserialized data and could potentially be very large. To prevent possible denial-of-service attacks or excessive memory allocation, consider adding a check to enforce a maximum allowable length for str_length
.
Apply this diff to add a maximum length check:
+ constexpr size_t MAX_STRING_LENGTH = 1024 * 1024; // 1 MB limit, adjust as necessary
...
if (clp::ErrorCode_Success != reader.try_read_string(str_length, deserialized_str)) {
+ if (str_length > MAX_STRING_LENGTH) {
+ return IRErrorCode::IRErrorCode_Corrupted_IR;
+ }
return IRErrorCode::IRErrorCode_Incomplete_IR;
}
Committable suggestion was skipped due to low confidence.
Deserializer
into standalone functions to prepare for a templatized Deserializer
.
… standalone functions to prepare for a templatized `Deserializer`. (y-scope#544)
… standalone functions to prepare for a templatized `Deserializer`. (y-scope#544)
Description
This PR is working towards #539
The workflow of the new deserializer API as discussed in #539 will have the following steps:
Before this PR, step 2 & 3 are in the same method. But thinking the deserializer as a high-level IR unit processor, it's cleaner to split these methods out into different methods. In an ideal implementation, we should make these methods as private methods of the deserializer. However, the deserializer will be implemented as a template to receive an IR unit handler defined in #540, which means we need to implement all the methods from step 2 to step 3 in the header file if we want to make them as private members of the deserializer. This will make the header too messy to manage.
As a solution for now, we decided to split the methods from step 2 and move them into a new set of the files named "ir_unit_deserialization_methods". This allows us to: (1). make things cleaner according to the workflow discussed above; (2). split most of the low-level deserialization into an implementation file. The downside of this solution is that these low-level APIs will be exposed directly to users as they are public C-style APIs. In the future, we should formally compile FFI core as a library and decide what public headers users can include.
For now, we use these new methods to re-implement the current
deserialize_to_next_log_event
method to consume the IR stream unit by unit.Validation performed
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Deserializer
class for better clarity on method functionalities.Tests