Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split StreamReader into an interface and implementation to prepare for adding another IR stream reader. #26

Merged
merged 42 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
3dae806
Refactor StreamReader to modularize decoding logic.
junhaoliao Oct 15, 2024
ab32045
Add a comment section for methods in StreamReader.hpp.
junhaoliao Oct 15, 2024
8dc3dff
Add `get_version` to decoding methods.
junhaoliao Oct 15, 2024
eba1d30
Move `#include <emscripten/bind.h>` from StreamReader.hpp to StreamRe…
junhaoliao Oct 15, 2024
5398d1d
Rename StreamReader -> IRStreamReader.
junhaoliao Oct 15, 2024
cfd55ca
Rename ClpIrStreamReader to ClpIRStreamReader in bindings.
junhaoliao Oct 15, 2024
d1ed2f0
Extract a StreamReader base class from IRStreamReader.
junhaoliao Oct 15, 2024
bf5e4c9
Rename IRStreamReader -> IrStreamReader.
junhaoliao Oct 15, 2024
ad306f2
merging main into junhao PR
davemarco Oct 31, 2024
c474bb4
some restructure
davemarco Nov 1, 2024
f4a6896
fix cmake
davemarco Nov 1, 2024
dba3910
more changes after test
davemarco Nov 1, 2024
9ddd974
junhao changes
davemarco Nov 4, 2024
a5cb529
fix lint
davemarco Nov 4, 2024
cd50c9a
version methodology change + better error handling
davemarco Nov 4, 2024
d2cf122
lint fix
davemarco Nov 4, 2024
7fe3a58
Update src/clp_ffi_js/ir/decoding_methods.hpp
davemarco Nov 5, 2024
9b67be9
fix print statement
davemarco Nov 5, 2024
fb2b290
fix lint
davemarco Nov 5, 2024
464cf31
forgot one comment, adding type for log level filter
davemarco Nov 5, 2024
6e8d025
Update src/clp_ffi_js/ir/StreamReader.hpp
davemarco Nov 5, 2024
19e41b9
Update src/clp_ffi_js/ir/StreamReader.cpp
davemarco Nov 5, 2024
4988949
Update src/clp_ffi_js/ir/decoding_methods.hpp
davemarco Nov 5, 2024
c79a83f
kirk review comments
davemarco Nov 5, 2024
ef0b86b
fix lint
davemarco Nov 5, 2024
d9e9e97
Apply suggestions from code review
davemarco Nov 6, 2024
efaeedd
kirk review
davemarco Nov 6, 2024
3764298
rename
davemarco Nov 6, 2024
c81cd4b
cleanup
davemarco Nov 6, 2024
28d4250
fix seek issue
davemarco Nov 6, 2024
c5db9a6
Update src/clp_ffi_js/ir/StreamReader.cpp
davemarco Nov 6, 2024
4a83bce
Update src/clp_ffi_js/ir/StreamReader.hpp
davemarco Nov 6, 2024
de7a2f8
Update src/clp_ffi_js/ir/StreamReader.hpp
davemarco Nov 6, 2024
1b31f08
Apply suggestions from code review
davemarco Nov 6, 2024
226ab3a
Update src/clp_ffi_js/ir/StreamReader.cpp
davemarco Nov 6, 2024
fa4debd
change to pass by value
davemarco Nov 6, 2024
4108398
kirk changes
davemarco Nov 6, 2024
c7a8fbf
change submodule
davemarco Nov 6, 2024
d5cad52
Merge branch 'main' into extractBaseClass
davemarco Nov 6, 2024
bfea301
Apply suggestions from code review
davemarco Nov 6, 2024
72bdf6e
revert submodule change
davemarco Nov 6, 2024
119de6c
use see
davemarco Nov 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
264 changes: 264 additions & 0 deletions src/clp_ffi_js/ir/IrStreamReader.cpp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides renaming the class from StreamReader to IrStreamReader, did we make any other changes to this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed create() since we have effectively moved this to base class. I removed encripten bindings and put them all in base class.

Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
#include "IrStreamReader.hpp"

#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <iterator>
#include <memory>
#include <optional>
#include <string>
#include <string_view>
#include <system_error>
#include <type_traits>
#include <utility>
#include <vector>

#include <clp/Array.hpp>
#include <clp/ErrorCode.hpp>
#include <clp/ir/LogEventDeserializer.hpp>
#include <clp/ir/types.hpp>
#include <clp/streaming_compression/zstd/Decompressor.hpp>
#include <clp/TraceableException.hpp>
#include <clp/type_utils.hpp>
#include <emscripten/bind.h>
#include <emscripten/em_asm.h>
#include <emscripten/val.h>
#include <spdlog/spdlog.h>

#include <clp_ffi_js/ClpFfiJsException.hpp>
#include <clp_ffi_js/constants.hpp>
#include <clp_ffi_js/ir/decoding_methods.hpp>
#include <clp_ffi_js/ir/LogEventWithLevel.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StreamReaderDataContext.hpp>

using namespace std::literals::string_literals;
using clp::ir::four_byte_encoded_variable_t;

namespace clp_ffi_js::ir {
auto IrStreamReader::create(DataArrayTsType const& data_array) -> IrStreamReader {
auto const length{data_array["length"].as<size_t>()};
SPDLOG_INFO("IrStreamReader::create: got buffer of length={}", length);

// Copy array from JavaScript to C++
clp::Array<char> data_buffer{length};
// NOLINTBEGIN(cppcoreguidelines-pro-type-reinterpret-cast)
emscripten::val::module_property("HEAPU8")
.call<void>("set", data_array, reinterpret_cast<uintptr_t>(data_buffer.data()));
// NOLINTEND(cppcoreguidelines-pro-type-reinterpret-cast)

auto zstd_decompressor{std::make_unique<clp::streaming_compression::zstd::Decompressor>()};
zstd_decompressor->open(data_buffer.data(), length);

auto stream_reader_data_context{create_deserializer_and_data_context(
std::move(zstd_decompressor),
std::move(data_buffer)
)};
return IrStreamReader{std::move(stream_reader_data_context)};
}

auto IrStreamReader::get_num_events_buffered() const -> size_t {
return m_encoded_log_events.size();
}

auto IrStreamReader::get_filtered_log_event_map() const -> FilteredLogEventMapTsType {
if (false == m_filtered_log_event_map.has_value()) {
return FilteredLogEventMapTsType{emscripten::val::null()};
}

return FilteredLogEventMapTsType{emscripten::val::array(m_filtered_log_event_map.value())};
}

void IrStreamReader::filter_log_events(emscripten::val const& log_level_filter) {
if (log_level_filter.isNull()) {
m_filtered_log_event_map.reset();
return;
}

m_filtered_log_event_map.emplace();
auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter
)};
Copy link

@coderabbitai coderabbitai bot Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for invalid log_level_filter input

Currently, the function assumes that log_level_filter is a valid array. To prevent potential errors if a non-array is passed from JavaScript, consider adding a type check and throwing an exception or returning an error if the input is invalid.

Apply this diff to enhance input validation:

 void IrStreamReader::filter_log_events(emscripten::val const& log_level_filter) {
     if (log_level_filter.isNull()) {
         m_filtered_log_event_map.reset();
         return;
     }
+    if (!log_level_filter.isArray()) {
+        throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_BadParam,
+            __FILENAME__,
+            __LINE__,
+            "Expected an array for log_level_filter"
+        };
+    }

     m_filtered_log_event_map.emplace();
     auto filter_levels{emscripten::vecFromJSArray<std::underlying_type_t<LogLevel>>(log_level_filter
     )};

Committable suggestion was skipped due to low confidence.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now this is skippable since the wrapper code is typed with TypeScript and any usage with non-array would fail static checks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood. Since the wrapper code is typed with TypeScript and non-array inputs would fail static checks, additional error handling here is not necessary.


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.367Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually sorry for my oversight, I think we want to define a type (similar to FilteredLogEventMapTsType) instead of directly using emscripten::val here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, thank you for the clarification. Defining a specific type for log_level_filter will enhance type safety and improve the code's readability. Here's how you might define a type similar to FilteredLogEventMapTsType:

using LogLevelFilterTsType = emscripten::val; // Define a type representing the log level filter

void IrStreamReader::filter_log_events(LogLevelFilterTsType const& log_level_filter) {
    // Implementation remains the same
}

This approach makes the expected input type explicit in the method signature and aligns with the existing type definitions.

Since this updates our previous understanding and may contradict the earlier learning (ID: 940b05fa-811e-47b0-98a2-bfbbe962ccf3) regarding the unnecessary runtime error handling for log_level_filter, would you like me to remove this learning? You can also manage learnings directly using the CodeRabbit UI learnings page.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

for (size_t log_event_idx = 0; log_event_idx < m_encoded_log_events.size(); ++log_event_idx) {
auto const& log_event = m_encoded_log_events[log_event_idx];
if (std::ranges::find(
filter_levels,
clp::enum_to_underlying_type(log_event.get_log_level())
)
!= filter_levels.end())
{
m_filtered_log_event_map->emplace_back(log_event_idx);
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider adding a type check for log_level_filter

Currently, the function assumes that log_level_filter is a valid array. To prevent potential runtime errors if a non-array is passed from JavaScript, consider adding a type check and throwing an exception if the input is invalid.


auto IrStreamReader::deserialize_stream() -> size_t {
if (nullptr == m_stream_reader_data_context) {
return m_encoded_log_events.size();
}

constexpr size_t cDefaultNumReservedLogEvents{500'000};
m_encoded_log_events.reserve(cDefaultNumReservedLogEvents);

while (true) {
auto result{m_stream_reader_data_context->get_deserializer().deserialize_log_event()};
if (result.has_error()) {
auto const error{result.error()};
if (std::errc::no_message_available == error) {
break;
}
if (std::errc::result_out_of_range == error) {
SPDLOG_ERROR("File contains an incomplete IR stream");
break;
}
Copy link

@coderabbitai coderabbitai bot Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Uniform error handling upon deserialization failure

When encountering std::errc::result_out_of_range, the code logs an error and breaks the loop. To maintain consistency and ensure the caller is properly informed of all deserialization failures, consider throwing an exception here as well.

Here's a suggested change:

 if (std::errc::result_out_of_range == error) {
     SPDLOG_ERROR("File contains an incomplete IR stream");
-    break;
+    throw ClpFfiJsException{
+        clp::ErrorCode::ErrorCode_Corrupt,
+        __FILENAME__,
+        __LINE__,
+        "File contains an incomplete IR stream"
+    };
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (std::errc::no_message_available == error) {
break;
}
if (std::errc::result_out_of_range == error) {
SPDLOG_ERROR("File contains an incomplete IR stream");
break;
}
if (std::errc::no_message_available == error) {
break;
}
if (std::errc::result_out_of_range == error) {
SPDLOG_ERROR("File contains an incomplete IR stream");
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Corrupt,
__FILENAME__,
__LINE__,
"File contains an incomplete IR stream"
};
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not throw an exception here because we intend to handle missing End-of-Stream packet gracefully.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood. Thank you for the clarification.


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:85-91
Timestamp: 2024-11-04T19:37:26.993Z
Learning: In the `IrStreamReader::deserialize_stream` method, we handle missing End-of-Stream packets gracefully by logging an error and breaking the loop when encountering `std::errc::result_out_of_range`, rather than throwing an exception.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Corrupt,
__FILENAME__,
__LINE__,
"Failed to deserialize: "s + error.category().name() + ":" + error.message()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::format?

};
}
auto const& log_event = result.value();
auto const& message = log_event.get_message();

auto const& logtype = message.get_logtype();
constexpr size_t cLogLevelPositionInMessages{1};
LogLevel log_level{LogLevel::NONE};
if (logtype.length() > cLogLevelPositionInMessages) {
// NOLINTNEXTLINE(readability-qualified-auto)
auto const log_level_name_it{std::find_if(
cLogLevelNames.begin() + static_cast<size_t>(cValidLogLevelsBeginIdx),
cLogLevelNames.end(),
[&](std::string_view level) {
return logtype.substr(cLogLevelPositionInMessages).starts_with(level);
}
)};
if (log_level_name_it != cLogLevelNames.end()) {
log_level = static_cast<LogLevel>(
std::distance(cLogLevelNames.begin(), log_level_name_it)
);
}
}

auto log_viewer_event{LogEventWithLevel<four_byte_encoded_variable_t>(
log_event.get_timestamp(),
log_event.get_utc_offset(),
message,
log_level
)};
m_encoded_log_events.emplace_back(std::move(log_viewer_event));
}
m_stream_reader_data_context.reset(nullptr);
return m_encoded_log_events.size();
}

auto IrStreamReader::decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType {
if (use_filter && false == m_filtered_log_event_map.has_value()) {
return DecodedResultsTsType{emscripten::val::null()};
}

size_t length{0};
if (use_filter) {
length = m_filtered_log_event_map->size();
} else {
length = m_encoded_log_events.size();
}
if (length < end_idx || begin_idx > end_idx) {
return DecodedResultsTsType{emscripten::val::null()};
}

std::string message;
constexpr size_t cDefaultReservedMessageLength{512};
message.reserve(cDefaultReservedMessageLength);
auto const results{emscripten::val::array()};

for (size_t i = begin_idx; i < end_idx; ++i) {
size_t log_event_idx{0};
if (use_filter) {
log_event_idx = m_filtered_log_event_map->at(i);
} else {
log_event_idx = i;
}
auto const& log_event{m_encoded_log_events[log_event_idx]};

auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary since we have the exception?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary since we have the exception?

break;
}
message = parsed.value();
Copy link

@coderabbitai coderabbitai bot Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle message decoding failures explicitly

In the decode_range method, when message decoding fails, the code logs an error and breaks the loop. This may result in partial results without informing the caller of the failure. Consider throwing an exception to notify the caller about the decoding issue.

Apply this diff to throw an exception upon decoding failure:

     auto const parsed{log_event.get_message().decode_and_unparse()};
     if (false == parsed.has_value()) {
         SPDLOG_ERROR("Failed to decode message.");
-        break;
+        throw ClpFfiJsException{
+            clp::ErrorCode::ErrorCode_Corrupt,
+            __FILENAME__,
+            __LINE__,
+            "Failed to decode message"
+        };
     }
     message = parsed.value();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
break;
}
message = parsed.value();
auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Corrupt,
__FILENAME__,
__LINE__,
"Failed to decode message"
};
}
message = parsed.value();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this. Messages decoding is not expected to fail unless the file has been corrupted / the deserializer in CLP core does not handle edge cases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood. Throwing an exception upon message decoding failure is appropriate here since message decoding failures are not expected unless the file has been corrupted or the deserializer in CLP core does not handle edge cases. I'll note this for future reference.


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:162-168
Timestamp: 2024-11-04T19:55:26.917Z
Learning: In the `IrStreamReader::decode_range` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, message decoding failures are not expected unless the file is corrupted or the deserializer in CLP core does not handle edge cases, so it's appropriate to throw an exception upon message decoding failure.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Copy link

@coderabbitai coderabbitai bot Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle message decoding failures more explicitly

In the decode_range method, when message decoding fails, the code logs an error and breaks the loop. This might result in partial results without informing the caller of the failure. Consider throwing an exception to notify the caller about the decoding issue.

You can apply this diff to throw an exception upon decoding failure:

 auto const parsed{log_event.get_message().decode_and_unparse()};
 if (false == parsed.has_value()) {
     SPDLOG_ERROR("Failed to decode message.");
-    break;
+    throw ClpFfiJsException{
+        clp::ErrorCode::ErrorCode_Corrupt,
+        __FILENAME__,
+        __LINE__,
+        "Failed to decode message"
+    };
 }
 message = parsed.value();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
break;
}
message = parsed.value();
auto const parsed{log_event.get_message().decode_and_unparse()};
if (false == parsed.has_value()) {
SPDLOG_ERROR("Failed to decode message.");
throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Corrupt,
__FILENAME__,
__LINE__,
"Failed to decode message"
};
}
message = parsed.value();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be the same as the above comment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, thank you for pointing that out. It seems the comment was duplicated. Sorry for any confusion.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.


m_ts_pattern.insert_formatted_timestamp(log_event.get_timestamp(), message);

EM_ASM(
{ Emval.toValue($0).push([UTF8ToString($1), $2, $3, $4]); },
results.as_handle(),
message.c_str(),
log_event.get_timestamp(),
log_event.get_log_level(),
log_event_idx + 1
);
}

Copy link

@coderabbitai coderabbitai bot Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimize the loop in decode_range

Instead of manually constructing the results array using EM_ASM, consider using a more direct approach with Emscripten bindings to improve readability and maintainability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be explored in the future.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, understood.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#26
File: src/clp_ffi_js/ir/IrStreamReader.cpp:58-59
Timestamp: 2024-11-04T19:35:32.602Z
Learning: In the `IrStreamReader::filter_log_events` method in `src/clp_ffi_js/ir/IrStreamReader.cpp`, the wrapper code is typed with TypeScript, ensuring that non-array inputs for `log_level_filter` would fail static checks, so additional runtime error handling is unnecessary.

return DecodedResultsTsType(results);
}

IrStreamReader::IrStreamReader(
StreamReaderDataContext<four_byte_encoded_variable_t>&& stream_reader_data_context
)
: m_stream_reader_data_context{std::make_unique<
StreamReaderDataContext<four_byte_encoded_variable_t>>(
std::move(stream_reader_data_context)
)},
m_ts_pattern{m_stream_reader_data_context->get_deserializer().get_timestamp_pattern()} {}

auto IrStreamReader::create_data_context(
std::unique_ptr<clp::streaming_compression::zstd::Decompressor>&& zstd_decompressor,
clp::Array<char>&& data_buffer
) -> StreamReaderDataContext<four_byte_encoded_variable_t> {
rewind_reader_and_validate_encoding_type(*zstd_decompressor);

auto result{
clp::ir::LogEventDeserializer<four_byte_encoded_variable_t>::create(*zstd_decompressor)
};
if (result.has_error()) {
auto const error_code{result.error()};
SPDLOG_CRITICAL(
"Failed to create deserializer: {}:{}",
error_code.category().name(),
error_code.message()
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge this log into the the exception?

throw ClpFfiJsException{
clp::ErrorCode::ErrorCode_Failure,
__FILENAME__,
__LINE__,
"Failed to create deserializer"
};
}

return {std::move(data_buffer), std::move(zstd_decompressor), std::move(result.value())};
}
} // namespace clp_ffi_js::ir

namespace {
EMSCRIPTEN_BINDINGS(ClpIrStreamReader) {
emscripten::class_<
clp_ffi_js::ir::IrStreamReader,
emscripten::base<clp_ffi_js::ir::StreamReader>>("ClpIrStreamReader")
.constructor(
&clp_ffi_js::ir::IrStreamReader::create,
emscripten::return_value_policy::take_ownership()
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use return_value_policy::move for efficiency

In the Emscripten bindings, the constructor uses return_value_policy::take_ownership, which is typically used for pointers. Since IrStreamReader::create returns by value and supports move semantics, it's more appropriate to use return_value_policy::move to avoid unnecessary copies.

Apply this diff to adjust the return value policy:

             .constructor(
                     &clp_ffi_js::ir::IrStreamReader::create,
-                    emscripten::return_value_policy::take_ownership()
+                    emscripten::return_value_policy::move
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.constructor(
&clp_ffi_js::ir::IrStreamReader::create,
emscripten::return_value_policy::take_ownership()
)
.constructor(
&clp_ffi_js::ir::IrStreamReader::create,
emscripten::return_value_policy::move
)

.function(
"getNumEventsBuffered",
&clp_ffi_js::ir::IrStreamReader::get_num_events_buffered
)
.function(
"getFilteredLogEventMap",
&clp_ffi_js::ir::IrStreamReader::get_filtered_log_event_map
)
.function("filterLogEvents", &clp_ffi_js::ir::IrStreamReader::filter_log_events)
.function("deserializeStream", &clp_ffi_js::ir::IrStreamReader::deserialize_stream)
.function("decodeRange", &clp_ffi_js::ir::IrStreamReader::decode_range);
}
} // namespace
122 changes: 122 additions & 0 deletions src/clp_ffi_js/ir/IrStreamReader.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#ifndef CLP_FFI_JS_IR_IR_STREAM_READER_HPP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a linter for this soon, but the format of header defines should be <NAMESPACE>_<FILENAME>_HPP, which in this case would be CLP_FFI_JS_IR_IRSTREAMREADER_HPP.

#define CLP_FFI_JS_IR_IR_STREAM_READER_HPP

#include <Array.hpp>
#include <cstddef>
#include <memory>
#include <optional>
#include <streaming_compression/zstd/Decompressor.hpp>
#include <vector>

#include <clp/ir/types.hpp>
#include <clp/TimestampPattern.hpp>
#include <emscripten/val.h>

#include <clp_ffi_js/ir/LogEventWithLevel.hpp>
#include <clp_ffi_js/ir/StreamReader.hpp>
#include <clp_ffi_js/ir/StreamReaderDataContext.hpp>

using clp::ir::four_byte_encoded_variable_t;

namespace clp_ffi_js::ir {
/**
* Class to deserialize and decode Zstandard-compressed CLP IR streams as well as format decoded
* log events.
*/
class IrStreamReader : public StreamReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this TextIrStreamReader and StreamReader -> IrStreamReader?

Copy link
Contributor Author

@davemarco davemarco Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you like UnstructuredStreamReader for irv1 and StructuredStreamReader for irv2? I think junhao prefers StreamReader vs. IrStreamReader since he wants to one day have an ffi for single file archive. Text seems a bit vague, I'm not sure if you have a reason for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about inserting Ir before Stream---UnstructuredIrStreamReader and StructuredIrStreamReader? (Once the new IR format supports backward compatibility, hopefully we can revert back to a single IrStreamReader class.)

since he wants to one day have an ffi for single file archive.

I see. I guess we can keep StreamReader then.

Text seems a bit vague

It's how we differentiate unstructured and structured for the casual user. "Text" for unstructured and "JSON" for structured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Anyways i changed for now. I think this ffi is not for the layman

friend StreamReader;

public:
/**
* Mapping between an index in the filtered log events collection to an index in the unfiltered
* log events collection.
*/
using FilteredLogEventsMap = std::optional<std::vector<size_t>>;

/**
* Creates a IrStreamReader to read from the given array.
*
* @param data_array An array containing a Zstandard-compressed IR stream.
* @return The created instance.
* @throw ClpFfiJsException if any error occurs.
*/
[[nodiscard]] static auto create(DataArrayTsType const& data_array) -> IrStreamReader;

// Destructor
~IrStreamReader() override = default;

// Disable copy constructor and assignment operator
IrStreamReader(IrStreamReader const&) = delete;
auto operator=(IrStreamReader const&) -> IrStreamReader& = delete;

// Define default move constructor
IrStreamReader(IrStreamReader&&) = default;
// Delete move assignment operator since it's also disabled in `clp::ir::LogEventDeserializer`.
auto operator=(IrStreamReader&&) -> IrStreamReader& = delete;
davemarco marked this conversation as resolved.
Show resolved Hide resolved

// Methods
/**
* @return The number of events buffered.
*/
davemarco marked this conversation as resolved.
Show resolved Hide resolved
[[nodiscard]] auto get_num_events_buffered() const -> size_t override;

/**
* @return The filtered log events map.
*/
davemarco marked this conversation as resolved.
Show resolved Hide resolved
[[nodiscard]] auto get_filtered_log_event_map() const -> FilteredLogEventMapTsType override;

/**
* Generates a filtered collection from all log events.
*
* @param log_level_filter Array of selected log levels
*/
davemarco marked this conversation as resolved.
Show resolved Hide resolved
void filter_log_events(emscripten::val const& log_level_filter) override;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure appropriate passing of 'emscripten::val'

Consider passing emscripten::val by value rather than by const reference in filter_log_events. Since emscripten::val is a lightweight handle to a JavaScript value, passing it by value might be more suitable and aligns with common practices.


/**
* Deserializes all log events in the stream. After the stream has been exhausted, it will be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deallocation seems like an implementation decision, so perhaps we can keep this note in the implementation's docstring but not in the interface's. @junhaoliao, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I removed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I see you've deleted this docstring. I'm fine keeping it this way, but if @junhaoliao wants to keep the part about deallocating the stream after exhaustion, then we need to add the docstring back in UnstructuredIrStreamReader.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just to be sure I understand the discussion -

  • in StreamReader which serves an interface, I agree we do not want to mention After the stream has been exhausted, it will be deallocated..
  • in UnstructuredIrStreamReader which is an implementation of the interface, I prefer that we mention After the stream has been exhausted, it will be deallocated. so that we can remind any future developer about the stream data lifecycle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. One way to do this is:

    /**
     * Deserializes all log events in the stream. After the stream has been exhausted, it will be
     * deallocated.
     *
     * @return The number of successfully deserialized ("valid") log events.
     */

Another way is:

    /**
     * See `StreamReader::deserialize_stream`.
     * 
     * After the stream has been exhausted, it will be deallocated.
     *
     * @return See `StreamReader::deserialize_stream`.
     */

We usually go with the first approach, but we don't have too many instances of this case, so we could change conventions.

Copy link
Collaborator

@junhaoliao junhaoliao Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can go with the second approach to avoid repetition and potentially docs going out of sync any time in the future.

For the second approach, can we use @sa or @see? (i'm not too sure if this is how the command is intended to be used: https://www.doxygen.nl/manual/commands.html#cmdsa

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know Doxygen had see. Yeah, let's use @see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davemarco, can you make the change?

* deallocated.
*
* @return The number of successfully deserialized ("valid") log events.
*/
[[nodiscard]] auto deserialize_stream() -> size_t override;

/**
* Decodes log events in the range `[beginIdx, endIdx)` of the filtered or unfiltered
* (depending on the value of `useFilter`) log events collection.
*
* @param begin_idx
* @param end_idx
* @param use_filter Whether to decode from the filtered or unfiltered log events collection.
* @return An array where each element is a decoded log event represented by an array of:
* - The log event's message
* - The log event's timestamp as milliseconds since the Unix epoch
* - The log event's log level as an integer that indexes into `cLogLevelNames`
* - The log event's number (1-indexed) in the stream
* @return null if any log event in the range doesn't exist (e.g. the range exceeds the number
* of log events in the collection).
*/
davemarco marked this conversation as resolved.
Show resolved Hide resolved
[[nodiscard]] auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType override;

private:
// Constructor
explicit IrStreamReader(
StreamReaderDataContext<four_byte_encoded_variable_t>&& stream_reader_data_context
);

// Methods
[[nodiscard]] static auto create_deserializer_and_data_context(
std::unique_ptr<clp::streaming_compression::zstd::Decompressor>&& zstd_decompressor,
clp::Array<char>&& data_buffer
) -> StreamReaderDataContext<four_byte_encoded_variable_t>;

// Variables
std::vector<LogEventWithLevel<four_byte_encoded_variable_t>> m_encoded_log_events;
std::unique_ptr<StreamReaderDataContext<four_byte_encoded_variable_t>>
m_stream_reader_data_context;
FilteredLogEventsMap m_filtered_log_event_map;
clp::TimestampPattern m_ts_pattern;
};
} // namespace clp_ffi_js::ir

#endif // CLP_FFI_JS_IR_IR_STREAM_READER_HPP
Loading
Loading