Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support search logs by timestamp for structured and unstructured logs. #42

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion src/clp_ffi_js/ir/StreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,11 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
)
.function("filterLogEvents", &clp_ffi_js::ir::StreamReader::filter_log_events)
.function("deserializeStream", &clp_ffi_js::ir::StreamReader::deserialize_stream)
.function("decodeRange", &clp_ffi_js::ir::StreamReader::decode_range);
.function("decodeRange", &clp_ffi_js::ir::StreamReader::decode_range)
.function(
"getLogEventIndexByTimestamp",
&clp_ffi_js::ir::StreamReader::find_timestamp_last_occurrence
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
);
}
} // namespace

Expand Down
5 changes: 5 additions & 0 deletions src/clp_ffi_js/ir/StreamReader.hpp
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include <concepts>
#include <cstddef>
#include <cstdint>
#include <ir/types.hpp>
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
#include <memory>
#include <optional>
#include <string>
Expand Down Expand Up @@ -124,6 +125,10 @@ class StreamReader {
[[nodiscard]] virtual auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType = 0;

[[nodiscard]] virtual auto find_timestamp_last_occurrence(
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
clp::ir::epoch_time_ms_t input_timestamp
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
) -> std::ptrdiff_t = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline that there would be some changes in both clp-ffi-js and ylv to return a nullable-integer instead, we ended up with a temporary conclusion that using a const value -1 was sufficient to indicate that no matching log event can be found.

However, returning null may only seem necessary now I realize that we have been (implicitly) using size_t as the type of LogEventIdx and such values are always non-negative. Can we declare a TS type and change the return type to it? You may refer to to FilteredLogEventMapTsType for such type declarations and usages; let me know if you need more help.


protected:
explicit StreamReader() = default;

Expand Down
24 changes: 24 additions & 0 deletions src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,30 @@ auto StructuredIrStreamReader::decode_range(size_t begin_idx, size_t end_idx, bo
);
}

auto StructuredIrStreamReader::find_timestamp_last_occurrence(
clp::ir::epoch_time_ms_t input_timestamp
) -> std::ptrdiff_t {
// Use std::lower_bound with a custom comparator
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
auto it = std::lower_bound(
m_deserialized_log_events->begin(),
m_deserialized_log_events->end(),
input_timestamp,
[](const LogEventWithFilterData<StructuredLogEvent>& event, clp::ir::epoch_time_ms_t timestamp) {
return event.get_timestamp() <= timestamp;
}
);

// Adjust the iterator to find the last valid index
if (it == m_deserialized_log_events->end() || it->get_timestamp() > input_timestamp) {
if (it == m_deserialized_log_events->begin()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to check against m_deserialized_log_events->empty() at the beginning of the method and early return, so that we can avoid checking it == m_deserialized_log_events->end() && it == m_deserialized_log_events->begin()?

return -1; // No element satisfies the condition
}
--it;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use std::range::upper_bound instead and always it-- after, so that we can avoid checking (it == m_deserialized_log_events->end() || it->get_timestamp() > input_timestamp)?

}

return std::distance(m_deserialized_log_events->begin(), it);
}

StructuredIrStreamReader::StructuredIrStreamReader(
StreamReaderDataContext<StructuredIrDeserializer>&& stream_reader_data_context,
std::shared_ptr<StructuredLogEvents> deserialized_log_events
Expand Down
3 changes: 3 additions & 0 deletions src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,9 @@ class StructuredIrStreamReader : public StreamReader {
[[nodiscard]] auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType override;

[[nodiscard]] auto find_timestamp_last_occurrence(clp::ir::epoch_time_ms_t input_timestamp
) -> std::ptrdiff_t override;

private:
// Constructor
explicit StructuredIrStreamReader(
Expand Down
23 changes: 23 additions & 0 deletions src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,29 @@ auto UnstructuredIrStreamReader::decode_range(size_t begin_idx, size_t end_idx,
);
}

auto UnstructuredIrStreamReader::find_timestamp_last_occurrence(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's deduplicate this with StructuredIrStreamReader::find_timestamp_last_occurrence.
You may refer to generic_decode_range.

clp::ir::epoch_time_ms_t input_timestamp
) -> std::ptrdiff_t {
// Use std::lower_bound with a custom comparator
auto it = std::lower_bound(
m_encoded_log_events.begin(),
m_encoded_log_events.end(),
input_timestamp,
[](LogEventWithFilterData<UnstructuredLogEvent> const& event,
clp::ir::epoch_time_ms_t timestamp) { return event.get_timestamp() <= timestamp; }
);

// Adjust the iterator to find the last valid index
if (it == m_encoded_log_events.end() || it->get_timestamp() > input_timestamp) {
if (it == m_encoded_log_events.begin()) {
return -1; // No element satisfies the condition
}
--it;
}

return std::distance(m_encoded_log_events.begin(), it);
}

UnstructuredIrStreamReader::UnstructuredIrStreamReader(
StreamReaderDataContext<UnstructuredIrDeserializer>&& stream_reader_data_context
)
Expand Down
3 changes: 3 additions & 0 deletions src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ class UnstructuredIrStreamReader : public StreamReader {
[[nodiscard]] auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType override;

[[nodiscard]] auto find_timestamp_last_occurrence(clp::ir::epoch_time_ms_t input_timestamp
) -> std::ptrdiff_t override;

private:
// Constructor
explicit UnstructuredIrStreamReader(
Expand Down
Loading