-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add register handler to the Dfa
class.
#66
base: main
Are you sure you want to change the base?
Conversation
…pp files; Also remove RegexDFA.tpp file.
…ector instead of set.
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
… one in case merging caused issues.
WalkthroughThe changes in this pull request primarily involve renaming and restructuring files and classes related to finite automata within the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Dfa
class.Dfa
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (17)
src/log_surgeon/Lexer.hpp (1)
41-44
: Consider passingid
by value instead of by const referenceThe parameter
uint32_t const& id
inadd_rule
can be passed by value since it is a small integral type. Passing by value is more efficient and avoids unnecessary indirection.Apply this diff to update the parameter:
-auto add_rule( - uint32_t const& id, +auto add_rule( + uint32_t id, std::unique_ptr<finite_automata::RegexAST<TypedNfaState>> rule ) -> void;src/log_surgeon/finite_automata/Dfa.hpp (1)
17-19
: Consider using rvalue reference to accept NFA in constructorThe constructor accepts
Nfa<NfaStateType> nfa
by value, which may lead to unnecessary copying. If ownership is intended to be transferred, consider using an rvalue reference andstd::move
.Apply this diff to modify the constructor signature:
-template <typename NfaStateType> -explicit Dfa(Nfa<NfaStateType> nfa); +template <typename NfaStateType> +explicit Dfa(Nfa<NfaStateType>&& nfa);src/log_surgeon/LogParser.cpp (2)
Line range hint
46-74
: Consider extracting delimiter validation logicThe delimiter validation logic in
add_rules
could be moved to a separate method for better code organization and reusability.+auto validate_delimiters(const std::vector<uint32_t>& delimiters) -> void { + if (delimiters.empty()) { + throw runtime_error("When using --schema-path, \"delimiters:\" line must be used."); + } +} auto LogParser::add_rules(std::unique_ptr<SchemaAST> schema_ast) -> void { for (auto const& delimiters : schema_ast->m_delimiters) { add_delimiters(delimiters); } vector<uint32_t> delimiters; for (uint32_t i = 0; i < cSizeOfByte; i++) { if (m_lexer.is_delimiter(i)) { delimiters.push_back(i); } } - if (delimiters.empty()) { - throw runtime_error("When using --schema-path, \"delimiters:\" line must be used."); - } + validate_delimiters(delimiters);
65-85
: Consider using a helper function for regex rule creationThe pattern of creating regex rules with timestamps appears twice with similar logic. Consider extracting this into a helper function.
+auto create_timestamp_rule( + const std::string& rule_name, + std::unique_ptr<RegexAST<ByteNfaState>> regex_ast, + char prefix_char +) -> void { + auto prefix = make_unique<RegexASTLiteral<ByteNfaState>>(prefix_char); + return add_rule( + rule_name, + make_unique<RegexASTCat<ByteNfaState>>( + std::move(prefix), + std::move(regex_ast) + ) + ); +}src/log_surgeon/Parser.hpp (1)
8-20
: LGTM! Consider adding template parameter documentation.The template parameter renaming improves clarity by better reflecting their purpose. However, it would be beneficial to add documentation explaining the requirements and constraints for
TypedNfaState
andTypedDfaState
template parameters.Add documentation above the class declaration:
/** * @tparam TypedNfaState Type representing NFA state, must satisfy... * @tparam TypedDfaState Type representing DFA state, must satisfy... */ template <typename TypedNfaState, typename TypedDfaState> class Parser {src/log_surgeon/LexicalRule.hpp (2)
28-31
: Address const correctness TODO.The TODO comment indicates a need to make the returned pointer constant. This should be addressed to prevent unintended modifications to the internal regex.
Would you like me to propose a const-correct implementation?
Line range hint
38-44
: Document implementation details of add_to_nfa.The method implementation would benefit from documentation explaining the state creation and tag handling logic.
Add implementation documentation:
/** * Adds this rule's regex to the NFA by: * 1. Creating an accepting end state with the rule's variable ID * 2. Adding regex transitions with negative tags to the NFA * * @param nfa Pointer to the NFA being constructed */ template <typename TypedNfaState> void LexicalRule<TypedNfaState>::add_to_nfa(finite_automata::Nfa<TypedNfaState>* nfa) const {src/log_surgeon/finite_automata/DfaState.hpp (1)
53-56
: Consider documenting the empty class optimizationThe use of std::tuple<> as an empty class optimization is clever. Consider adding a brief comment about the space optimization aspect.
// NOTE: We don't need m_tree_transitions for the `state_type == DfaStateType::Byte` case, so we -// use an empty class (`std::tuple<>`) in that case. +// use an empty class (`std::tuple<>`) in that case to optimize memory usage through empty base optimization.src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
Line range hint
67-74
: Consider using std::ranges::none_of for better readabilityThe null check could be more idiomatic using
std::ranges::none_of
.-if (std::ranges::any_of(tags, [](Tag const* tag) { return nullptr == tag; })) { +if (std::ranges::none_of(tags, [](Tag const* tag) { return nullptr != tag; })) { throw std::invalid_argument("Tags cannot contain null elements"); }
Line range hint
83-93
: Consider caching the transformed tag namesFor better performance when
serialize
is called multiple times, consider caching the transformed tag names.private: std::vector<Tag const*> m_tags; + mutable std::optional<std::vector<std::string>> m_cached_tag_names; TypedNfaState const* m_dest_state; + + [[nodiscard]] auto get_tag_names() const { + if (!m_cached_tag_names) { + auto tag_names = m_tags | std::ranges::views::transform(&Tag::get_name); + m_cached_tag_names = std::vector<std::string>(tag_names.begin(), tag_names.end()); + } + return *m_cached_tag_names; + }src/log_surgeon/LogEvent.cpp (1)
45-47
: Consider explicit type declarations for better readabilityWhile using
auto&
can improve maintainability, explicit type declarations make the code more self-documenting. Consider keepingToken&
for better readability, especially in public methods where the type information is valuable for API users.-auto& token = m_log_output_buffer->get_mutable_token(i); +Token& token = m_log_output_buffer->get_mutable_token(i);Also applies to: 54-64
src/log_surgeon/finite_automata/Nfa.hpp (1)
Line range hint
158-158
: Follow coding guidelines for boolean comparisons.The comparison
false == state_queue.empty()
follows the guideline, but it can be simplified.- while (false == state_queue.empty()) { + while (false == state_queue.empty()) {src/log_surgeon/Lalr1Parser.hpp (1)
277-277
: Consider documenting the commented-out code.The commented-out member variable declaration should either be removed or documented to explain why it's kept.
src/log_surgeon/Lexer.tpp (2)
69-72
: Consider using more descriptive variable names.Variables like
next
andnext_char
could be more descriptive. Consider names likenext_state
andcurrent_char
to better convey their purpose.Also applies to: 85-85, 201-216, 244-244, 257-258, 261-261, 278-278
380-382
: Document the TODO comment about DFA ignoring tags.The TODO comment about DFA ignoring tags should be expanded to explain the impact and potential solutions.
Would you like me to help create a GitHub issue to track this TODO item?
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Line range hint
894-940
: Consider moving the documentation to a separate file.The TODO comment suggests moving the NFA construction documentation to a separate file. This would improve code organization and maintainability.
Would you like me to help create a documentation file and move this content there?
Line range hint
844-878
: Consider extracting NFA state management into helper methods.The
add_to_nfa
method inRegexASTMultiplication
contains complex state management logic. Consider extracting the repeated patterns into helper methods to improve readability and maintainability.Example helper methods:
private: void add_repeated_transitions(Nfa<TypedNfaState>* nfa, TypedNfaState* end_state, uint32_t count) const; void add_optional_transitions(Nfa<TypedNfaState>* nfa, TypedNfaState* end_state, uint32_t start, uint32_t end) const;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (35)
CMakeLists.txt
(2 hunks)examples/intersect-test.cpp
(4 hunks)src/log_surgeon/Buffer.hpp
(1 hunks)src/log_surgeon/BufferParser.hpp
(2 hunks)src/log_surgeon/Lalr1Parser.cpp
(2 hunks)src/log_surgeon/Lalr1Parser.hpp
(8 hunks)src/log_surgeon/Lalr1Parser.tpp
(28 hunks)src/log_surgeon/Lexer.hpp
(4 hunks)src/log_surgeon/Lexer.tpp
(16 hunks)src/log_surgeon/LexicalRule.hpp
(2 hunks)src/log_surgeon/LogEvent.cpp
(2 hunks)src/log_surgeon/LogParser.cpp
(6 hunks)src/log_surgeon/LogParser.hpp
(4 hunks)src/log_surgeon/Parser.hpp
(1 hunks)src/log_surgeon/Parser.tpp
(2 hunks)src/log_surgeon/ParserInputBuffer.cpp
(2 hunks)src/log_surgeon/ReaderParser.hpp
(2 hunks)src/log_surgeon/Schema.cpp
(1 hunks)src/log_surgeon/SchemaParser.cpp
(3 hunks)src/log_surgeon/SchemaParser.hpp
(4 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(1 hunks)src/log_surgeon/finite_automata/DfaState.hpp
(4 hunks)src/log_surgeon/finite_automata/DfaStatePair.hpp
(5 hunks)src/log_surgeon/finite_automata/DfaStateType.hpp
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(6 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/NfaStateType.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(29 hunks)src/log_surgeon/finite_automata/RegexDFA.hpp
(0 hunks)src/log_surgeon/finite_automata/RegexDFAStateType.hpp
(0 hunks)src/log_surgeon/finite_automata/RegexNFAStateType.hpp
(0 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(5 hunks)tests/CMakeLists.txt
(1 hunks)tests/test-NFA.cpp
(1 hunks)tests/test-lexer.cpp
(2 hunks)
💤 Files with no reviewable changes (3)
- src/log_surgeon/finite_automata/RegexDFAStateType.hpp
- src/log_surgeon/finite_automata/RegexNFAStateType.hpp
- src/log_surgeon/finite_automata/RegexDFA.hpp
✅ Files skipped from review due to trivial changes (5)
- src/log_surgeon/ReaderParser.hpp
- src/log_surgeon/BufferParser.hpp
- src/log_surgeon/finite_automata/NfaStateType.hpp
- src/log_surgeon/finite_automata/DfaStateType.hpp
- src/log_surgeon/ParserInputBuffer.cpp
🧰 Additional context used
📓 Path-based instructions (22)
src/log_surgeon/Lalr1Parser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LogEvent.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Buffer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LogParser.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LogParser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Schema.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Parser.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/SchemaParser.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/DfaStatePair.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/DfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lalr1Parser.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/SchemaParser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
examples/intersect-test.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LexicalRule.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-lexer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-NFA.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lexer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (10)
CMakeLists.txt (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/DfaStatePair.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/DfaState.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
tests/test-lexer.cpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#38
File: src/log_surgeon/finite_automata/RegexAST.hpp:698-705
Timestamp: 2024-11-10T16:46:53.300Z
Learning: Using `fmt::join` on a `vector<uint32_t>` containing digits correctly concatenates them into a string representation.
tests/test-NFA.cpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/Lexer.tpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
src/log_surgeon/finite_automata/Nfa.hpp (4)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:99-105
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/RegexNFA.hpp`, it's acceptable to have constructors without the `explicit` specifier. Do not suggest adding `explicit` to constructors in this file.
src/log_surgeon/finite_automata/RegexAST.hpp (4)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/NfaState.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (47)
examples/intersect-test.cpp (5)
8-11
: LGTM: Updated type aliases reflect new class structure
The using
declarations correctly update the type aliases to ByteDfaState
, ByteNfaState
, Dfa
, and Nfa
, aligning with the refactored classes.
20-20
: LGTM: Updated ByteLexicalRule
alias
The ByteLexicalRule
type alias is appropriately updated to use ByteNfaState
.
24-24
: LGTM: Updated function parameter to new DFA type
The parameter dfa1
is correctly updated to Dfa<ByteDfaState> const&
, reflecting the new Dfa
class usage.
43-44
: LGTM: Correct instantiation of Nfa
and Dfa
The Nfa
and Dfa
objects are properly instantiated using std::move
, transferring ownership of resources as intended.
81-82
: LGTM: Correct instantiation of Nfa
and Dfa
in main
In the main
function, the Nfa
and Dfa
are correctly constructed with the updated types and usage.
src/log_surgeon/Lexer.hpp (3)
14-16
: LGTM: Updated includes for new finite automata classes
The include directives are updated appropriately to include Dfa.hpp
, DfaState.hpp
, and Nfa.hpp
, reflecting the refactored class names.
23-23
: LGTM: Renamed template parameters for clarity
Renaming the template parameters to TypedNfaState
and TypedDfaState
improves code readability and consistency.
154-155
: LGTM: Updated type aliases for ByteLexer
and Utf8Lexer
The using
declarations for ByteLexer
and Utf8Lexer
are updated correctly to use the new state types, ensuring consistency across the codebase.
src/log_surgeon/finite_automata/Dfa.hpp (2)
47-108
: LGTM: Correct DFA construction from NFA
The constructor successfully builds the DFA from the provided NFA, correctly implementing the subset construction algorithm.
125-142
: LGTM: Implemented get_intersect
method correctly
The get_intersect
method accurately computes the set of schema types reachable by another DFA, properly handling state pairs and transitions.
src/log_surgeon/finite_automata/NfaState.hpp (5)
1-2
: LGTM: Renamed to NfaState
and updated include guards
The class and include guards are appropriately renamed to NfaState
, reflecting the refactoring from RegexNFAState
.
24-25
: LGTM: Defined ByteNfaState
and Utf8NfaState
type aliases
The type aliases for ByteNfaState
and Utf8NfaState
are correctly defined, improving code clarity.
37-38
: Consistency with prior updates in tag handling
Using std::vector<Tag const*> tags
in the constructor aligns with previous learnings to prefer vectors over sets for performance unless uniqueness is required.
127-138
: LGTM: Conditional handling of intervals based on state type
The add_interval
method correctly uses if constexpr
to handle byte transitions for Byte
state type and interval tree transitions for Utf8
state type.
174-207
: Note on treating tagged transitions as epsilon transitions
The epsilon_closure
method currently treats tagged transitions as epsilon transitions, as indicated by the TODO
comment. This might lead to incorrect behaviour if tagged transitions have different semantics.
Would you like assistance in implementing proper handling for tagged transitions, or should we open a GitHub issue to track this task?
src/log_surgeon/LogParser.cpp (1)
20-21
: LGTM: Type aliases are properly updated
The type aliases for ByteDfaState
and ByteNfaState
from the finite_automata
namespace are correctly introduced, aligning with the broader refactoring effort.
src/log_surgeon/Lalr1Parser.tpp (1)
Line range hint 56-74
: LGTM: Template parameter updates are consistent
The template parameter updates from NFAStateType
and DFAStateType
to TypedNfaState
and TypedDfaState
are consistently applied throughout the class definition and implementation.
src/log_surgeon/Lalr1Parser.cpp (1)
11-14
: LGTM: Member initialization is simplified
The initialization of member variables in the NonTerminal
constructor is properly simplified by removing unnecessary class qualifiers.
src/log_surgeon/Parser.tpp (3)
10-11
: LGTM: Template parameter renaming improves clarity
The renaming from NFAStateType/DFAStateType to TypedNfaState/TypedDfaState better reflects their purpose as typed state classes.
32-36
: LGTM: Method signature updates follow modern C++ conventions
The changes consistently apply the new template parameter names and adopt trailing return type syntax, improving code style consistency.
44-47
: LGTM: Consistent method signature updates
The changes maintain consistency with the template parameter renaming and modern C++ style conventions.
src/log_surgeon/finite_automata/DfaState.hpp (2)
16-20
: LGTM: Enhanced type safety with enum-based state types
The use of DfaStateType enum and clear type aliases improves type safety and code readability.
Line range hint 59-73
: LGTM: Efficient state transition implementation
The implementation efficiently handles both byte and UTF-8 cases using constexpr if, with proper validation of tree search results.
src/log_surgeon/finite_automata/DfaStatePair.hpp (2)
Line range hint 22-40
: LGTM: Consistent template parameter renaming
The changes maintain consistency with the project-wide renaming while preserving the existing functionality.
Line range hint 66-81
: Address UTF-8 support for register handler feature
The TODO comment indicates missing UTF-8 support. Since this PR adds register handler functionality, consider implementing UTF-8 support to ensure complete functionality.
Let's check if there are any UTF-8 related tests or implementations:
Would you like help implementing the UTF-8 support for the get_reachable_pairs method?
src/log_surgeon/Buffer.hpp (1)
75-76
: LGTM: Consistent use of trailing return type syntax
The modification improves code style consistency by using the modern trailing return type syntax.
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
19-21
: LGTM: Improved template parameter naming
The template parameter renaming from NFAStateType
to TypedNfaState
enhances code clarity and aligns with the project's naming conventions.
src/log_surgeon/SchemaParser.hpp (2)
71-72
: LGTM: Base class inheritance update
The update to use Lalr1Parser
with ByteNfaState
and ByteDfaState
is consistent with the new naming convention.
49-49
: Verify type parameter changes across the codebase
The type parameter changes from RegexNFAByteState
to ByteNfaState
look good, but we should verify that all dependent code has been updated accordingly.
Also applies to: 58-58
✅ Verification successful
Type parameter changes have been consistently updated
The codebase shows consistent usage of ByteNfaState
across all relevant files, including:
- Schema parser implementation and header files
- Log parser implementation and header files
- Test files
- Example files
All template instantiations of RegexAST
and its derived classes are using the new type parameter. No instances of the old type name RegexNFAByteState
were found.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining references to old type names
rg "RegexNFAByteState|RegexDFAByteState" --type cpp
Length of output: 53
Script:
#!/bin/bash
# Search for any usage of ByteNfaState to verify the new type is being used consistently
rg "ByteNfaState" --type cpp -A 2
# Also search for any usage of RegexAST with type parameters
ast-grep --pattern 'RegexAST<$_>'
Length of output: 9142
src/log_surgeon/LogParser.hpp (2)
18-18
: LGTM: Type parameter update
The update to use ByteNfaState
and ByteDfaState
is consistent with the new naming convention and matches the changes in SchemaParser.hpp.
29-29
: Documentation properly updated
The exception documentation has been correctly updated to reflect the new Lalr1Parser
naming.
Also applies to: 38-38
CMakeLists.txt (1)
71-73
: LGTM: File renames are consistent
The renaming of files from LALR1Parser to Lalr1Parser and RegexDFA/NFA to Dfa/Nfa is consistent with the code changes. All necessary files are included in the build.
Also applies to: 99-105
✅ Verification successful
File renames are consistent and properly referenced
All renamed files exist in their expected locations, and references to these files have been properly updated throughout the codebase. The remaining occurrences of "Lalr1Parser" are correct as they refer to the new naming convention.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify all renamed files exist
for file in "src/log_surgeon/Lalr1Parser."{cpp,hpp,tpp} \
"src/log_surgeon/finite_automata/"{Dfa,DfaState,DfaStatePair,DfaStateType}.hpp \
"src/log_surgeon/finite_automata/"{Nfa,NfaState,NfaStateType}.hpp; do
if [ ! -f "$file" ]; then
echo "Missing file: $file"
exit 1
fi
done
Length of output: 2612
Script:
#!/bin/bash
# Check for any remaining references to old file names
rg -i "LALR1Parser|RegexDFA|RegexNFA" --type cpp --type cmake
Length of output: 1626
src/log_surgeon/finite_automata/Nfa.hpp (3)
Line range hint 20-105
: LGTM! Well-structured class design.
The class follows RAII principles with proper ownership semantics using unique_ptr for state management. The constructor correctly initializes the NFA and processes the lexical rules.
106-129
: LGTM! Well-designed state creation methods.
The state creation methods follow a consistent pattern and maintain proper ownership semantics. The raw pointer usage is appropriate as ownership is clearly maintained by m_states.
Line range hint 193-212
: LGTM! Clean serialization implementation.
The serialization logic is well-implemented using fmt::format for efficient string formatting.
tests/test-NFA.cpp (1)
10-31
: LGTM! Type aliases correctly updated.
The type aliases have been consistently updated to use ByteNfaState, maintaining compatibility with the core implementation changes.
tests/test-lexer.cpp (1)
21-32
: LGTM! Consistent type alias updates.
The type aliases have been updated consistently with the changes in other files, maintaining the test coverage and functionality.
src/log_surgeon/Lalr1Parser.hpp (5)
123-123
: LGTM! Method implementations are concise and clear.
The direct member access without this->
prefix improves readability.
Also applies to: 161-161, 168-168
203-204
: LGTM! Class renaming follows proper naming conventions.
The renaming from LALR1Parser
to Lalr1Parser
aligns with standard C++ naming conventions where acronyms are treated as words.
215-215
: LGTM! Method signatures are well-defined.
The parameter types are clear and consistent with the template parameters.
Also applies to: 225-225
397-398
: LGTM! Using declaration is properly placed.
The using declaration for the base class member is correctly placed in the protected section.
410-410
: LGTM! Include statement is correctly updated.
The include statement matches the new filename.
src/log_surgeon/Lexer.tpp (1)
21-22
: LGTM! Method declarations are consistent and well-structured.
The template parameter renaming is consistently applied across all method declarations, improving code clarity.
Also applies to: 42-45, 171-177, 304-308, 321-322, 335-336, 346-347, 359-362, 367-369, 378-383
src/log_surgeon/SchemaParser.cpp (1)
419-419
: LGTM! Method declarations use consistent return type syntax.
The trailing return type syntax (-> void
) is consistently applied, improving readability and maintainability.
Also applies to: 484-484
src/log_surgeon/finite_automata/RegexAST.hpp (3)
27-28
: LGTM! Template parameter and method signature changes are consistent.
The changes to rename the template parameter from NFAStateType
to TypedNfaState
and update the method signatures are well-implemented. The conditional check follows the coding guidelines.
Also applies to: 43-45, 78-78, 111-112
160-161
: LGTM! Template parameter changes are consistent across derived classes.
The template parameter has been consistently updated from NFAStateType
to TypedNfaState
across all derived classes. The changes maintain code consistency and align with the PR objectives.
Also applies to: 239-240, 294-295, 427-428, 490-491, 553-554, 631-632
Line range hint 956-962
: Verify the existence and content of README-Schema.md.
Multiple error messages reference README-Schema.md for documentation about escaping special characters in bracket expressions.
Also applies to: 976-983, 996-1004
@@ -124,20 +118,14 @@ class Lexer { | |||
} | |||
|
|||
[[nodiscard]] auto get_dfa( | |||
) const -> std::unique_ptr<finite_automata::RegexDFA<DFAStateType>> const& { | |||
) const -> std::unique_ptr<finite_automata::Dfa<TypedDfaState>> const& { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid returning a const reference to a std::unique_ptr
Returning a const reference to a std::unique_ptr
in get_dfa
can be misleading regarding ownership semantics. Consider returning a raw pointer or a reference to the managed object instead.
Apply this diff to modify the return type:
-) const -> std::unique_ptr<finite_automata::Dfa<TypedDfaState>> const& {
+) const -> finite_automata::Dfa<TypedDfaState> const& {
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
) const -> std::unique_ptr<finite_automata::Dfa<TypedDfaState>> const& { | |
) const -> finite_automata::Dfa<TypedDfaState> const& { |
template <typename TypedNfaState, typename TypedDfaState> | ||
auto Lalr1Parser<TypedNfaState, TypedDfaState>::generate_lalr1_goto() -> void { | ||
// done already at end of generate_lr1_item_sets()? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove empty method or add implementation
The generate_lalr1_goto
method appears to be empty with a comment suggesting its functionality is handled elsewhere. Consider removing this method if it's not needed or implementing the required functionality.
-template <typename TypedNfaState, typename TypedDfaState>
-auto Lalr1Parser<TypedNfaState, TypedDfaState>::generate_lalr1_goto() -> void {
- // done already at end of generate_lr1_item_sets()?
-}
auto const schema_ast = SchemaParser::try_schema_string(var_schema); | ||
m_schema_ast->add_schema_var(std::move(schema_ast->m_schema_vars[0]), priority); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding validation for schema_ast
The method should validate that schema_ast->m_schema_vars
is not empty before accessing its first element.
auto Schema::add_variable(std::string_view const var_schema, int const priority) const -> void {
auto const schema_ast = SchemaParser::try_schema_string(var_schema);
+ if (schema_ast->m_schema_vars.empty()) {
+ throw std::runtime_error("Schema variables cannot be empty");
+ }
m_schema_ast->add_schema_var(std::move(schema_ast->m_schema_vars[0]), priority);
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
auto const schema_ast = SchemaParser::try_schema_string(var_schema); | |
m_schema_ast->add_schema_var(std::move(schema_ast->m_schema_vars[0]), priority); | |
auto const schema_ast = SchemaParser::try_schema_string(var_schema); | |
if (schema_ast->m_schema_vars.empty()) { | |
throw std::runtime_error("Schema variables cannot be empty"); | |
} | |
m_schema_ast->add_schema_var(std::move(schema_ast->m_schema_vars[0]), priority); |
../src/log_surgeon/finite_automata/Nfa.hpp | ||
../src/log_surgeon/finite_automata/NfaState.hpp | ||
../src/log_surgeon/finite_automata/NfaStateType.hpp | ||
../src/log_surgeon/finite_automata/RegisterHandler.hpp | ||
../src/log_surgeon/finite_automata/Tag.hpp | ||
../src/log_surgeon/finite_automata/TaggedTransition.hpp | ||
../src/log_surgeon/LALR1Parser.cpp | ||
../src/log_surgeon/LALR1Parser.hpp | ||
../src/log_surgeon/LALR1Parser.tpp | ||
../src/log_surgeon/Lalr1Parser.cpp | ||
../src/log_surgeon/Lalr1Parser.hpp | ||
../src/log_surgeon/Lalr1Parser.tpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Additional test coverage needed for NFA components
The current test coverage focuses mainly on NFA construction and serialization through a single complex test case. However, the following components lack dedicated test coverage:
- NfaState functionality and state types
- RegisterHandler operations
- Tag and TaggedTransition behaviours
🔗 Analysis chain
Verify test coverage for new Nfa implementation.
While the source files have been updated to use the new Nfa implementation, we should ensure adequate test coverage exists.
Let me check the content of the test file to verify the coverage.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for test files covering new Nfa implementation
# Search for test files related to Nfa
echo "Searching for Nfa test files..."
fd -e cpp -e hpp test.*nfa -i
# Search for Nfa-related test cases in existing test files
echo -e "\nSearching for Nfa test cases..."
rg -i "TEST_CASE.*nfa" "tests/"
Length of output: 288
Script:
#!/bin/bash
# Check content of NFA test file
echo "Content of test-NFA.cpp:"
cat tests/test-NFA.cpp
# List all the new NFA-related source files for comparison
echo -e "\nNFA-related source files:"
fd -e hpp -e cpp nfa -i src/
Length of output: 8621
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
src/log_surgeon/finite_automata/Dfa.hpp (5)
14-14
: Consider forward declaration of RegisterHandlerSince
RegisterHandler
is only used as a private member and not in the public interface, consider forward declaring it instead of including the header to reduce compilation dependencies.-#include <log_surgeon/finite_automata/RegisterHandler.hpp> + +namespace log_surgeon::finite_automata { +class RegisterHandler; +}
78-86
: Track UTF-8 support requirementThe TODO comment indicates missing UTF-8 support. This should be tracked for future implementation.
Would you like me to create a GitHub issue to track the UTF-8 support requirement?
65-65
: Consider simplifying the comparisonThe comparison
false == unmarked_sets.empty()
follows the coding guideline but could be more readable as!unmarked_sets.empty()
.
133-133
: Consider consolidating UTF-8 related tasksThis is another TODO for UTF-8 support. Consider consolidating all UTF-8 related tasks into a single tracking issue.
Would you like me to create a consolidated GitHub issue for all UTF-8 related tasks?
134-134
: Consider simplifying the comparisonSimilar to the previous instance, the comparison
false == unvisited_pairs.empty()
follows the coding guideline but could be more readable as!unvisited_pairs.empty()
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/log_surgeon/finite_automata/Dfa.hpp
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
🔇 Additional comments (2)
src/log_surgeon/finite_automata/Dfa.hpp (2)
113-125
: Implementation looks good!
The method correctly creates DFA states using smart pointers and properly handles the transfer of matching variable IDs.
45-45
: Initialize RegisterHandler member
The m_register_handler
member is added but not initialized in the constructor. Consider whether it needs specific initialization.
References
Description
Dfa
class.state
variable.Validation performed
Dfa
file.Summary by CodeRabbit
Summary by CodeRabbit
Release Notes
New Features
Dfa
class for representing Deterministic Finite Automata (DFA).ByteNfaState
andByteDfaState
.Refactor
Bug Fixes
LogEvent
class.Documentation
Tests
Nfa
implementation and updated type usages.