-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Remove redundant regex prefix from class names and fix capitalization. #59
base: main
Are you sure you want to change the base?
Conversation
…Should fix this so the ordering is determinstic.
…ename tag members to make it clear they are positions.
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
…eterministic. This is fine as tags are unique across disjoint subtrees.
Co-authored-by: Lin Zhihao <[email protected]>
…ypedefs to the top of the file to fix compilation error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (9)
src/log_surgeon/finite_automata/Dfa.hpp (1)
58-58
: Address the TODO: Implement UTF-8 handlingThe
get_intersect
method contains a TODO comment indicating that UTF-8 (multi-byte transitions) handling needs to be implemented. Please consider adding support for UTF-8 to ensure correct functionality with multi-byte character transitions.Would you like assistance in generating the implementation for UTF-8 handling?
src/log_surgeon/finite_automata/DfaStatePair.hpp (1)
71-71
: Address the TODO: Implement UTF-8 handlingThe
get_reachable_pairs
method contains a TODO comment about handling UTF-8 (multi-byte transitions). Please consider implementing support for UTF-8 to properly handle multi-byte character transitions in DFA state pairs.Would you like assistance in implementing UTF-8 handling in this method?
src/log_surgeon/Lexer.hpp (1)
49-49
: Pass 'uint32_t' by value instead of by const referenceSince
uint32_t
is a small fundamental type, passing it by value is more efficient and clearer. Change the parameteruint32_t const& id
touint32_t id
.Apply this change:
- auto add_rule(uint32_t const& id, std::unique_ptr<finite_automata::RegexAST<NfaStateType>> rule) + auto add_rule(uint32_t id, std::unique_ptr<finite_automata::RegexAST<NfaStateType>> rule)src/log_surgeon/finite_automata/Nfa.hpp (1)
Line range hint
164-164
: TODO comment needs attention.The comment indicates missing UTF-8 case handling. This could affect the functionality of the NFA for UTF-8 encoded input.
Would you like me to help implement the UTF-8 case handling or create a GitHub issue to track this task?
src/log_surgeon/finite_automata/NfaState.hpp (1)
130-130
: Consider using enum class member directly.Instead of
NfaStateType::Utf8 == state_type
, considerstate_type == NfaStateType::Utf8
for consistency with the style used elsewhere in the file.- if constexpr (NfaStateType::Utf8 == state_type) { + if constexpr (state_type == NfaStateType::Utf8) {src/log_surgeon/Lalr1Parser.hpp (1)
279-279
: Remove commented-out code.The commented-out member variable declaration should be removed as it adds noise to the codebase.
- /* Lexer<NfaStateType, DfaStateType> m_lexer; */
src/log_surgeon/Lexer.tpp (1)
Line range hint
44-83
: Consider extracting state transition logic.The state machine implementation in the scan method is quite complex. Consider extracting the state transition logic into a separate method to improve maintainability.
+ auto transition_state(DfaStateType const* current_state, unsigned char next_char) -> DfaStateType* { + if (m_is_delimiter[next_char] || input_buffer.log_fully_consumed() || !m_has_delimiters) { + if (current_state->is_accepting()) { + m_match = true; + m_type_ids = &(current_state->get_matching_variable_ids()); + m_match_pos = prev_byte_buf_pos; + m_match_line = m_line; + } + } + return current_state->next(next_char); + }src/log_surgeon/Lalr1Parser.tpp (1)
Line range hint
580-628
: Consider using string streams for error reporting.The current string concatenation approach in error reporting could be inefficient. Consider using
std::ostringstream
for better performance and maintainability.- std::string error_string = "Schema:" + std::to_string(line_num + 1) + ":" - + std::to_string(consumed_input.size() + 1) - + ": error: " + error_type + "\n"; + std::ostringstream error_stream; + error_stream << "Schema:" << (line_num + 1) << ":" + << (consumed_input.size() + 1) + << ": error: " << error_type << "\n";src/log_surgeon/finite_automata/RegexAST.hpp (1)
716-717
: LGTM! Implementation details are consistently updated.The template parameter renaming has been correctly applied across all implementation methods. However, there's a minor opportunity for improvement in the error messages.
Consider updating the error messages in the
RegexASTGroup
constructors to use consistent formatting and avoid repetition of the explanation about escaping special characters.Extract the common error message to a constant:
static constexpr char const* const INVALID_BRACKET_EXPRESSION_ERROR = "A bracket expression in the schema contains illegal characters, " "remember to escape special characters. " "Refer to README-Schema.md for more details.";Then use it in the constructors:
- throw std::runtime_error("RegexASTGroup1: right == nullptr: A bracket expression in the " - "schema contains illegal characters, remember to escape special " - "characters. Refer to README-Schema.md for more details."); + throw std::runtime_error(fmt::format("RegexASTGroup1: right == nullptr: {}", + INVALID_BRACKET_EXPRESSION_ERROR));Also applies to: 721-722, 739-740, 770-771, 799-800, 830-831, 892-893, 949-950, 989-990, 1006-1007, 1014-1015, 1020-1021, 1041-1042, 1058-1059, 1074-1075
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (32)
CMakeLists.txt
(2 hunks)examples/intersect-test.cpp
(4 hunks)src/log_surgeon/BufferParser.hpp
(2 hunks)src/log_surgeon/Lalr1Parser.cpp
(2 hunks)src/log_surgeon/Lalr1Parser.hpp
(5 hunks)src/log_surgeon/Lalr1Parser.tpp
(20 hunks)src/log_surgeon/Lexer.hpp
(6 hunks)src/log_surgeon/Lexer.tpp
(18 hunks)src/log_surgeon/LexicalRule.hpp
(2 hunks)src/log_surgeon/LogParser.cpp
(5 hunks)src/log_surgeon/LogParser.hpp
(4 hunks)src/log_surgeon/Parser.hpp
(1 hunks)src/log_surgeon/Parser.tpp
(3 hunks)src/log_surgeon/ReaderParser.hpp
(2 hunks)src/log_surgeon/SchemaParser.cpp
(1 hunks)src/log_surgeon/SchemaParser.hpp
(4 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(1 hunks)src/log_surgeon/finite_automata/DfaState.hpp
(1 hunks)src/log_surgeon/finite_automata/DfaStatePair.hpp
(1 hunks)src/log_surgeon/finite_automata/DfaStateType.hpp
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(6 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/NfaStateType.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(31 hunks)src/log_surgeon/finite_automata/RegexDFA.hpp
(0 hunks)src/log_surgeon/finite_automata/RegexDFA.tpp
(0 hunks)src/log_surgeon/finite_automata/RegexDFAStateType.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexNFAStateType.hpp
(0 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(4 hunks)tests/CMakeLists.txt
(1 hunks)tests/test-NFA.cpp
(1 hunks)tests/test-lexer.cpp
(2 hunks)
💤 Files with no reviewable changes (3)
- src/log_surgeon/finite_automata/RegexNFAStateType.hpp
- src/log_surgeon/finite_automata/RegexDFA.tpp
- src/log_surgeon/finite_automata/RegexDFA.hpp
✅ Files skipped from review due to trivial changes (5)
- src/log_surgeon/finite_automata/RegexDFAStateType.hpp
- src/log_surgeon/finite_automata/DfaStateType.hpp
- src/log_surgeon/finite_automata/NfaStateType.hpp
- src/log_surgeon/ReaderParser.hpp
- src/log_surgeon/BufferParser.hpp
🧰 Additional context used
📓 Path-based instructions (19)
src/log_surgeon/Lalr1Parser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Parser.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LogParser.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/SchemaParser.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LexicalRule.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/DfaStatePair.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lalr1Parser.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/DfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LogParser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
examples/intersect-test.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-NFA.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-lexer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lexer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/SchemaParser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (9)
src/log_surgeon/LexicalRule.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
CMakeLists.txt (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
examples/intersect-test.cpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
tests/test-NFA.cpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
tests/test-lexer.cpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#38
File: src/log_surgeon/finite_automata/RegexAST.hpp:698-705
Timestamp: 2024-11-10T16:46:53.300Z
Learning: Using `fmt::join` on a `vector<uint32_t>` containing digits correctly concatenates them into a string representation.
src/log_surgeon/finite_automata/Nfa.hpp (5)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:99-105
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/RegexNFA.hpp`, it's acceptable to have constructors without the `explicit` specifier. Do not suggest adding `explicit` to constructors in this file.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:442-456
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the C++ file `src/log_surgeon/finite_automata/RegexNFA.hpp`, for the `RegexNFA::serialize()` function, prioritize code clarity over efficiency when handling string operations.
src/log_surgeon/finite_automata/NfaState.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/RegexAST.hpp (5)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#38
File: src/log_surgeon/finite_automata/RegexAST.hpp:663-669
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `RegexASTLiteral::serialize()`, to properly handle Unicode characters beyond the ASCII range, cast `m_character` to `char32_t` and use `U"{}{}"` in `fmt::format`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (43)
src/log_surgeon/finite_automata/DfaState.hpp (1)
40-40
: Use of 'false == expression' aligns with coding guidelines
The use of false == m_matching_variable_ids.empty()
complies with the coding guideline to prefer false == <expression>
over !<expression>
.
src/log_surgeon/LogParser.cpp (5)
20-21
: Type Aliases Updated for Consistency
The using
statements for finite_automata::DfaByteState
and finite_automata::NfaByteState
have been correctly updated to reflect the new naming conventions, promoting consistency across the codebase.
65-68
: Constructor Template Parameters Updated Appropriately
The usage of RegexAST<NfaByteState>
and RegexASTLiteral<NfaByteState>
in initializing first_timestamp_regex_ast
and r1
correctly reflects the updated type names, ensuring consistency with the new NfaByteState
type.
143-145
: Delimiter Group Construction Uses Updated Types
The creation of RegexASTGroup<NfaByteState>
and the subsequent concatenation with rule->m_regex_ptr
using RegexASTCat<NfaByteState>
correctly apply the new naming conventions, maintaining consistency throughout the finite automata implementation.
196-196
: Lexer Reference Updated with New Type Names
The assignment to token.m_type_ids_ptr
now references Lexer<NfaByteState, DfaByteState>::cTokenUncaughtStringTypes
, reflecting the updated type aliases and ensuring correct linkage to the lexer constants.
262-262
: Consistent Update of Lexer Reference
Similarly, the assignment to curr_token.m_type_ids_ptr
correctly references Lexer<NfaByteState, DfaByteState>::cTokenUncaughtStringTypes
, maintaining consistency with the updated type names.
src/log_surgeon/Lalr1Parser.cpp (2)
1-1
: Header Include Updated to Reflect Renamed File
The include directive has been updated to #include "Lalr1Parser.hpp"
, aligning with the new file name and ensuring that the correct header file is included.
11-14
: Simplified Member Variable Initialization
The constructor for NonTerminal
now initializes m_children_start
using m_next_children_start
without class qualification, enhancing code readability. The increment of m_next_children_start
is also simplified, which is appropriate since both are static member variables of the same class.
src/log_surgeon/Parser.hpp (3)
8-8
: Template Parameter Names Standardized
The template parameters have been renamed to NfaStateType
and DfaStateType
, promoting consistent and clear naming conventions across the codebase.
15-15
: Method Signature Updated with New Template Parameters
The add_rule
method now correctly uses std::unique_ptr<finite_automata::RegexAST<NfaStateType>>
for the rule
parameter, reflecting the updated template parameter names and ensuring type consistency.
20-20
: Member Variable Type Updated for Consistency
The member variable m_lexer
is now declared as Lexer<NfaStateType, DfaStateType>
, aligning with the updated template parameter names and maintaining consistency throughout the parser implementation.
src/log_surgeon/LexicalRule.hpp (6)
9-9
: Template Parameter Renamed for Clarity
The template parameter has been renamed from NFAStateType
to NfaStateType
, improving readability and aligning with the naming conventions used across the project.
15-15
: Constructor Parameter Type Updated
The constructor for LexicalRule
now accepts a std::unique_ptr<finite_automata::RegexAST<NfaStateType>>
for the regex
parameter, correctly reflecting the updated template parameter name.
24-24
: Method Signature Adjusted to New NFA Type
The add_to_nfa
method signature has been updated to accept a pointer to finite_automata::Nfa<NfaStateType>
, ensuring compatibility with the renamed NFA class and maintaining consistency in the finite automata usage.
28-28
: Return Type Updated in get_regex
Method
The return type of the get_regex
method is now finite_automata::RegexAST<NfaStateType>*
, reflecting the updated template parameter and providing the correct type information.
35-35
: Member Variable Type Correctly Updated
The member variable m_regex
is now of type std::unique_ptr<finite_automata::RegexAST<NfaStateType>>
, aligning with the updated template parameter and ensuring type consistency within the class.
38-39
: Definition of add_to_nfa
Method Updated
The definition of the add_to_nfa
method now uses the updated template parameter NfaStateType
and accepts a pointer to finite_automata::Nfa<NfaStateType>
, maintaining consistency with the class declaration and the finite automata implementation.
tests/CMakeLists.txt (1)
8-10
: LGTM! File renaming changes are consistent with PR objectives.
The updates to source file paths correctly reflect the removal of the "Regex" prefix and the capitalization changes.
Also applies to: 14-16
src/log_surgeon/Parser.tpp (1)
10-11
: LGTM! Template parameter naming changes are consistent.
The capitalization changes in template parameters align with the PR objectives while maintaining the code's functionality.
Also applies to: 32-35, 44-46
src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
18-20
: LGTM! Template parameter naming changes are consistent.
The capitalization changes in template parameters align with the PR objectives while maintaining the code's functionality.
Also applies to: 28-28, 32-32, 39-39, 50-50, 56-59, 66-66, 75-75, 82-82, 96-96
18-20
: Verify consistent usage of NfaStateType across the codebase.
Let's ensure the new naming convention is consistently applied across all related files.
✅ Verification successful
NfaStateType template parameter is consistently used across the codebase
The verification shows that NfaStateType
is used consistently throughout the codebase, particularly in the finite automata implementation. The template parameter is properly used in:
- Template class declarations
- Function parameters
- Type aliases
- Documentation comments
All occurrences follow the same CamelCase naming convention, and there are no instances of inconsistent naming patterns (like NFAStateType or nfa_state_type) in the template parameter usage.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining instances of NFAStateType in the codebase
# that might have been missed during the renaming.
echo "Checking for remaining NFAStateType occurrences..."
rg "NFAStateType" --type cpp --type hpp
echo "Checking for any mixed usage of NfaStateType/NFAStateType..."
rg -i "nfastatetype" --type cpp --type hpp
Length of output: 380
Script:
#!/bin/bash
# Let's try with correct file extensions and more specific search
echo "Checking for any NFA state type usage..."
rg -i "nfastatetype|nfa_state_type" -g "*.h" -g "*.hpp" -g "*.cpp"
# Let's also check template parameters specifically
echo -e "\nChecking template parameters..."
ast-grep --pattern 'template <$$$> class $_ { $$$ }'
Length of output: 26668
examples/intersect-test.cpp (2)
8-11
: LGTM! Using declarations updated consistently
The using declarations have been updated to use the new class names (Dfa, DfaByteState, Nfa, NfaByteState), aligning with the PR objectives.
20-20
: LGTM! Type declarations updated consistently
All type declarations and variable instantiations have been updated to use the new class names:
- ByteLexicalRule using NfaByteState
- Function parameter using Dfa
- Variable declarations using Nfa
The changes maintain consistency with the PR objectives while preserving the original functionality.
Also applies to: 24-24, 43-43, 45-45, 81-81
src/log_surgeon/SchemaParser.hpp (3)
8-8
: LGTM! Include path updated
The include path has been updated to use the renamed Lalr1Parser.hpp, maintaining consistency with the capitalization fixes.
49-49
: LGTM! Type declarations in SchemaVarAST updated
The RegexAST template parameter has been updated to use NfaByteState instead of RegexNFAByteState, aligning with the PR objectives.
Also applies to: 58-58
71-72
: LGTM! Class inheritance updated
The SchemaParser class inheritance has been updated to use the new state type names (NfaByteState, DfaByteState) and properly formatted with consistent indentation.
src/log_surgeon/LogParser.hpp (3)
9-9
: LGTM! Include path updated
The include path has been updated to use the renamed Lalr1Parser.hpp, maintaining consistency with the capitalization fixes.
18-18
: LGTM! Class inheritance updated
The LogParser class inheritance has been updated to use the new state type names (NfaByteState, DfaByteState), aligning with the PR objectives.
29-29
: LGTM! Documentation updated
The exception documentation has been updated to reference Lalr1Parser instead of LALR1Parser, maintaining consistency with the new naming convention.
Also applies to: 38-38
CMakeLists.txt (1)
71-73
: LGTM! File renaming is consistent.
The renaming of files follows a consistent pattern:
- Removes redundant "Regex" prefix
- Fixes capitalization (LALR1 -> Lalr1, DFA -> Dfa, NFA -> Nfa)
- Updates all related files together
Also applies to: 99-105
src/log_surgeon/finite_automata/Nfa.hpp (1)
1-2
: LGTM! Class and header guard renaming is consistent.
The renaming follows the PR objectives:
- Removes redundant "Regex" prefix
- Updates header guard to match
- Template parameter uses consistent capitalization (NfaStateType)
Also applies to: 20-21
tests/test-NFA.cpp (1)
10-10
: LGTM! Type aliases updated consistently.
The changes maintain consistency with the class renaming:
- Include path updated to new Nfa.hpp
- All type aliases updated to use Nfa and NfaByteState
- No functional changes to the test case required
Also applies to: 16-16, 23-31
tests/test-lexer.cpp (1)
21-32
: LGTM! Type alias updates are consistent with PR objectives.
The changes correctly update the type aliases to use the new naming convention, removing the redundant "Regex" prefix while maintaining the same functionality.
src/log_surgeon/finite_automata/NfaState.hpp (2)
1-2
: LGTM! Header guard follows the new naming convention.
The header guard has been correctly updated to match the new file name.
20-24
: LGTM! Class and type alias updates are consistent.
The changes correctly:
- Rename the class template from RegexNFAState to NfaState
- Update type aliases to use the new naming convention
- Fix capitalization (UTF8 → Utf8)
src/log_surgeon/Lalr1Parser.hpp (1)
205-206
: LGTM! Template parameter names follow the new convention.
The template parameters have been correctly updated to use the new naming convention, maintaining consistency with other files.
src/log_surgeon/Lexer.tpp (3)
20-21
: LGTM! Template parameter renaming is consistent.
The renaming of template parameters from NFAStateType
/DFAStateType
to NfaStateType
/DfaStateType
follows proper naming conventions and is consistently applied across all method signatures.
Also applies to: 41-42, 169-170, 302-303, 319-320, 333-334, 344-345, 357-358, 376-377
377-381
: LGTM! Type usage is correctly updated.
The type usage in the generate()
method has been properly updated to use the new type names while maintaining the same functionality.
Line range hint 255-276
: Verify stack usage in wildcard handling.
The stack-based DFA state traversal could potentially consume excessive memory for complex patterns. Consider:
- Adding a maximum stack depth limit
- Implementing stack size checks
src/log_surgeon/SchemaParser.cpp (1)
19-37
: LGTM! Type alias updates are consistent.
The type alias updates from RegexNFAByteState
to NfaByteState
are consistently applied across all declarations and maintain proper type relationships.
src/log_surgeon/Lalr1Parser.tpp (1)
56-57
: LGTM! Template parameter renaming is consistent throughout the parser implementation.
The renaming of template parameters maintains consistency with the changes in other files and follows the project's new naming conventions.
Also applies to: 68-69, 77-78, 85-86, 112-113, 153-154, 163-164, 193-194, 213-214, 236-237, 269-270, 301-302, 385-386, 421-422, 427-428, 433-434
src/log_surgeon/finite_automata/RegexAST.hpp (2)
27-28
: LGTM! The renaming changes are consistent.
The renaming of the template parameter from NFAStateType
to NfaStateType
and the corresponding updates to method signatures maintain consistency with the PR objectives while preserving the existing functionality.
Also applies to: 43-45, 73-78
160-161
: LGTM! Template parameter renaming is consistent across derived classes.
The renaming changes have been consistently applied across all derived classes:
RegexASTEmpty
RegexASTLiteral
RegexASTInteger
RegexASTGroup
RegexASTOr
RegexASTCat
RegexASTMultiplication
RegexASTCapture
The changes maintain the existing class hierarchy and relationships.
Also applies to: 190-191, 239-240, 294-295, 427-428, 490-491, 553-554, 631-632
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (10)
src/log_surgeon/finite_automata/NfaState.hpp (4)
51-53
: Ensure consistent use ofconst
qualifiers in function parametersThe functions
add_positive_tagged_start_transition
andadd_epsilon_transition
have differentconst
qualifiers for theirdest_state
parameters (NfaState const*
vs.NfaState*
). For consistency and to prevent unintended modifications, consider usingNfaState const*
for all parameters that are not modified within the function.Apply this diff to harmonize the
const
qualifiers:auto add_epsilon_transition(NfaState const* epsilon_transition) -> void { m_epsilon_transitions.push_back(epsilon_transition); }
70-72
: Uniformly applyconst
qualifiers to function parametersSimilarly, in the
add_byte_transition
function, consider changing thedest_state
parameter toNfaState const*
to maintain consistency and ensure immutability of the pointed object within the function.Apply this diff:
auto add_byte_transition(uint8_t byte, NfaState const* dest_state) -> void { m_bytes_transitions[byte].push_back(dest_state); }
74-76
: Useconst
pointers in return types for getter functionsIn the
get_epsilon_transitions
function, you return a vector ofNfaState*
. To maintain const correctness and prevent unintended modifications to the states, consider returningstd::vector<NfaState const*> const&
.Apply this diff:
[[nodiscard]] auto get_epsilon_transitions() const -> std::vector<NfaState const*> const& { return m_epsilon_transitions; }Also, update the member variable accordingly:
- std::vector<NfaState*> m_epsilon_transitions; + std::vector<NfaState const*> m_epsilon_transitions;
Line range hint
168-236
: Refactor theserialize
method for improved readabilityThe
serialize
method is lengthy and contains repetitive code for handling optional values and error checking. Refactoring parts of this method into smaller helper functions would enhance readability and maintainability.src/log_surgeon/finite_automata/DfaStatePair.hpp (1)
71-71
: Offer assistance for TODO: Handle UTF-8 transitionsThere's a TODO comment indicating the need to handle UTF-8 (multi-byte transitions) in the
get_reachable_pairs
method. Would you like assistance in implementing this functionality or opening a new GitHub issue to track this task?src/log_surgeon/finite_automata/Dfa.hpp (1)
58-58
: Offer assistance for TODO: Handle UTF-8 transitionsThere's a TODO comment indicating the need to handle UTF-8 (multi-byte transitions) in the
get_intersect
method. Would you like assistance in implementing this functionality or opening a new GitHub issue to track this task?src/log_surgeon/Lexer.hpp (1)
126-128
: Avoid returning const references to primitive typesMethods such as
get_has_delimiters()
,is_delimiter(uint8_t byte)
, andis_first_char(uint8_t byte)
returnbool const&
. Returning a const reference to a primitive type likebool
is unnecessary and can lead to inefficiencies. Consider returningbool
by value instead.Also applies to: 131-135
src/log_surgeon/Lexer.tpp (3)
Line range hint
169-301
: Consider extracting common logic between scan methods.The
scan_with_wildcard
method largely duplicates code from thescan
method. Consider extracting the common logic into a private helper method to improve maintainability and reduce code duplication.
459-466
: Address or remove commented-out UTF-8 code.The commented-out UTF-8 specific code in the
nfa_to_dfa
method should either be implemented or removed if no longer needed. If this is a planned feature, consider creating a tracking issue.Would you like me to help create a GitHub issue to track the UTF-8 support implementation?
Also applies to: 484-491
Tagged transitions should not be treated as epsilon transitions
Based on the codebase analysis, treating tagged transitions as epsilon transitions is incorrect. The implementation shows that tagged transitions serve distinct purposes:
- Positive tagged start/end transitions are used for capture group boundaries
- Negative tagged transitions are used for negative lookahead assertions
- Epsilon transitions are separate and have their own dedicated implementation
The TODO comment should be removed and tagged transitions should be handled according to their specific semantics rather than being treated as epsilon transitions. This affects pattern matching behaviour, particularly for:
- Capture group boundaries tracking
- Negative lookahead assertion evaluation
- State reachability computation
🔗 Analysis chain
Line range hint
407-407
: Review epsilon transition handling for tagged transitions.The current implementation treats tagged transitions as epsilon transitions. This might not be the intended behavior and could affect pattern matching accuracy.
Let me search for more context about the implementation of tagged transitions.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for usage of tagged transitions in tests rg -A 5 "tagged.*transition" "tests/"Length of output: 7665
Script:
#!/bin/bash # Search for implementation of tagged transitions in source files rg -A 5 "tagged.*transition" "src/" # Search for any epsilon transition handling rg -A 5 "epsilon.*transition" "src/"Length of output: 27502
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (14)
src/log_surgeon/Lalr1Parser.hpp
(5 hunks)src/log_surgeon/Lalr1Parser.tpp
(20 hunks)src/log_surgeon/Lexer.hpp
(6 hunks)src/log_surgeon/Lexer.tpp
(18 hunks)src/log_surgeon/LexicalRule.hpp
(2 hunks)src/log_surgeon/Parser.hpp
(1 hunks)src/log_surgeon/Parser.tpp
(3 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(1 hunks)src/log_surgeon/finite_automata/DfaStatePair.hpp
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(6 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(31 hunks)src/log_surgeon/finite_automata/RegexDFAStateType.hpp
(1 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- src/log_surgeon/Parser.hpp
- src/log_surgeon/Parser.tpp
- src/log_surgeon/LexicalRule.hpp
- src/log_surgeon/Lalr1Parser.hpp
- src/log_surgeon/finite_automata/TaggedTransition.hpp
- src/log_surgeon/finite_automata/Nfa.hpp
- src/log_surgeon/finite_automata/RegexAST.hpp
- src/log_surgeon/Lalr1Parser.tpp
🧰 Additional context used
📓 Path-based instructions (5)
src/log_surgeon/finite_automata/RegexDFAStateType.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/DfaStatePair.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lexer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (2)
src/log_surgeon/finite_automata/RegexDFAStateType.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
src/log_surgeon/finite_automata/NfaState.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (4)
src/log_surgeon/finite_automata/NfaState.hpp (1)
Line range hint 125-145
: Verify condition in add_interval
to avoid excluding zero
The condition if (interval.first != 0 && interval.first <= interval.second)
might inadvertently exclude intervals starting at zero. This could lead to missed transitions for intervals that begin with zero. Please verify that this condition behaves as intended.
Please double-check the logic of this condition to ensure intervals starting at zero are correctly handled, or adjust the condition if necessary.
src/log_surgeon/Lexer.hpp (1)
127-129
: Avoid returning a const reference to a std::unique_ptr
Returning a const&
to a std::unique_ptr
in get_dfa()
can lead to confusion and potential misuse. Consider returning a raw pointer or a reference to the managed object instead.
src/log_surgeon/Lexer.tpp (2)
20-21
: LGTM! Consistent template parameter renaming.
The template parameter renaming from NFAStateType
/DFAStateType
to TypedNfaState
/TypedDfaState
has been applied consistently throughout the file, aligning with the project's naming conventions.
Also applies to: 41-42, 169-170, 302-303, 319-320, 333-334, 344-345, 357-358, 365-366, 376-377, 391-392, 428-429
379-379
: Verify the impact of DFA ignoring tags.
The TODO comment indicates that the DFA ignores tags in patterns like "capture:user=(?<user_id>\d+)". This could lead to incorrect pattern matching behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we proceed, a quick question: do we really need to differentiate DFA state type and NFA state type? Can we have a generic type to mark whether the input is byte or unicode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any header guard issue within this file should be tracked in #65
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ensure the namings in NFA unit tests are also sync with this PR:
- Rename
test-NFA.cpp
totest-Nfa.cpp
- Rename
ByteNFA
toByteNfa
in the unit test file
Just to clarify: we don't necessarily do this in the current PR, but it's some design decision we should make |
I thought one of the other PRs already posted combine these into a single type, but it seems that's not the case. I'll add a PR to combine the types. |
Do u think we can do it in this PR? 'cuz reviewing the current renaming changes and then reviewing another PR that has a lot of overlapped overwrite would duplicate the work |
References
Description
Additions:
<algorithm>
toTaggedTransition.hpp
as reordering of includes caused by file name changes causes previous clang-tidy warning to become a compiler error.Renames:
DFAByteState
toByteDfaState
DFAUTF8State
toUtf8DfaState
NFAByteState
toByteNfaState
NFAUTF8State
toUtf8NfaState
DfaState
/DfaStateType
toTypedDfaState
NfaStateType
toTypedNfaState
RegexDFA*
toDfa*
RegexNFA*
toNfa*
NFA*
toNfa*
DFA*
toDfa*
LALR1Parser
toLalr1Parser
Does not rename:
*AST*
as there are too many instances of this and it will create conflicts with future PRs that are already submitted. This will be fixed in separate clang-tidy PRs for each file in the future.Validation performed
Previously existing tests still succeed.
Summary by CodeRabbit
Release Notes
New Features
Dfa
class for representing Deterministic Finite Automata.Dfa
class.Renaming and Refactoring
RegexNFA
toNfa
andRegexDFA
toDfa
across the codebase for consistency.LALR1Parser
toLalr1Parser
for uniformity.Documentation
LALR1Parser
toLalr1Parser
.Bug Fixes
These changes enhance clarity and maintainability while introducing new capabilities for finite automata handling.