-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract RegexNFAState
and tagged transition classes into their own files.
#47
Conversation
… tag to matching_variable_id; Use full names for vars (r->rule); Clarify if states are NFA or DFA
…for clairty that nothing is shared b/w tests
…egexASTgroup with min = 1 OR'd with RegexASTEmpty
…iteral arguments; Use const& for non-literals; Use auto where possible; Use uint32_t over int for ids; replace begin() and end() with cbegin() and cend()
…(); Add docstrign to RegexDFAStatePair
…NFA; Made add to nfa functions const
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
WalkthroughThe pull request introduces changes aimed at enhancing the finite automata functionality within the log parsing library. It includes the addition of new header files for Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (5)
src/log_surgeon/finite_automata/RegexNFAStateType.hpp (2)
7-10
: Consider adding documentation for the enum class values.Please add documentation explaining the distinction between
Byte
andUTF8
state types and their specific use cases in the NFA implementation.namespace log_surgeon::finite_automata { +/** + * Represents the type of state in a Regex NFA. + * + * Byte: Processes input one byte at a time + * UTF8: Processes input as UTF-8 encoded characters + */ enum class RegexNFAStateType : uint8_t { Byte, UTF8 };
12-13
: Add documentation for the template class.Consider adding documentation explaining the purpose of
RegexNFAState
and how the state type parameter affects its behaviour.+/** + * Represents a state in a Regex NFA with specific character handling behaviour. + * + * @tparam state_type Determines how the state processes input characters + * (byte-by-byte or as UTF-8 sequences) + */ template <RegexNFAStateType state_type> class RegexNFAState;src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
18-20
: Clarify ownership semantics ofdest_state
; consider using references or smart pointersCurrently,
dest_state
is a raw pointer to a constNFAStateType
. If the transition classes do not owndest_state
, consider using a const reference (NFAStateType const&
) to indicate non-ownership, enhancing clarity and safety. If ownership needs to be managed, using smart pointers likestd::shared_ptr
orstd::unique_ptr
can prevent potential memory issues.Also applies to: 42-44
71-71
: Confirm that the serialization format and tag ordering meet system requirementsWhen serializing tags using
fmt::join(m_tags, ",")
, the tags are ordered according tostd::set
, resulting in an ascending order. Please verify that this ordering aligns with the expected serialization format. If tag order is not significant or a different order is required, adjustments to the container type or serialization logic may be necessary.Also applies to: 82-82
src/log_surgeon/finite_automata/RegexNFA.hpp (1)
Line range hint
78-78
: Reminder: Address the TODO to handle the UTF-8 case.Handling the UTF-8 case is important to ensure proper processing of Unicode inputs.
Would you like assistance in implementing support for the UTF-8 case or opening a GitHub issue to track this task?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
CMakeLists.txt
(1 hunks)src/log_surgeon/finite_automata/RegexNFA.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexNFAState.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexNFAStateType.hpp
(1 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(1 hunks)tests/CMakeLists.txt
(1 hunks)
🔇 Additional comments (6)
src/log_surgeon/finite_automata/RegexNFAStateType.hpp (1)
1-19
: Well-structured header with good separation of concerns.
The header file is well-organized with:
- Proper include guards
- Minimal includes
- Clear type definitions
- Good use of type safety with enum class
- Appropriate use of templates and type aliases
tests/CMakeLists.txt (1)
7-9
: LGTM! The new header files are properly integrated.
The addition of these header files aligns with the PR objective of separating classes from RegexNFA.hpp. The files are correctly placed in the finite_automata directory and included in the build configuration.
Let's verify that these files exist and were properly separated from RegexNFA.hpp:
✅ Verification successful
The new header files are properly separated and integrated
The verification confirms that:
- All three header files exist in the correct location
- The classes have been properly separated:
- RegexNFAState is now in its own file
- RegexNFAStateType contains the enum class definition
- TaggedTransition contains both PositiveTaggedTransition and NegativeTaggedTransition classes
- RegexNFA.hpp correctly includes the new RegexNFAState header
- The classes are no longer present in RegexNFA.hpp
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence of new header files and their separation from RegexNFA.hpp
# Test 1: Check if the new files exist
echo "Checking if new files exist..."
fd -t f "RegexNFAState.hpp|RegexNFAStateType.hpp|TaggedTransition.hpp" src/log_surgeon/finite_automata/
# Test 2: Verify that these classes are no longer in RegexNFA.hpp
echo "Verifying separation from RegexNFA.hpp..."
rg -i "class\s+(RegexNFAState|TaggedTransition)" src/log_surgeon/finite_automata/RegexNFA.hpp
Length of output: 522
Script:
#!/bin/bash
# Let's verify the content of these files and their relationships
# Check if RegexNFA.hpp includes these new headers
echo "Checking includes in RegexNFA.hpp..."
rg -A 1 "#include.*?(RegexNFAState|RegexNFAStateType|TaggedTransition)" src/log_surgeon/finite_automata/RegexNFA.hpp
# Verify that the classes are properly defined in their respective files
echo "Checking class definitions in new files..."
for file in RegexNFAState.hpp RegexNFAStateType.hpp TaggedTransition.hpp; do
echo "=== Content of $file ==="
rg "class\s+\w+" "src/log_surgeon/finite_automata/$file" -A 2
done
# Check if these files are properly guarded
echo "Checking header guards..."
for file in RegexNFAState.hpp RegexNFAStateType.hpp TaggedTransition.hpp; do
echo "=== Header guards in $file ==="
rg "#ifndef.*?_HPP|#define.*?_HPP" "src/log_surgeon/finite_automata/$file"
done
Length of output: 2449
CMakeLists.txt (1)
100-102
: LGTM! The new header files are properly integrated.
The addition of these header files aligns with the PR objective of separating classes from RegexNFA.hpp. The files are correctly placed in the finite_automata directory and will be included in the installation.
Let's verify that all the new header files exist:
✅ Verification successful
All new header files are present in the correct location
All three header files have been successfully verified to exist in the src/log_surgeon/finite_automata/
directory:
- RegexNFAState.hpp
- RegexNFAStateType.hpp
- TaggedTransition.hpp
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence of the new header files
# Test: Check if all new header files exist. Expect: All files should be found.
for file in "RegexNFAState.hpp" "RegexNFAStateType.hpp" "TaggedTransition.hpp"; do
fd --type f "^${file}$" || echo "Missing file: ${file}"
done
Length of output: 514
src/log_surgeon/finite_automata/RegexNFA.hpp (2)
19-19
: LGTM!
Including <log_surgeon/finite_automata/RegexNFAState.hpp>
ensures that the necessary definitions are available after moving RegexNFAState
into its own file.
Line range hint 109-112
: Verify the safety of unchecked optional access in the serialize
method.
The use of .value()
on state->serialize(state_ids)
assumes it always returns a valid value. Please ensure that serialize(state_ids)
cannot return an empty std::optional
, to avoid potential exceptions.
src/log_surgeon/finite_automata/RegexNFAState.hpp (1)
100-100
:
Undefined identifier cSizeOfByte
The identifier cSizeOfByte
is used but not defined within this file or included headers. Ensure that cSizeOfByte
is defined or properly included.
To verify where cSizeOfByte
is defined, you can run:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small changes before we merge
Co-authored-by: Lin Zhihao <[email protected]>
…ypedefs to the top of the file to fix compilation error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
src/log_surgeon/finite_automata/RegexNFAState.hpp (1)
114-115
: Consider usingauto
for the return type.Since the return type is
void
, you can simplify the function declaration by usingauto
for the return type:-auto RegexNFAState<state_type>::add_interval(Interval interval, RegexNFAState* dest_state) -> void { +auto RegexNFAState<state_type>::add_interval(Interval interval, RegexNFAState* dest_state) {This is a minor nitpick and a matter of personal preference. Using
auto
forvoid
return types can make the code more concise without sacrificing readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
src/log_surgeon/finite_automata/RegexNFAState.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexNFAStateType.hpp
(1 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/log_surgeon/finite_automata/RegexNFAStateType.hpp
🧰 Additional context used
📓 Learnings (2)
src/log_surgeon/finite_automata/RegexNFAState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-02T09:13:56.755Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-02T09:18:31.046Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
🔇 Additional comments (3)
src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
16-43
: Well-structured implementation of PositiveTaggedTransition
.
The PositiveTaggedTransition
class is cleanly implemented with clear constructors and accessor methods. The use of std::optional
in the serialize
method appropriately handles cases where the destination state is not found.
46-73
: Well-structured implementation of NegativeTaggedTransition
.
The NegativeTaggedTransition
class is well-designed, and the handling of multiple tags using std::set<uint32_t>
is appropriate for ensuring uniqueness and order. The serialize
method effectively generates a string representation of the transition.
src/log_surgeon/finite_automata/RegexNFAState.hpp (1)
1-215
: LGTM!
The RegexNFAState
class is well-structured and provides a comprehensive set of methods for managing NFA states and transitions. The code follows good practices such as using const
and [[nodiscard]]
where appropriate, and the error handling in the serialize
method is robust. The class is properly templated to handle different state types, and the type aliases RegexNFAByteState
and RegexNFAUTF8State
provide convenient shortcuts. Overall, the code is clean, modular, and easy to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
src/log_surgeon/finite_automata/RegexNFAState.hpp (2)
37-38
: Consider passingtags
by value and moving.The constructor takes
tags
by value and then moves it intom_negative_tagged_transition
. To avoid an unnecessary copy, consider takingtags
by rvalue reference (std::set<uint32_t>&&
) and moving it directly.Apply this change:
- RegexNFAState(std::set<uint32_t> tags, RegexNFAState const* dest_state) + RegexNFAState(std::set<uint32_t>&& tags, RegexNFAState const* dest_state)
79-79
: Consider marking the method asconst
.The
get_tree_transitions
method returns aconst
reference tom_tree_transitions
but is not marked asconst
itself. Consider adding theconst
qualifier to align with the return type and allow calling it onconst
instances.Apply this change:
- auto get_tree_transitions() -> Tree const& { return m_tree_transitions; } + [[nodiscard]] auto get_tree_transitions() const -> Tree const& { return m_tree_transitions; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
src/log_surgeon/finite_automata/RegexNFAState.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexNFAStateType.hpp
(1 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/log_surgeon/finite_automata/RegexNFAStateType.hpp
🧰 Additional context used
📓 Learnings (2)
src/log_surgeon/finite_automata/RegexNFAState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-02T09:13:56.755Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-02T09:18:31.046Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
🔇 Additional comments (17)
src/log_surgeon/finite_automata/RegexNFAState.hpp (13)
21-22
: LGTM!
The template class declaration looks good. The class is parameterized by the RegexNFAStateType
enum, allowing for different state types.
24-25
: LGTM!
The type aliases RegexNFAByteState
and RegexNFAUTF8State
are defined correctly using the RegexNFAState
template class with the appropriate RegexNFAStateType
values.
32-32
: LGTM!
The default constructor is defined correctly.
34-35
: LGTM!
The constructor taking a single tag and destination state is implemented correctly, initializing the m_positive_tagged_transitions
member.
40-42
: LGTM!
The set_accepting
and is_accepting
methods are implemented correctly, allowing the acceptance status to be set and retrieved.
44-50
: LGTM!
The set_matching_variable_id
and get_matching_variable_id
methods are implemented correctly, allowing the matching variable ID to be set and retrieved.
52-60
: LGTM!
The get_positive_tagged_transitions
and get_negative_tagged_transition
methods are implemented correctly, returning the respective tagged transitions.
62-68
: LGTM!
The add_epsilon_transition
and get_epsilon_transitions
methods are implemented correctly, allowing epsilon transitions to be added and retrieved.
70-77
: LGTM!
The add_byte_transition
and get_byte_transitions
methods are implemented correctly, allowing byte transitions to be added and retrieved.
87-87
: LGTM!
The add_interval
method declaration looks good. It takes an Interval
and a destination state pointer as parameters.
97-98
: LGTM!
The serialize
method declaration looks good. It takes a const
reference to a map of states to their unique identifiers and returns an optional string.
115-159
: LGTM!
The add_interval
method implementation looks correct. It handles adding transitions based on the provided interval, either as byte transitions or tree transitions depending on the state type.
161-212
: LGTM!
The serialize
method implementation looks correct. It serializes the NFA state into a string representation, handling byte transitions, epsilon transitions, positive and negative tagged transitions, and the accepting tag. It also performs error handling by returning std::nullopt
if the serialization of any transition fails.
src/log_surgeon/finite_automata/TaggedTransition.hpp (4)
1-76
: LGTM!
The new TaggedTransition.hpp
file is well-structured and follows best practices. The classes PositiveTaggedTransition
and NegativeTaggedTransition
are properly encapsulated and provide the necessary functionality for representing tagged transitions in a finite automaton.
16-43
: ****
As discussed in a previous review, the classes PositiveTaggedTransition
and NegativeTaggedTransition
currently do not share enough functionality to justify refactoring into a common base class.
Also applies to: 46-73
71-72
: ****
As mentioned in a previous review, using std::set<uint32_t>
for m_tags
is appropriate for now, as this is non-critical code. The change to std::vector<uint32_t>
will be tracked in a future PR.
26-38
: Verify error handling for missing destination states.
The serialize
methods in both PositiveTaggedTransition
and NegativeTaggedTransition
return std::nullopt
if the destination state is not found in the provided state_ids
map. Ensure that the calling code appropriately handles this case to avoid unexpected behaviour.
Run the following script to verify the error handling:
Also applies to: 56-68
✅ Verification successful
Let me analyze the verification results. The code shows that:
-
The
serialize
method inRegexNFAState
properly handlesstd::nullopt
returns from both positive and negative tagged transitions by propagating thestd::nullopt
up the call chain. -
The main caller of
serialize
is inRegexNFA::serialize()
which generates state IDs from a BFS traversal order, ensuring all states are included in thestate_ids
map. This is evident from the comment: "state_ids
is well-formed as its generated fromget_bfs_traversal_order
". -
The error handling for missing destination states is properly implemented at both the transition level (in
TaggedTransition.hpp
) and state level (inRegexNFAState.hpp
).
Error handling for missing destination states is properly implemented.
The codebase shows a robust error handling chain:
- Tagged transitions return
std::nullopt
if destination state is not found - States propagate the
std::nullopt
from transitions - The main serialization code ensures state IDs are well-formed through BFS traversal
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify error handling for missing destination states in `serialize` methods.
# Test 1: Search for `serialize` method calls. Expect: Calls are wrapped in error handling logic.
rg -A 10 $'auto.+serialize'
# Test 2: Search for `std::nullopt` usage. Expect: Appropriate handling of `std::nullopt` return values.
rg -A 10 $'std::nullopt'
Length of output: 37979
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
Extract `RegexNFAState` and tagged transition classes into their own files.
RegexNFAState
and tagged transition classes into their own files.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation