-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: separate the concept of captures and tags; lexer now tracks mapping from variables to capture to tags to registers. #72
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Lin Zhihao <[email protected]>
…reflect functionality change.
WalkthroughThis pull request introduces a comprehensive refactoring of the log surgeon's finite automata and lexer components, primarily focusing on replacing the concept of "tags" with "captures". The changes span multiple files across the project, including CMakeLists.txt, header files, and test cases. The refactoring involves renaming classes, updating method signatures, and modifying how captures are managed throughout the codebase, with a particular emphasis on improving the representation and handling of regex capture groups. Changes
Sequence DiagramsequenceDiagram
participant Lexer
participant Nfa
participant RegexAST
participant Capture
Lexer->>Nfa: Create NFA with rules
Nfa->>RegexAST: Process regex rules
RegexAST->>Capture: Generate capture groups
Nfa-->>Lexer: Return NFA with capture information
Possibly related PRs
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (9)
src/log_surgeon/finite_automata/Dfa.hpp (1)
Line range hint
121-134
: Consider adding error handling for invalid input inget_intersect
While the method correctly implements the intersection operation, it might benefit from validation of the input DFA pointer and appropriate error handling.
Consider adding input validation:
template <typename TypedDfaState> auto Dfa<TypedDfaState>::get_intersect(Dfa const* dfa_in) const -> std::set<uint32_t> { + if (nullptr == dfa_in) { + throw std::invalid_argument("Input DFA pointer cannot be null"); + } std::set<uint32_t> schema_types; // ... rest of the implementationsrc/log_surgeon/finite_automata/Nfa.hpp (1)
122-122
: Code formatting issue detectedThe code does not adhere to the project's code formatting guidelines. Please run
clang-format
on the affected file to resolve formatting issues.🧰 Tools
🪛 GitHub Actions: lint
[error] 122-122: Code should be clang-formatted
src/log_surgeon/finite_automata/RegexAST.hpp (1)
113-113
: Code formatting issue detectedThe code does not adhere to the project's code formatting guidelines at line 113. Please run
clang-format
on the affected file to resolve formatting issues.🧰 Tools
🪛 GitHub Actions: lint
[error] 113-113: Code should be clang-formatted
tests/test-capture.cpp (1)
23-34
: Consider adding more test cases for constructors.While the current tests cover basic copy and move operations, consider adding tests for:
- Copy/move with empty strings
- Self-assignment test
- Chain assignment test (a = b = c)
src/log_surgeon/Lexer.hpp (1)
131-169
: Consider adding validation for capture_id parameter.While the error handling is good, consider adding a debug assertion or validation for the
capture_id
parameter inget_registers_for_capture()
to catch potential programming errors early.[[nodiscard]] auto get_registers_for_capture(symbol_id_t capture_id ) const -> std::optional<std::pair<register_id_t, register_id_t>> { + assert(capture_id < m_id_symbol.size() && "Invalid capture_id"); auto const tag_ids = get_tag_ids_for_capture_id(capture_id);
tests/test-nfa.cpp (1)
47-48
: Consider usingstd::move
for better performance.The initialization of
ByteNfa
could be more efficient by movingrules
instead of copying it.- ByteNfa const nfa{rules}; + ByteNfa const nfa{std::move(rules)};🧰 Tools
🪛 GitHub Actions: lint
[error] 47-65: Code should be clang-formatted
src/log_surgeon/Lexer.tpp (1)
381-394
: Enhance the error message for duplicate captures.The error message could be more descriptive to help users identify which capture names are duplicated.
- throw std::invalid_argument("`m_rules` contains capture names that are not unique."); + throw std::invalid_argument(fmt::format("Duplicate capture name '{}' found in rules.", capture_name));🧰 Tools
🪛 GitHub Actions: lint
[error] 390-390: Code should be clang-formatted
tests/test-lexer.cpp (2)
215-297
: Consider extracting helper functions for better test readability.The test case could be more maintainable by extracting common setup and verification logic into helper functions.
// Helper function for lexer setup auto setup_lexer(string_view schema) -> log_surgeon::lexers::ByteLexer { log_surgeon::lexers::ByteLexer lexer; lexer.add_delimiters({' ', '\n'}); // ... rest of setup code ... return lexer; } // Helper function for token verification auto verify_token(const log_surgeon::Token& token, const log_surgeon::lexers::ByteLexer& lexer, string_view expected_type, string_view expected_value) -> void { REQUIRE(nullptr != token.m_type_ids_ptr); REQUIRE(1 == token.m_type_ids_ptr->size()); REQUIRE(expected_type == lexer.m_id_symbol[token.m_type_ids_ptr->at(0)]); REQUIRE(expected_value == token.to_string_view()); }
359-400
: Track missing test coverage for register functionality.The TODO comments indicate that tests for register-related functionality are pending implementation. This should be tracked to ensure complete test coverage.
Would you like me to create a GitHub issue to track the implementation of these test cases?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
CMakeLists.txt
(1 hunks)src/log_surgeon/Lexer.hpp
(5 hunks)src/log_surgeon/Lexer.tpp
(3 hunks)src/log_surgeon/LexicalRule.hpp
(1 hunks)src/log_surgeon/SchemaParser.cpp
(4 hunks)src/log_surgeon/finite_automata/Capture.hpp
(2 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(3 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(3 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(3 hunks)src/log_surgeon/finite_automata/PrefixTree.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(20 hunks)src/log_surgeon/finite_automata/RegisterHandler.hpp
(2 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(3 hunks)tests/CMakeLists.txt
(2 hunks)tests/test-capture.cpp
(1 hunks)tests/test-lexer.cpp
(2 hunks)tests/test-nfa.cpp
(4 hunks)tests/test-tag.cpp
(0 hunks)
💤 Files with no reviewable changes (1)
- tests/test-tag.cpp
✅ Files skipped from review due to trivial changes (2)
- src/log_surgeon/finite_automata/PrefixTree.hpp
- src/log_surgeon/finite_automata/Capture.hpp
🧰 Additional context used
📓 Path-based instructions (12)
src/log_surgeon/finite_automata/RegisterHandler.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-nfa.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-capture.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LexicalRule.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/SchemaParser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-lexer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Lexer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (6)
src/log_surgeon/finite_automata/RegisterHandler.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T22:25:35.608Z
Learning: In the `RegisterHandler` class in `src/log_surgeon/finite_automata/RegisterHandler.hpp`, the methods `add_register` and `append_position` rely on `emplace_back` and `m_prefix_tree.insert` to handle exceptions correctly and maintain consistent state without requiring additional exception handling.
src/log_surgeon/SchemaParser.cpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
src/log_surgeon/finite_automata/NfaState.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/Lexer.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T22:25:35.608Z
Learning: In the `RegisterHandler` class in `src/log_surgeon/finite_automata/RegisterHandler.hpp`, the methods `add_register` and `append_position` rely on `emplace_back` and `m_prefix_tree.insert` to handle exceptions correctly and maintain consistent state without requiring additional exception handling.
🪛 GitHub Actions: lint
tests/test-nfa.cpp
[error] 47-65: Code should be clang-formatted
src/log_surgeon/LexicalRule.hpp
[error] 28-28: Code should be clang-formatted
src/log_surgeon/SchemaParser.cpp
[error] 1-1: Code should be clang-formatted
src/log_surgeon/Lexer.tpp
[error] 4-4: Code should be clang-formatted
[error] 390-390: Code should be clang-formatted
src/log_surgeon/finite_automata/Nfa.hpp
[error] 122-122: Code should be clang-formatted
src/log_surgeon/finite_automata/RegexAST.hpp
[error] 113-113: Code should be clang-formatted
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build (ubuntu-latest, Release)
- GitHub Check: build (ubuntu-latest, Debug)
🔇 Additional comments (29)
src/log_surgeon/finite_automata/Dfa.hpp (2)
5-5
: Well-structured include additions!Making the dependencies on
<map>
,<stack>
, andRegisterHandler.hpp
explicit improves maintainability by clearly showing the required headers.Also applies to: 8-8, 14-14
82-82
: Good defensive programming practice!Initializing the
state
pointer tonullptr
is a good practice that prevents undefined behaviour and makes debugging easier.src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
14-15
: Introducedtag_id_t
typedef for clarityDefining
tag_id_t
as an alias forstd::uint32_t
improves code readability and maintainability.src/log_surgeon/finite_automata/Nfa.hpp (1)
43-43
: Optimized constructor by passing vector by const referenceChanging the constructor parameter to accept
rules
byconst
reference avoids unnecessary copying, improving performance.tests/test-capture.cpp (3)
8-11
: LGTM! Basic test case is well-implemented.The test verifies the fundamental functionality of the
get_name()
method.
13-16
: LGTM! Empty string handling is properly tested.The test ensures that empty capture names are handled correctly.
18-21
: LGTM! Special characters test is comprehensive.The test verifies that special characters in capture names are preserved correctly.
src/log_surgeon/LexicalRule.hpp (1)
26-28
: LGTM! Method implementation is correct.The
get_captures()
method is well-designed:
- Properly marked with [[nodiscard]]
- Returns const pointers to prevent modification
- Correctly delegates to the underlying regex implementation
🧰 Tools
🪛 GitHub Actions: lint
[error] 28-28: Code should be clang-formatted
src/log_surgeon/finite_automata/RegisterHandler.hpp (2)
5-5
: LGTM! Include is properly placed.The
<cstdint>
include is correctly added to support the new type alias.
21-21
: LGTM! Type alias enhances code clarity.The
register_id_t
type alias improves code readability and maintainability by providing a meaningful name for the register identifier type.src/log_surgeon/Lexer.hpp (3)
27-29
: LGTM! Type aliases enhance code clarity.The type aliases improve code readability and maintainability by providing meaningful names for different identifier types.
59-61
: LGTM! Exception documentation is clear.The comment properly documents the exception that can be thrown when duplicate capture names are encountered.
197-199
: LGTM! Member variables support the new mapping functionality.The new member variables properly support the mapping between variables, captures, tags, and registers.
tests/test-nfa.cpp (3)
58-66
: LGTM! Test expectations updated to match the new capture-to-tag mapping.The changes correctly reflect the new implementation where captures are mapped to numerical tag indices.
Line range hint
69-101
: LGTM! Consistent update of test expectations.The changes maintain consistency with the new capture-to-tag mapping implementation.
Line range hint
106-132
: LGTM! Test expectations properly updated.The changes complete the consistent update of test expectations for the new capture-to-tag mapping implementation.
src/log_surgeon/finite_automata/NfaState.hpp (2)
35-40
: LGTM! Good use of value types over pointers.The change from
Tag const*
totag_id_t
reduces memory management complexity and aligns with modern C++ practices.
54-56
: LGTM! Consistent use of value types.The method signature update maintains consistency with the constructor changes, using
tag_id_t
throughout the class.src/log_surgeon/Lexer.tpp (2)
396-401
: LGTM! Proper implementation of capture-to-tag mapping.The code correctly processes and stores the relationships between captures and their corresponding tag IDs.
403-403
: Verify the impact of DFA ignoring captures.The TODO comment indicates that DFA treats capture groups as regular expressions. Please verify if this limitation affects the correctness of capture group functionality.
✅ Verification successful
DFA capture group limitation is a known design choice
The TODO comment accurately reflects the current implementation. The DFA treats capture expressions like "capture:user=(?<user_id>\d+)" as simple patterns "capture:user=\d+". This is a known limitation in the design, not a bug:
- Tests in
test-lexer.cpp
show that capture groups are supported in the API- The DFA implementation intentionally simplifies these patterns
- Capture group processing likely happens at a different stage of processing
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Search for tests that verify capture group functionality with DFA. rg -A 5 "TEST_CASE.*capture.*DFA" tests/Length of output: 42
Script:
#!/bin/bash # Search for capture-related code and tests echo "=== Capture-related tests ===" rg -l "TEST_CASE.*capture" tests/ echo -e "\n=== Capture implementation ===" ast-grep --pattern 'capture' echo -e "\n=== DFA implementation ===" rg -l "DFA|deterministic.*automaton" src/Length of output: 2361
Script:
#!/bin/bash # Examine capture tests and DFA implementation echo "=== Capture Tests ===" rg -A 10 "TEST_CASE.*capture" tests/test-lexer.cpp tests/test-capture.cpp echo -e "\n=== DFA Capture Handling ===" rg -A 5 "capture" src/log_surgeon/finite_automata/Dfa.hpp echo -e "\n=== NFA to DFA Conversion ===" rg -A 10 "convert.*to.*dfa|to_dfa" src/log_surgeon/finite_automata/Length of output: 1133
tests/test-lexer.cpp (1)
299-358
: LGTM! Comprehensive test coverage for capture group functionality.The test case thoroughly verifies the new capture-to-tag mapping features, including variable IDs, capture IDs, and tag IDs relationships.
src/log_surgeon/SchemaParser.cpp (4)
13-13
: LGTM!The include statement for
Capture.hpp
is correctly placed and aligns with the transition from tags to captures.
205-205
: LGTM!The comment has been correctly updated to use "captures" terminology, maintaining consistency with the codebase changes.
251-251
: LGTM!The comment has been correctly updated to use "captures" terminology, maintaining consistency with the codebase changes.
170-170
: Verify non-null requirement for Capture.The change from Tag to Capture is consistent. However, based on previous learnings, ensure that the Capture instance is always non-null when passed to RegexASTCapture.
Run this script to verify the non-null requirement:
✅ Verification successful
Non-null requirement is enforced by design
The Capture class design and RegexASTCapture's constructor guarantees that Capture instances cannot be null. The
make_unique<Capture>(r4->m_name)
call is safe as it satisfies the required string parameter constructor.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that Capture instances are always non-null when passed to RegexASTCapture. # Search for any null checks or assignments of Capture instances ast-grep --pattern 'RegexASTCapture($$$, nullptr)' || ast-grep --pattern 'RegexASTCapture($$$, NULL)' # Search for any potential null assignments to Capture variables rg -A 2 'Capture.*=.*null|Capture.*=.*nullptr|Capture.*=.*NULL'Length of output: 118
Script:
#!/bin/bash # Find Capture class definition and usage ast-grep --pattern 'class Capture { $$$ }' # Find RegexASTCapture constructor definition ast-grep --pattern 'RegexASTCapture::RegexASTCapture($$$)' # Find all Capture construction patterns ast-grep --pattern 'new Capture($$$)' || ast-grep --pattern 'make_unique<Capture>($$$)' # Find all RegexASTCapture construction patterns ast-grep --pattern 'new RegexASTCapture($$$)' || ast-grep --pattern 'make_unique<RegexASTCapture>($$$)'Length of output: 287
Script:
#!/bin/bash # Search for Capture and RegexASTCapture in header files fd -e hpp -e h --exec cat {} \; | rg -A 10 'class (Capture|RegexASTCapture)' # Find all make_unique constructions rg -A 2 'make_unique<(Capture|RegexASTCapture)>' # Find constructor implementations rg -A 5 '(Capture|RegexASTCapture)::\1'Length of output: 1320
tests/CMakeLists.txt (2)
27-34
: LGTM!The addition of test-capture.cpp and the improved formatting enhance the maintainability of the test suite.
5-7
: Verify source file list consistency.The addition of Capture.hpp aligns with the transition from tags to captures. However, Nfa.hpp and NfaState.hpp appear to be duplicated in the source list.
Run this script to verify the source file list:
CMakeLists.txt (2)
96-96
: LGTM!The addition of Capture.hpp is correctly placed within the finite_automata section.
102-104
: Verify the necessity of restored files.The restoration of PrefixTree and RegexAST files should be verified against the capture functionality requirements.
Run this script to verify file dependencies:
✅ Verification successful
Restoration of files is necessary and correct
The PrefixTree and RegexAST files are essential components of the system:
- PrefixTree is required by RegisterHandler for position tracking
- RegexAST provides critical capture group functionality
- Together they form a cohesive pattern matching and position tracking system
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check dependencies between restored files and Capture functionality. # Search for Capture-related usage in restored files echo "Checking PrefixTree.cpp for Capture usage:" rg -A 3 "Capture" "src/log_surgeon/finite_automata/PrefixTree.cpp" || true echo "Checking PrefixTree.hpp for Capture usage:" rg -A 3 "Capture" "src/log_surgeon/finite_automata/PrefixTree.hpp" || true echo "Checking RegexAST.hpp for Capture usage:" rg -A 3 "Capture" "src/log_surgeon/finite_automata/RegexAST.hpp" || trueLength of output: 4173
Script:
#!/bin/bash # Check relationships between PrefixTree and RegexAST/Capture components echo "Checking RegexAST.hpp for PrefixTree usage:" rg -A 3 "PrefixTree" "src/log_surgeon/finite_automata/RegexAST.hpp" || true echo "Checking for files using PrefixTree:" rg --files-with-matches "PrefixTree" || true echo "Checking implementation files for relationships:" rg -l "class.*(?:RegexAST|PrefixTree)" || trueLength of output: 770
Script:
#!/bin/bash # Examine RegisterHandler implementation echo "Checking RegisterHandler for component relationships:" rg -A 5 "class RegisterHandler" "src/log_surgeon/finite_automata/RegisterHandler.hpp" || true echo "Checking RegisterHandler usage of components:" rg -A 3 "(?:PrefixTree|RegexAST|Capture)" "src/log_surgeon/finite_automata/RegisterHandler.hpp" || true echo "Checking test implementation for component integration:" rg -A 3 "(?:PrefixTree|RegexAST|Capture)" "tests/test-register-handler.cpp" || trueLength of output: 2057
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/log_surgeon/finite_automata/RegexAST.hpp (1)
650-660
:⚠️ Potential issueAvoid throwing exceptions in the initializer list
Throwing exceptions in the member initializer list can lead to issues with object construction and exception safety. It's safer to perform null checks in the constructor body and throw exceptions from there.
Apply this diff to move the null checks into the constructor body:
RegexASTCapture( std::unique_ptr<RegexAST<TypedNfaState>> group_regex_ast, std::unique_ptr<Capture> capture ) - : m_group_regex_ast{( - nullptr == group_regex_ast - ? throw std::invalid_argument("Group regex AST cannot be null") - : std::move(group_regex_ast) - )}, - m_capture{ - nullptr == capture ? throw std::invalid_argument("Capture cannot be null") - : std::move(capture) - } { + { + if (nullptr == group_regex_ast) { + throw std::invalid_argument("Group regex AST cannot be null"); + } + if (nullptr == capture) { + throw std::invalid_argument("Capture cannot be null"); + } + m_group_regex_ast = std::move(group_regex_ast); + m_capture = std::move(capture); RegexAST<TypedNfaState>::set_subtree_positive_captures( m_group_regex_ast->get_subtree_positive_captures() ); RegexAST<TypedNfaState>::add_subtree_positive_captures({m_capture.get()}); }
🧹 Nitpick comments (4)
tests/test-lexer.cpp (3)
215-297
: Consider adding test cases for edge cases.The test case thoroughly validates the basic functionality but could benefit from additional test cases for:
- Empty input
- Multiple consecutive delimiters
- Input containing only delimiters
359-360
: Track TODOs in the issue tracker.The TODOs indicate pending implementation of determinization and simulation features. These should be tracked in the issue tracker for better visibility and follow-up.
Would you like me to create GitHub issues to track these TODOs?
Also applies to: 399-400
342-357
: Consider adding negative test cases for mapping validation.The mapping validation tests only cover the happy path. Consider adding test cases for:
- Non-existent variable IDs
- Non-existent capture IDs
- Invalid tag IDs
src/log_surgeon/finite_automata/Nfa.hpp (1)
95-99
: Consider using stable identifiers instead of raw pointers as unordered_map keysUsing
Capture const*
as keys in anunordered_map
relies on the stability of pointer values, which can be error-prone if objects are moved or deleted. To enhance safety and maintainability, consider using unique IDs or smart pointers with custom hash and equality functions as keys.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/log_surgeon/Lexer.hpp
(5 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(3 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(21 hunks)tests/test-lexer.cpp
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/log_surgeon/Lexer.hpp
🧰 Additional context used
📓 Path-based instructions (3)
tests/test-lexer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: build (ubuntu-latest, Release)
- GitHub Check: build (ubuntu-latest, Debug)
- GitHub Check: build (macos-latest, Release)
- GitHub Check: build (macos-latest, Debug)
🔇 Additional comments (4)
tests/test-lexer.cpp (2)
9-9
: LGTM!The new includes and using declarations are appropriate for the added test cases.
Also applies to: 15-15
362-397
: LGTM!The token recognition tests with captures are thorough and well-structured.
src/log_surgeon/finite_automata/Nfa.hpp (2)
43-43
: Efficient parameter passing by const referenceGood use of a constant reference in the
Nfa
constructor to avoid unnecessary copies ofrules
.
20-28
: Verify thread safety ofUniqueIdGenerator
If
UniqueIdGenerator
is used in multithreaded contexts, thegenerate_id()
method is not thread-safe due to the use of the post-increment operator. Consider usingstd::atomic<uint32_t>
forcurrent_id
or synchronizing access with a mutex to ensure thread safety.Run the following script to check for usage of
UniqueIdGenerator
in multithreaded contexts:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
55-56
: 🛠️ Refactor suggestionPass vector by const reference to improve efficiency.
Consider passing
tag_ids
by const reference to avoid unnecessary copying of the vector.-NegativeTaggedTransition(std::vector<tag_id_t> tag_ids, TypedNfaState const* dest_state) +NegativeTaggedTransition(std::vector<tag_id_t> const& tag_ids, TypedNfaState const* dest_state)
🧹 Nitpick comments (1)
src/log_surgeon/Lexer.tpp (1)
381-394
: Enhance the error message for duplicate capture names.The error message could be more descriptive by including the duplicate capture name.
Apply this diff to improve the error message:
- throw std::invalid_argument("`m_rules` contains capture names that are not unique." + throw std::invalid_argument(fmt::format( + "`m_rules` contains duplicate capture name: '{}'", capture_name) );
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/log_surgeon/Lexer.tpp
(3 hunks)src/log_surgeon/LexicalRule.hpp
(1 hunks)src/log_surgeon/SchemaParser.cpp
(4 hunks)src/log_surgeon/finite_automata/PrefixTree.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(21 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(3 hunks)tests/test-nfa.cpp
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- src/log_surgeon/finite_automata/PrefixTree.hpp
- src/log_surgeon/LexicalRule.hpp
- tests/test-nfa.cpp
- src/log_surgeon/SchemaParser.cpp
🧰 Additional context used
📓 Path-based instructions (2)
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (2)
src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build (ubuntu-latest, Release)
- GitHub Check: build (ubuntu-latest, Debug)
- GitHub Check: build (macos-latest, Debug)
🔇 Additional comments (9)
src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
14-14
: LGTM! Good use of type alias.The introduction of
tag_id_t
improves code readability and provides better type safety.
23-24
: LGTM! Good transition from pointers to IDs.The changes effectively replace
Tag
pointers withtag_id_t
, which:
- Reduces memory usage
- Simplifies memory management
- Maintains the same functionality
Also applies to: 40-40, 44-44
src/log_surgeon/Lexer.tpp (3)
7-7
: LGTM!The addition of
<stdexcept>
header is appropriate for usingstd::invalid_argument
exception.
362-374
: LGTM!The parameter type change from
uint32_t
tosymbol_id_t
improves type safety and semantic clarity.
397-402
: LGTM!The mapping between captures and tag IDs is well-structured and maintains the relationship necessary for tracking ownership transfer after parsing.
src/log_surgeon/finite_automata/RegexAST.hpp (4)
33-38
: LGTM!The documentation updates accurately reflect the transition from tags to captures and clearly explain the role of captures in the AST.
155-157
: LGTM!The member variable changes from tags to captures are consistent and well-named, reflecting the new design.
650-664
: Move null checks to constructor body.As previously discussed, throwing exceptions in the initializer list can lead to issues with object construction and exception safety.
Apply this diff to move the null checks into the constructor body:
RegexASTCapture( std::unique_ptr<RegexAST<TypedNfaState>> group_regex_ast, std::unique_ptr<Capture> capture -) - : m_group_regex_ast{( - nullptr == group_regex_ast - ? throw std::invalid_argument("Group regex AST cannot be null") - : std::move(group_regex_ast) - )}, - m_capture{ - nullptr == capture ? throw std::invalid_argument("Capture cannot be null") - : std::move(capture) - } { +) { + if (nullptr == group_regex_ast) { + throw std::invalid_argument("Group regex AST cannot be null"); + } + if (nullptr == capture) { + throw std::invalid_argument("Capture cannot be null"); + } + m_group_regex_ast = std::move(group_regex_ast); + m_capture = std::move(capture); RegexAST<TypedNfaState>::set_subtree_positive_captures( m_group_regex_ast->get_subtree_positive_captures() ); RegexAST<TypedNfaState>::add_subtree_positive_captures({m_capture.get()}); }
954-961
: LGTM!The serialization changes properly handle captures and maintain Unicode support through
std::u32string
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/test-lexer.cpp (2)
65-72
: Track the TODO comment about register value checks.The
test_scanning_input
function is well-implemented, but there's a TODO comment about checking register values that should be tracked.Would you like me to create a GitHub issue to track the implementation of register value checks when simulation is implemented?
Also applies to: 128-153
312-346
: Track the TODO comment about register-related checks.The test case provides good coverage for capture groups, but there's a TODO comment about checking
get_register_for_tag_id
andget_registers_for_capture
that should be tracked.Would you like me to create a GitHub issue to track the implementation of these register-related checks when determinization is implemented?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test-lexer.cpp
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
tests/test-lexer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: build (ubuntu-latest, Release)
- GitHub Check: build (ubuntu-latest, Debug)
- GitHub Check: build (macos-latest, Release)
- GitHub Check: build (macos-latest, Debug)
🔇 Additional comments (4)
tests/test-lexer.cpp (4)
9-9
: LGTM! Well-organized header inclusions and using declarations.The header inclusions follow a good organization pattern (standard library headers first, followed by project headers), and the using declarations are properly scoped.
Also applies to: 15-25
58-63
: LGTM! Well-structured utility function for lexer initialization.The
initialize_lexer
function is well-documented and follows good practices:
- Clear parameter documentation
- Proper handling of delimiters
- Good error handling for duplicate symbols
Also applies to: 93-126
296-310
: LGTM! Comprehensive test coverage for basic lexer functionality.The test case provides good coverage for:
- Token recognition for valid patterns
- Handling of uncaught strings
347-349
: LGTM! Good test coverage for token matching scenarios.The test provides comprehensive coverage for:
- Exact pattern matching
- Handling of non-matching patterns
- Handling of partial matches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of review. It is highly recommended to configure clang-tidy in CLion or the editor you're using so that many errors can be caught before code review.
auto const capture_ids{m_var_id_to_capture_ids.find(var_id)}; | ||
if (m_var_id_to_capture_ids.end() == capture_ids) { | ||
return std::nullopt; | ||
} | ||
return capture_ids->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto const capture_ids{m_var_id_to_capture_ids.find(var_id)}; | |
if (m_var_id_to_capture_ids.end() == capture_ids) { | |
return std::nullopt; | |
} | |
return capture_ids->second; | |
if (m_var_id_to_capture_ids.contains(var_id)) { | |
return m_var_id_to_capture_ids.at(var_id); | |
} | |
return std::nullopt; |
How about rewriting in this way to avoid using iterators (shouldn't have any performance difference).
Same for the rest
} | ||
|
||
[[nodiscard]] auto get_tag_ids_for_capture_id(symbol_id_t const capture_id | ||
) const -> std::optional<std::pair<tag_id_t, tag_id_t>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Headers for std::optional
and std::pair
are not directly included
@@ -122,8 +128,48 @@ class Lexer { | |||
return m_dfa; | |||
} | |||
|
|||
std::unordered_map<std::string, uint32_t> m_symbol_id; | |||
std::unordered_map<uint32_t, std::string> m_id_symbol; | |||
[[nodiscard]] auto get_capture_ids_for_var_id(symbol_id_t const var_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about renaming to get_capture_ids_from_var_id
return tag_ids->second; | ||
} | ||
|
||
[[nodiscard]] auto get_register_for_tag_id(tag_id_t const tag_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about get_reg_id_xxx
instead of get_register
since the return value is an ID.
auto const tag_ids{get_tag_ids_for_capture_id(capture_id)}; | ||
if (tag_ids.has_value()) { | ||
auto const start_reg{get_register_for_tag_id(tag_ids.value().first())}; | ||
auto const end_reg{get_register_for_tag_id(tag_ids.value().second())}; | ||
if (start_reg.has_value() && end_reg.has_value()) { | ||
return std::make_pair(start_reg.value(), end_reg.value()); | ||
} | ||
} | ||
return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto const tag_ids{get_tag_ids_for_capture_id(capture_id)}; | |
if (tag_ids.has_value()) { | |
auto const start_reg{get_register_for_tag_id(tag_ids.value().first())}; | |
auto const end_reg{get_register_for_tag_id(tag_ids.value().second())}; | |
if (start_reg.has_value() && end_reg.has_value()) { | |
return std::make_pair(start_reg.value(), end_reg.value()); | |
} | |
} | |
return std::nullopt; | |
auto const optional_tag_ids{get_tag_ids_for_capture_id(capture_id)}; | |
if (false == optional_tag_ids.has_value()) { | |
return std::nullopt; | |
} | |
auto const [start_tag_id, end_tag_id]{optional_tag_ids.value()}; | |
auto const optional_start_reg_id{get_register_for_tag_id(start_tag_id)}; | |
if (false == optional_tag_ids.has_value()) { | |
return std::nullopt; | |
} | |
auto const optional_end_reg_id{get_register_for_tag_id(end_tag_id)}; | |
if (false == optional_tag_ids.has_value()) { | |
return std::nullopt; | |
} | |
return {optional_start_reg_id.value(), optional_end_reg_id.value()}; |
How about rewriting in this way to:
- Follow the convention of using
optional_
prefix for optional variables. - Reduce the level of nested if statements.
REQUIRE(lexer.m_symbol_id.end() != var_id); | ||
|
||
string captureName{cCaptureName}; | ||
auto const capture_id{lexer.m_symbol_id.find(captureName)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, capture_id_it
, or use contains
and at
instead
auto const capture_id{lexer.m_symbol_id.find(captureName)}; | ||
REQUIRE(lexer.m_symbol_id.end() != capture_id); | ||
|
||
auto capture_ids{lexer.get_capture_ids_for_var_id(var_id->second)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional_capture_ids
auto const
REQUIRE(1 == capture_ids.value().size()); | ||
REQUIRE(capture_id->second == capture_ids.value()[0]); | ||
|
||
auto tag_ids{lexer.get_tag_ids_for_capture_id(capture_ids.value()[0])}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional_tag_ids
auto const
auto initialize_lexer(Schema schema, ByteLexer& lexer) -> void; | ||
|
||
/** | ||
* Scans the given input to ensure the correct behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need a more detailed doc string to explain what this helper checks.
// TODO: add check for get_register_for_tag_id and get_registers_for_capture when | ||
// determinization is implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: add check for get_register_for_tag_id and get_registers_for_capture when | |
// determinization is implemented. | |
// TODO: Add check for `get_register_for_tag_id` and `get_registers_for_capture` when | |
// TDFA's determination is implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
src/log_surgeon/UniqueIdGenerator.hpp (1)
5-13
: Improve class implementation with member initialization and nodiscard attribute.The class can be made more concise and safer by using member initialization and adding the nodiscard attribute to prevent accidental discarding of generated IDs.
namespace log_surgeon { class UniqueIdGenerator { public: - UniqueIdGenerator() : current_id{0} {} - [[nodiscard]] auto generate_id() -> uint32_t { return current_id++; } private: - uint32_t current_id; + uint32_t current_id{0}; }; } // namespace log_surgeonCMakeLists.txt (2)
Line range hint
1-1
: Address CMake deprecation warning.Update the CMake minimum version to use the range syntax for better compatibility.
-cmake_minimum_required(VERSION 3.5.1) +cmake_minimum_required(VERSION 3.5.1...3.27)
71-84
: Improve file ordering in SOURCE_FILES.The finite automata files should be alphabetically ordered for better maintainability.
src/log_surgeon/FileReader.hpp src/log_surgeon/finite_automata/Capture.hpp - src/log_surgeon/finite_automata/Dfa.hpp - src/log_surgeon/finite_automata/DfaState.hpp - src/log_surgeon/finite_automata/DfaStatePair.hpp - src/log_surgeon/finite_automata/Nfa.hpp - src/log_surgeon/finite_automata/NfaState.hpp - src/log_surgeon/finite_automata/PrefixTree.cpp - src/log_surgeon/finite_automata/PrefixTree.hpp - src/log_surgeon/finite_automata/RegexAST.hpp - src/log_surgeon/finite_automata/RegisterHandler.hpp - src/log_surgeon/finite_automata/StateType.hpp - src/log_surgeon/finite_automata/TaggedTransition.hpp - src/log_surgeon/finite_automata/UnicodeIntervalTree.hpp - src/log_surgeon/finite_automata/UnicodeIntervalTree.tpp + src/log_surgeon/finite_automata/Dfa.hpp + src/log_surgeon/finite_automata/DfaState.hpp + src/log_surgeon/finite_automata/DfaStatePair.hpp + src/log_surgeon/finite_automata/Nfa.hpp + src/log_surgeon/finite_automata/NfaState.hpp + src/log_surgeon/finite_automata/PrefixTree.cpp + src/log_surgeon/finite_automata/PrefixTree.hpp + src/log_surgeon/finite_automata/RegexAST.hpp + src/log_surgeon/finite_automata/RegisterHandler.hpp + src/log_surgeon/finite_automata/StateType.hpp + src/log_surgeon/finite_automata/TaggedTransition.hpp + src/log_surgeon/finite_automata/UnicodeIntervalTree.hpp + src/log_surgeon/finite_automata/UnicodeIntervalTree.tppsrc/log_surgeon/finite_automata/Nfa.hpp (2)
120-124
: Initialize all member variables in constructor.Consider explicitly initializing all member variables in the initialization list for better clarity.
Nfa<TypedNfaState>::Nfa(std::vector<LexicalRule<TypedNfaState>> const& rules) - : m_root{new_state()} { + : m_states{} + , m_capture_to_tag_ids{} + , m_root{new_state()} + , m_unique_id_generator{} { for (auto const& rule : rules) { rule.add_to_nfa(this); }
112-113
: Track TODO for lexer limitation.The TODO comment indicates a design limitation in the lexer that should be addressed.
Would you like me to create an issue to track this TODO for improving the lexer's handling of capture group names?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CMakeLists.txt
(2 hunks)src/log_surgeon/UniqueIdGenerator.hpp
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(3 hunks)tests/CMakeLists.txt
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (2)
src/log_surgeon/UniqueIdGenerator.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
🪛 GitHub Actions: Build
CMakeLists.txt
[warning] 1-1: CMake Deprecation Warning: Compatibility with CMake < 3.10 will be removed from a future version of CMake. Update the VERSION argument or use ... syntax.
🔇 Additional comments (2)
src/log_surgeon/finite_automata/Nfa.hpp (2)
21-28
: Improve class documentation.The documentation needs to be more precise and follow proper formatting.
/** - * Represents a NFA(non-deterministic finite automata) for recognizing a language based on the set - * of rules used during initialization. Currently use as an intermediate model for generating the - * DFA. + * Represents a Non-Deterministic Finite Automaton (NFA) designed to recognize a language based on + * a set of rules provided during initialization. This class serves as an intermediate + * representation used for generating the corresponding Deterministic Finite Automaton (DFA). * - * Currently we assume all capture groups have unique names. + * NOTE: It is assumed that all capture groups have unique names, even across different rules. * @tparam TypedNfaState */
128-139
: Improve implementation of get_or_create_capture_tags.The method can be made more concise and safer with const variables.
auto Nfa<TypedNfaState>::get_or_create_capture_tags(Capture const* capture ) -> std::pair<tag_id_t, tag_id_t> { - auto const existing_tags{m_capture_to_tag_ids.find(capture)}; - if (m_capture_to_tag_ids.end() == existing_tags) { - auto start_tag{m_unique_id_generator.generate_id()}; - auto end_tag{m_unique_id_generator.generate_id()}; - auto new_tags{std::make_pair(start_tag, end_tag)}; - m_capture_to_tag_ids.emplace(capture, new_tags); - return new_tags; + if (false == m_capture_to_tag_ids.contains(capture)) { + m_capture_to_tag_ids.emplace( + capture, + m_unique_id_generator.generate_id(), + m_unique_id_generator.generate_id() + ); } - return existing_tags->second; + return m_capture_to_tag_ids.at(capture); }
Description
*_id_t
aliases to make it more clear what the intended purpose of the maps are.Validation performed
Summary by CodeRabbit
Based on the comprehensive changes across multiple files, here are the updated release notes:
New Features
Capture
class to replace the previousTag
system.Improvements
symbol_id_t
andregister_id_t
.Refactoring
Testing
Capture
functionality.These changes represent a significant architectural improvement in the log parsing and regex handling system.