Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract RegexNFAState and tagged transition classes into their own files. #47

Merged
merged 172 commits into from
Nov 7, 2024
Merged
Changes from 8 commits
Commits
Show all changes
172 commits
Select commit Hold shift + click to select a range
a6274ec
Bug-fix for unicode array sizes
SharafMohamed Sep 12, 2024
186d239
Merge remote-tracking branch 'upstream/main' into nfa-cleanup-pr
SharafMohamed Sep 12, 2024
4f122c6
Move LexicalRule to its own class; Change name to variable_id; Change…
SharafMohamed Sep 12, 2024
c24f6e1
Additional fix for swapping meaning of tag
SharafMohamed Sep 12, 2024
33582da
Another additional fix for swapping meaning of tag
SharafMohamed Sep 12, 2024
3338ec7
Fix up some comments
SharafMohamed Sep 12, 2024
3cd3c0f
Fix comment grammar
SharafMohamed Sep 12, 2024
e05acbb
Add tags to AST; Serialize AST for testing; Add unit-test for testing…
SharafMohamed Sep 13, 2024
5e61e83
Use using to condense code; Use a unique schema object for each test …
SharafMohamed Sep 13, 2024
082090d
Add has_capture_groups(); Add unit-test for has_capture_groups()
SharafMohamed Sep 13, 2024
2c6d94e
Create and use RegexASTEmpty to split RegexASTgroup with min=0 into R…
SharafMohamed Sep 13, 2024
4e02f24
Add unit-test for 0 repetition regex
SharafMohamed Sep 13, 2024
bb3c543
Add more tests for repetition regex
SharafMohamed Sep 13, 2024
54027ad
Return by value in literal getters; Use const instead of const& for l…
SharafMohamed Sep 16, 2024
e58274f
Refactor new_state()
SharafMohamed Sep 16, 2024
1321871
Rename get_first_matching_variable_ids() to get_matching_variable_ids…
SharafMohamed Sep 16, 2024
c904755
Remove redundant docstrings
SharafMohamed Sep 16, 2024
ffe9a0f
Remove has_capture_groups()
SharafMohamed Sep 16, 2024
913ed1a
Const and auto changes
SharafMohamed Sep 16, 2024
795add3
Add tagged-nfa
SharafMohamed Sep 16, 2024
6e45657
Clarify that the add functions are adding to the nfa; Make add to nfa…
SharafMohamed Sep 17, 2024
7aa8a92
Changed AST add functions to indicate the AST are being added to the …
SharafMohamed Sep 17, 2024
d1d87e7
Merged with previous PR
SharafMohamed Sep 17, 2024
f386a3b
Merge branch 'tagged-ast' into pre-tagged-nfa-cleanup
SharafMohamed Sep 17, 2024
0c600d7
Merge branch 'pre-tagged-nfa-cleanup' into regex-ast-empty
SharafMohamed Sep 17, 2024
bedad75
Change add in RegexASTEmpty to add_to_nfa
SharafMohamed Sep 17, 2024
c78f79c
Merge with previous PR
SharafMohamed Sep 17, 2024
cd54e64
Fix and refactor NFA unit-test
SharafMohamed Sep 17, 2024
06c7066
Merge with previous PRs and update some ints to uints.
SharafMohamed Oct 8, 2024
38ab6fe
Fix compiler error.
SharafMohamed Oct 8, 2024
4c6b9c6
Fix compiler error where macos considers a struct default constructor…
SharafMohamed Oct 8, 2024
2e71aaa
Add state_type explicitly.
SharafMohamed Oct 8, 2024
c062a2c
Add state_type explicitly.
SharafMohamed Oct 8, 2024
eaa5674
Remove commented out code.
SharafMohamed Oct 8, 2024
f150474
Remove errent +=.
SharafMohamed Oct 8, 2024
bdafe10
Replace constructors with aggregate initialization.
SharafMohamed Oct 8, 2024
335bb34
Replace static inline with static constexpr.
SharafMohamed Oct 8, 2024
8446390
Undo last commit.
SharafMohamed Oct 8, 2024
73d8e46
Fix comment.
SharafMohamed Oct 8, 2024
7871f80
Finish changes of int to uint32_t for SymbolID.
SharafMohamed Oct 8, 2024
56483c9
Added comment explaining use of uint32_t for SymbolID.
SharafMohamed Oct 8, 2024
cafa973
Finish removing ints that should be uint32_t.
SharafMohamed Oct 8, 2024
a2b1bfd
Fix formatting.
SharafMohamed Oct 8, 2024
2fb4831
Rename SymbolID to SymbolId; Remove redundant ID for SymbolIds enum v…
SharafMohamed Oct 10, 2024
79482b1
Use docstring instead of inline comment.
SharafMohamed Oct 10, 2024
0237854
Use `auto`.
SharafMohamed Oct 10, 2024
c935af5
Use `const` for error code.
SharafMohamed Oct 10, 2024
91b5e78
Use `auto` and `const` for `add_to_nfa_with_negative_tags`.
SharafMohamed Oct 10, 2024
f6c86ec
Use 'auto' for `intermediate_state`.
SharafMohamed Oct 10, 2024
8fd70d7
Replace `find` with `at`.
SharafMohamed Oct 10, 2024
dd03a35
Use `auto` for `intermediate_state`.
SharafMohamed Oct 10, 2024
fd6bb02
Added constructors for tagged transition classes.
SharafMohamed Oct 10, 2024
65861c3
Add getters to tagged transition classes.
SharafMohamed Oct 10, 2024
dfb7dcf
Use emplace_back instead of push_back for tagged transitions.
SharafMohamed Oct 10, 2024
473787e
Use `const` for `factor`.
SharafMohamed Oct 10, 2024
2a08121
Use `const` for `sub_factor`.
SharafMohamed Oct 10, 2024
0b7e38b
Use list initialization for `rule`.
SharafMohamed Oct 10, 2024
f3b0f6a
Use list initialization for `var_schema`.
SharafMohamed Oct 10, 2024
74793b3
Group `visited_states` modifications together.
SharafMohamed Oct 10, 2024
d2f38fa
Use unordered_map instead of map for state_ids.
SharafMohamed Oct 10, 2024
a7f7a14
Make add_to_queue lambda a helper called add_to_queue_and_visited.
SharafMohamed Oct 10, 2024
d244a80
Replace const& with std::move when dealing with negative_tags.
SharafMohamed Oct 10, 2024
158df37
Run auto-formatter.
SharafMohamed Oct 10, 2024
f82b46f
Remove incorrect comment.
SharafMohamed Oct 16, 2024
c87caf9
Move LexicalRule to its own class; Pass rules into NFA construction; …
SharafMohamed Oct 20, 2024
abe55e2
Add tagged transitions during RegexNFAState construction; Remove unus…
SharafMohamed Oct 20, 2024
6d1db10
Fix compiler errors in intersect-test.
SharafMohamed Oct 20, 2024
a5413d0
Run linter.
SharafMohamed Oct 20, 2024
a4a4ab7
Fix headgaurd comment in LexicalRule.hpp.
SharafMohamed Oct 20, 2024
aa93847
Run linter.
SharafMohamed Oct 20, 2024
abb2656
Improve naming of intermediate state for postive and negative tagged …
SharafMohamed Oct 20, 2024
2f1c588
Move serialize method from test into classes; Clean up serialize code…
SharafMohamed Oct 20, 2024
9835eb0
Fix compiler error.
SharafMohamed Oct 20, 2024
dcd79a6
Improve var naming; Improve docstring.
SharafMohamed Oct 20, 2024
73300e7
Improve docstrings for serialize() methods.
SharafMohamed Oct 20, 2024
8548bd9
Add get_traversal_order() to NFA; Fix docstrings.
SharafMohamed Oct 20, 2024
38720f7
Add missing include to test-intersect.
SharafMohamed Oct 20, 2024
b700d99
Update src/log_surgeon/LexicalRule.hpp
SharafMohamed Oct 23, 2024
12e930c
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
0a104ff
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
7c126eb
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
7e43f99
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
5957bfb
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
98b5242
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
06742ba
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
021ac00
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
29e9c43
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
bd6081b
Update docstring for get_travel_order().
SharafMohamed Oct 23, 2024
16edf6f
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
b4b0b63
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
d108697
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
eef79d2
Update tests/test-NFA.cpp
SharafMohamed Oct 23, 2024
0d599cb
Update tests/test-NFA.cpp
SharafMohamed Oct 23, 2024
2807141
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
8e225cd
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
ecb84fb
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
6fc6030
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
f83ac5f
Rename get_traversal_order() to get_bfs_tranversal_order() and upate …
SharafMohamed Oct 23, 2024
e3214f1
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
d7d6dbe
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
fc55354
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
fbc25c8
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
df070c3
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
84cd573
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
a35f61f
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
53ba56a
Remove unused using.
SharafMohamed Oct 23, 2024
8a677e3
Remove empty namespace.
SharafMohamed Oct 23, 2024
f17f752
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
cbe1d39
Make traversal_order const.
SharafMohamed Oct 23, 2024
77bf2e0
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 23, 2024
a9d0ef3
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
43ec3f0
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
45372df
Add missing using for std::move.
SharafMohamed Oct 23, 2024
d0ba724
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 23, 2024
f69aa86
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
723eabb
Use move semantic for NFA constructor.
SharafMohamed Oct 23, 2024
8d40656
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 23, 2024
d20e391
Move add_to_queue_and_visited() to lambda.
SharafMohamed Oct 23, 2024
6a312e9
Fix compiler error in intersect-test.
SharafMohamed Oct 23, 2024
f8e5f8f
Simplify new_state().
SharafMohamed Oct 24, 2024
fc25f00
Remove using for std::move, and explicitly add namespace.
SharafMohamed Oct 24, 2024
cdab650
Update serialize docstring.
SharafMohamed Oct 24, 2024
e8db277
Have internal serialize() functions for RegexNFA (states and tagged t…
SharafMohamed Oct 24, 2024
337cead
Reserve space during BFS; Run linter.
SharafMohamed Oct 24, 2024
4a30fdc
Add braced initialization to nfa.
SharafMohamed Oct 27, 2024
0203038
Update docstring for positive tag serialization.
SharafMohamed Oct 27, 2024
633acc4
Update docstring for negative tag serialization.
SharafMohamed Oct 27, 2024
4db7b82
Use return statement for full docstring of get_bfs_traversal_order.
SharafMohamed Oct 27, 2024
01f8b14
Update NFA serialize() docstring.
SharafMohamed Oct 27, 2024
d047624
Add long form of BFS for first use.
SharafMohamed Oct 27, 2024
f9c4f46
Use const for state_id_it.
SharafMohamed Oct 27, 2024
bd77c78
Update docstring for NFA state serialize.
SharafMohamed Oct 27, 2024
f2d8049
Combine the two failure cases in NFA state serailize's docstring to m…
SharafMohamed Oct 27, 2024
4cb560f
Use const for state_id_it.
SharafMohamed Oct 27, 2024
95b7497
For NFA state serialize flip order of failure checks to reduce indent…
SharafMohamed Oct 27, 2024
e187445
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 27, 2024
8b85511
Use const& for passing rules into the NFA as rules are never stored, …
SharafMohamed Oct 28, 2024
0756794
Use braced initialization for NFA.
SharafMohamed Oct 28, 2024
6ab439a
Remove warning for not check std::optional when we know the function …
SharafMohamed Oct 28, 2024
9244812
Remove redundant initialzation of member variables in tagged transiti…
SharafMohamed Oct 28, 2024
0d151a4
Use member initialization lists for constructing NFA state from tagge…
SharafMohamed Oct 28, 2024
ac63713
Switch to using optional prefix for optional return types.
SharafMohamed Oct 28, 2024
b57b93f
Make negative tagged transition singular as you can never have more t…
SharafMohamed Oct 28, 2024
c3fb16d
Add missing param for new_state_with_negative_tagged_transitions.
SharafMohamed Oct 28, 2024
8a41367
Move RegexNFAStateType, RegexNFAState, and PositiveTaggedTransition/N…
SharafMohamed Oct 28, 2024
ac7260f
Run linter.
SharafMohamed Oct 29, 2024
40a8206
Merge branch 'main' into singular-negative-transition
SharafMohamed Oct 31, 2024
c2eea21
Change t to curr_state and u to dest_state.
SharafMohamed Oct 31, 2024
629fce9
Change curr_state to current_state; Remove extraneous *; Add newline …
SharafMohamed Oct 31, 2024
aed62b2
Add TODO for utf8 case in BFS.
SharafMohamed Oct 31, 2024
34522a7
Use auto and fix order of const wrt to *.
SharafMohamed Oct 31, 2024
332af35
Initialize m_dest_state to nullptr.
SharafMohamed Oct 31, 2024
748e794
Change negative_tagged_transition to negative_tagged_transition_string.
SharafMohamed Oct 31, 2024
38dc22b
Change negative tag transitions to singular.
SharafMohamed Oct 31, 2024
5a30ed8
Switch transitions to singular where applicable.
SharafMohamed Oct 31, 2024
c8bf9e6
Merge changes with previous PR manually. Still missing changes to pre…
SharafMohamed Oct 31, 2024
90edf77
Auto linter.
SharafMohamed Oct 31, 2024
fd765f7
Merge branch 'singular-negative-transition' into individual-files
SharafMohamed Oct 31, 2024
2d0157e
Reduce indentation of epsilon closure by using continue.
SharafMohamed Oct 31, 2024
1cabafd
Use optional for negative transitions in RegexNFAState.
SharafMohamed Oct 31, 2024
dc2c637
Add missing headers; Remove unused headers.
SharafMohamed Nov 1, 2024
7c5cfc0
Assign optional_negative_tagged_transition to a reference.
SharafMohamed Nov 1, 2024
4e8d290
Assign optional_negative_tagged_transition to a reference again.
SharafMohamed Nov 1, 2024
fde9037
Add <stack> to Lexer.tpp.
SharafMohamed Nov 1, 2024
e63637e
Fix comment grammar.
SharafMohamed Nov 1, 2024
08e7d5e
Update with previous PR.
SharafMohamed Nov 1, 2024
ef95061
Sync with previous PR.
SharafMohamed Nov 2, 2024
9da470d
Merge branch 'main' into individual-files
SharafMohamed Nov 2, 2024
b451651
Move RegexNFAXState typedef into RegexNFAState.hpp
SharafMohamed Nov 6, 2024
f71348b
Switch void to auto -> void.
SharafMohamed Nov 6, 2024
21e80b9
Merge branch 'individual-files' of https://github.com/SharafMohamed/l…
SharafMohamed Nov 6, 2024
4576d7d
Move short functions into the class definition; Move RegexNFAXState t…
SharafMohamed Nov 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions src/log_surgeon/Lexer.hpp
Original file line number Diff line number Diff line change
@@ -33,9 +33,9 @@ class LexicalRule {
* Adds AST representing the lexical rule to the NFA
* @param nfa
*/
auto add_ast(finite_automata::RegexNFA<NFAStateType>* nfa) const -> void;
auto add_to_nfa(finite_automata::RegexNFA<NFAStateType>* nfa) const -> void;

[[nodiscard]] auto get_variable_id() const -> uint32_t const& { return m_variable_id; }
[[nodiscard]] auto get_variable_id() const -> uint32_t { return m_variable_id; }

[[nodiscard]] auto get_regex() const -> finite_automata::RegexAST<NFAStateType>* {
return m_regex.get();
@@ -81,7 +81,7 @@ class Lexer {
* @param variable_id
* @return finite_automata::RegexAST*
*/
auto get_rule(uint32_t const& variable_id) -> finite_automata::RegexAST<NFAStateType>*;
auto get_rule(uint32_t variable_id) -> finite_automata::RegexAST<NFAStateType>*;

/**
* Generate DFA for lexer
16 changes: 8 additions & 8 deletions src/log_surgeon/Lexer.tpp
Original file line number Diff line number Diff line change
@@ -362,9 +362,9 @@ void Lexer<NFAStateType, DFAStateType>::add_rule(
}

template <typename NFAStateType, typename DFAStateType>
auto Lexer<NFAStateType, DFAStateType>::get_rule(uint32_t const& variable_id
auto Lexer<NFAStateType, DFAStateType>::get_rule(uint32_t const variable_id
) -> finite_automata::RegexAST<NFAStateType>* {
for (auto& rule : m_rules) {
for (auto const& rule : m_rules) {
if (rule.get_variable_id() == variable_id) {
return rule.get_regex();
}
@@ -375,8 +375,8 @@ auto Lexer<NFAStateType, DFAStateType>::get_rule(uint32_t const& variable_id
template <typename NFAStateType, typename DFAStateType>
void Lexer<NFAStateType, DFAStateType>::generate() {
finite_automata::RegexNFA<NFAStateType> nfa;
for (auto& rule : m_rules) {
rule.add_ast(&nfa);
for (auto const& rule : m_rules) {
rule.add_to_nfa(&nfa);
}
m_dfa = nfa_to_dfa(nfa);
DFAStateType const* state = m_dfa->get_root();
@@ -392,7 +392,7 @@ void Lexer<NFAStateType, DFAStateType>::generate() {
template <typename NFAStateType, typename DFAStateType>
void Lexer<NFAStateType, DFAStateType>::generate_reverse() {
finite_automata::RegexNFA<NFAStateType> nfa;
for (auto& rule : m_rules) {
for (auto const& rule : m_rules) {
rule.add_ast(&nfa);
}
nfa.reverse();
@@ -408,11 +408,11 @@ void Lexer<NFAStateType, DFAStateType>::generate_reverse() {
}

template <typename NFAStateType>
void LexicalRule<NFAStateType>::add_ast(finite_automata::RegexNFA<NFAStateType>* nfa) const {
NFAStateType* end_state = nfa->new_state();
void LexicalRule<NFAStateType>::add_to_nfa(finite_automata::RegexNFA<NFAStateType>* nfa) const {
auto* end_state = nfa->new_state();
end_state->set_accepting(true);
end_state->set_matching_variable_id(m_variable_id);
m_regex->add(nfa, end_state);
m_regex->add_to_nfa(nfa, end_state);
}

template <typename NFAStateType, typename DFAStateType>
70 changes: 39 additions & 31 deletions src/log_surgeon/finite_automata/RegexAST.hpp
Original file line number Diff line number Diff line change
@@ -50,7 +50,7 @@ class RegexAST {
* @param nfa
* @param end_state
*/
virtual auto add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) -> void = 0;
virtual auto add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) const -> void = 0;

/**
* Serialize the AST into a string
@@ -133,7 +133,7 @@ class RegexASTLiteral : public RegexAST<NFAStateType> {
* @param nfa
* @param end_state
*/
auto add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) -> void override;
auto add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) const -> void override;

/**
* serialize the RegexASTLiteral into a string
@@ -199,7 +199,7 @@ class RegexASTInteger : public RegexAST<NFAStateType> {
* @param nfa
* @param end_state
*/
auto add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) -> void override;
auto add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) const -> void override;

/**
* serialize the RegexASTInteger into a string
@@ -320,7 +320,7 @@ class RegexASTGroup : public RegexAST<NFAStateType> {
* @param nfa
* @param end_state
*/
auto add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) -> void override;
auto add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) const -> void override;

/**
* serialize the RegexASTGroup into a string
@@ -423,7 +423,7 @@ class RegexASTOr : public RegexAST<NFAStateType> {
* @param nfa
* @param end_state
*/
auto add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) -> void override;
auto add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) const -> void override;

/**
* serialize the RegexASTOr into a string
@@ -503,7 +503,7 @@ class RegexASTCat : public RegexAST<NFAStateType> {
* @param nfa
* @param end_state
*/
auto add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) -> void override;
auto add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) const -> void override;

/**
* serialize the RegexASTCat into a string
@@ -584,7 +584,7 @@ class RegexASTMultiplication : public RegexAST<NFAStateType> {
* @param nfa
* @param end_state
*/
auto add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) -> void override;
auto add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) const -> void override;

/**
* serialize the RegexASTMultiplication into a string
@@ -666,7 +666,7 @@ class RegexASTCapture : public RegexAST<NFAStateType> {
* @param nfa
* @param end_state
*/
auto add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) -> void override;
auto add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) const -> void override;

/**
* serialize the RegexASTCapture into a string
@@ -699,7 +699,8 @@ template <typename NFAStateType>
RegexASTLiteral<NFAStateType>::RegexASTLiteral(uint32_t character) : m_character(character) {}

template <typename NFAStateType>
void RegexASTLiteral<NFAStateType>::add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) {
void RegexASTLiteral<NFAStateType>::add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state)
const {
nfa->add_root_interval(Interval(m_character, m_character), end_state);
}

@@ -726,10 +727,10 @@ RegexASTInteger<NFAStateType>::RegexASTInteger(RegexASTInteger* left, uint32_t d
}

template <typename NFAStateType>
void RegexASTInteger<NFAStateType>::add(
void RegexASTInteger<NFAStateType>::add_to_nfa(
[[maybe_unused]] RegexNFA<NFAStateType>* nfa,
[[maybe_unused]] NFAStateType* end_state
) {
) const {
throw std::runtime_error("Unsupported");
}

@@ -754,9 +755,10 @@ RegexASTOr<NFAStateType>::RegexASTOr(
m_right(std::move(right)) {}

template <typename NFAStateType>
void RegexASTOr<NFAStateType>::add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) {
m_left->add(nfa, end_state);
m_right->add(nfa, end_state);
void RegexASTOr<NFAStateType>::add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state)
const {
m_left->add_to_nfa(nfa, end_state);
m_right->add_to_nfa(nfa, end_state);
}

template <typename NFAStateType>
@@ -792,12 +794,13 @@ RegexASTCat<NFAStateType>::RegexASTCat(
m_right(std::move(right)) {}

template <typename NFAStateType>
void RegexASTCat<NFAStateType>::add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) {
void RegexASTCat<NFAStateType>::add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state)
const {
NFAStateType* saved_root = nfa->get_root();
NFAStateType* intermediate_state = nfa->new_state();
m_left->add(nfa, intermediate_state);
m_left->add_to_nfa(nfa, intermediate_state);
nfa->set_root(intermediate_state);
m_right->add(nfa, end_state);
m_right->add_to_nfa(nfa, end_state);
nfa->set_root(saved_root);
}

@@ -833,37 +836,37 @@ RegexASTMultiplication<NFAStateType>::RegexASTMultiplication(
m_max(max) {}

template <typename NFAStateType>
void RegexASTMultiplication<NFAStateType>::add(
void RegexASTMultiplication<NFAStateType>::add_to_nfa(
RegexNFA<NFAStateType>* nfa,
NFAStateType* end_state
) {
) const {
NFAStateType* saved_root = nfa->get_root();
if (this->m_min == 0) {
nfa->get_root()->add_epsilon_transition(end_state);
} else {
for (uint32_t i = 1; i < this->m_min; i++) {
NFAStateType* intermediate_state = nfa->new_state();
m_operand->add(nfa, intermediate_state);
m_operand->add_to_nfa(nfa, intermediate_state);
nfa->set_root(intermediate_state);
}
m_operand->add(nfa, end_state);
m_operand->add_to_nfa(nfa, end_state);
}
if (this->is_infinite()) {
nfa->set_root(end_state);
m_operand->add(nfa, end_state);
m_operand->add_to_nfa(nfa, end_state);
} else if (this->m_max > this->m_min) {
if (this->m_min != 0) {
NFAStateType* intermediate_state = nfa->new_state();
m_operand->add(nfa, intermediate_state);
m_operand->add_to_nfa(nfa, intermediate_state);
nfa->set_root(intermediate_state);
}
for (uint32_t i = this->m_min + 1; i < this->m_max; ++i) {
m_operand->add(nfa, end_state);
m_operand->add_to_nfa(nfa, end_state);
NFAStateType* intermediate_state = nfa->new_state();
m_operand->add(nfa, intermediate_state);
m_operand->add_to_nfa(nfa, intermediate_state);
nfa->set_root(intermediate_state);
}
m_operand->add(nfa, end_state);
m_operand->add_to_nfa(nfa, end_state);
}
nfa->set_root(saved_root);
}
@@ -891,8 +894,9 @@ auto RegexASTMultiplication<NFAStateType>::serialize(bool const with_tags) -> st
}

template <typename NFAStateType>
void RegexASTCapture<NFAStateType>::add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) {
m_group_regex_ast->add(nfa, end_state);
void RegexASTCapture<NFAStateType>::add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state)
const {
m_group_regex_ast->add_to_nfa(nfa, end_state);
}

template <typename NFAStateType>
@@ -1033,9 +1037,13 @@ auto RegexASTGroup<NFAStateType>::complement(std::vector<Range> const& ranges
}

template <typename NFAStateType>
void RegexASTGroup<NFAStateType>::add(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state) {
std::sort(this->m_ranges.begin(), this->m_ranges.end());
std::vector<Range> merged_ranges = RegexASTGroup::merge(this->m_ranges);
void RegexASTGroup<NFAStateType>::add_to_nfa(RegexNFA<NFAStateType>* nfa, NFAStateType* end_state)
const {
// TODO: there should be a better way to do this with a set and keep m_ranges sorted, but we
// have to consider removing overlap + taking the compliment.
std::vector<Range> merged_ranges = m_ranges;
std::sort(merged_ranges.begin(), merged_ranges.end());
merged_ranges = merge(merged_ranges);
if (this->m_negate) {
merged_ranges = complement(merged_ranges);
}
21 changes: 11 additions & 10 deletions src/log_surgeon/finite_automata/RegexDFA.hpp
Original file line number Diff line number Diff line change
@@ -23,7 +23,7 @@ class RegexDFAState {
public:
using Tree = UnicodeIntervalTree<RegexDFAState<stateType>*>;

auto add_matching_variable_id(int const& variable_id) -> void {
auto add_matching_variable_id(uint32_t const variable_id) -> void {
m_matching_variable_ids.push_back(variable_id);
}

@@ -54,6 +54,13 @@ class RegexDFAState {
std::conditional_t<stateType == RegexDFAStateType::UTF8, Tree, std::tuple<>> m_tree_transitions;
};

/**
* This class represents a pair of regex states. The intended use is for the two states in the pair
* to belong to unique DFAs. A pair is considered accepting if both states are accepting in
* their respective DFA. A different pair is considered reachable if both its states are reachable
* in their respective DFAs from this pair's states. The first state in the pair contains the
* variable types the pair matches.
*/
template <typename DFAState>
class RegexDFAStatePair {
public:
@@ -85,17 +92,11 @@ class RegexDFAStatePair {
std::set<RegexDFAStatePair<DFAState>>& unvisited_pairs
) const -> void;

/**
* @return Whether both states are accepting
*/
[[nodiscard]] auto is_accepting() const -> bool {
return m_state1->is_accepting() && m_state2->is_accepting();
}

/**
* @return The matching variable ids of the first state of the pair
*/
[[nodiscard]] auto get_first_matching_variable_ids() const -> std::vector<int> const& {
[[nodiscard]] auto get_matching_variable_ids() const -> std::vector<int> const& {
return m_state1->get_matching_variable_ids();
}

@@ -113,11 +114,11 @@ class RegexDFA {
/**
* Creates a new DFA state based on a set of NFA states and adds it to
* m_states
* @param set
* @param nfa_state_set
* @return DFAStateType*
*/
template <typename NFAStateType>
auto new_state(std::set<NFAStateType*> const& set) -> DFAStateType*;
auto new_state(std::set<NFAStateType*> const& nfa_state_set) -> DFAStateType*;

auto get_root() const -> DFAStateType const* { return m_states.at(0).get(); }

11 changes: 5 additions & 6 deletions src/log_surgeon/finite_automata/RegexDFA.tpp
Original file line number Diff line number Diff line change
@@ -42,10 +42,9 @@ template <typename DFAStateType>
template <typename NFAStateType>
auto RegexDFA<DFAStateType>::new_state(std::set<NFAStateType*> const& nfa_state_set
) -> DFAStateType* {
std::unique_ptr<DFAStateType> ptr = std::make_unique<DFAStateType>();
m_states.push_back(std::move(ptr));
DFAStateType* dfa_state = m_states.back().get();
for (NFAStateType const* nfa_state : nfa_state_set) {
m_states.emplace_back(std::make_unique<DFAStateType>());
auto* dfa_state = m_states.back().get();
for (auto const* nfa_state : nfa_state_set) {
if (nfa_state->is_accepting()) {
dfa_state->add_matching_variable_id(nfa_state->get_matching_variable_id());
}
@@ -64,8 +63,8 @@ auto RegexDFA<DFAStateType>::get_intersect(std::unique_ptr<RegexDFA> const& dfa_
while (false == unvisited_pairs.empty()) {
auto current_pair_it = unvisited_pairs.begin();
if (current_pair_it->is_accepting()) {
auto& matching_variable_ids = current_pair_it->get_first_matching_variable_ids();
schema_types.insert(matching_variable_ids.begin(), matching_variable_ids.end());
auto const& matching_variable_ids = current_pair_it->get_matching_variable_ids();
schema_types.insert(matching_variable_ids.cbegin(), matching_variable_ids.cend());
}
visited_pairs.insert(*current_pair_it);
current_pair_it->get_reachable_pairs(visited_pairs, unvisited_pairs);
10 changes: 5 additions & 5 deletions src/log_surgeon/finite_automata/RegexNFA.hpp
Original file line number Diff line number Diff line change
@@ -31,11 +31,11 @@ class RegexNFAState {

[[nodiscard]] auto is_accepting() const -> bool const& { return m_accepting; }

auto set_matching_variable_id(int const variable_id) -> void {
auto set_matching_variable_id(uint32_t const variable_id) -> void {
m_matching_variable_id = variable_id;
}

[[nodiscard]] auto get_matching_variable_id() const -> int const& {
[[nodiscard]] auto get_matching_variable_id() const -> uint32_t {
return m_matching_variable_id;
}

@@ -82,7 +82,7 @@ class RegexNFAState {

private:
bool m_accepting{false};
int m_matching_variable_id{0};
uint32_t m_matching_variable_id{0};
std::vector<RegexNFAState*> m_epsilon_transitions;
std::array<std::vector<RegexNFAState*>, cSizeOfByte> m_bytes_transitions;
// NOTE: We don't need m_tree_transitions for the `stateType ==
@@ -224,7 +224,7 @@ void RegexNFA<NFAStateType>::reverse() {

// propagate matching_variable_id from old accepting m_states
for (NFAStateType* old_accepting_state : new_end->get_epsilon_transitions()) {
int matching_variable_id = old_accepting_state->get_matching_variable_id();
auto const matching_variable_id = old_accepting_state->get_matching_variable_id();
std::stack<NFAStateType*> unvisited_states;
std::set<NFAStateType*> visited_states;
unvisited_states.push(old_accepting_state);
@@ -252,7 +252,7 @@ void RegexNFA<NFAStateType>::reverse() {
for (int32_t i = m_states.size() - 1; i >= 0; --i) {
std::unique_ptr<NFAStateType>& src_state_unique_ptr = m_states[i];
NFAStateType* src_state = src_state_unique_ptr.get();
int matching_variable_id = src_state->get_matching_variable_id();
auto const matching_variable_id = src_state->get_matching_variable_id();
for (uint32_t byte = 0; byte < cSizeOfByte; byte++) {
std::vector<NFAStateType*> byte_transitions = src_state->get_byte_transitions(byte);
for (int32_t j = byte_transitions.size() - 1; j >= 0; --j) {