Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: separate the concept of captures and tags; lexer now tracks mapping from variables to capture to tags to registers. #72

Open
wants to merge 510 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
510 commits
Select commit Hold shift + click to select a range
08b7548
Add const for consitency with constructor.
SharafMohamed Nov 19, 2024
449133e
Update positive end transition to be optional instead of a vector.
SharafMohamed Nov 19, 2024
7b837bf
Rename new_state function correctly.
SharafMohamed Nov 19, 2024
f0eb56b
Update capture group AST state creation.
SharafMohamed Nov 19, 2024
a945915
Encapsulate new state for capture group.
SharafMohamed Nov 19, 2024
c757ded
Fix compiler error.
SharafMohamed Nov 19, 2024
2eb7477
Use singular for end transition getter function.
SharafMohamed Nov 20, 2024
08060ed
Void to auto -> void.
SharafMohamed Nov 20, 2024
0c2c1d1
Update new_capture_group_start_states to new_capture_group_states to …
SharafMohamed Nov 20, 2024
b0b951a
Linter.
SharafMohamed Nov 20, 2024
3c2a2ab
Update docstring for .
SharafMohamed Nov 20, 2024
98c5b95
Rename to new_start_and_end_states_with_positively_tagged_transitions.
SharafMohamed Nov 20, 2024
f59cf41
Rename to capture_X_state.
SharafMohamed Nov 20, 2024
85a2d69
Update docstring.
SharafMohamed Nov 20, 2024
4c602d4
Updated diagram to match vars used in code.
SharafMohamed Nov 20, 2024
2b01433
Rename vars to serialized_X.
SharafMohamed Nov 20, 2024
e37b29a
Run Linter.
SharafMohamed Nov 20, 2024
c5beca3
Fix typo.
SharafMohamed Nov 20, 2024
fe4a7b3
Update diagram for capture group NFA.
SharafMohamed Nov 20, 2024
8993088
Merge branch 'fixed-tagged-nfa' into register
SharafMohamed Nov 26, 2024
aaf720a
Merge branch 'main' into register
SharafMohamed Nov 26, 2024
0017512
Add register unit-tests, add PrefixTree with unit-tests.
SharafMohamed Nov 26, 2024
336f2ae
Finished with initial register implementation.
SharafMohamed Nov 26, 2024
3449df2
Linter.
SharafMohamed Nov 26, 2024
ef62df1
Linter.
SharafMohamed Nov 26, 2024
a085650
Docstring fixes.
SharafMohamed Nov 27, 2024
2be06c0
Add boundry test case.
SharafMohamed Nov 27, 2024
9ec01dd
Improve test cases for setting positions in prefix tree.
SharafMohamed Nov 27, 2024
019e675
Improve test cases for setting invalid positions in prefix tree.
SharafMohamed Nov 27, 2024
83a411a
Remove confusing description; Remove unused include.
SharafMohamed Nov 27, 2024
c88fbb5
Add edge case test to register unit-tests.
SharafMohamed Nov 27, 2024
7c91ddc
Update docstring for PrefixTreeNode.
SharafMohamed Nov 27, 2024
4c50769
Add comments to test-case; Add new test case for setting root value.
SharafMohamed Nov 27, 2024
98200b4
Update docstring to make it clear that any negative value of m_positi…
SharafMohamed Nov 27, 2024
afaf01a
Fix header gaurd.
SharafMohamed Nov 27, 2024
8dea476
Fix typo.
SharafMohamed Nov 27, 2024
dbb1e16
Remove newline in docstring.
SharafMohamed Nov 27, 2024
e054825
Improve throw consistency.
SharafMohamed Nov 27, 2024
792ce96
Update prefix tree insertion test cases.
SharafMohamed Nov 27, 2024
cab6e81
Fix test case.
SharafMohamed Nov 27, 2024
ffda5e6
Fix @throws doscstring for consistency; Improve insert() docstring.
SharafMohamed Nov 27, 2024
ff11672
Improve register handler test coverage.
SharafMohamed Nov 27, 2024
536b50b
Fix == ordering in test-cases; Fix vector initialization to remove re…
SharafMohamed Nov 27, 2024
77c20f7
Add const for consistency.
SharafMohamed Nov 27, 2024
f43759c
Add _HPP to header guards; Remove unused include.
SharafMohamed Nov 27, 2024
01e8881
Fix typo.
SharafMohamed Nov 27, 2024
fbb3d36
Remove blank line.
SharafMohamed Nov 27, 2024
e1f2b18
Rename to m_prefix_tree; Remove unused include.
SharafMohamed Nov 27, 2024
a51b49d
Add param descriptions to docstrings.
SharafMohamed Nov 27, 2024
002577e
Improve out of range check to be consistent.
SharafMohamed Nov 27, 2024
52a155c
Update set docstring.
SharafMohamed Nov 27, 2024
a6beafc
Punctuate docstrings.
SharafMohamed Nov 27, 2024
ec1f757
Update PregixTreeNode docstring.
SharafMohamed Nov 28, 2024
f35741f
Improve docstring for PrefixTree.
SharafMohamed Nov 28, 2024
e8e5e55
Change to use auto -> void; Punctuate out_of_range throws.
SharafMohamed Nov 28, 2024
f1ece30
Update Register docstring.
SharafMohamed Nov 28, 2024
08997ae
Update PrefixTree docstring.
SharafMohamed Nov 28, 2024
0910c62
Grammar fix.
SharafMohamed Nov 28, 2024
ede680e
Grammar fix.
SharafMohamed Nov 28, 2024
c7b047c
Use auto where possible.
SharafMohamed Nov 28, 2024
6fa8fcb
Use uniform initialization.
SharafMohamed Dec 2, 2024
18b9160
Add missing header.
SharafMohamed Dec 2, 2024
3f08fa3
Linter.
SharafMohamed Dec 2, 2024
e281f04
Fix spacing.
SharafMohamed Dec 2, 2024
a03734e
Make Node a member of PrefixTree.
SharafMohamed Dec 2, 2024
9123c7a
Rename index to prefix_tree_node_id.
SharafMohamed Dec 2, 2024
fe35fe0
Make it clear indicies in add_register are refering to prefix_tree no…
SharafMohamed Dec 3, 2024
de58e08
Linter.
SharafMohamed Dec 3, 2024
1426179
rename to reg_id.
SharafMohamed Dec 3, 2024
3301f14
Rename to reg_id.
SharafMohamed Dec 3, 2024
c9b1369
Use at().
SharafMohamed Dec 3, 2024
e2aee66
Remove Register class and use uint32_t instead; Rename vers to xxx_re…
SharafMohamed Dec 3, 2024
36c1810
Rename to reg_id.
SharafMohamed Dec 3, 2024
48df8b0
Remove unused header.
SharafMohamed Dec 3, 2024
a8605fc
Change pred index to be optional and nullopt for root.
SharafMohamed Dec 3, 2024
15cb1b6
Add and use node_id_t.
SharafMohamed Dec 3, 2024
6b787d0
Add position_t.
SharafMohamed Dec 3, 2024
cd8f4e3
Change to id_t.
SharafMohamed Dec 3, 2024
72da50c
Add is_root().
SharafMohamed Dec 4, 2024
3fc7ea7
Add missing header.
SharafMohamed Dec 4, 2024
6443d66
Update PrefixTree docstring.
SharafMohamed Dec 4, 2024
63aec4d
Removing node docstring as its redundant.
SharafMohamed Dec 4, 2024
295f3ee
Combine private section in PrefixTree.
SharafMohamed Dec 4, 2024
1186666
Add missing header; Remove copy paste error.
SharafMohamed Dec 4, 2024
06ee38e
Rename to node_id and parent_node_id.
SharafMohamed Dec 4, 2024
e103011
Update get_reversed_positions' docstring.
SharafMohamed Dec 4, 2024
31b0346
Update get_reversed positions' docstring to clarify exlcusivity of th…
SharafMohamed Dec 4, 2024
4005e41
Grammar fix.
SharafMohamed Dec 4, 2024
e38940c
Add maybe_unusued.
SharafMohamed Dec 4, 2024
d71368d
Update src/log_surgeon/finite_automata/RegisterHandler.hpp
SharafMohamed Dec 4, 2024
dd4b6e1
Update test case names to document code names better.
SharafMohamed Dec 4, 2024
7322852
Implicitily use auto in vectors.
SharafMohamed Dec 4, 2024
dba1a18
Explicitily use position_t for vectors.
SharafMohamed Dec 4, 2024
ee6efab
Update tests/test-register-handler.cpp
SharafMohamed Dec 4, 2024
9ba980c
Switch to size_t.
SharafMohamed Dec 4, 2024
27b324c
Clang-tidy: Remove magic numbers + Fix headers.
SharafMohamed Dec 4, 2024
f651a24
Reduce complexity for clang-tidy.
SharafMohamed Dec 4, 2024
fc6f426
Add negative pos test case in test-register-handler.cpp.
SharafMohamed Dec 4, 2024
c8fb570
Alternate b/w positive and negative positions in test-prefix-tree neg…
SharafMohamed Dec 4, 2024
1f66918
Add cRootId and size() to PrefixTree.
SharafMohamed Dec 4, 2024
a388c80
Update note.
SharafMohamed Dec 4, 2024
340eaf7
Update docstring.
SharafMohamed Dec 4, 2024
22cf931
Fix typo.
SharafMohamed Dec 4, 2024
e75c888
Merge branch 'register' into individual-dfa-files
SharafMohamed Dec 4, 2024
d5b20fe
Merge branch 'individual-dfa-files' into remove-redundant-typenames
SharafMohamed Dec 4, 2024
69a7ad1
Merge branch 'remove-redundant-typenames' into remove-regex-prefix
SharafMohamed Dec 4, 2024
0b7ef3c
Merge branch 'remove-regex-prefix' into remove-uneeded-this
SharafMohamed Dec 4, 2024
ada697f
Merge branch 'remove-uneeded-this' into use-auto
SharafMohamed Dec 4, 2024
4e08e2c
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Dec 4, 2024
5916d33
Fix errors created by merge, need to still double check all the funct…
SharafMohamed Dec 5, 2024
c61f2d9
Update header for size_t.
SharafMohamed Dec 5, 2024
417bde8
Update src/log_surgeon/finite_automata/PrefixTree.hpp
SharafMohamed Dec 5, 2024
738876d
Update src/log_surgeon/finite_automata/PrefixTree.hpp
SharafMohamed Dec 5, 2024
93c03a0
Update src/log_surgeon/finite_automata/RegisterHandler.hpp
SharafMohamed Dec 5, 2024
6481e5f
Update tests/test-prefix-tree.cpp
SharafMohamed Dec 5, 2024
6a9a4a4
Clean up register initialization helper; Fix typo.
SharafMohamed Dec 5, 2024
052d86f
Update get_parent_id to clarify its unsafe and suppress warning.
SharafMohamed Dec 5, 2024
ed70bd5
Move constants in test-register-handler.hpp to minimize scope.
SharafMohamed Dec 5, 2024
fab801f
Merge branch 'register' into individual-dfa-files
SharafMohamed Dec 5, 2024
db84cb7
Merge branch 'individual-dfa-files' into remove-redundant-typenames
SharafMohamed Dec 5, 2024
7f6fcd9
Merge branch 'remove-redundant-typenames' into remove-regex-prefix
SharafMohamed Dec 5, 2024
a346104
Merge branch 'remove-regex-prefix' into remove-uneeded-this
SharafMohamed Dec 5, 2024
e90ae14
Merge branch 'remove-uneeded-this' into use-auto
SharafMohamed Dec 5, 2024
4d30509
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Dec 5, 2024
1671e39
Move constants into scope for test-prefix-tree.cpp.
SharafMohamed Dec 5, 2024
748dfc5
Rename to handler_init and return handler.
SharafMohamed Dec 5, 2024
8abf35a
Add docstring for get_parent_id_unsafe().
SharafMohamed Dec 5, 2024
1e5fdcc
Linter.
SharafMohamed Dec 5, 2024
effac53
Merge branch 'individual-dfa-files' into remove-redundant-typenames
SharafMohamed Dec 5, 2024
43aa3be
Linter + Merge branch 'remove-redundant-typenames' into remove-regex-…
SharafMohamed Dec 5, 2024
c0920bf
Merge branch 'remove-regex-prefix' into remove-uneeded-this
SharafMohamed Dec 5, 2024
19fe130
Linter.
SharafMohamed Dec 5, 2024
61ceba6
Merge branch 'remove-uneeded-this' into use-auto
SharafMohamed Dec 5, 2024
0e1e2b2
Linter.
SharafMohamed Dec 5, 2024
9302ecb
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Dec 5, 2024
71d926d
Merge branch 'register' into individual-dfa-files
SharafMohamed Dec 6, 2024
66ed13b
Merge branch 'main' into individual-dfa-files
SharafMohamed Dec 6, 2024
a12a360
Fix comment length.
SharafMohamed Dec 6, 2024
244d122
Initialize byte transitions.
SharafMohamed Dec 6, 2024
176391b
Use const* in place of unique_ptr reference; Update docstrings.
SharafMohamed Dec 7, 2024
012f61f
Update intersect test to compile.
SharafMohamed Dec 7, 2024
96a6363
Update next() docstring.
SharafMohamed Dec 7, 2024
7cd39f0
Merge branch 'individual-dfa-files' into remove-redundant-typenames
SharafMohamed Dec 7, 2024
a4a93b4
Rename to state_type.
SharafMohamed Dec 8, 2024
421c3de
Update headers.
SharafMohamed Dec 8, 2024
ecd4e4e
Merge branch 'individual-dfa-files' into remove-redundant-typenames
SharafMohamed Dec 8, 2024
1b945a1
Update Lexer headers.
SharafMohamed Dec 8, 2024
70cd43d
Merge branch 'individual-dfa-files' into remove-redundant-typenames
SharafMohamed Dec 8, 2024
78c4125
Add header for conditional_t.
SharafMohamed Dec 8, 2024
3ce0b30
Merge branch 'individual-dfa-files' into remove-redundant-typenames
SharafMohamed Dec 8, 2024
33623fa
Linter.
SharafMohamed Dec 8, 2024
bae0557
Merge branch 'individual-dfa-files' into remove-redundant-typenames
SharafMohamed Dec 8, 2024
5bbeafc
Linter.
SharafMohamed Dec 8, 2024
0decaf5
Change ! to false ==.
SharafMohamed Dec 8, 2024
0e2d593
Merge branch 'individual-dfa-files' into remove-redundant-typenames
SharafMohamed Dec 8, 2024
cf5980d
Merge branch 'remove-redundant-typenames' into remove-regex-prefix
SharafMohamed Dec 8, 2024
6e65a3e
LALR1Parser to Lalr1Parser.
SharafMohamed Dec 8, 2024
9c2ad81
Linter.
SharafMohamed Dec 8, 2024
c4fc96b
Rename templates to TypedDfaState and TypedNfaState.
SharafMohamed Dec 8, 2024
a6bbaef
Rename to Utf8*State and Byte*State.
SharafMohamed Dec 8, 2024
ff2dac3
Remove RegexDFAStateType.hpp.
SharafMohamed Dec 8, 2024
7a8982d
Linter.
SharafMohamed Dec 8, 2024
7b0a86c
Linter again.
SharafMohamed Dec 8, 2024
2632616
Add missing alogrithm header; Update test-NFA.
SharafMohamed Dec 8, 2024
992a2ec
Update intersect-test.cpp with new names.
SharafMohamed Dec 8, 2024
a5419f0
Linter on intersect-test.cpp.
SharafMohamed Dec 8, 2024
ae64f64
Merge branch 'remove-regex-prefix' into remove-uneeded-this
SharafMohamed Dec 9, 2024
9153a7c
Switch this->m_lexer to m_lexer by using Parser::m_lexer.
SharafMohamed Dec 9, 2024
fa0c098
Linter.
SharafMohamed Dec 9, 2024
845b088
Merge branch 'remove-uneeded-this' into use-auto
SharafMohamed Dec 9, 2024
2dd6f45
Lint.
SharafMohamed Dec 9, 2024
ee46719
More auto changes.
SharafMohamed Dec 9, 2024
4cebc06
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Dec 11, 2024
827d39c
Merge branch 'main' into remove-redundant-typenames
SharafMohamed Dec 11, 2024
80979de
Remove old epsilon_closure after double checking it matches the moved…
SharafMohamed Dec 11, 2024
b38b6d7
Merge dfa_to_nfa() into Dfa::Dfa().
SharafMohamed Dec 11, 2024
1c2d7b3
Merge branch 'remove-redundant-typenames' into remove-regex-prefix
SharafMohamed Dec 11, 2024
11da5d5
Merge branch 'remove-regex-prefix' into remove-uneeded-this
SharafMohamed Dec 11, 2024
3f6966f
Merge branch 'remove-uneeded-this' into use-auto
SharafMohamed Dec 11, 2024
11eab2b
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Dec 11, 2024
538ae8b
Fix intersect_test compiler errrors.
SharafMohamed Dec 11, 2024
74841b0
Add const.
SharafMohamed Dec 11, 2024
f84cd0b
Add register handler to DFA.
SharafMohamed Dec 12, 2024
efb7932
Add missing headers; Initialize variable.
SharafMohamed Dec 12, 2024
be3d483
Update src/log_surgeon/finite_automata/RegexDFAStatePair.hpp
SharafMohamed Dec 19, 2024
b8afafa
Update src/log_surgeon/finite_automata/RegexNFAState.hpp
SharafMohamed Dec 19, 2024
68ed2b1
Update src/log_surgeon/finite_automata/RegexDFAStatePair.hpp
SharafMohamed Dec 19, 2024
8bebee4
Add maps to lexer; Add lexer test case.
SharafMohamed Jan 6, 2025
beb59aa
NonTerminal::m_all_children to m_all_children.
SharafMohamed Jan 6, 2025
b4d25bb
Merge branch 'remove-redundant-typenames' of https://github.com/Shara…
SharafMohamed Jan 6, 2025
4e07f8d
Merge branch 'remove-redundant-typenames' into remove-regex-prefix
SharafMohamed Jan 6, 2025
8f7a355
Merge branch 'remove-regex-prefix' into remove-uneeded-this
SharafMohamed Jan 6, 2025
fe3b704
Merge branch 'remove-uneeded-this' into use-auto
SharafMohamed Jan 6, 2025
1380e53
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Jan 6, 2025
6d5b920
Merge branch 'main' into remove-regex-prefix
SharafMohamed Jan 7, 2025
9a82620
Merge branch 'remove-regex-prefix' into remove-uneeded-this
SharafMohamed Jan 7, 2025
8abd2cf
Merge branch 'remove-uneeded-this' into use-auto
SharafMohamed Jan 7, 2025
a509323
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Jan 7, 2025
6bf8c01
Fix test-NFA to test-nfa and ByteNFA to ByteNfa.
SharafMohamed Jan 7, 2025
5dc5953
Combine NfaStateType and DfaStateType into StateType.
SharafMohamed Jan 7, 2025
ed631b3
Merge branch 'remove-regex-prefix' into remove-uneeded-this
SharafMohamed Jan 7, 2025
8410fb1
Merge branch 'remove-uneeded-this' into use-auto
SharafMohamed Jan 7, 2025
ff92325
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Jan 7, 2025
b26d22b
Remove out-dated comment.
SharafMohamed Jan 8, 2025
9af72f3
Remove old commented utf8 code.
SharafMohamed Jan 8, 2025
6f3ce6c
Merge branch 'remove-regex-prefix' into remove-uneeded-this
SharafMohamed Jan 8, 2025
1f1d3ea
Merge branch 'remove-uneeded-this' into use-auto
SharafMohamed Jan 8, 2025
d21e373
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Jan 8, 2025
66cd3b3
Merge branch 'main' into remove-uneeded-this
SharafMohamed Jan 8, 2025
ea78ed2
Merge branch 'remove-uneeded-this' into use-auto
SharafMohamed Jan 8, 2025
25237a2
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Jan 8, 2025
3a94829
Merge branch 'refactor-nfa-to-dfa' into restructure-tagging
SharafMohamed Jan 8, 2025
9902984
Fix file ordering in tests cmake.
SharafMohamed Jan 8, 2025
7ccdfd5
Merge branch 'main' into use-auto
SharafMohamed Jan 8, 2025
013997c
Explicitly use bool for initializations from true/false instead of auto.
SharafMohamed Jan 9, 2025
88e2cc2
Update src/log_surgeon/Lalr1Parser.tpp
SharafMohamed Jan 9, 2025
4ebbc82
Make bool typed explicitly.
SharafMohamed Jan 9, 2025
e6893d9
Update src/log_surgeon/Lalr1Parser.tpp
SharafMohamed Jan 9, 2025
425a8e6
Update src/log_surgeon/Lalr1Parser.tpp
SharafMohamed Jan 9, 2025
fd394af
Update src/log_surgeon/Lalr1Parser.tpp
SharafMohamed Jan 9, 2025
7fe39db
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Jan 9, 2025
6cb5c23
Update src/log_surgeon/Lalr1Parser.tpp
SharafMohamed Jan 10, 2025
239872a
Merge branch 'use-auto' into refactor-nfa-to-dfa
SharafMohamed Jan 10, 2025
8bd9bae
Merge branch 'refactor-nfa-to-dfa' into restructure-tagging
SharafMohamed Jan 10, 2025
473f8e5
Merge branch 'main' into refactor-nfa-to-dfa
SharafMohamed Jan 10, 2025
626ad54
Merge branch 'refactor-nfa-to-dfa' into restructure-tagging
SharafMohamed Jan 10, 2025
0fb0758
Update test code.
SharafMohamed Jan 11, 2025
f7e1718
Switch typedef to using; Remove commented out code.
SharafMohamed Jan 11, 2025
18c3e25
Merge branch 'refactor-nfa-to-dfa' into restructure-tagging
SharafMohamed Jan 11, 2025
af2c8d4
Merge branch 'main' into restructure-tagging
SharafMohamed Jan 11, 2025
3504ffb
Fixed lexer test...the current way of using the lexer is insanely bad.
SharafMohamed Jan 11, 2025
8b43e83
Add capture group test case.
SharafMohamed Jan 12, 2025
719e722
Add getters and setters for lexer maps; Update unit-tests.
SharafMohamed Jan 12, 2025
082145e
Use capture instead of tag.
SharafMohamed Jan 13, 2025
e06cf44
Fix ownership of m_rules.
SharafMohamed Jan 13, 2025
2e99a91
Add tags.
SharafMohamed Jan 13, 2025
94db0d6
Lexer now enforced unique capture names.
SharafMohamed Jan 13, 2025
9731fa3
Update test-nfa to match added tags.
SharafMohamed Jan 13, 2025
bc78c49
Update the last lexer map that can be initialized currently.
SharafMohamed Jan 13, 2025
2c89216
Uniform initialization.
SharafMohamed Jan 13, 2025
f06031d
Indent comment.
SharafMohamed Jan 13, 2025
90ce695
Indent comment.
SharafMohamed Jan 13, 2025
40a90bd
Linter.
SharafMohamed Jan 13, 2025
d9c20fe
Linter.
SharafMohamed Jan 13, 2025
391607d
Linter.
SharafMohamed Jan 13, 2025
e6273b2
Linter.
SharafMohamed Jan 13, 2025
f88a780
Use moe semantics properly for negative transition.
SharafMohamed Jan 13, 2025
0cc6c24
Clean up testing code.
SharafMohamed Jan 13, 2025
54d9181
Move UniqueIdGenerator to its own file.
SharafMohamed Jan 29, 2025
d38393d
Fix accidental deletion.
SharafMohamed Jan 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -93,17 +93,17 @@ set(SOURCE_FILES
src/log_surgeon/SchemaParser.hpp
src/log_surgeon/Token.cpp
src/log_surgeon/Token.hpp
src/log_surgeon/finite_automata/PrefixTree.cpp
src/log_surgeon/finite_automata/PrefixTree.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
src/log_surgeon/finite_automata/Capture.hpp
src/log_surgeon/finite_automata/Dfa.hpp
src/log_surgeon/finite_automata/DfaState.hpp
src/log_surgeon/finite_automata/DfaStatePair.hpp
src/log_surgeon/finite_automata/Nfa.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/PrefixTree.cpp
src/log_surgeon/finite_automata/PrefixTree.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
src/log_surgeon/finite_automata/RegisterHandler.hpp
src/log_surgeon/finite_automata/StateType.hpp
src/log_surgeon/finite_automata/Tag.hpp
src/log_surgeon/finite_automata/TaggedTransition.hpp
src/log_surgeon/finite_automata/UnicodeIntervalTree.hpp
src/log_surgeon/finite_automata/UnicodeIntervalTree.tpp
Expand Down
55 changes: 52 additions & 3 deletions src/log_surgeon/Lexer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include <log_surgeon/finite_automata/DfaState.hpp>
#include <log_surgeon/finite_automata/Nfa.hpp>
#include <log_surgeon/finite_automata/RegexAST.hpp>
#include <log_surgeon/finite_automata/RegisterHandler.hpp>
#include <log_surgeon/LexicalRule.hpp>
#include <log_surgeon/ParserInputBuffer.hpp>
#include <log_surgeon/Token.hpp>
Expand All @@ -23,6 +24,10 @@ namespace log_surgeon {
template <typename TypedNfaState, typename TypedDfaState>
class Lexer {
public:
using register_id_t = finite_automata::RegisterHandler::register_id_t;
using symbol_id_t = uint32_t;
using tag_id_t = finite_automata::tag_id_t;
Comment on lines +27 to +29
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these type definitions to one header?
The problem with aliasing here is: if we decide to change the name of one symbol, for example, register_id_t -> reg_id_t, it takes more effort to apply to all places since it's been aliased

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I swapped to in future PRs is introducing the aliases in the log_surgeon::finite_automata namespace. This way if a file needs to use register's, it'll also have access to register_id_t. I feel like this is similar to moving them into a single header, while keeping the definition inside the file that its most relevent to.

Does this make sense, or am I missing what you're trying to say here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is do we really need aliasing for register ID type? Like can we just have register_id_t defined in one place and everyone else will use it directly from one include. I'm fine with leaving the other two as they are in this PR.


static inline std::vector<uint32_t> const cTokenEndTypes = {(uint32_t)SymbolId::TokenEnd};
static inline std::vector<uint32_t> const cTokenUncaughtStringTypes
= {(uint32_t)SymbolId::TokenUncaughtString};
Expand Down Expand Up @@ -51,7 +56,8 @@ class Lexer {
auto get_rule(uint32_t variable_id) -> finite_automata::RegexAST<TypedNfaState>*;

/**
* Generate DFA for lexer
* Generate DFA for lexer.
* @throw std::invalid_argument if `m_rules` contains multipe captures with the same name.
*/
auto generate() -> void;

Expand Down Expand Up @@ -122,8 +128,48 @@ class Lexer {
return m_dfa;
}

std::unordered_map<std::string, uint32_t> m_symbol_id;
std::unordered_map<uint32_t, std::string> m_id_symbol;
[[nodiscard]] auto get_capture_ids_for_var_id(symbol_id_t const var_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming to get_capture_ids_from_var_id

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add doc strings for these newly added functions for:

  • Document what is the input parameter
  • Document what is the output (like when to return std::nullopt)

) const -> std::optional<std::vector<symbol_id_t>> {
auto const capture_ids{m_var_id_to_capture_ids.find(var_id)};
if (m_var_id_to_capture_ids.end() == capture_ids) {
return std::nullopt;
}
return capture_ids->second;
Comment on lines +133 to +137
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto const capture_ids{m_var_id_to_capture_ids.find(var_id)};
if (m_var_id_to_capture_ids.end() == capture_ids) {
return std::nullopt;
}
return capture_ids->second;
if (m_var_id_to_capture_ids.contains(var_id)) {
return m_var_id_to_capture_ids.at(var_id);
}
return std::nullopt;

How about rewriting in this way to avoid using iterators (shouldn't have any performance difference).
Same for the rest

}

[[nodiscard]] auto get_tag_ids_for_capture_id(symbol_id_t const capture_id
) const -> std::optional<std::pair<tag_id_t, tag_id_t>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headers for std::optional and std::pair are not directly included

auto const tag_ids{m_capture_id_to_tag_ids.find(capture_id)};
if (m_capture_id_to_tag_ids.end() == tag_ids) {
return std::nullopt;
}
return tag_ids->second;
}

[[nodiscard]] auto get_register_for_tag_id(tag_id_t const tag_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about get_reg_id_xxx instead of get_register since the return value is an ID.

) const -> std::optional<register_id_t> {
auto const it{m_tag_to_register_id.find(tag_id)};
if (m_tag_to_register_id.end() == it) {
return std::nullopt;
}
return it->second;
}

[[nodiscard]] auto get_registers_for_capture(symbol_id_t capture_id
) const -> std::optional<std::pair<register_id_t, register_id_t>> {
auto const tag_ids{get_tag_ids_for_capture_id(capture_id)};
if (tag_ids.has_value()) {
auto const start_reg{get_register_for_tag_id(tag_ids.value().first())};
auto const end_reg{get_register_for_tag_id(tag_ids.value().second())};
if (start_reg.has_value() && end_reg.has_value()) {
return std::make_pair(start_reg.value(), end_reg.value());
}
}
return std::nullopt;
Comment on lines +160 to +168
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto const tag_ids{get_tag_ids_for_capture_id(capture_id)};
if (tag_ids.has_value()) {
auto const start_reg{get_register_for_tag_id(tag_ids.value().first())};
auto const end_reg{get_register_for_tag_id(tag_ids.value().second())};
if (start_reg.has_value() && end_reg.has_value()) {
return std::make_pair(start_reg.value(), end_reg.value());
}
}
return std::nullopt;
auto const optional_tag_ids{get_tag_ids_for_capture_id(capture_id)};
if (false == optional_tag_ids.has_value()) {
return std::nullopt;
}
auto const [start_tag_id, end_tag_id]{optional_tag_ids.value()};
auto const optional_start_reg_id{get_register_for_tag_id(start_tag_id)};
if (false == optional_tag_ids.has_value()) {
return std::nullopt;
}
auto const optional_end_reg_id{get_register_for_tag_id(end_tag_id)};
if (false == optional_tag_ids.has_value()) {
return std::nullopt;
}
return {optional_start_reg_id.value(), optional_end_reg_id.value()};

How about rewriting in this way to:

  • Follow the convention of using optional_ prefix for optional variables.
  • Reduce the level of nested if statements.

}

std::unordered_map<std::string, symbol_id_t> m_symbol_id;
std::unordered_map<symbol_id_t, std::string> m_id_symbol;
Comment on lines +171 to +172
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of this PR's scope:

  1. We shouldn't have public members like these two.
  2. We may need to improve the naming of these two variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, these are relic's of the old code, I can create an issue for it unless its covered in the clang-tidy issue.


private:
/**
Expand All @@ -148,6 +194,9 @@ class Lexer {
std::unique_ptr<finite_automata::Dfa<TypedDfaState>> m_dfa;
bool m_asked_for_more_data{false};
TypedDfaState const* m_prev_state{nullptr};
std::unordered_map<symbol_id_t, std::vector<symbol_id_t>> m_var_id_to_capture_ids;
std::unordered_map<symbol_id_t, std::pair<tag_id_t, tag_id_t>> m_capture_id_to_tag_ids;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about m_capture_id_to_tag_pair?

std::unordered_map<tag_id_t, register_id_t> m_tag_to_register_id;
};

namespace lexers {
Expand Down
35 changes: 29 additions & 6 deletions src/log_surgeon/Lexer.tpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include <cassert>
#include <memory>
#include <stack>
#include <stdexcept>
#include <string>
#include <vector>

Expand Down Expand Up @@ -358,17 +359,17 @@ void Lexer<TypedNfaState, TypedDfaState>::add_delimiters(std::vector<uint32_t> c

template <typename TypedNfaState, typename TypedDfaState>
void Lexer<TypedNfaState, TypedDfaState>::add_rule(
uint32_t const& id,
symbol_id_t const& var_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The header file is not updated accordingly
  2. This might be outside of this PR's scope: this var_id name is very confusing: should we call it rule_id? We can discuss the naming offline (as well as the re-design of how rules are stored), but the immediate TODO is we should probably have one type alias named rule_id_t, otherwise it's more confusing to have var_id to be symbol_id_t that actually represents a rule ID, lol. So I suggest symbol_id_t -> capture_id_t and rule_id_t to differentiate them more clearly. (Or maybe I missed some big picture of why we need a unified symbol_id_t)

std::unique_ptr<finite_automata::RegexAST<TypedNfaState>> rule
) {
m_rules.emplace_back(id, std::move(rule));
m_rules.emplace_back(var_id, std::move(rule));
}

template <typename TypedNfaState, typename TypedDfaState>
auto Lexer<TypedNfaState, TypedDfaState>::get_rule(uint32_t const variable_id
auto Lexer<TypedNfaState, TypedDfaState>::get_rule(symbol_id_t const var_id
) -> finite_automata::RegexAST<TypedNfaState>* {
for (auto const& rule : m_rules) {
if (rule.get_variable_id() == variable_id) {
if (rule.get_variable_id() == var_id) {
return rule.get_regex();
}
}
Expand All @@ -377,8 +378,30 @@ auto Lexer<TypedNfaState, TypedDfaState>::get_rule(uint32_t const variable_id

template <typename TypedNfaState, typename TypedDfaState>
void Lexer<TypedNfaState, TypedDfaState>::generate() {
finite_automata::Nfa<TypedNfaState> nfa{std::move(m_rules)};
// TODO: DFA ignores tags. E.g., treats "capture:user=(?<user_id>\d+)" as "capture:user=\d+"
for (auto const& rule : m_rules) {
for (auto* capture : rule.get_captures()) {
std::string const capture_name{capture->get_name()};
symbol_id_t capture_id{0};
if (m_symbol_id.find(capture_name) == m_symbol_id.end()) {
capture_id = m_symbol_id.size();
m_symbol_id[capture_name] = capture_id;
m_id_symbol[capture_id] = capture_name;
Comment on lines +387 to +388
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is disallowed to use [] for map-like data structure (especially unordered maps) since the behavior is complicated and error-prone. I will ensure this is added in our coding guideline.
Update: Added a section here with detailed reasons:
https://www.notion.so/yscope/WIP-Coding-Guidelines-9a308b847a5343958ba3cb97a850be66?pvs=4#b071a701e5c94bbdb6c4461fa10bbd26

Comment on lines +387 to +388
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very confusing: I don't really understand how m_symbol_id and m_id_symbol are used in the current code. But seems like here we just need a capture_id <-> capture_name mapping. As said, we should create a separate data structure to store this mapping unless we have a strong reason to combine them into the existing m_symbol_id and m_id_symbol. Doing so would give us two advantages:

  • Seems like if we have another non-capture symbol that conflicts any capture name, we will also encounter a key conflict issue (which is not expected, as I found in the unit test). Using a stand-alone mapping solves this issue.
  • m_rules and the capture id <-> name mappings share the same life cycle. The mapping could use std::string_view as the key type to be more lightweight.

} else {
throw std::invalid_argument("`m_rules` contains capture names that are not unique."
);
}
m_var_id_to_capture_ids[rule.get_variable_id()].push_back(capture_id);
Comment on lines +384 to +393
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
symbol_id_t capture_id{0};
if (m_symbol_id.find(capture_name) == m_symbol_id.end()) {
capture_id = m_symbol_id.size();
m_symbol_id[capture_name] = capture_id;
m_id_symbol[capture_id] = capture_name;
} else {
throw std::invalid_argument("`m_rules` contains capture names that are not unique."
);
}
m_var_id_to_capture_ids[rule.get_variable_id()].push_back(capture_id);
if (m_symbol_id.contains(capture_name)) {
throw std::invalid_argument("`m_rules` contains capture names that are not unique."
);
}
auto const capture_id{m_symbol_id.size()};
m_symbol_id.emplace(capture_name, capture_id);
m_id_symbol.emplace(capture_id, capture_name);
auto const var_id{rule.get_variable_id()};
if (false == m_var_id_to_capture_ids.contains(var_id)) {
m_var_id_to_capture_ids.emplace(var_id, {});
}
m_var_id_to_capture_ids.at(var_id).push_back(capture_id);

I think we can rewrite in this way. The key change here (as well as the other suggestions) is to short-circuit the error case in a stand-alone if check, and leave the regular logic outside of any inner-if-else statements.

}
}

finite_automata::Nfa<TypedNfaState> nfa{m_rules};
for (auto const& [capture, tag_ids] : nfa.get_capture_to_tag_ids()) {
std::string capture_name{capture->get_name()};
auto capture_id{m_symbol_id[capture_name]};
Comment on lines +399 to +400
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string capture_name{capture->get_name()};
auto capture_id{m_symbol_id[capture_name]};
std::string const capture_name{capture->get_name()};
auto const capture_id{m_symbol_id.at(capture_name)};

Note: if we change the mapping (currently m_symbol_id) to an unordered map of string_view, we could use

Suggested change
std::string capture_name{capture->get_name()};
auto capture_id{m_symbol_id[capture_name]};
auto const capture_id{m_symbol_id.at(capture->get_name())};

m_capture_id_to_tag_ids.emplace(capture_id, tag_ids);
}

// TODO: DFA ignores captures. E.g., treats "capture:user=(?<user_id>\d+)" as "capture:user=\d+"
m_dfa = std::make_unique<finite_automata::Dfa<TypedDfaState>>(std::move(nfa));
auto const* state = m_dfa->get_root();
for (uint32_t i = 0; i < cSizeOfByte; i++) {
Expand Down
4 changes: 4 additions & 0 deletions src/log_surgeon/LexicalRule.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ class LexicalRule {
*/
auto add_to_nfa(finite_automata::Nfa<TypedNfaState>* nfa) const -> void;

[[nodiscard]] auto get_captures() const -> std::vector<finite_automata::Capture const*> {
return m_regex->get_subtree_positive_captures();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_subtree_positive_captures returns a const reference, can this function just forward the const reference?

}
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved

[[nodiscard]] auto get_variable_id() const -> uint32_t { return m_variable_id; }

[[nodiscard]] auto get_regex() const -> finite_automata::RegexAST<TypedNfaState>* {
Expand Down
8 changes: 4 additions & 4 deletions src/log_surgeon/SchemaParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@

#include <log_surgeon/Constants.hpp>
#include <log_surgeon/FileReader.hpp>
#include <log_surgeon/finite_automata/Capture.hpp>
#include <log_surgeon/finite_automata/RegexAST.hpp>
#include <log_surgeon/finite_automata/Tag.hpp>
#include <log_surgeon/Lalr1Parser.hpp>
#include <log_surgeon/Lexer.hpp>
#include <log_surgeon/utils.hpp>
Expand Down Expand Up @@ -167,7 +167,7 @@ static auto regex_capture_rule(NonTerminal const* m) -> std::unique_ptr<ParserAS
auto& r6 = m->non_terminal_cast(5)->get_parser_ast()->get<unique_ptr<RegexASTByte>>();
return std::make_unique<ParserValueRegex>(make_unique<RegexASTCaptureByte>(
std::move(r6),
std::make_unique<finite_automata::Tag>(r4->m_name)
std::make_unique<finite_automata::Capture>(r4->m_name)
));
}

Expand Down Expand Up @@ -202,7 +202,7 @@ static auto regex_or_rule(NonTerminal* m) -> unique_ptr<ParserAST> {
static auto regex_match_zero_or_more_rule(NonTerminal* m) -> unique_ptr<ParserAST> {
auto& r1 = m->non_terminal_cast(0)->get_parser_ast()->get<unique_ptr<RegexASTByte>>();

// To handle negative tags we treat `R*` as `R+ | ∅`.
// To handle negative captures we treat `R*` as `R+ | ∅`.
return make_unique<ParserValueRegex>(make_unique<RegexASTOrByte>(
make_unique<RegexASTEmptyByte>(),
make_unique<RegexASTMultiplicationByte>(std::move(r1), 1, 0)
Expand Down Expand Up @@ -248,7 +248,7 @@ static auto regex_match_range_rule(NonTerminal* m) -> unique_ptr<ParserAST> {
auto& r1 = m->non_terminal_cast(0)->get_parser_ast()->get<unique_ptr<RegexASTByte>>();

if (0 == min) {
// To handle negative tags we treat `R*` as `R+ | ∅`.
// To handle negative captures we treat `R*` as `R+ | ∅`.
return make_unique<ParserValueRegex>(make_unique<RegexASTOrByte>(
make_unique<RegexASTEmptyByte>(),
make_unique<RegexASTMultiplicationByte>(std::move(r1), 1, max)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
#ifndef LOG_SURGEON_FINITE_AUTOMATA_TAG
#define LOG_SURGEON_FINITE_AUTOMATA_TAG
#ifndef LOG_SURGEON_FINITE_AUTOMATA_CAPTURE
#define LOG_SURGEON_FINITE_AUTOMATA_CAPTURE

#include <string>
#include <string_view>
#include <utility>

namespace log_surgeon::finite_automata {
class Tag {
class Capture {
public:
explicit Tag(std::string name) : m_name{std::move(name)} {}
explicit Capture(std::string name) : m_name{std::move(name)} {}

[[nodiscard]] auto get_name() const -> std::string_view { return m_name; }

Expand All @@ -17,4 +17,4 @@ class Tag {
};
} // namespace log_surgeon::finite_automata

#endif // LOG_SURGEON_FINITE_AUTOMATA_TAG
#endif // LOG_SURGEON_FINITE_AUTOMATA_CAPTURE
7 changes: 6 additions & 1 deletion src/log_surgeon/finite_automata/Dfa.hpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems to be irrelevant to this PR. Can we defer it to the future PRs?

Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@
#define LOG_SURGEON_FINITE_AUTOMATA_DFA_HPP

#include <cstdint>
#include <map>
#include <memory>
#include <set>
#include <stack>
#include <vector>

#include <log_surgeon/Constants.hpp>
#include <log_surgeon/finite_automata/DfaStatePair.hpp>
#include <log_surgeon/finite_automata/Nfa.hpp>
#include <log_surgeon/finite_automata/RegisterHandler.hpp>

namespace log_surgeon::finite_automata {
template <typename TypedDfaState>
Expand Down Expand Up @@ -38,6 +42,7 @@ class Dfa {

private:
std::vector<std::unique_ptr<TypedDfaState>> m_states;
RegisterHandler m_register_handler;
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
};

template <typename TypedDfaState>
Expand Down Expand Up @@ -74,7 +79,7 @@ Dfa<TypedDfaState>::Dfa(Nfa<TypedNfaState> nfa) {
}
auto next_dfa_state
= [&dfa_states, &create_dfa_state](StateSet const& set) -> TypedDfaState* {
TypedDfaState* state;
TypedDfaState* state{nullptr};
auto it = dfa_states.find(set);
if (it == dfa_states.end()) {
state = create_dfa_state(set);
Expand Down
Loading
Loading