-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for tagged NFA; Use uint32_t
to replace int
for IDs.
#42
Conversation
… tag to matching_variable_id; Use full names for vars (r->rule); Clarify if states are NFA or DFA
…for clairty that nothing is shared b/w tests
…egexASTgroup with min = 1 OR'd with RegexASTEmpty
…iteral arguments; Use const& for non-literals; Use auto where possible; Use uint32_t over int for ids; replace begin() and end() with cbegin() and cend()
…(); Add docstrign to RegexDFAStatePair
…NFA; Made add to nfa functions const
…nor are parts of the rules stored, instead the rules are only read and used to build the NFA.
Co-authored-by: Lin Zhihao <[email protected]>
…call succeeds in NFA's serialize. Co-authored-by: Lin Zhihao <[email protected]>
…on classes when they are initialized in their constructor.
…d transitions instead of emplace back.
…oint of failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For PR title, how about:
Add support for tagged NFA; Use `uint32_t` to replace `int` for IDs.
Sorry mb, let's fix the latest comments from coderabbit |
…dcoding to use bytes; Take state type instaed of transition type as template for transition classes.
int
with uint32_t
.uint32_t
to replace int
for IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
src/log_surgeon/finite_automata/RegexNFA.hpp (2)
47-54
: Consider adding@throws
documentation for serialize methods.The
serialize
methods' documentation should indicate whether they can throw exceptions (e.g., fromfmt::format
). This helps users handle potential exceptions appropriately.Also applies to: 71-78
Line range hint
177-361
: Document thread safety guarantees.The class handles complex state management but lacks documentation about thread safety. Consider adding thread safety documentation to help users properly synchronize access in multi-threaded contexts.
std::vector<PositiveTaggedTransition<RegexNFAState>> m_positive_tagged_transitions; | ||
std::vector<NegativeTaggedTransition<RegexNFAState>> m_negative_tagged_transitions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider reserving capacity for transition vectors.
The vectors m_positive_tagged_transitions
and m_negative_tagged_transitions
might benefit from capacity reservation if you have an estimate of the typical number of transitions.
return fmt::format( | ||
"{}:{}byte_transitions={{{}}},epsilon_transitions={{{}}},positive_tagged_transitions={{" | ||
"{}}},negative_tagged_transitions={{{}}}", | ||
state_ids.at(this), | ||
accepting_tag_string, | ||
fmt::join(byte_transitions, ","), | ||
fmt::join(epsilon_transitions, ","), | ||
fmt::join(positive_tagged_transitions, ","), | ||
fmt::join(negative_tagged_transitions, ",") | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using fmt::join with a custom formatter.
Instead of building multiple intermediate strings with fmt::join
and then combining them, consider creating a custom formatter to handle the entire state serialization in one pass. This could improve performance by reducing memory allocations.
Example approach:
struct StateFormatter {
template<typename ParseContext>
constexpr auto parse(ParseContext& ctx) { return ctx.begin(); }
template<typename FormatContext>
auto format(const State& state, FormatContext& ctx) {
// Format all transitions in one pass
return fmt::format_to(ctx.out(), "{}:{}", ...);
}
};
References
Description
The way tagged-NFAs work are as follows:
Changes to implement tagged-NFA:
add_ast()
updated to add tags to each rule's regex and to build the NFA with tags.add_with_negative_tags()
implemented to add a rule to the NFA while considering negative tags. This requires two passes of the AST, first positive tags are added, then negative tags are added as they depend on knowing the positive tags of alternate paths in the the AST.add()
functions calladd_with_negative_tags()
such that the NFA recursively adds negative tags when traversing the AST.add_negative_tagged_transition()
adds negative tags. Called at whichever AST node has negative tags.add_positive_tagged_transitions()
adds a positive tag. Called for every capture group AST node.Changes as a result of tagged-NFA:
generate_reverse()
commented out as it is currently unused (at least internally and in CLP) until it is fixed to work with tags.Validation
Summary by CodeRabbit
New Features
LexicalRule
class to manage lexical rules in finite automata.Bug Fixes