Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for tracking matched and unmatched capture groups in RegexAST nodes using integer-based tags; Add support for serializing RegexAST nodes. #38

Merged
merged 77 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from 65 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
a6274ec
Bug-fix for unicode array sizes
SharafMohamed Sep 12, 2024
186d239
Merge remote-tracking branch 'upstream/main' into nfa-cleanup-pr
SharafMohamed Sep 12, 2024
4f122c6
Move LexicalRule to its own class; Change name to variable_id; Change…
SharafMohamed Sep 12, 2024
c24f6e1
Additional fix for swapping meaning of tag
SharafMohamed Sep 12, 2024
33582da
Another additional fix for swapping meaning of tag
SharafMohamed Sep 12, 2024
3338ec7
Fix up some comments
SharafMohamed Sep 12, 2024
3cd3c0f
Fix comment grammar
SharafMohamed Sep 12, 2024
e05acbb
Add tags to AST; Serialize AST for testing; Add unit-test for testing…
SharafMohamed Sep 13, 2024
54027ad
Return by value in literal getters; Use const instead of const& for l…
SharafMohamed Sep 16, 2024
e58274f
Refactor new_state()
SharafMohamed Sep 16, 2024
1321871
Rename get_first_matching_variable_ids() to get_matching_variable_ids…
SharafMohamed Sep 16, 2024
c904755
Remove redundant docstrings
SharafMohamed Sep 16, 2024
913ed1a
Const and auto changes
SharafMohamed Sep 16, 2024
7aa8a92
Changed AST add functions to indicate the AST are being added to the …
SharafMohamed Sep 17, 2024
77e44a5
Merge branch 'nfa-cleanup-pr' into comment-cleanup
SharafMohamed Sep 17, 2024
d1d87e7
Merged with previous PR
SharafMohamed Sep 17, 2024
053d057
Update src/log_surgeon/finite_automata/RegexAST.hpp
SharafMohamed Sep 18, 2024
a822307
updated examples to use
SharafMohamed Sep 18, 2024
0b9603a
Merge branch 'nfa-cleanup-pr' into comment-cleanup
SharafMohamed Sep 18, 2024
2ef84d1
TODO to clarify RegexAST class is actually nodes in the AST
SharafMohamed Sep 18, 2024
83bd518
Merge branch 'main' into comment-cleanup
SharafMohamed Sep 18, 2024
d3d815e
Merge branch 'comment-cleanup' of https://github.com/SharafMohamed/lo…
SharafMohamed Sep 18, 2024
168adb0
Grammar fix
SharafMohamed Sep 18, 2024
20b3421
Typo fix
SharafMohamed Sep 18, 2024
5231a4a
Fix var references in new comments
SharafMohamed Sep 18, 2024
e4ac215
Move DFA comment to RegexDFA.hpp
SharafMohamed Sep 19, 2024
4b8b13e
Merge branch 'comment-cleanup' into tagged-ast
SharafMohamed Sep 19, 2024
e2d05fa
Merge branch 'main' into tagged-ast
SharafMohamed Sep 19, 2024
660eb9b
Remove duplicate comments from child classes
SharafMohamed Sep 19, 2024
731b9fe
Change capture groups to assign tag when its originally added to the …
SharafMohamed Sep 19, 2024
81b4ffa
Assign negative tags during construction of AST nodes
SharafMohamed Sep 23, 2024
a4d29a5
Added docstring for RegexAST explaining what the class does
SharafMohamed Sep 23, 2024
9379447
Removed serializing AST without tags as this intermediate representat…
SharafMohamed Sep 23, 2024
6c4d933
Fix docstring tense; Add [[nodiscard]] and const to serialize functio…
SharafMohamed Sep 23, 2024
a2fdbf1
Fix linter error
SharafMohamed Sep 23, 2024
b0485f5
Make serialize_with_negative_tags() protected
SharafMohamed Sep 23, 2024
64da95c
Merge branch 'main' into tagged-ast
SharafMohamed Sep 23, 2024
e5bda43
Make it explicitly clear when parent method is used
SharafMohamed Sep 23, 2024
0ac7c43
Use fmt in serialize
SharafMohamed Sep 23, 2024
529dcb2
Have fmt fetched if its not found
SharafMohamed Sep 23, 2024
2e17bee
Remove fetching fmt
SharafMohamed Sep 23, 2024
013765b
Fetch fmt with QUIET
SharafMohamed Sep 23, 2024
14cbe97
Added debug prints to cmake
SharafMohamed Sep 23, 2024
20864d6
Find after fetch
SharafMohamed Sep 23, 2024
72e61f6
Modified fmt fetch to match GSL
SharafMohamed Sep 23, 2024
aadb290
Fix fmt tag in fetch
SharafMohamed Sep 23, 2024
c9d5510
Revert fmt fetch back to how it was before
SharafMohamed Sep 23, 2024
f138527
Switched approach for generating unique capture group ids
SharafMohamed Sep 23, 2024
4cc5e2a
Update fetch fmt
SharafMohamed Sep 23, 2024
e08e345
fmt debug cmake
SharafMohamed Sep 23, 2024
ea5121a
Fixed order of declaring and making available for fetched content in …
SharafMohamed Sep 23, 2024
2ff09b5
Force fmt to generate install rules
SharafMohamed Sep 23, 2024
69cdd1f
Add docstring for FMT_INSTALL; Remove debug prints
SharafMohamed Sep 23, 2024
f580cdd
Copy GSL when fetching fmt
SharafMohamed Sep 23, 2024
c0dd3fe
Fix type in ftm include
SharafMohamed Sep 23, 2024
df12855
Cmake test
SharafMohamed Sep 23, 2024
be00753
Cmake test2
SharafMohamed Sep 23, 2024
2f47770
Cmake test3
SharafMohamed Sep 23, 2024
a15365b
Cmake test4
SharafMohamed Sep 23, 2024
239ff76
Cmake test5
SharafMohamed Sep 23, 2024
2826461
Fix cmake indentation
SharafMohamed Sep 23, 2024
db42bd8
Fix cmake indentation
SharafMohamed Sep 23, 2024
d5eeb5a
Remove space in cmake
SharafMohamed Sep 23, 2024
ce2fc76
Use std::move for set assignment
SharafMohamed Sep 25, 2024
9643138
Update docstrings; Use move and merge for sets.
SharafMohamed Sep 25, 2024
550961e
For serialize functions use const and make nullptr checks explicit; F…
SharafMohamed Sep 25, 2024
74c46ad
Update RegexAST docstring.
SharafMohamed Sep 25, 2024
29e0777
Return raw const* instead of unique_ptr
SharafMohamed Sep 25, 2024
cde19a6
Fix compiler errors in previous commit
SharafMohamed Sep 25, 2024
9837c10
Remove <format>
SharafMohamed Sep 25, 2024
dac2122
Replace bind with a lambda function
SharafMohamed Sep 25, 2024
aa4a4e4
Add fmt to clan-format header libraries; Run clang-format
SharafMohamed Sep 25, 2024
e00fead
Add description of new test
SharafMohamed Sep 25, 2024
63fd9da
Use GIT_SHALLOW ON for fmt in cmakelists
SharafMohamed Sep 25, 2024
a5eae39
Handle 32bit unicode in AST node serialize()
SharafMohamed Sep 26, 2024
7b73929
Remove irrelevent comment from clang-tidy (also fixes the previously …
SharafMohamed Sep 26, 2024
1c62d8c
Merge branch 'main' into tagged-ast
SharafMohamed Sep 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 38 additions & 13 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ project(log_surgeon
LANGUAGES CXX
)

if (POLICY CMP0077)
cmake_policy(SET CMP0077 NEW)
endif()

if (CMAKE_PROJECT_NAME STREQUAL PROJECT_NAME AND BUILD_TESTING)
find_package(Catch2 3 REQUIRED)
include(Catch)
Expand All @@ -24,9 +28,27 @@ if (NOT Microsoft.GSL_FOUND)
GIT_TAG "v4.0.0"
GIT_SHALLOW ON
)
endif()

find_package(fmt 8.0.1 QUIET)
if (NOT fmt_FOUND)
FetchContent_Declare(fmt
GIT_REPOSITORY https://github.com/fmtlib/fmt
GIT_TAG "8.0.1"
)
endif()

# Declare the details of all fetched content before making them available.
if (NOT Microsoft.GSL_FOUND)
FetchContent_MakeAvailable(GSL)
endif()

if (NOT fmt_FOUND)
# Force fmt to generate install rules
set(FMT_INSTALL ON CACHE BOOL "Enable installation for fmt." FORCE)
FetchContent_MakeAvailable(fmt)
endif()

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

if (NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
Expand Down Expand Up @@ -83,20 +105,13 @@ set(LCHIP_THIRD_PARTY_INCLUDE_DIR "${LCHIP_INSTALL_INCLUDE_DIR}/log_surgeon/thir

add_library(log_surgeon ${SOURCE_FILES})
add_library(log_surgeon::log_surgeon ALIAS log_surgeon)

if (Microsoft.GSL_FOUND)
target_link_libraries(log_surgeon
PUBLIC
Microsoft.GSL::GSL
)
endif()
target_include_directories(log_surgeon
PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/src>
$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>
PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/src
)
if (NOT Microsoft.GSL_FOUND)
PUBLIC
Microsoft.GSL::GSL
)
else()
# Since the user doesn't have GSL installed, use the GSL headers directly.
# NOTE:
# - We can't link against the `Microsoft.GSL::GSL` target since that would require adding `GSL`
Expand All @@ -106,9 +121,19 @@ if (NOT Microsoft.GSL_FOUND)
PUBLIC
$<BUILD_INTERFACE:${GSL_SOURCE_DIR}/include>
$<INSTALL_INTERFACE:${LCHIP_THIRD_PARTY_INCLUDE_DIR}>
)
)
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
endif()

target_link_libraries(log_surgeon PUBLIC fmt::fmt)

target_include_directories(log_surgeon
PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/src>
$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>
PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/src
)

target_compile_features(log_surgeon
PRIVATE cxx_std_20
)
Expand Down
24 changes: 13 additions & 11 deletions src/log_surgeon/SchemaParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
#include <log_surgeon/Lexer.hpp>
#include <log_surgeon/utils.hpp>

using ParserValueRegex = log_surgeon::ParserValue<std::unique_ptr<
log_surgeon::finite_automata::RegexAST<log_surgeon::finite_automata::RegexNFAByteState>>>;
using RegexASTByte
= log_surgeon::finite_automata::RegexAST<log_surgeon::finite_automata::RegexNFAByteState>;
using RegexASTGroupByte = log_surgeon::finite_automata::RegexASTGroup<
Expand Down Expand Up @@ -154,14 +156,22 @@ auto SchemaParser::existing_schema_rule(NonTerminal* m) -> unique_ptr<SchemaAST>
return schema_ast;
}

auto SchemaParser::regex_capture_rule(NonTerminal* m) -> std::unique_ptr<ParserAST> {
auto* r4 = dynamic_cast<IdentifierAST*>(m->non_terminal_cast(3)->get_parser_ast().get());
auto& r6 = m->non_terminal_cast(5)->get_parser_ast()->get<unique_ptr<RegexASTByte>>();
return std::make_unique<ParserValueRegex>(make_unique<RegexASTCaptureByte>(
r4->m_name,
std::move(r6),
m_capture_group_id_generator.assign_next_id()
));
}

static auto identity_rule_ParserASTSchema(NonTerminal* m) -> unique_ptr<SchemaAST> {
unique_ptr<ParserAST>& r1 = m->non_terminal_cast(0)->get_parser_ast();
std::unique_ptr<SchemaAST> schema_ast(dynamic_cast<SchemaAST*>(r1.release()));
return schema_ast;
}

using ParserValueRegex = ParserValue<unique_ptr<RegexASTByte>>;

static auto regex_identity_rule(NonTerminal* m) -> unique_ptr<ParserAST> {
return unique_ptr<ParserAST>(new ParserValueRegex(
std::move(m->non_terminal_cast(0)->get_parser_ast()->get<unique_ptr<RegexASTByte>>())
Expand Down Expand Up @@ -283,14 +293,6 @@ static auto regex_range_rule(NonTerminal* m) -> unique_ptr<ParserAST> {
);
}

static auto regex_capture_rule(NonTerminal* m) -> unique_ptr<ParserAST> {
auto* r4 = dynamic_cast<IdentifierAST*>(m->non_terminal_cast(3)->get_parser_ast().get());
auto& r6 = m->non_terminal_cast(5)->get_parser_ast()->get<unique_ptr<RegexASTByte>>();
return std::make_unique<ParserValueRegex>(
make_unique<RegexASTCaptureByte>(r4->m_name, std::move(r6))
);
}

static auto regex_middle_identity_rule(NonTerminal* m) -> unique_ptr<ParserAST> {
return unique_ptr<ParserAST>(new ParserValueRegex(
std::move(m->non_terminal_cast(1)->get_parser_ast()->get<unique_ptr<RegexASTByte>>())
Expand Down Expand Up @@ -603,7 +605,7 @@ void SchemaParser::add_productions() {
add_production(
"Literal",
{"Lparen", "QuestionMark", "Langle", "Identifier", "Rangle", "Regex", "Rparen"},
regex_capture_rule
std::bind(&SchemaParser::regex_capture_rule, this, std::placeholders::_1)
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
);
add_production("Literal", {"Lparen", "Regex", "Rparen"}, regex_middle_identity_rule);
for (auto const& [special_regex_char, special_regex_name] : m_special_regex_characters) {
Expand Down
20 changes: 20 additions & 0 deletions src/log_surgeon/SchemaParser.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,17 @@
#include <log_surgeon/LALR1Parser.hpp>

namespace log_surgeon {
/**
* Class for generating monotonically increasing integer IDs.
*/
class UniqueIdGenerator {
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
public:
[[nodiscard]] auto assign_next_id() -> uint32_t { return m_next_id++; }

private:
uint32_t m_next_id{0};
};

// ASTs used in SchemaParser AST
class SchemaAST : public ParserAST {
public:
Expand Down Expand Up @@ -100,6 +111,13 @@ class SchemaParser : public LALR1Parser<
*/
auto existing_schema_rule(NonTerminal* m) -> std::unique_ptr<SchemaAST>;

/**
* A semantic rule for regex capture groups that needs access to `m_capture_group_id_generator`.
* @param m
* @return A unique pointer to the parsed regex capture group.
*/
auto regex_capture_rule(NonTerminal* m) -> std::unique_ptr<ParserAST>;

/**
* After lexing half of the buffer, reads into that half of the buffer and
* changes variables accordingly
Expand All @@ -126,6 +144,8 @@ class SchemaParser : public LALR1Parser<
auto generate_schema_ast(Reader& reader) -> std::unique_ptr<SchemaAST>;

static inline std::unordered_map<char, std::string> m_special_regex_characters;

UniqueIdGenerator m_capture_group_id_generator;
};
} // namespace log_surgeon

Expand Down
Loading
Loading