Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Remove redundant typenames and use standard naming for typenames. #58

Open
wants to merge 387 commits into
base: main
Choose a base branch
from

Conversation

SharafMohamed
Copy link
Contributor

@SharafMohamed SharafMohamed commented Dec 5, 2024

References

  • Depends on PR#57.
  • To review in parallel with PR#57, diff against PR#57 locally. In the repo run:
git fetch upstream pull/57/head:pr-57
git fetch upstream pull/58/head:pr-58
git diff pr-57 pr-58

Description

  • Remove redundant typenames.
  • Rename stateType to state_type.

Validation performed

Previously existing tests still succeed.

Summary by CodeRabbit

  • Documentation
    • Improved comments for clarity in various classes and methods.
  • Chores
    • Removed unnecessary include directives to simplify dependencies.
    • Added a TODO comment for future refactoring of the RegexNFA class.

SharafMohamed and others added 30 commits October 27, 2024 20:05
…call succeeds in NFA's serialize.

Co-authored-by: Lin Zhihao <[email protected]>
…on classes when they are initialized in their constructor.
…egativeTaggedTransition classes into their own files.
…just an id. This object is created and owned by the capture AST, and other AST and NFA states point to these tags.
…Should fix this so the ordering is determinstic.
…ename tag members to make it clear they are positions.
@SharafMohamed SharafMohamed changed the title refactor: Remove redundant typenames. refactor: Remove redundant typenames. Dec 7, 2024
@SharafMohamed SharafMohamed changed the title refactor: Remove redundant typenames. refactor: Update typenames to remove redundancy and follow naming standards. Dec 8, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/log_surgeon/finite_automata/RegexDFAState.hpp (2)

37-37: Prefer false == expression over !expression

According to the coding guidelines, prefer using false == expression rather than !expression. In line 37, consider changing !m_matching_variable_ids.empty() to false == m_matching_variable_ids.empty().


68-68: Prefer false == expression over !expression

As per the coding guidelines, it's recommended to use false == expression instead of !expression. In line 68, please consider changing !result->empty() to false == result->empty().

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 99b5b08 and a4a93b4.

📒 Files selected for processing (10)
  • CMakeLists.txt (1 hunks)
  • examples/intersect-test.cpp (1 hunks)
  • src/log_surgeon/LALR1Parser.cpp (1 hunks)
  • src/log_surgeon/finite_automata/RegexDFA.hpp (1 hunks)
  • src/log_surgeon/finite_automata/RegexDFA.tpp (0 hunks)
  • src/log_surgeon/finite_automata/RegexDFAState.hpp (1 hunks)
  • src/log_surgeon/finite_automata/RegexDFAStatePair.hpp (1 hunks)
  • src/log_surgeon/finite_automata/RegexDFAStateType.hpp (1 hunks)
  • src/log_surgeon/finite_automata/RegexNFA.hpp (0 hunks)
  • src/log_surgeon/finite_automata/RegexNFAState.hpp (1 hunks)
💤 Files with no reviewable changes (2)
  • src/log_surgeon/finite_automata/RegexNFA.hpp
  • src/log_surgeon/finite_automata/RegexDFA.tpp
✅ Files skipped from review due to trivial changes (1)
  • src/log_surgeon/finite_automata/RegexNFAState.hpp
🧰 Additional context used
📓 Path-based instructions (6)
examples/intersect-test.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/finite_automata/RegexDFAStateType.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/LALR1Parser.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/finite_automata/RegexDFA.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/finite_automata/RegexDFAStatePair.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/finite_automata/RegexDFAState.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (4)
src/log_surgeon/finite_automata/RegexDFAStateType.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
CMakeLists.txt (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
src/log_surgeon/finite_automata/RegexDFA.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/RegexDFAState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
🔇 Additional comments (7)
src/log_surgeon/finite_automata/RegexDFA.hpp (2)

40-52: Efficient implementation of new_state method

The updated new_state method efficiently creates new DFA states and correctly handles matching variable IDs. The use of std::make_unique ensures proper memory management.


54-72: Well-structured get_intersect method

The revised get_intersect method is well-organized and effectively computes the intersection of DFAs. The use of sets for tracking visited and unvisited pairs enhances readability and performance.

src/log_surgeon/finite_automata/RegexDFAStatePair.hpp (1)

1-84: Clear and logical implementation of RegexDFAStatePair

The RegexDFAStatePair class is well-designed, with appropriate methods for managing DFA state pairs. The operator overloading and reachability logic are correctly implemented.

src/log_surgeon/finite_automata/RegexDFAStateType.hpp (1)

1-13: Correct definition of RegexDFAStateType enumeration

The RegexDFAStateType enum is properly defined and will facilitate state specialization based on type. This enhances code clarity and maintainability.

src/log_surgeon/LALR1Parser.cpp (1)

11-14: LGTM: Constructor initialization is properly formatted

The initialization list follows good practices with clear member initialization.

examples/intersect-test.cpp (1)

45-45: LGTM: Proper usage of unique_ptr

The change to use dfa2.get() correctly passes the raw pointer while maintaining ownership semantics of the unique_ptr.

CMakeLists.txt (1)

100-102: LGTM: Improved organization of RegexDFA components

The separation into distinct header files (RegexDFAState.hpp, RegexDFAStatePair.hpp, and RegexDFAStateType.hpp) improves code organization and maintainability.

Let's verify that all necessary header files are included:

✅ Verification successful

All RegexDFA-related header files are properly included in CMakeLists.txt

The comparison shows that all header files in the finite_automata directory, including the newly added RegexDFA component files (RegexDFAState.hpp, RegexDFAStatePair.hpp, and RegexDFAStateType.hpp), are correctly listed in CMakeLists.txt. The file list is complete and up-to-date.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all necessary header files are included in CMakeLists.txt

# Check for any .hpp files in the finite_automata directory that might be missing
fd -e hpp -e tpp . src/log_surgeon/finite_automata/ | sort > /tmp/actual_files.txt
rg -o 'src/log_surgeon/finite_automata/[^[:space:]"]+\.(hpp|tpp)' CMakeLists.txt | sort > /tmp/listed_files.txt
diff /tmp/actual_files.txt /tmp/listed_files.txt

Length of output: 201


Script:

#!/bin/bash
# Let's try a different approach to list files
echo "=== Files in finite_automata directory ==="
fd . src/log_surgeon/finite_automata/ -t f

echo -e "\n=== Files listed in CMakeLists.txt ==="
rg "src/log_surgeon/finite_automata/" CMakeLists.txt

Length of output: 1841

@SharafMohamed SharafMohamed changed the title refactor: Update typenames to remove redundancy and follow naming standards. refactor: Remove redundant typenames and fix typename naming to follow standards. Dec 8, 2024
@SharafMohamed SharafMohamed changed the title refactor: Remove redundant typenames and fix typename naming to follow standards. refactor: Remove redundant typenames and use standard naming for typenames. Dec 8, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
src/log_surgeon/finite_automata/RegexDFAState.hpp (1)

62-64: Consider following the coding guideline for boolean expressions.

The condition RegexDFAStateType::Byte == state_type already follows the guideline of putting the constant on the left side of the comparison. However, for consistency, consider applying the same pattern throughout the method.

Apply this diff:

     if constexpr (RegexDFAStateType::Byte == state_type) {
         return m_bytes_transition[character];
     } else {
         if (character < cSizeOfByte) {
             return m_bytes_transition[character];
         }
         std::unique_ptr<std::vector<typename Tree::Data>> result
                 = m_tree_transitions.find(Interval(character, character));
         assert(result->size() <= 1);
-        if (false == result->empty()) {
+        if (true == result->empty()) {
+            return nullptr;
+        }
-            return result->front().m_value;
-        }
-        return nullptr;
+        return result->front().m_value;
     }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a4a93b4 and 827d39c.

📒 Files selected for processing (2)
  • src/log_surgeon/finite_automata/RegexDFAState.hpp (2 hunks)
  • src/log_surgeon/finite_automata/RegexDFAStatePair.hpp (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/log_surgeon/finite_automata/RegexDFAStatePair.hpp
🧰 Additional context used
📓 Path-based instructions (1)
src/log_surgeon/finite_automata/RegexDFAState.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
src/log_surgeon/finite_automata/RegexDFAState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (3)
src/log_surgeon/finite_automata/RegexDFAState.hpp (3)

22-22: LGTM! The template parameter renaming improves consistency.

The change from stateType to state_type follows standard C++ naming conventions, and the simplified type alias improves code readability.

Also applies to: 25-25


55-59: LGTM! Well-documented conditional type declaration.

The member variable declarations are clear and well-documented. The use of std::conditional_t for compile-time type selection is appropriate, and the comment effectively explains the empty tuple case.


43-43: LGTM! Method signatures are clear and consistent.

The simplified parameter types in method signatures improve readability while maintaining type safety.

Also applies to: 51-51

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is a comment for myself): since this PR has many different types of fixes, I think we need to have a multi-line commit message when we merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There're similar cases in LALR1Parser.hpp:

    [[nodiscard]] auto token_cast(uint32_t i) const -> Token* {
        assert(i < cSizeOfAllChildren);
        return &std::get<Token>(NonTerminal::m_all_children[m_children_start + i]);
    }

    /**
     * Return the ith child's (body of production) MatchedSymbol as a
     * NonTerminal. Note: only children are needed (and stored) for performing
     * semantic actions (for the AST)
     * @param i
     * @return NonTerminal*
     */
    [[nodiscard]] auto non_terminal_cast(uint32_t i) const -> NonTerminal* {
        assert(i < cSizeOfAllChildren);
        return &std::get<NonTerminal>(NonTerminal::m_all_children[m_children_start + i]);
    }

We could simplify NonTerminal::m_all_children to m_all_children

@@ -27,10 +27,10 @@ class RegexDFAStatePair {
m_state2(state2) {};

/**
* Used for ordering in a set by considering the states' addresses
* Used for ordering in a set by considering the states' addresses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Used for ordering in a set by considering the states' addresses.

I think we can drop this description. Otherwise we need to make sure it matches the rule of doc string descriptions, sth like Orders two states required by std::set.

) const -> void {
// TODO: Handle UTF-8 (multi-byte transitions) as well
for (uint32_t i = 0; i < cSizeOfByte; i++) {
auto next_state1 = m_state1->next(i);
auto next_state2 = m_state2->next(i);
if (next_state1 != nullptr && next_state2 != nullptr) {
RegexDFAStatePair<DFAState> reachable_pair{next_state1, next_state2};
RegexDFAStatePair reachable_pair{next_state1, next_state2};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
RegexDFAStatePair reachable_pair{next_state1, next_state2};
RegexDFAStatePair const reachable_pair{next_state1, next_state2};

I understand this is out of the PR's scope, but I don't mind if we also include fixes like this
It's up to u whether to commit this change within this PR

Comment on lines +91 to +92
Add `dest_state` to `m_bytes_transitions` if all values in interval are a byte, otherwise add
`dest_state` to `m_tree_transitions`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add `dest_state` to `m_bytes_transitions` if all values in interval are a byte, otherwise add
`dest_state` to `m_tree_transitions`.
* Add `dest_state` to `m_bytes_transitions` if all values in interval are a byte, otherwise add
* `dest_state` to `m_tree_transitions`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants