feat: add fuzz testing for parser and handle escaped colons #10

laojianzi · 2024-11-29T08:08:45Z

Summary by Sourcery

Enhance the parser's robustness by introducing fuzz testing and improving the lexer to handle escaped characters more accurately. Add new test cases to cover edge cases and ensure the parser's stability against malformed input.

New Features:

Introduce fuzz testing for the parser to ensure robustness against unexpected input.

Enhancements:

Improve the lexer to handle escaped colons correctly and refine token collection logic.

Tests:

Add a new test suite for fuzz testing the parser, including a variety of seed inputs to test different query formats.
Add specific test cases to handle edge cases in parsing, such as unexpected tokens and escaped characters.

sourcery-ai · 2024-11-29T08:08:51Z

Reviewer's Guide by Sourcery

This PR adds fuzzing tests to the parser package and fixes several bugs discovered through fuzzing. The changes include improvements to token handling in the lexer, parser modifications for better operator handling, and new test cases for previously uncovered edge cases.

Sequence diagram for fuzzing test process

sequenceDiagram
    actor Developer
    participant FuzzParser
    participant Parser
    participant Lexer
    Developer->>FuzzParser: Add seed queries
    FuzzParser->>Parser: Parse query
    Parser->>Lexer: Tokenize query
    Lexer-->>Parser: Return tokens
    Parser-->>FuzzParser: Return statement
    FuzzParser->>Parser: Parse statement.String()
    Parser->>Lexer: Tokenize statement
    Lexer-->>Parser: Return tokens
    Parser-->>FuzzParser: Return statement
    FuzzParser-->>Developer: Assert no errors and equality

Class diagram for parser and lexer changes

classDiagram
    class Parser {
        +Stmt() ast.Expr
    }
    class Lexer {
        +shouldBreak(i int, isString, withEscape bool, endChar rune) bool
        +collectNextToken(start int) string
    }
    class TestParserFuzzBugs {
        +TestParserFuzzBugs(t *testing.T)
    }
    class FuzzParser {
        +FuzzParser(f *testing.F)
    }
    Parser --> Lexer
    TestParserFuzzBugs --> Parser
    FuzzParser --> Parser
    note for Lexer "Improved token handling logic"
    note for Parser "Modified operator handling logic"

File-Level Changes

Change	Details	Files
Added fuzzing infrastructure and test cases	Implemented a new FuzzParser function with comprehensive seed inputs Added basic safety checks for parsing, including no-panic verification Included validation for String() output stability and re-parsing capability	`parser/fuzz_test.go` `parser/testdata/fuzz/FuzzParser/45eecbed51a417f5` `parser/testdata/fuzz/FuzzParser/74ba8e9a0ce951ee`
Enhanced lexer token handling	Improved character peeking logic for better escape sequence handling Added special handling for colon characters in non-string contexts Modified token collection break conditions	`parser/lexer.go`
Updated parser operator validation	Added additional check for field token kind in binary expression parsing	`parser/parser.go`
Added regression tests for fuzzing-discovered bugs	Created test cases for edge cases with escaped AND keywords Added tests for various colon placement scenarios Included error case validation for invalid syntax	`parser/parser_test.go`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

deepsource-io · 2024-11-29T08:09:31Z

Here's the code health analysis summary for commits 733963f..d9796a1. View details on DeepSource ↗.

Analysis Summary

Analyzer	Status	Summary	Link
Go	✅ Success		View Check ↗
Test coverage	✅ Success	❗ 1 occurence introduced	View Check ↗

Code Coverage Report

Metric	Aggregate	Go
Composite Coverage	84.9% (down 0.4% from `main`)	84.9% (down 0.4% from `main`)
Line Coverage	84.9% (down 0.4% from `main`)	84.9% (down 0.4% from `main`)
New Composite Coverage	66.7%	66.7%
New Line Coverage	66.7%	66.7%

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

sourcery-ai

Hey @laojianzi - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

llamapreview

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 PR Summary

Business value and requirements alignment: The PR enhances the parser's robustness by introducing fuzz testing and improving the lexer to handle escaped characters more accurately. This aligns with the business requirement to ensure the parser can handle a wider range of inputs, including edge cases and malformed queries.
Key components modified: The PR primarily modifies the parser, lexer, and test files.
Impact assessment: The improvements will make the parser more robust, accurate, and reliable, benefiting users by improving the overall reliability of the KQL parser.

1.2 Architecture Changes

System design modifications: Introduction of fuzz testing to the parser package.
Component interactions: The lexer's improved handling of escaped characters will directly impact the parser's ability to process queries accurately.
Integration points: The new fuzz testing infrastructure will integrate with the existing testing framework.
Dependency changes and implications: No new dependencies are introduced, but the testing suite is expanded with fuzz testing.

2. Detailed Technical Analysis

2.1 Code Logic Deep-Dive

parser/fuzz_test.go - FuzzParser

Submitted PR Code:

func FuzzParser(f *testing.F) {
	// Add initial corpus
	seeds := []string{
		"field:value",
		"field: value",
		"field : value",
		`field: "value"`,
		"field: *",
		"field: value*",
		"field: *value",
		"field: *value*",
		"field > 10",
		"field >= 10",
		"field < 10",
		"field <= 10",
		"field: true",
		"field: false",
		"field: null",
		"field1: value1 AND field2: value2",
		"field1: value1 OR field2: value2",
		"NOT field: value",
		"(field: value)",
		"(field1: value1) AND (field2: value2)",
		`field: "value with spaces"`,
		`field: "value with \"escaped\" quotes"`,
		`field: "value with
 newline"`,
		"field1: value1 AND field2: value2 OR field3: value3",
		"field1: (value1 OR value2) AND field2: value3",
	}

	for _, seed := range seeds {
		f.Add(seed)
	}

	f.Fuzz(func(t *testing.T, query string) {
		if strings.TrimSpace(query) == "" {
			return
		}

		// TODO: Current fuzzing implementation has limitations in input/output validation.
		// This test only covers basic safety checks:
		// 1. No panics during parsing
		// 2. String() output can be re-parsed
		// 3. String() output remains stable
		//
		// Contributions welcome for better validation approaches :)
		stmt, err := parser.New(query).Stmt()
		if err != nil || stmt == nil {
			return
		}

		stmt2, err := parser.New(stmt.String()).Stmt()
		assert.NoError(t, err)
		assert.Equal(t, stmt.String(), stmt2.String())
	})
}

Analysis:
- Current logic and potential issues: The fuzz testing function adds a variety of seed inputs to test different query formats. The current implementation has basic safety checks but lacks comprehensive input/output validation.
- Edge cases and error handling: The function handles empty queries by returning early. However, it does not cover all edge cases, such as malformed queries that might cause panics.
- Cross-component impact: The fuzz testing will help identify issues in the parser and lexer, improving overall robustness.
- Business logic considerations: Ensures the parser can handle a wide range of inputs, including edge cases and malformed queries.

LlamaPReview Suggested Improvements:

// Enhance the TODO comment with specific suggestions for better validation
// TODO: Current fuzzing implementation has limitations in input/output validation.
// Suggestions:
// 1. Implement a more robust validation mechanism to check the semantic correctness of the parsed statements.
// 2. Add checks for known edge cases and malformed inputs.
// 3. Consider using a validation library or framework to automate some of these checks.

Improvement rationale:
- Technical benefits: Enhances the robustness of the fuzz testing by covering more edge cases and validating the semantic correctness of the parsed statements.
- Business value: Improves the reliability of the parser, ensuring it can handle a wider range of inputs.
- Risk assessment: Low risk, as the suggestions are mainly about enhancing the validation mechanism without changing the core logic.

parser/lexer.go - shouldBreak

Submitted PR Code:
```
func (l *defaultLexer) shouldBreak(i int, isString, withEscape bool, endChar rune) bool {
	ch := l.peek(i)
	if isString && !withEscape && ch == endChar {
		return true
	}

	if !isString && !withEscape {
		if unicode.IsSpace(ch) || ch == ')' || ch == ':' {
			return true
		}
	}

	// not \:
	if !isString && withEscape && ch == ':' && (!l.peekOk(i-1) || l.peek(i-1) != '\\') {
		return true
	}

	return false
}
```
- Analysis:
  - Current logic and potential issues: The function checks if token collection should stop based on various conditions. The addition of handling escaped colons improves accuracy but might miss some edge cases.
  - Edge cases and error handling: The function handles escaped colons correctly but does not cover all possible escape sequences.
  - Cross-component impact: Improvements in the lexer will directly affect the parser's ability to handle queries accurately.
  - Business logic considerations: Ensures the lexer can handle escaped characters correctly, improving the parser's robustness.
- LlamaPReview Suggested Improvements:
```
// Add a comment explaining the purpose of the new condition
// Handle escaped colons correctly to prevent premature token breaking
if !isString && withEscape && ch == ':' && (!l.peekOk(i-1) || l.peek(i-1) != '\\') {
	return true
}
```
- Improvement rationale:
  - Technical benefits: Clarifies the purpose of the new condition, making the code more maintainable.
  - Business value: Ensures the lexer handles escaped colons correctly, improving the parser's accuracy.
  - Risk assessment: Low risk, as the suggestion is about adding a comment for better code understanding.

parser/parser.go - parseLiteral

Submitted PR Code:
```
expr, err := p.parseLiteral()
if err != nil {
	return nil, err
}

if !hasNot {
	pos = expr.Pos()
}

op := p.lexer.Token.Kind
if !op.IsOperator() || !p.lexer.lastTokenKind.IsField() {
	return ast.NewBinaryExpr(pos, "", 0, expr, hasNot), nil
}
```
- Analysis:
  - Current logic and potential issues: The parser checks if the current token is an operator and if the last token is a field. This ensures that only valid binary expressions are parsed. However, it does not handle cases where the operator is valid but the field is not.
  - Edge cases and error handling: The current implementation might miss edge cases where the operator is valid but the field is not, leading to incorrect parsing.
  - Cross-component impact: This change affects the parser's ability to handle queries accurately, impacting the overall robustness of the system.
  - Business logic considerations: Ensures that only valid binary expressions are parsed, improving the accuracy of query processing.
- LlamaPReview Suggested Improvements:
```
// Add a comment explaining the purpose of the new condition
// Ensure that the operator is valid and the last token is a field
if !op.IsOperator() || !p.lexer.lastTokenKind.IsField() {
	return ast.NewBinaryExpr(pos, "", 0, expr, hasNot), nil
}
```
- Improvement rationale:
  - Technical benefits: Clarifies the purpose of the new condition, making the code more maintainable.
  - Business value: Ensures that only valid binary expressions are parsed, improving the accuracy of query processing.
  - Risk assessment: Low risk, as the suggestion is about adding a comment for better code understanding.

parser/parser_test.go - TestParserFuzzBugs

Submitted PR Code:

func TestParserFuzzBugs(t *testing.T) {
	tests := []struct {
		query   string
		want    ast.Expr
		wantErr error
	}{
		{
			"0 :0",
			nil,
			errors.New("line 0:2 expected keyword OR|AND|NOT, but got \":\"

0 :0
^
"),
},
{
"\AND :0",
ast.NewBinaryExpr(0, "\AND", token.TokenKindOperatorEql, ast.NewLiteral(
6, 7, token.TokenKindInt, "0", nil,
), false),
nil,
},
{
"\AND: 0",
ast.NewBinaryExpr(0, "\AND", token.TokenKindOperatorEql, ast.NewLiteral(
6, 7, token.TokenKindInt, "0", nil,
), false),
nil,
},
}

for _, tt := range tests {
	t.Run(tt.query, func(t *testing.T) {
		stmt, err := parser.New(tt.query).Stmt()

		if tt.wantErr == nil {
			assert.NoError(t, err)
		} else {
			assert.EqualError(t, err, tt.wantErr.Error())
		}

		assert.EqualValues(t, tt.want, stmt)
	})
}

}

- **Analysis**:
  * **Current logic and potential issues**: The test cases cover various edge cases, including escaped AND keywords and colon placement scenarios. However, they do not cover all possible edge cases, such as malformed queries with invalid escape sequences.
  * **Edge cases and error handling**: The current implementation handles specific edge cases but might miss others, leading to incomplete test coverage.
  * **Cross-component impact**: These test cases help identify issues in the parser and lexer, improving overall robustness.
  * **Business logic considerations**: Ensures the parser can handle a wide range of inputs, including edge cases and malformed queries.
- **LlamaPReview Suggested Improvements**:
  ```go
  // Add more test cases to cover additional edge cases
  {
  	"invalid\\escape: 0",
  	nil,
  	errors.New("line 0:1 expected keyword OR|AND|NOT, but got \"invalid\\escape\"
invalid\\escape: 0
^
"),
  },
  {
  	"field:\\invalid",
  	nil,
  	errors.New("line 0:7 expected keyword OR|AND|NOT, but got \"\\invalid\"
field:\\invalid
     ^
"),
  },
  ```
- **Improvement rationale**:
  * **Technical benefits**: Enhances test coverage by including more edge cases, ensuring the parser can handle a wider range of inputs.
  * **Business value**: Improves the reliability of the parser, ensuring it can handle various edge cases and malformed queries.
  * **Risk assessment**: Low risk, as the suggestions are about adding more test cases for better coverage.

### 2.2 Implementation Quality
- **Code Structure**:
* **Organization and modularity**: The code is well-organized, with clear separation of concerns between the lexer, parser, and tests.
* **Design pattern adherence**: The code follows Go idioms and best practices, such as using interfaces and structs for modularity.
* **Reusability aspects**: The fuzz testing infrastructure can be reused for other components, enhancing overall test coverage.
* **Maintainability factors**: The code is maintainable, with clear comments and logical structure.

- **Error Handling**:
* **Exception scenarios coverage**: The fuzz testing covers basic safety checks but lacks comprehensive input/output validation.
* **Recovery mechanisms**: The code handles errors gracefully, returning early in case of invalid inputs.
* **Logging and monitoring**: The code uses assertions for testing, but more detailed logging would be beneficial for monitoring.
* **User experience impact**: Improved error handling will enhance the user experience by making the parser more robust.

- **Performance Considerations**:
* **Resource utilization**: The fuzz testing might increase resource utilization due to the large number of test cases.
* **Scalability aspects**: The improvements in the lexer and parser will enhance scalability by handling more complex queries.
* **Bottleneck analysis**: The current implementation might have bottlenecks in handling large inputs, which fuzz testing will help identify.
* **Optimization opportunities**: Optimizing the fuzz testing to cover more edge cases without increasing resource utilization significantly.

## 3. Critical Findings
<details>

### 3.1 Potential Issues
- **Issue**: The fuzz testing lacks comprehensive input/output validation, which might lead to undetected issues in the parser.
- **Impact**:
* **Technical implications**: Potential undetected bugs in the parser that could lead to incorrect query processing.
* **Business consequences**: Decreased reliability of the parser, affecting user trust.
* **User experience effects**: Users might encounter unexpected behavior or errors when using the parser.
- **Suggested Resolution**:
* **Technical changes**: Enhance the fuzz testing to include more comprehensive input/output validation.
* **Configuration updates**: Update the fuzz testing configuration to cover more edge cases.
* **Testing requirements**: Add more test cases to validate the semantic correctness of the parsed statements.

- **Issue**: The lexer's handling of escaped characters might miss some edge cases, leading to incorrect token collection.
- **Impact**:
* **Performance implications**: Incorrect token collection might lead to performance issues in query processing.
* **Maintenance overhead**: Increased maintenance overhead due to potential bugs in the lexer.
* **Future scalability**: Affects the scalability of the parser by limiting its ability to handle complex queries.
- **Suggested Resolution**:
* **Technical changes**: Enhance the lexer to handle more escape sequences accurately.
* **Migration strategy**: Gradually introduce more comprehensive escape sequence handling in the lexer.
* **Testing requirements**: Add test cases to cover the new escape sequences.

- **Issue**: The parser does not handle cases where the operator is valid but the field is not, leading to incorrect parsing.
- **Impact**:
* **Accuracy implications**: Incorrect parsing might lead to incorrect query results, affecting the accuracy of data retrieval.
* **User trust**: Decreased user trust in the parser due to incorrect query results.
* **Maintenance burden**: Increased maintenance burden due to potential bugs in the parser.
- **Suggested Resolution**:
* **Technical changes**: Enhance the parser to handle cases where the operator is valid but the field is not.
* **Configuration updates**: Update the parser configuration to cover more edge cases.
* **Testing requirements**: Add test cases to validate the handling of these edge cases.

- **Issue**: The test cases in `parser/parser_test.go` do not cover all possible edge cases, leading to incomplete test coverage.
- **Impact**:
* **Robustness implications**: Incomplete test coverage might lead to undetected bugs in the parser, affecting its robustness.
* **User experience**: Users might encounter unexpected behavior or errors due to undetected bugs.
* **Maintenance overhead**: Increased maintenance overhead due to potential bugs in the parser.
- **Suggested Resolution**:
* **Technical changes**: Add more test cases to cover additional edge cases.
* **Testing requirements**: Update the test suite to include more comprehensive test cases.

</details>

## 4. Security Assessment
<details>

- **Authentication/Authorization impacts**: No significant security considerations in the current changes.
- **Data handling concerns**: No significant security considerations in the current changes.
- **Input validation**: The current fuzz testing implementation lacks comprehensive input/output validation, which might lead to undetected issues in the parser.
- **Security best practices**: Follow best practices for input validation and error handling to ensure the robustness of the parser.
- **Potential security risks**: The lack of comprehensive input/output validation might lead to undetected bugs in the parser, affecting its robustness and reliability.
- **Mitigation strategies**: Enhance the fuzz testing to include more comprehensive input/output validation.
- **Security testing requirements**: Add more test cases to validate the semantic correctness of the parsed statements.

</details>

## 5. Testing Strategy
<details>

- **Test Coverage**:
* **Unit test requirements**: Ensure comprehensive unit tests for the lexer and parser, covering all edge cases.
* **Integration test scenarios**: Add integration tests to validate the interaction between the lexer and parser.
* **Edge case validation**: The new test cases cover various edge cases, but more can be added for better coverage.
- **Test Recommendations**:
* **Suggested Test Cases**:
  ```go
  // Add more test cases to cover additional edge cases
  {
  	"invalid\\escape: 0",
  	nil,
  	errors.New("line 0:1 expected keyword OR|AND|NOT, but got \"invalid\\escape\"
invalid\\escape: 0
^
"),
  },
  {
  	"field:\\invalid",
  	nil,
  	errors.New("line 0:7 expected keyword OR|AND|NOT, but got \"\\invalid\"
field:\\invalid
     ^
"),
  },
  ```
* **Coverage improvements**: Ensure all critical paths in the lexer and parser are covered by tests.
* **Performance testing needs**: Monitor the performance impact of fuzz testing and optimize as needed.

</details>

## 6. Documentation & Maintenance
<details>

- **Documentation updates needed**: Update the documentation to reflect the new fuzz testing infrastructure and improvements in the lexer and parser.
- **Long-term maintenance considerations**: Ensure the code is well-documented and maintainable, with clear comments and logical structure.
- **Technical debt and monitoring requirements**: Monitor the technical debt and address any potential issues in the lexer and parser.

</details>

## 7. Deployment & Operations
<details>

- **Deployment impact and strategy**: The changes should be thoroughly tested before deployment to ensure they do not introduce new issues.
- **Key operational considerations**: Monitor the performance impact of fuzz testing and optimize as needed.

</details>

## 8. Summary & Recommendations
### 8.1 Key Action Items
1. **Enhance the fuzz testing** to include more comprehensive input/output validation.
2. **Update the lexer** to handle more escape sequences accurately.
3. **Improve the parser** to handle cases where the operator is valid but the field is not.
4. **Add more test cases** to cover additional edge cases and ensure comprehensive test coverage.

### 8.2 Future Considerations
- **Technical evolution path**: Continuously enhance the fuzz testing infrastructure and improve the lexer and parser to handle more complex queries.
- **Business capability evolution**: Ensure the parser can handle a wider range of inputs, including edge cases and malformed queries, to improve overall reliability.
- **System integration impacts**: Monitor the integration of the new fuzz testing infrastructure with the existing testing framework.

---
💡 **LlamaPReview Community**
Have feedback on this AI Code review tool? Join our [GitHub Discussions](https://github.com/JetXu-LLM/LlamaPReview-site/discussions) to share your thoughts and help shape the future of LlamaPReview.

test: add fuzz test

8827edd

sourcery-ai bot changed the title ~~@sourcery-ai~~ feat: add fuzz testing for parser and handle escaped colons Nov 29, 2024

sourcery-ai bot reviewed Nov 29, 2024

View reviewed changes

fix: resolves an issue found by golangci-lint

33ee460

llamapreview bot reviewed Nov 29, 2024

View reviewed changes

fix: only run fuzz tests when go >= 1.18

d9796a1

laojianzi merged commit 68d76a1 into main Nov 29, 2024
4 checks passed

laojianzi deleted the test/fuzz branch November 29, 2024 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fuzz testing for parser and handle escaped colons #10

feat: add fuzz testing for parser and handle escaped colons #10

laojianzi commented Nov 29, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Nov 29, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

deepsource-io bot commented Nov 29, 2024 •

edited

Loading

Analysis Summary

Code Coverage Report

sourcery-ai bot left a comment

llamapreview bot left a comment

feat: add fuzz testing for parser and handle escaped colons #10

feat: add fuzz testing for parser and handle escaped colons #10

Conversation

laojianzi commented Nov 29, 2024 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Nov 29, 2024 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for fuzzing test process

Class diagram for parser and lexer changes

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

deepsource-io bot commented Nov 29, 2024 • edited Loading

Analysis Summary

Code Coverage Report

sourcery-ai bot left a comment

Choose a reason for hiding this comment

llamapreview bot left a comment

Choose a reason for hiding this comment

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 PR Summary

1.2 Architecture Changes

2. Detailed Technical Analysis

2.1 Code Logic Deep-Dive

parser/fuzz_test.go - FuzzParser

parser/lexer.go - shouldBreak

parser/parser.go - parseLiteral

parser/parser_test.go - TestParserFuzzBugs

laojianzi commented Nov 29, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Nov 29, 2024 •

edited

Loading

deepsource-io bot commented Nov 29, 2024 •

edited

Loading