Skip to content

Conversation

@dusantism-db
Copy link
Contributor

What changes were proposed in this pull request?

In this PR, support for REPEAT statement in SQL scripting is introduced.

Changes summary:

Grammar/parser changes

  • repeatStatement grammar rule
  • visitRepeatStatement rule visitor
  • RepeatStatement logical operetor

RepeatStatementExec execution node
Internal sates - Condition and Body
Iterator implementation - switch between body and condition until condition evaluates to true
SqlScriptingInterpreter - added logic to transform RepeatStatement logical operator to RepeatStatementExec execution node

Why are the changes needed?

This is a part of SQL Scripting introduced to Spark, REPEAT statement is a basic control flow construct in the SQL language.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New tests are introduced to all of the three scripting test suites: SqlScriptingParserSuite, SqlScriptingExecutionNodeSuite and SqlScriptingInterpreterSuite.

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to unify parts of the while and repeat? The only difference is starting state and reset() method. Does it make sense to do it for 2 loops only?

  • State enum is exactly the same
  • treeIterator is exactly the same
  • Constructor parameters are identical

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, treeIterator is not exactly the same because for while it iterates while condition is true, but for repeat it iterates while condition is false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see it. We can always override some new abstraction like continueIteration(session, condition). From my side this is ok, I'm just thinking if there is need to avoid code duplication in cases like this. Let's wait to hear from other folks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point, but let's aim to solve this in a follow-up PR to simplify things a bit in this PR, since logic is correct.

Copy link
Contributor

@davidm-db davidm-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments.

@dusantism-db dusantism-db force-pushed the sql-scripting-repeat-statement branch 2 times, most recently from 585dc0c to c3ef7a0 Compare September 5, 2024 16:23
Comment on lines +97 to +117
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty lines are a suggestion for better readability of this method

Comment on lines +292 to +295
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge this with the above case for WhileStatementContext? You have multiple options for this, using | operator or Either type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wouldn't be trivial, as there is no common supertype between RepeatStatementContext and WhileStatementContext with the beginLabel() method defined. Maybe we could have a labeledStatement grammar rule, or something similar, to abstract some of this label logic from all statements with labels (ifElse, while, repeat..)

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix indentations and it would be nice to add a little bit more negative tests.

@dusantism-db dusantism-db force-pushed the sql-scripting-repeat-statement branch from b572135 to e99566a Compare September 9, 2024 09:05
/**
* Executable node for RepeatStatement.
* @param condition Executable node for the condition.
* @param body Executable node for the body.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any contract/restriction for the body?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the type, not really. The body can be anything which extends CompoundBodyExec

curr = Some(condition)
condition.reset()
return retStmt
case _ =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases does this happen? Shall we raise an error here like unsupported or unexpected statement?

Copy link
Contributor Author

@dusantism-db dusantism-db Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This match checks if the returned statement is LEAVE or ITERATE, because in those cases it should return early. If it's not LEAVE or ITERATE, then the method will continue normally. Because we only check if we should return early, we shouldn't throw any error. Similar logic exists in WhileStatementExec.

@dusantism-db dusantism-db requested a review from MaxGekk September 9, 2024 14:19
@MaxGekk
Copy link
Member

MaxGekk commented Sep 11, 2024

+1, LGTM. Merging to master.
Thank you, @dusantism-db and @miland-db @davidm-db for review.

@MaxGekk MaxGekk closed this in da89322 Sep 11, 2024
@MaxGekk
Copy link
Member

MaxGekk commented Sep 11, 2024

@dusantism-db Congratulations with your first contribution to Apache Spark!

@dusantism-db
Copy link
Contributor Author

@dusantism-db Congratulations with your first contribution to Apache Spark!

Thanks for your help, Max!

MaxGekk added a commit that referenced this pull request Sep 11, 2024
…Class` in `SqlScriptingInterpreterSuite`

### What changes were proposed in this pull request?
In the PR, I propose to replace `errorClass` by `condition` in `SqlScriptingInterpreterSuite`

### Why are the changes needed?
The changes from the PR #47756 conflict to #48027

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the modified test:
```
$ build/sbt "test:testOnly *SqlScriptingInterpreterSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48072 from MaxGekk/fix-errorClass-REPEAT.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants