[SPARK-2053][SQL] Add Catalyst expressions for CASE WHEN. #1055

concretevitamin · 2014-06-11T22:27:05Z

JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2053

This PR adds support for two types of CASE statements present in Hive. The first type is of the form CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END, with the semantics like a chain of if statements. The second type is of the form CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END, with the semantics like a switch statement on key a. Both forms are implemented in CaseWhen.

This link contains more detailed descriptions on their semantics.

Notes / Open issues:

Please check if any implicit contracts / invariants are broken in the implementations (especially for the operators). I am not very familiar with them and I currently find them tricky to spot.
We should decide whether or not a non-boolean condition is allowed in a branch of CaseWhen. Hive throws a SemanticException for this situation and I think it'd be good to mimic it -- the question is where in the whole Spark SQL pipeline should we signal an exception for such a query.

Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala

AmplabJenkins · 2014-06-11T22:32:05Z

Merged build triggered.

AmplabJenkins · 2014-06-11T22:32:14Z

Merged build started.

AmplabJenkins · 2014-06-11T22:33:07Z

Merged build finished.

AmplabJenkins · 2014-06-11T22:33:08Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15689/

AmplabJenkins · 2014-06-11T22:52:06Z

Merged build triggered.

AmplabJenkins · 2014-06-11T22:52:14Z

Merged build started.

AmplabJenkins · 2014-06-12T00:12:06Z

Merged build finished.

AmplabJenkins · 2014-06-12T00:12:06Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15691/

marmbrus · 2014-06-12T01:03:05Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala

How about:

"Returns the [[DataType]] of the result of evaluating this expression. It is invalid to query the dataType of an unresolved expression (i.e., when resolved == false)."

AmplabJenkins · 2014-06-13T21:22:15Z

Build started.

AmplabJenkins · 2014-06-13T22:43:45Z

Build finished.

AmplabJenkins · 2014-06-13T22:43:46Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15772/

concretevitamin · 2014-06-13T22:55:50Z

The latest build only contains some PySpark failures I think.

On Fri, Jun 13, 2014 at 3:43 PM, UCB AMPLab [email protected]
wrote:

Build finished.

—
Reply to this email directly or view it on GitHub
#1055 (comment).

marmbrus · 2014-06-14T02:44:06Z

Yeah, the python problems should be fixed now though. I think the problem is that this PR doesn't merge cleanly anymore so you aren't picking up the python fixes done by @pwendell. You can tell the merge failed because Jenkins said "Build started." instead of "Merge build started".

Please rebase :)

chenghao-intel · 2014-06-15T03:44:14Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala

Besides that can be optimized, it brings bug in some corner case. If the first child expression is deterministic-less, ( e.g. UDF rand() ), as computing it multiple times may get different values, which is not what we want here semantically.
Probably we need to create a new expression for wrapping common sub expression. I am OK to leave it for the further improving, but can you add more doc for this?

Yeah, that is a very good point. I have updated the comment accordingly & added this JIRA to track this.

AmplabJenkins · 2014-06-16T18:09:45Z

Build triggered.

AmplabJenkins · 2014-06-16T18:09:52Z

Build started.

AmplabJenkins · 2014-06-16T19:26:19Z

Build finished.

AmplabJenkins · 2014-06-16T19:26:20Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15820/

Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala

AmplabJenkins · 2014-06-16T22:19:45Z

Merged build triggered.

AmplabJenkins · 2014-06-16T22:19:51Z

Merged build started.

AmplabJenkins · 2014-06-16T23:40:42Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-16T23:40:42Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15826/

@transient

JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2053 This PR adds support for two types of CASE statements present in Hive. The first type is of the form `CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END`, with the semantics like a chain of if statements. The second type is of the form `CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END`, with the semantics like a switch statement on key `a`. Both forms are implemented in `CaseWhen`. [This link](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ConditionalFunctions) contains more detailed descriptions on their semantics. Notes / Open issues: * Please check if any implicit contracts / invariants are broken in the implementations (especially for the operators). I am not very familiar with them and I currently find them tricky to spot. * We should decide whether or not a non-boolean condition is allowed in a branch of `CaseWhen`. Hive throws a `SemanticException` for this situation and I think it'd be good to mimic it -- the question is where in the whole Spark SQL pipeline should we signal an exception for such a query. Author: Zongheng Yang <[email protected]> Closes #1055 from concretevitamin/caseWhen and squashes the following commits: 4226eb9 [Zongheng Yang] Comment. 79d26fc [Zongheng Yang] Merge branch 'master' into caseWhen caf9383 [Zongheng Yang] Update a FIXME. 9d26ab8 [Zongheng Yang] Add @transient marker. 788a0d9 [Zongheng Yang] Implement CastNulls, which fixes udf_case and udf_when. 7ef284f [Zongheng Yang] Refactors: remove redundant passes, improve toString, mark transient. f47ae7b [Zongheng Yang] Modify queries in tests to have shorter golden files. 1c1fbfc [Zongheng Yang] Cleanups per review comments. 7d2b7e2 [Zongheng Yang] Translate CaseKeyWhen to CaseWhen at parsing time. 47d406a [Zongheng Yang] Do toArray once and lazily outside of eval(). bb3d109 [Zongheng Yang] Update scaladoc of a method. aea3195 [Zongheng Yang] Fix bug that branchesArr is not used; remove unused import. 96870a8 [Zongheng Yang] Turn off scalastyle for some comments. 7392f3a [Zongheng Yang] Minor cleanup. 2cf08bb [Zongheng Yang] Merge branch 'master' into caseWhen 9f84b40 [Zongheng Yang] Add golden outputs from Hive. db51a85 [Zongheng Yang] Add allCondBooleans check; uncomment tests. 3f9ef0a [Zongheng Yang] Cleanups and bug fixes (mainly in eval() and resolved). be54bc8 [Zongheng Yang] Rewrite eval() to a low-level implementation. Separate two CASE stmts. f2bcb9d [Zongheng Yang] WIP 5906f75 [Zongheng Yang] WIP efd019b [Zongheng Yang] eval() and toString() bug fixes. 7d81e95 [Zongheng Yang] Clean up resolved. a31d782 [Zongheng Yang] Finish up Case. (cherry picked from commit e243c5f) Signed-off-by: Michael Armbrust <[email protected]>

marmbrus · 2014-06-17T11:36:20Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala

This is minor so I went ahead and merged anyway, but I think it would be better to enumerate these certain cases instead of leaving it up to the reader to understand the code below.

Thanks for the note!

marmbrus · 2014-06-17T11:36:45Z

Thanks! merged into master and 1.0

@transient

JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2053 This PR adds support for two types of CASE statements present in Hive. The first type is of the form `CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END`, with the semantics like a chain of if statements. The second type is of the form `CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END`, with the semantics like a switch statement on key `a`. Both forms are implemented in `CaseWhen`. [This link](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ConditionalFunctions) contains more detailed descriptions on their semantics. Notes / Open issues: * Please check if any implicit contracts / invariants are broken in the implementations (especially for the operators). I am not very familiar with them and I currently find them tricky to spot. * We should decide whether or not a non-boolean condition is allowed in a branch of `CaseWhen`. Hive throws a `SemanticException` for this situation and I think it'd be good to mimic it -- the question is where in the whole Spark SQL pipeline should we signal an exception for such a query. Author: Zongheng Yang <[email protected]> Closes apache#1055 from concretevitamin/caseWhen and squashes the following commits: 4226eb9 [Zongheng Yang] Comment. 79d26fc [Zongheng Yang] Merge branch 'master' into caseWhen caf9383 [Zongheng Yang] Update a FIXME. 9d26ab8 [Zongheng Yang] Add @transient marker. 788a0d9 [Zongheng Yang] Implement CastNulls, which fixes udf_case and udf_when. 7ef284f [Zongheng Yang] Refactors: remove redundant passes, improve toString, mark transient. f47ae7b [Zongheng Yang] Modify queries in tests to have shorter golden files. 1c1fbfc [Zongheng Yang] Cleanups per review comments. 7d2b7e2 [Zongheng Yang] Translate CaseKeyWhen to CaseWhen at parsing time. 47d406a [Zongheng Yang] Do toArray once and lazily outside of eval(). bb3d109 [Zongheng Yang] Update scaladoc of a method. aea3195 [Zongheng Yang] Fix bug that branchesArr is not used; remove unused import. 96870a8 [Zongheng Yang] Turn off scalastyle for some comments. 7392f3a [Zongheng Yang] Minor cleanup. 2cf08bb [Zongheng Yang] Merge branch 'master' into caseWhen 9f84b40 [Zongheng Yang] Add golden outputs from Hive. db51a85 [Zongheng Yang] Add allCondBooleans check; uncomment tests. 3f9ef0a [Zongheng Yang] Cleanups and bug fixes (mainly in eval() and resolved). be54bc8 [Zongheng Yang] Rewrite eval() to a low-level implementation. Separate two CASE stmts. f2bcb9d [Zongheng Yang] WIP 5906f75 [Zongheng Yang] WIP efd019b [Zongheng Yang] eval() and toString() bug fixes. 7d81e95 [Zongheng Yang] Clean up resolved. a31d782 [Zongheng Yang] Finish up Case.

@transient

JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2053 This PR adds support for two types of CASE statements present in Hive. The first type is of the form `CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END`, with the semantics like a chain of if statements. The second type is of the form `CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END`, with the semantics like a switch statement on key `a`. Both forms are implemented in `CaseWhen`. [This link](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ConditionalFunctions) contains more detailed descriptions on their semantics. Notes / Open issues: * Please check if any implicit contracts / invariants are broken in the implementations (especially for the operators). I am not very familiar with them and I currently find them tricky to spot. * We should decide whether or not a non-boolean condition is allowed in a branch of `CaseWhen`. Hive throws a `SemanticException` for this situation and I think it'd be good to mimic it -- the question is where in the whole Spark SQL pipeline should we signal an exception for such a query. Author: Zongheng Yang <[email protected]> Closes apache#1055 from concretevitamin/caseWhen and squashes the following commits: 4226eb9 [Zongheng Yang] Comment. 79d26fc [Zongheng Yang] Merge branch 'master' into caseWhen caf9383 [Zongheng Yang] Update a FIXME. 9d26ab8 [Zongheng Yang] Add @transient marker. 788a0d9 [Zongheng Yang] Implement CastNulls, which fixes udf_case and udf_when. 7ef284f [Zongheng Yang] Refactors: remove redundant passes, improve toString, mark transient. f47ae7b [Zongheng Yang] Modify queries in tests to have shorter golden files. 1c1fbfc [Zongheng Yang] Cleanups per review comments. 7d2b7e2 [Zongheng Yang] Translate CaseKeyWhen to CaseWhen at parsing time. 47d406a [Zongheng Yang] Do toArray once and lazily outside of eval(). bb3d109 [Zongheng Yang] Update scaladoc of a method. aea3195 [Zongheng Yang] Fix bug that branchesArr is not used; remove unused import. 96870a8 [Zongheng Yang] Turn off scalastyle for some comments. 7392f3a [Zongheng Yang] Minor cleanup. 2cf08bb [Zongheng Yang] Merge branch 'master' into caseWhen 9f84b40 [Zongheng Yang] Add golden outputs from Hive. db51a85 [Zongheng Yang] Add allCondBooleans check; uncomment tests. 3f9ef0a [Zongheng Yang] Cleanups and bug fixes (mainly in eval() and resolved). be54bc8 [Zongheng Yang] Rewrite eval() to a low-level implementation. Separate two CASE stmts. f2bcb9d [Zongheng Yang] WIP 5906f75 [Zongheng Yang] WIP efd019b [Zongheng Yang] eval() and toString() bug fixes. 7d81e95 [Zongheng Yang] Clean up resolved. a31d782 [Zongheng Yang] Finish up Case.

…view (#1055)

concretevitamin added 11 commits June 10, 2014 13:11

Finish up Case.

a31d782

Clean up resolved.

7d81e95

eval() and toString() bug fixes.

efd019b

WIP

5906f75

WIP

f2bcb9d

Rewrite eval() to a low-level implementation. Separate two CASE stmts.

be54bc8

Cleanups and bug fixes (mainly in eval() and resolved).

3f9ef0a

Add allCondBooleans check; uncomment tests.

db51a85

Add golden outputs from Hive.

9f84b40

Merge branch 'master' into caseWhen

2cf08bb

Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala

Minor cleanup.

7392f3a

concretevitamin changed the title ~~SPARK-2053: add Catalyst expressions for CASE WHEN.~~ [SPARK-2053][SQL] Add Catalyst expressions for CASE WHEN. Jun 11, 2014

Turn off scalastyle for some comments.

96870a8

Fix bug that branchesArr is not used; remove unused import.

aea3195

marmbrus reviewed Jun 12, 2014
View reviewed changes

chenghao-intel reviewed Jun 15, 2014
View reviewed changes

Update a FIXME.

caf9383

concretevitamin added 2 commits June 16, 2014 15:14

Merge branch 'master' into caseWhen

79d26fc

Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala

Comment.

4226eb9

asfgit closed this in e243c5f Jun 17, 2014

marmbrus reviewed Jun 17, 2014
View reviewed changes

concretevitamin deleted the caseWhen branch June 17, 2014 18:22

wangyum pushed a commit that referenced this pull request May 26, 2023

[CARMEL-6152] [Follow-up] Calculate more accurate information in the …

97b0ece

…view (#1055)

udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024

MapR [SPARK-1115] Remove code duplicates in configure.sh (apache#1055)

4e27f7a

mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025

MapR [SPARK-1115] Remove code duplicates in configure.sh (apache#1055)

c1647c7

[SPARK-2053][SQL] Add Catalyst expressions for CASE WHEN. #1055

[SPARK-2053][SQL] Add Catalyst expressions for CASE WHEN. #1055

Uh oh!

Conversation

concretevitamin commented Jun 11, 2014

Uh oh!

AmplabJenkins commented Jun 11, 2014

Uh oh!

AmplabJenkins commented Jun 11, 2014

Uh oh!

AmplabJenkins commented Jun 11, 2014

Uh oh!

AmplabJenkins commented Jun 11, 2014

Uh oh!

AmplabJenkins commented Jun 11, 2014

Uh oh!

AmplabJenkins commented Jun 11, 2014

Uh oh!

AmplabJenkins commented Jun 12, 2014

Uh oh!

AmplabJenkins commented Jun 12, 2014

Uh oh!

marmbrus Jun 12, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Jun 13, 2014

Uh oh!

AmplabJenkins commented Jun 13, 2014

Uh oh!

AmplabJenkins commented Jun 13, 2014

Uh oh!

concretevitamin commented Jun 13, 2014

Uh oh!

marmbrus commented Jun 14, 2014

Uh oh!

chenghao-intel Jun 15, 2014

Choose a reason for hiding this comment

Uh oh!

concretevitamin Jun 16, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

AmplabJenkins commented Jun 16, 2014

Uh oh!

marmbrus Jun 17, 2014

Choose a reason for hiding this comment

Uh oh!

concretevitamin Jun 17, 2014

Choose a reason for hiding this comment

Uh oh!

marmbrus commented Jun 17, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants