-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-33527][SQL] Extend the function of decode so as consistent with mainstream databases #30479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test status failure |
|
Test build #132427 has finished for PR 30479 at commit
|
|
Test build #132426 has finished for PR 30479 at commit
|
| def createExpr(params: Seq[Expression]): Expression = { | ||
| params.length match { | ||
| case 0 => StringDecode(Literal.create(null, StringType), Literal.create(null, StringType)) | ||
| case 1 => StringDecode(params.head, Literal.create(null, StringType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really allow 0 or 1 parameter before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's preserve the AnalysisException(s"Invalid number of arguments for function decode. ")
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Just a question; the other systems don't have a function like Spark decode/encode? |
| case 0 | 1 => | ||
| throw new AnalysisException("Invalid number of arguments for function decode. " + | ||
| s"Expected: 2; Found: ${params.length}") | ||
| case 2 => StringDecode(params.head, params.last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, IMHO it would be good to rename the existing decode func or assign a new name into the new decode func because the generated doc for decode will be complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If rename the existing decode func will break the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, so I think we need add a legacy config, update the migration guide, ... if we rename it. Anyway, the latter approach (assigning a new name into the new decode func) looks better if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR could avoid migration and support the new decode as well. cc @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably rename the function in ansi mode.
| |_FUNC_(expr, search, result [, search, result ] ... [, default]) - Decode compares expr | ||
| | to each search value one by one. If expr is equal to a search, returns the corresponding result. | ||
| | If no match is found, then Oracle returns default. If default is omitted, returns null. | ||
| """, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you follow the other format? e.g.,
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Lines 479 to 486 in c88edda
| usage = "_FUNC_(str, search[, replace]) - Replaces all occurrences of `search` with `replace`.", | |
| arguments = """ | |
| Arguments: | |
| * str - a string expression | |
| * search - a string expression. If `search` is not found in `str`, `str` is returned unchanged. | |
| * replace - a string expression. If `replace` is not specified or is an empty string, nothing replaces | |
| the string that is removed from `str`. | |
| """, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new decode function is the same as other function uses RuntimeReplaceable. The parameter list is not fixed.
Only PostgreSQL have the function. |
| > SELECT _FUNC_(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle'); | ||
| NULL | ||
| """, | ||
| since = "3.1.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, new feature should arrive in 3.2.0 because branch-3.1 is out now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
Test build #132460 has finished for PR 30479 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #132484 has finished for PR 30479 at commit
|
|
thanks, merging to master! |
| """, | ||
| since = "3.2.0") | ||
| // scalastyle:on line.size.limit | ||
| case class Decode(params: Seq[Expression], child: Expression) extends RuntimeReplaceable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this replaces the existing decode in SQL side only, right? shall we update migration guide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new decode could keep compatible with old one.
| since = "1.5.0") | ||
| // scalastyle:on line.size.limit | ||
| case class Decode(bin: Expression, charset: Expression) | ||
| case class StringDecode(bin: Expression, charset: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an alternative for the previous decode after this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only renames the expression, not the function. This PR just adds an overload of the decode function when there are more than 3 parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest the following followup:
- add a new function name for
StringDecode. Probably "string_decode". - require
decodefunction to take more than 2 parameters, under ansi mode(or a legacy config)
|
@cloud-fan Thanks for your work |
### What changes were proposed in this pull request? This PR fixes the Scala 2.13 build failure brought by #30479 . ### Why are the changes needed? To pass Scala 2.13 build. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Should be done byGitHub Actions. Closes #30727 from sarutak/fix-scala213-build-failure. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
In Spark, decode(bin, charset) - Decodes the first argument using the second argument character set.
Unfortunately this is NOT what any other SQL vendor understands
DECODEto do.DECODEgenerally is a short hand for a simple case expression:There are some mainstream database support the syntax.
Oracle
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/DECODE.html#GUID-39341D91-3442-4730-BD34-D3CF5D4701CE
Vertica
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/DECODE.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_____10
DB2
https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1447.htm
Redshift
https://docs.aws.amazon.com/redshift/latest/dg/r_DECODE_expression.html
Pig
https://pig.apache.org/docs/latest/api/org/apache/pig/piggybank/evaluation/decode/Decode.html
Teradata
https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/jtCpCycpEaXESG4d63kMjg
Snowflake
https://docs.snowflake.com/en/sql-reference/functions/decode.html
Why are the changes needed?
It is very useful.
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
Jenkins test.