Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Nov 24, 2020

What changes were proposed in this pull request?

In Spark, decode(bin, charset) - Decodes the first argument using the second argument character set.

Unfortunately this is NOT what any other SQL vendor understands DECODE to do.
DECODE generally is a short hand for a simple case expression:

SELECT DECODE(c1, 1, 'Hello', 2, 'World', '!') FROM (VALUES (1), (2), (3)) AS T(c1)
=> 
(Hello),
(World)
(!)

There are some mainstream database support the syntax.
Oracle
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/DECODE.html#GUID-39341D91-3442-4730-BD34-D3CF5D4701CE
Vertica
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/DECODE.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_____10
DB2
https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1447.htm
Redshift
https://docs.aws.amazon.com/redshift/latest/dg/r_DECODE_expression.html
Pig
https://pig.apache.org/docs/latest/api/org/apache/pig/piggybank/evaluation/decode/Decode.html
Teradata
https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/jtCpCycpEaXESG4d63kMjg
Snowflake
https://docs.snowflake.com/en/sql-reference/functions/decode.html

Why are the changes needed?

It is very useful.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Jenkins test.

beliefer and others added 30 commits June 19, 2020 10:36
@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37026/

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37027/

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37026/

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37027/

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Test build #132427 has finished for PR 30479 at commit 6a1b919.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Test build #132426 has finished for PR 30479 at commit 80c56a1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Decode(params: Seq[Expression], child: Expression) extends RuntimeReplaceable

def createExpr(params: Seq[Expression]): Expression = {
params.length match {
case 0 => StringDecode(Literal.create(null, StringType), Literal.create(null, StringType))
case 1 => StringDecode(params.head, Literal.create(null, StringType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really allow 0 or 1 parameter before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's preserve the AnalysisException(s"Invalid number of arguments for function decode. ")

@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37062/

@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37062/

@maropu
Copy link
Member

maropu commented Dec 9, 2020

Just a question; the other systems don't have a function like Spark decode/encode?

case 0 | 1 =>
throw new AnalysisException("Invalid number of arguments for function decode. " +
s"Expected: 2; Found: ${params.length}")
case 2 => StringDecode(params.head, params.last)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, IMHO it would be good to rename the existing decode func or assign a new name into the new decode func because the generated doc for decode will be complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If rename the existing decode func will break the behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, so I think we need add a legacy config, update the migration guide, ... if we rename it. Anyway, the latter approach (assigning a new name into the new decode func) looks better if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR could avoid migration and support the new decode as well. cc @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably rename the function in ansi mode.

|_FUNC_(expr, search, result [, search, result ] ... [, default]) - Decode compares expr
| to each search value one by one. If expr is equal to a search, returns the corresponding result.
| If no match is found, then Oracle returns default. If default is omitted, returns null.
""",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you follow the other format? e.g.,

usage = "_FUNC_(str, search[, replace]) - Replaces all occurrences of `search` with `replace`.",
arguments = """
Arguments:
* str - a string expression
* search - a string expression. If `search` is not found in `str`, `str` is returned unchanged.
* replace - a string expression. If `replace` is not specified or is an empty string, nothing replaces
the string that is removed from `str`.
""",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new decode function is the same as other function uses RuntimeReplaceable. The parameter list is not fixed.

@beliefer
Copy link
Contributor Author

beliefer commented Dec 9, 2020

Just a question; the other systems don't have a function like Spark decode/encode?

Only PostgreSQL have the function.

> SELECT _FUNC_(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle');
NULL
""",
since = "3.1.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, new feature should arrive in 3.2.0 because branch-3.1 is out now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Test build #132460 has finished for PR 30479 at commit 1614e62.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37086/

@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37086/

@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Test build #132484 has finished for PR 30479 at commit eea967a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 24d7e45 Dec 11, 2020
""",
since = "3.2.0")
// scalastyle:on line.size.limit
case class Decode(params: Seq[Expression], child: Expression) extends RuntimeReplaceable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this replaces the existing decode in SQL side only, right? shall we update migration guide?

Copy link
Contributor Author

@beliefer beliefer Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new decode could keep compatible with old one.

since = "1.5.0")
// scalastyle:on line.size.limit
case class Decode(bin: Expression, charset: Expression)
case class StringDecode(bin: Expression, charset: Expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an alternative for the previous decode after this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only renames the expression, not the function. This PR just adds an overload of the decode function when there are more than 3 parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest the following followup:

  1. add a new function name for StringDecode. Probably "string_decode".
  2. require decode function to take more than 2 parameters, under ansi mode(or a legacy config)

@beliefer
Copy link
Contributor Author

@cloud-fan Thanks for your work
@maropu @HyukjinKwon Thanks for your review too.

dongjoon-hyun pushed a commit that referenced this pull request Dec 11, 2020
### What changes were proposed in this pull request?

This PR fixes the Scala 2.13 build failure brought by #30479 .

### Why are the changes needed?

To pass Scala 2.13 build.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Should be done byGitHub Actions.

Closes #30727 from sarutak/fix-scala213-build-failure.

Authored-by: Kousuke Saruta <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants