-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-33527][SQL] Extend the function of decode so as consistent with mainstream databases #30479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4a6f903
96456e2
4314005
d6af4a7
f69094f
b86a42d
2ac5159
9021d6c
74a2ef4
9828158
9cd1aaf
abfcbb9
07c6c81
580130b
3712808
6107413
4b799b4
ee0ecbf
596bc61
0164e2f
90b79fc
2cef3a9
c26b64f
2e02cd2
a6d0741
82e5b2c
70bbf5d
126a51e
f2ceacd
5ad208f
cd81faf
338d28b
2c7a822
1cf326b
80c56a1
6a1b919
1614e62
eea967a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -26,6 +26,7 @@ import scala.collection.mutable.ArrayBuffer | |||||||||||||||||
|
|
||||||||||||||||||
| import org.apache.commons.codec.binary.{Base64 => CommonsBase64} | ||||||||||||||||||
|
|
||||||||||||||||||
| import org.apache.spark.sql.AnalysisException | ||||||||||||||||||
| import org.apache.spark.sql.catalyst.InternalRow | ||||||||||||||||||
| import org.apache.spark.sql.catalyst.analysis.{FunctionRegistry, TypeCheckResult} | ||||||||||||||||||
| import org.apache.spark.sql.catalyst.expressions.codegen._ | ||||||||||||||||||
|
|
@@ -2082,6 +2083,65 @@ case class UnBase64(child: Expression) | |||||||||||||||||
| } | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| object Decode { | ||||||||||||||||||
| def createExpr(params: Seq[Expression]): Expression = { | ||||||||||||||||||
| params.length match { | ||||||||||||||||||
| case 0 | 1 => | ||||||||||||||||||
| throw new AnalysisException("Invalid number of arguments for function decode. " + | ||||||||||||||||||
| s"Expected: 2; Found: ${params.length}") | ||||||||||||||||||
| case 2 => StringDecode(params.head, params.last) | ||||||||||||||||||
| case _ => | ||||||||||||||||||
| val input = params.head | ||||||||||||||||||
| val other = params.tail | ||||||||||||||||||
| val itr = other.iterator | ||||||||||||||||||
| var default: Expression = Literal.create(null, StringType) | ||||||||||||||||||
| val branches = ArrayBuffer.empty[(Expression, Expression)] | ||||||||||||||||||
| while (itr.hasNext) { | ||||||||||||||||||
| val search = itr.next | ||||||||||||||||||
| if (itr.hasNext) { | ||||||||||||||||||
| val condition = EqualTo(input, search) | ||||||||||||||||||
| branches += ((condition, itr.next)) | ||||||||||||||||||
| } else { | ||||||||||||||||||
| default = search | ||||||||||||||||||
| } | ||||||||||||||||||
| } | ||||||||||||||||||
| CaseWhen(branches.seq, default) | ||||||||||||||||||
| } | ||||||||||||||||||
| } | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| // scalastyle:off line.size.limit | ||||||||||||||||||
| @ExpressionDescription( | ||||||||||||||||||
| usage = """ | ||||||||||||||||||
| |_FUNC_(bin, charset) - Decodes the first argument using the second argument character set. | ||||||||||||||||||
| | | ||||||||||||||||||
| |_FUNC_(expr, search, result [, search, result ] ... [, default]) - Decode compares expr | ||||||||||||||||||
| | to each search value one by one. If expr is equal to a search, returns the corresponding result. | ||||||||||||||||||
| | If no match is found, then Oracle returns default. If default is omitted, returns null. | ||||||||||||||||||
| """, | ||||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you follow the other format? e.g., spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala Lines 479 to 486 in c88edda
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The new |
||||||||||||||||||
| examples = """ | ||||||||||||||||||
| Examples: | ||||||||||||||||||
| > SELECT _FUNC_(encode('abc', 'utf-8'), 'utf-8'); | ||||||||||||||||||
| abc | ||||||||||||||||||
| > SELECT _FUNC_(2, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle', 'Non domestic'); | ||||||||||||||||||
| San Francisco | ||||||||||||||||||
| > SELECT _FUNC_(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle', 'Non domestic'); | ||||||||||||||||||
| Non domestic | ||||||||||||||||||
| > SELECT _FUNC_(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle'); | ||||||||||||||||||
| NULL | ||||||||||||||||||
| """, | ||||||||||||||||||
| since = "3.2.0") | ||||||||||||||||||
| // scalastyle:on line.size.limit | ||||||||||||||||||
| case class Decode(params: Seq[Expression], child: Expression) extends RuntimeReplaceable { | ||||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this replaces the existing
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The new |
||||||||||||||||||
|
|
||||||||||||||||||
| def this(params: Seq[Expression]) = { | ||||||||||||||||||
| this(params, Decode.createExpr(params)) | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| override def flatArguments: Iterator[Any] = Iterator(params) | ||||||||||||||||||
| override def exprsReplaced: Seq[Expression] = params | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| /** | ||||||||||||||||||
| * Decodes the first argument into a String using the provided character set | ||||||||||||||||||
| * (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). | ||||||||||||||||||
|
|
@@ -2097,7 +2157,7 @@ case class UnBase64(child: Expression) | |||||||||||||||||
| """, | ||||||||||||||||||
| since = "1.5.0") | ||||||||||||||||||
| // scalastyle:on line.size.limit | ||||||||||||||||||
| case class Decode(bin: Expression, charset: Expression) | ||||||||||||||||||
| case class StringDecode(bin: Expression, charset: Expression) | ||||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have an alternative for the previous
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This only renames the expression, not the function. This PR just adds an overload of the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd suggest the following followup:
|
||||||||||||||||||
| extends BinaryExpression with ImplicitCastInputTypes with NullIntolerant { | ||||||||||||||||||
|
|
||||||||||||||||||
| override def left: Expression = bin | ||||||||||||||||||
|
|
||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, IMHO it would be good to rename the existing
decodefunc or assign a new name into the newdecodefunc because the generated doc fordecodewill be complicated.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If rename the
existingdecode func will break the behavior.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, so I think we need add a legacy config, update the migration guide, ... if we rename it. Anyway, the latter approach (assigning a new name into the new
decodefunc) looks better if possible.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR could avoid migration and support the new
decodeas well. cc @cloud-fanThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably rename the function in ansi mode.