[SPARK-38225][SQL] Adjust input format of function to_binary#35533
[SPARK-38225][SQL] Adjust input format of function to_binary#35533xinrong-meng wants to merge 7 commits intoapache:masterfrom
format of function to_binary#35533Conversation
| val value = lit.eval() | ||
| if (value == null) Literal(null, BinaryType) | ||
| else { | ||
| else if (value.isInstanceOf[UTF8String]) { |
There was a problem hiding this comment.
Is it possible for value to be an instance of java.lang.String?
There was a problem hiding this comment.
no, it can only be UTF8String
There was a problem hiding this comment.
seems we can remove this check now, as we have checked the data type ahead
There was a problem hiding this comment.
Makes sense! Removed.
| -- !query schema | ||
| struct<> | ||
| -- !query output | ||
| org.apache.spark.sql.AnalysisException |
There was a problem hiding this comment.
Intentionally failed query.
| -- !query schema | ||
| struct<> | ||
| -- !query output | ||
| org.apache.spark.sql.AnalysisException |
There was a problem hiding this comment.
Intentionally failed query.
format of function to_binary
|
Would you please review it when you are free? Thanks! |
| @@ -2565,15 +2565,14 @@ case class ToBinary(expr: Expression, format: Option[Expression], child: Express | |||
| case lit if lit.foldable => | |||
There was a problem hiding this comment.
I'd put the string type check here, as to_binary(str_col, cast(null as int)) should fail instead of returning null.
There was a problem hiding this comment.
Makes sense.
May I confirm that the case below is acceptable?
spark-sql> select to_binary(cast(null as int), 'utf-8');
NULL
There was a problem hiding this comment.
yea, the type coercion can cast any atomic type to string type implicitly.
| def this(expr: Expression, format: Expression) = this(expr, Option(format), | ||
| format match { | ||
| case lit if lit.foldable => | ||
| case lit if (lit.foldable && Seq(StringType, NullType).contains(lit.dataType)) => |
There was a problem hiding this comment.
BTW, do other systems like snowflake support to_binary('abc', null)?
There was a problem hiding this comment.
Snowflake doesn't specify that in their document, but BigQuery and Teradata support to_binary('abc', null):
Expressions, functions, and operators | BigQuery | Google Cloud "If an operand is NULL, the result is NULL, with the exception of the IS operator."
Teradata Online Documentation | Quick access to technical manuals "If either instring or in_encoding is NULL, the result is NULL."
|
thanks, merging to master! |
What changes were proposed in this pull request?
Adjust input
formatof functionto_binary:formatparameterbase2format supportWhy are the changes needed?
Currently, function to_binary doesn't deal with the non-string
formatparameter properly.For example,
spark.sql("select to_binary('abc', 1)")raises casting error, rather than hint that encoding format is unsupported.In addition, the current result of the
base2format is incorrect compared to other DBMSes: ours useCast(expr, BinaryType)which utilizesutf-8format, whereas DBMSes (e.g. BigQuery)' convert a BASE2-encoded string to bytes.So we remove the
base2support now and will follow up with that later.Does this PR introduce any user-facing change?
Yes.
formatparameter. For example:From:
To:
base2format supportHow was this patch tested?
Unit test.