Skip to content

Conversation

@bersprockets
Copy link
Contributor

What changes were proposed in this pull request?

Change TryToBinary's constructor to test whether the format expression is foldable before creating the replacement expression.

Why are the changes needed?

try_to_binary, when called with non-foldable format, throws an unhelpful error message:

spark-sql> SELECT try_to_binary(col1, col2) from values ('abc', 'utf-8') as data(col1, col2);
[INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace.
org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace.
	at org.apache.spark.SparkException$.internalError(SparkException.scala:88)
...
Caused by: java.lang.AssertionError: assertion failed
	at scala.Predef$.assert(Predef.scala:208)
	at org.apache.spark.sql.catalyst.expressions.ToBinary.$anonfun$replacement$1(stringExpressions.scala:2597)
	at scala.Option.map(Option.scala:230)
	at org.apache.spark.sql.catalyst.expressions.ToBinary.replacement$lzycompute(stringExpressions.scala:2596)
	at org.apache.spark.sql.catalyst.expressions.ToBinary.replacement(stringExpressions.scala:2596)

TryToBinary creates an instance of ToBinary as a replacement expression. However, ToBinary's default constructor does not check whether the format expression is foldable, so the code that creates ToBinary's own replacement expression fails with an assertion failure.

ToBinary is not typically instantiated using the default constructor. The function registry uses the second auxiliary constructor, which does check whether format is foldable. The code that creates ToBinary's replacement expression assumes this second auxiliary constructor is always used.

TryToBinary cannot use ToBinary's second auxiliary constructor because it needs to set nullOnInvalidFormat to true.

After this PR, the above example will throw a more useful error message:

spark-sql> SELECT try_to_binary(col1, col2) from values ('abc', 'utf-8') as data(col1, col2);
The 'format' parameter of function 'try_to_binary' needs to be a string literal.; line 1 pos 7
spark-sql> 

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests.

@github-actions github-actions bot added the SQL label Nov 20, 2022
-- format must be foldable
SELECT try_to_binary(col1, col2) from values ('abc', 'utf-8') as data(col1, col2);
-- non-foldable input string
SELECT try_to_binary(col1, 'utf-8') from values ('abc') as data(col1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly relevant to this issue, but I noticed that there were no tests for non-foldable input values, so I added one.

def this(expr: Expression, formatExpression: Expression) =
this(expr, Some(formatExpression),
TryEval(ToBinary(expr, Some(formatExpression), nullOnInvalidFormat = true)))
TryEval(ToBinary(expr, Some(TryToBinary.checkFormat(formatExpression)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you perform the check in checkInputDataTypes() as we do that in other expression, please.

BTW, exceptions from expressions constructors are wrapped by AnalysisException additionally which is not convenient to users.

struct<>
-- !query output
org.apache.spark.sql.AnalysisException
The 'format' parameter of function 'to_binary' needs to be a string literal.; line 1 pos 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you move the exception from constructor, you should see AnalysisException w/ an error class.

@bersprockets
Copy link
Contributor Author

I will wait on PR #38737.

@bersprockets
Copy link
Contributor Author

Tested PR #38737. That PR incidentally seems to fix this issue:

SELECT try_to_binary(col1, col2) from values ('abc', 'utf-8') as data(col1, col2);
[DATATYPE_MISMATCH.NON_FOLDABLE_INPUT] Cannot resolve "to_binary(col1, col2)" due to data type mismatch: the input fmt should be a foldable "STRING" expression; however, got "col2".; line 1 pos 7;
'Project [unresolvedalias(try_to_binary(col1#0, col2#1), None)]
+- SubqueryAlias data
   +- LocalRelation [col1#0, col2#1]

spark-sql> 

Therefore, closing this PR.

@bersprockets bersprockets deleted the try_to_binary_issue branch December 23, 2022 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants