-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31410][SPARK-30668][SQL][FOLLOWUP] Raise exception instead of silent change for new DateFormatter #27537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes will be reverted after #27524.
|
Test build #118205 has finished for PR 27537 at commit
|
|
Test build #118271 has finished for PR 27537 at commit
|
|
@xuanyuanking can you fix the conflicts? |
|
Sure, will also reuse the |
b9b3c8f to
c07d09d
Compare
|
Test build #118754 has finished for PR 27537 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
|
cc @MaxGekk |
|
Test build #118846 has finished for PR 27537 at commit
|
86313e5 to
6f51b13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used to create DateFormatter and TimestampFormatter, and I'm pretty sure these 2 formatters are used in many expressions and other places like the json parser.
I think it's better to put the logic in the formatter, not in some expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, got it. I moved this logic into formatter in cc9fd4f.
|
Test build #118856 has finished for PR 27537 at commit
|
|
Test build #119021 has finished for PR 27537 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different places can create different legacy formatter. It's better to always create the legacy formatter, and the new formatter just carry the instance of legacy formatter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done in 47d2f80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC some places just call formatter and catch NonFatal. Can you do some checks and make sure we don't catch non-fatal blindly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding, fix for the non-codegen in fbac26d.
|
Test build #119034 has finished for PR 27537 at commit
|
|
Test build #119035 has finished for PR 27537 at commit
|
0242629 to
fbac26d
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
46dbf9b to
1a92e0f
Compare
| } | ||
|
|
||
| // When legacy time parser policy set to EXCEPTION, check whether we will get different results | ||
| // between legacy format and new format. For legacy parser, DateTimeParseException will not be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format -> parser?
|
|
||
| // When legacy time parser policy set to EXCEPTION, check whether we will get different results | ||
| // between legacy format and new format. For legacy parser, DateTimeParseException will not be | ||
| // thrown. On the contrary, if the legacy policy set to CORRECTED, DateTimeParseException will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For legacy parser, DateTimeParseException will not be thrown
I think it should be
If new parser fails but legacy parser works, throw a SparkUpgradeException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the contrary, if the legacy policy set to CORRECTED ...
If the legacy policy set to CORRECTED, do nothing and let the exception propagate.
| case _: Throwable => None | ||
| } | ||
| if (res.nonEmpty) { | ||
| throw new SparkUpgradeException("3.0", s"Set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should explain what happened.
Fail to parse '$s' in the new parser. You can set ... to LEGACY to restore ..., or set it to CORRECTED and treat it as an invalid datetime string.
| val msg = intercept[SparkException] { | ||
| df2.collect() | ||
| }.getCause.getMessage | ||
| assert(msg.contains(s"Set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to LEGACY to restore " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good enough to test if the cause is SparkUpgradeException
| val message = intercept[SparkException] { | ||
| df.collect() | ||
| }.getCause.getMessage | ||
| assert(message.contains(s"Set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to LEGACY to restore " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| val msg = intercept[SparkException] { | ||
| csv.collect() | ||
| }.getCause.getMessage | ||
| assert(msg.contains(s"Set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to LEGACY to restore " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| val msg = intercept[SparkException] { | ||
| json.collect() | ||
| }.getCause.getMessage | ||
| assert(msg.contains(s"Set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to LEGACY to restore " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| * Exception thrown when Spark returns different result after upgrading to a new version. | ||
| */ | ||
| private[spark] class SparkUpgradeException(version: String, message: String, cause: Throwable) | ||
| extends SparkException(s"Exception for upgrading to Spark $version: $message", cause) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may get a different result due to the upgrading of Spark $version: $message
|
Test build #119299 has finished for PR 27537 at commit
|
| checkExceptionInExpression[SparkUpgradeException]( | ||
| GetTimestamp( | ||
| Literal("2020-01-27T20:06:11.847-0800"), | ||
| Literal("yyyy-MM-dd'T'HH:mm:ss.SSSz")), "Exception for upgrading to Spark 3.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Fail to parse"?
|
Test build #119306 has finished for PR 27537 at commit
|
|
Test build #119355 has finished for PR 27537 at commit
|
|
thanks, merging to master/3.0! |
… for new DateFormatter This is a follow-up work for #27441. For the cases of new TimestampFormatter return null while legacy formatter can return a value, we need to throw an exception instead of silent change. The legacy config will be referenced in the error message. Avoid silent result change for new behavior in 3.0. Yes, an exception is thrown when we detect legacy formatter can parse the string and the new formatter return null. Extend existing UT. Closes #27537 from xuanyuanking/SPARK-30668-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 7db0af5) Signed-off-by: Wenchen Fan <[email protected]>
|
Thanks for the review! |
… for new DateFormatter ### What changes were proposed in this pull request? This is a follow-up work for apache#27441. For the cases of new TimestampFormatter return null while legacy formatter can return a value, we need to throw an exception instead of silent change. The legacy config will be referenced in the error message. ### Why are the changes needed? Avoid silent result change for new behavior in 3.0. ### Does this PR introduce any user-facing change? Yes, an exception is thrown when we detect legacy formatter can parse the string and the new formatter return null. ### How was this patch tested? Extend existing UT. Closes apache#27537 from xuanyuanking/SPARK-30668-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This is a follow-up work for #27441. For the cases of new TimestampFormatter return null while legacy formatter can return a value, we need to throw an exception instead of silent change. The legacy config will be referenced in the error message.
Why are the changes needed?
Avoid silent result change for new behavior in 3.0.
Does this PR introduce any user-facing change?
Yes, an exception is thrown when we detect legacy formatter can parse the string and the new formatter return null.
How was this patch tested?
Extend existing UT.