Skip to content

Conversation

@xuanyuanking
Copy link
Member

@xuanyuanking xuanyuanking commented Feb 11, 2020

What changes were proposed in this pull request?

This is a follow-up work for #27441. For the cases of new TimestampFormatter return null while legacy formatter can return a value, we need to throw an exception instead of silent change. The legacy config will be referenced in the error message.

Why are the changes needed?

Avoid silent result change for new behavior in 3.0.

Does this PR introduce any user-facing change?

Yes, an exception is thrown when we detect legacy formatter can parse the string and the new formatter return null.

How was this patch tested?

Extend existing UT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes will be reverted after #27524.

@SparkQA
Copy link

SparkQA commented Feb 11, 2020

Test build #118205 has finished for PR 27537 at commit 4679600.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 12, 2020

Test build #118271 has finished for PR 27537 at commit b9b3c8f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

@xuanyuanking can you fix the conflicts?

@xuanyuanking
Copy link
Member Author

Sure, will also reuse the LegacyBehaviorPolicy added in #27579

@xuanyuanking xuanyuanking changed the title [WIP][SPARK-30668][SQL][FOLLOWUP] Raise exception instead of silent change for new TimestampFormatter [SPARK-30668][SQL][FOLLOWUP] Raise exception instead of silent change for new TimestampFormatter Feb 21, 2020
@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118754 has finished for PR 27537 at commit c07d09d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

cc @MaxGekk

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118846 has finished for PR 27537 at commit 86313e5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to create DateFormatter and TimestampFormatter, and I'm pretty sure these 2 formatters are used in many expressions and other places like the json parser.

I think it's better to put the logic in the formatter, not in some expressions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, got it. I moved this logic into formatter in cc9fd4f.

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118856 has finished for PR 27537 at commit 6f51b13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 27, 2020

Test build #119021 has finished for PR 27537 at commit cc9fd4f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different places can create different legacy formatter. It's better to always create the legacy formatter, and the new formatter just carry the instance of legacy formatter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done in 47d2f80

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC some places just call formatter and catch NonFatal. Can you do some checks and make sure we don't catch non-fatal blindly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding, fix for the non-codegen in fbac26d.

@SparkQA
Copy link

SparkQA commented Feb 27, 2020

Test build #119034 has finished for PR 27537 at commit addfccc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 27, 2020

Test build #119035 has finished for PR 27537 at commit 5f28f5b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

// When legacy time parser policy set to EXCEPTION, check whether we will get different results
// between legacy format and new format. For legacy parser, DateTimeParseException will not be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format -> parser?


// When legacy time parser policy set to EXCEPTION, check whether we will get different results
// between legacy format and new format. For legacy parser, DateTimeParseException will not be
// thrown. On the contrary, if the legacy policy set to CORRECTED, DateTimeParseException will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For legacy parser, DateTimeParseException will not be thrown
I think it should be
If new parser fails but legacy parser works, throw a SparkUpgradeException.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the contrary, if the legacy policy set to CORRECTED ...

If the legacy policy set to CORRECTED, do nothing and let the exception propagate.

case _: Throwable => None
}
if (res.nonEmpty) {
throw new SparkUpgradeException("3.0", s"Set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explain what happened.

Fail to parse '$s' in the new parser. You can set ... to LEGACY to restore ..., or set it to CORRECTED and treat it as an invalid datetime string.

val msg = intercept[SparkException] {
df2.collect()
}.getCause.getMessage
assert(msg.contains(s"Set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to LEGACY to restore " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good enough to test if the cause is SparkUpgradeException

val message = intercept[SparkException] {
df.collect()
}.getCause.getMessage
assert(message.contains(s"Set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to LEGACY to restore " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

val msg = intercept[SparkException] {
csv.collect()
}.getCause.getMessage
assert(msg.contains(s"Set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to LEGACY to restore " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

val msg = intercept[SparkException] {
json.collect()
}.getCause.getMessage
assert(msg.contains(s"Set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to LEGACY to restore " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* Exception thrown when Spark returns different result after upgrading to a new version.
*/
private[spark] class SparkUpgradeException(version: String, message: String, cause: Throwable)
extends SparkException(s"Exception for upgrading to Spark $version: $message", cause)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may get a different result due to the upgrading of Spark $version: $message

@SparkQA
Copy link

SparkQA commented Mar 4, 2020

Test build #119299 has finished for PR 27537 at commit 1a92e0f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

checkExceptionInExpression[SparkUpgradeException](
GetTimestamp(
Literal("2020-01-27T20:06:11.847-0800"),
Literal("yyyy-MM-dd'T'HH:mm:ss.SSSz")), "Exception for upgrading to Spark 3.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fail to parse"?

@SparkQA
Copy link

SparkQA commented Mar 4, 2020

Test build #119306 has finished for PR 27537 at commit 0e543da.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 5, 2020

Test build #119355 has finished for PR 27537 at commit d01c750.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 7db0af5 Mar 5, 2020
cloud-fan pushed a commit that referenced this pull request Mar 5, 2020
… for new DateFormatter

This is a follow-up work for #27441. For the cases of new TimestampFormatter return null while legacy formatter can return a value, we need to throw an exception instead of silent change. The legacy config will be referenced in the error message.

Avoid silent result change for new behavior in 3.0.

Yes, an exception is thrown when we detect legacy formatter can parse the string and the new formatter return null.

Extend existing UT.

Closes #27537 from xuanyuanking/SPARK-30668-follow.

Authored-by: Yuanjian Li <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 7db0af5)
Signed-off-by: Wenchen Fan <[email protected]>
@xuanyuanking
Copy link
Member Author

Thanks for the review!

@xuanyuanking xuanyuanking deleted the SPARK-30668-follow branch March 5, 2020 10:20
@xuanyuanking xuanyuanking changed the title [SPARK-30668][SQL][FOLLOWUP] Raise exception instead of silent change for new DateFormatter [SPARK-31410][SPARK-30668][SQL][FOLLOWUP] Raise exception instead of silent change for new DateFormatter Apr 10, 2020
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
… for new DateFormatter

### What changes were proposed in this pull request?
This is a follow-up work for apache#27441. For the cases of new TimestampFormatter return null while legacy formatter can return a value, we need to throw an exception instead of silent change. The legacy config will be referenced in the error message.

### Why are the changes needed?
Avoid silent result change for new behavior in 3.0.

### Does this PR introduce any user-facing change?
Yes, an exception is thrown when we detect legacy formatter can parse the string and the new formatter return null.

### How was this patch tested?
Extend existing UT.

Closes apache#27537 from xuanyuanking/SPARK-30668-follow.

Authored-by: Yuanjian Li <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants