Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Oct 30, 2019

What changes were proposed in this pull request?

In the PR, I propose to extract parsing of the seconds interval units to the private method parseNanos in IntervalUtils and modify the code to correctly parse the fractional part of the seconds unit of intervals in the cases:

  • When the fractional part has less than 9 digits
  • The seconds unit is negative

Why are the changes needed?

The changes are needed to fix the issues:

spark-sql> select interval '10.123456 seconds';
interval 10 seconds 123 microseconds

The correct result must be interval 10 seconds 123 milliseconds 456 microseconds

spark-sql> select interval '-10.123456789 seconds';
interval -9 seconds -876 milliseconds -544 microseconds

but the whole interval should be negated, and the result must be interval -10 seconds -123 milliseconds -456 microseconds, taking into account the truncation to microseconds.

Does this PR introduce any user-facing change?

Yes. After changes:

spark-sql> select interval '10.123456 seconds';
interval 10 seconds 123 milliseconds 456 microseconds
spark-sql> select interval '-10.123456789 seconds';
interval -10 seconds -123 milliseconds -456 microseconds

How was this patch tested?

By existing and new tests in ExpressionParserSuite.

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112901 has finished for PR 26313 at commit d768468.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member

yaooqinn commented Oct 30, 2019

This also might be fixed in #26314, would you please take a look?

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 30, 2019

Probably, this could be fixed by the new feature proposed by #26314 but tests from this PR and the fix of the existing code are valuable as well, I think. @cloud-fan @dongjoon-hyun WDYT?

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 30, 2019

The bugs exist in the versions since 2.0, and the fixes can be ported back but the feature in #26314 cannot, I think.

@dongjoon-hyun
Copy link
Member

I agree with @MaxGekk .

@cloud-fan
Copy link
Contributor

not related to this PR, but I find 10 seconds 123 milliseconds 456 microseconds is hard to read. Shall we only display a fraction value of second?

}

private def parseNanos(nanosStr: String, isNegative: Boolean): Long = {
val alignedStr = if (nanosStr == null) nanosStr else (nanosStr + "000000000").substring(0, 9)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we return 0 immediately if nanosStr is null? It's easier to read than relying on toLongWithRange to return 0.

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112907 has finished for PR 26313 at commit 46f1464.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112922 has finished for PR 26313 at commit bf324f1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112914 has finished for PR 26313 at commit 182deef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112923 has finished for PR 26313 at commit a9797db.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112930 has finished for PR 26313 at commit 337e9af.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 30, 2019

The failure #26313 (comment) is not related to the changes, it seems

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 30, 2019

jenkins, retest this, please

@dongjoon-hyun
Copy link
Member

Yes. +1 for @cloud-fan 's suggestion. Originally, I thought it's natural, but @cloud-fan 's suggestion sounds more practical.

not related to this PR, but I find 10 seconds 123 milliseconds 456 microseconds is hard to read. Shall we only display a fraction value of second?

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 30, 2019

Yes. +1 for @cloud-fan 's suggestion. Originally, I thought it's natural, but @cloud-fan 's suggestion sounds more practical.

Let's discuss this there: https://issues.apache.org/jira/browse/SPARK-29671

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112959 has finished for PR 26313 at commit 337e9af.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 3206a99 Oct 31, 2019
@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 31, 2019

Should I back port this on 2.4?

@cloud-fan
Copy link
Contributor

yes, please send a backport PR, thanks!

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 1, 2019

Here is the back port to 2.4 #26355

@MaxGekk MaxGekk deleted the fix-interval-nanos-parsing branch June 5, 2020 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants