Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jun 21, 2020

What changes were proposed in this pull request?

Replace Decimal by Int op in the MakeInterval & MakeTimestamp expression. For instance, (secs * Decimal(MICROS_PER_SECOND)).toLong can be replaced by the unscaled long because the former one already contains microseconds.

Why are the changes needed?

To improve performance.

Before:

make_timestamp():                         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
...
make_timestamp(2019, 1, 2, 3, 4, 50.123456)             94             99           4         10.7          93.8      38.8X

After:

make_timestamp(2019, 1, 2, 3, 4, 50.123456)             76             92          15         13.1          76.5      48.1X

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • By existing test suites IntervalExpressionsSuite, DateExpressionsSuite and etc.
  • Re-generate results of MakeDateTimeBenchmark in the environment:
Item Description
Region us-west-2 (Oregon)
Instance r3.xlarge
AMI ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5)
Java OpenJDK 64-Bit Server VM 1.8.0_252 and OpenJDK 64-Bit Server VM 11.0.7+10

@SparkQA
Copy link

SparkQA commented Jun 21, 2020

Test build #124337 has finished for PR 28886 at commit b9e2ee0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 21, 2020

@dongjoon-hyun While working on #28873, I had realised that some Decimal ops are not needed actually, and they can be replaced by regular int ops. May I ask you to take a look at this.

val nanosPerSec = Decimal(NANOS_PER_SECOND, 10, 0)
val nanos = ((secAndNanos - secFloor) * nanosPerSec).toInt
val seconds = secFloor.toInt
assert(secAndMicros.scale == 6,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that this is the right place to ask, but I tried to Google this but didn't find anything. I was just wondering what does assert(secAndMicros.scale == 6 do exactly? I am assuming that it changes the Decimal's scale to 6, but what does that exactly help with? And is the added tail just a bunch of zeros? For instance does 3.14 become 3.140000?
Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that it changes the Decimal's scale to 6

This assert doesn't change the scale, it just reads it.

And is the added tail just a bunch of zeros? For instance does 3.14 become 3.140000?

It shifts the decimal point. For example, if you have 3.14 with precision 6 and scale 3 that means 003.140. If you set

  • scale to 0, it becomes 3140
  • scale to 4 -> 0.314

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you so much!

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 22, 2020

@cloud-fan @juliuszsompolski @HyukjinKwon Could you review this small perf improvement.

val totalMonths = Math.addExact(months, Math.multiplyExact(years, MONTHS_PER_YEAR))
val totalDays = Math.addExact(days, Math.multiplyExact(weeks, DAYS_PER_WEEK))
var micros = (secs * Decimal(MICROS_PER_SECOND)).toLong
assert(secs.scale == 6, "Seconds fractional must have 6 digits for microseconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we check the precision as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check it but precision value is not important for this code...or I am wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, Decimal guarantees that precision >= scale (@Ngone51 correct?). That's enough for the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, I think we should say DecimalType guarantees precision >= scale. But since any Decimal is directly or indirectly bounded with a DecimalType. Therefore, Decimal also guarantees precision >= scale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added assert there to guarantee that Int will not overflow. What would you like to assert here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sanity check? The precision must be <= 8 here, right?

Copy link
Member Author

@MaxGekk MaxGekk Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not anymore, please, take a look at #28873 and https://issues.apache.org/jira/browse/SPARK-32021

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in e00f43c Jun 23, 2020
@MaxGekk MaxGekk deleted the make_interval-opt-decimal branch December 11, 2020 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants