Fix precision loss in from_unixtime(double) function#21899
Conversation
mbasmanova
left a comment
There was a problem hiding this comment.
@hainenber Thank you for the fix.
Please, review Contributing guidelines at https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md and update the PR to comply.
There was a problem hiding this comment.
Notice that function name is from_unixtime, not from_unix. Make sure to use the right name in PR title/description, commit message, Release Notes. etc.
Is TestExpressionCompiler.java the right place for this test? Shouldn't it go somewhere in com/facebook/presto/operator/scalar/TestDateTimeFunctionsBase.java
There was a problem hiding this comment.
Thanks, I've moved them to com/facebook/presto/operator/scalar/TestDateTimeFunctionsBase.java in latest commit
FROM_UNIX() resultFROM_UNIXTIME() result
3ef4ef7 to
0484117
Compare
|
Commits squashed and adhere to commit message guideline. |
There was a problem hiding this comment.
Would you confirm that this test fails without the change?
There was a problem hiding this comment.
Maybe adding a comment that this particular double was causing loss of precision in the function before the fix and hence we are testing it here.
In case anyone wondering why this number.
There was a problem hiding this comment.
Is this a fix for when the timestamp presented as double has nanos? I cannot repro this issue when working with millis directly.
assertFunction("from_unixtime(1.7041507095805E9)"
My not so unit test code:
public static double toUnixTimeTest(long timestamp)
{
return timestamp / 1000.0;
}
public static long fromUnixTimeTestOld(double unixTime)
{
return Math.round(unixTime * 1000);
}
public static long fromUnixTimeTestNew(double unixTime)
{
return Math.round(Math.floor(unixTime) * 1000 + Math.round((unixTime - Math.floor(unixTime)) * 1000));
}
public static void main(String[] args)
{
SqlTimestamp baselineTimestamp = sqlTimestampOf(LocalDateTime.of(2024, 1, 1, 23, 11, 49, millisToNanos(580)));
long baselineMillis = baselineTimestamp.getMillis();
for (int i = -1_000_000_000; i < 1_000_000_000; i++) {
long expectedMillis = baselineMillis + i;
double doubleValue = toUnixTimeTest(expectedMillis);
long oldMillis = fromUnixTimeTestOld(doubleValue);
long newMillis = fromUnixTimeTestNew(doubleValue);
if (expectedMillis != oldMillis || oldMillis != newMillis) {
SqlTimestamp tsOld = new SqlTimestamp(oldMillis, MILLISECONDS);
SqlTimestamp tsNew = new SqlTimestamp(newMillis, MILLISECONDS);
System.out.println(expectedMillis);
System.out.println(baselineTimestamp);
System.out.println(tsOld);
System.out.println(tsNew);
System.out.println();
}
}
}
There was a problem hiding this comment.
@mbasmanova In my understanding this change might cause checksum mismatches when running verifier on Velox written DWRF partitions because Velox uses nano precision in timestamps.
There was a problem hiding this comment.
This is a tricky change. Future readers won't know why this is done in this particular way and 'git log' won't help much because commit message just says 'loses precision in some corner cases' without providing any details.
Would you add a comment here to explain what's going on and why the computation is done this way and also update commit message to explain clearly the problem and solution?
There was a problem hiding this comment.
@hainenber
We could reuse some sentences from the issue and put them in comment.
There was a problem hiding this comment.
I've amended the change with more details in the form of comments and commit message.
spershin
left a comment
There was a problem hiding this comment.
Thank you for tacking ling this change.
It is good to go.
Please, add some comments to the new code as reviewers suggested.
There was a problem hiding this comment.
@hainenber
We could reuse some sentences from the issue and put them in comment.
There was a problem hiding this comment.
Maybe adding a comment that this particular double was causing loss of precision in the function before the fix and hence we are testing it here.
In case anyone wondering why this number.
0484117 to
3f0b9ef
Compare
|
@hainenber |
mbasmanova
left a comment
There was a problem hiding this comment.
@hainenber Thank you for iterating on this PR. Looks good % some nits and commit message needs updating.
There are typos in the commit message, the title is too long and the body has some lines that are too long. I suggest to use the following as the commit message.
Fix precision loss in from_unixtime(double) function
from_unixtime(1.7041507095805E9) used to return 1704150709 seconds and 581
milliseconds. It should return 580 milliseconds.
Before this change, the function used Math.round(unixTime * 1000), which loses
precision in some cases.
In the above case, it pushes the resulting number to 1704150709580.500000000,
which after rounding becomes 1704150709581, e.g. 581 milliseconds.
There was a problem hiding this comment.
and hence we are testing it here
Drop this phrase as it is redundant.
There was a problem hiding this comment.
Let's move this comment inside the function.
There was a problem hiding this comment.
Machine-representable double for the 1.7041507095805E9 is 1704150709.58049988746643066406.
My understanding is that 1.7041507095805E9 cannot be represented in a computer. Are you saying that 1704150709.58049988746643066406 is the closest value that can be represented?
There was a problem hiding this comment.
Yes, that's the original sentence from the author :D
FROM_UNIXTIME() result|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 75ebaf1...5acdf21. No notifications. |
934f6c9 to
b1c9942
Compare
spershin
left a comment
There was a problem hiding this comment.
Looks good, thank you for working on this and addressing tons of comments!
|
Let's rebase and merge this change. |
|
Let's rebase and merge this change. |
|
hi there, this change can have other folks taking over. Sorry for the belated info! |
from_unixtime(1.7041507095805E9) used to return 1704150709 seconds and 581 milliseconds. It should return 580 milliseconds. Before this change, the function used Math.round(unixTime * 1000), which loses precision in some cases. In the above case, it pushes the resulting number to 1704150709580.500000000, which after rounding becomes 1704150709581, e.g. 581 milliseconds.
b1c9942 to
5acdf21
Compare
|
@spershin rebased, could you please re-review? |
|
Accepted, however, I'm not a committer, so a committer needs to review this too. cc @mbasmanova ? |


Description
Apply spershin's proposed change to fix precision loss for timestamp yielded from
FROM_UNIXTIME()function.Motivation and Context
Fixes #21891
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.