Skip to content

Fix precision loss in from_unixtime(double) function#21899

Merged
spershin merged 1 commit intoprestodb:masterfrom
hainenber:gh-21891-fix-precision-loss-for-fromunix-timestamp-result
Mar 12, 2024
Merged

Fix precision loss in from_unixtime(double) function#21899
spershin merged 1 commit intoprestodb:masterfrom
hainenber:gh-21891-fix-precision-loss-for-fromunix-timestamp-result

Conversation

@hainenber
Copy link
Contributor

@hainenber hainenber commented Feb 11, 2024

Description

Apply spershin's proposed change to fix precision loss for timestamp yielded from FROM_UNIXTIME() function.

Motivation and Context

Fixes #21891

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
*  Fixes precision loss when timestamp yielded from `from_unixtime(double)` function.

@hainenber hainenber requested a review from a team as a code owner February 11, 2024 09:34
@hainenber hainenber requested a review from presto-oss February 11, 2024 09:34
@tdcmeehan tdcmeehan requested a review from spershin February 11, 2024 19:23
@tdcmeehan tdcmeehan self-assigned this Feb 11, 2024
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hainenber Thank you for the fix.

Please, review Contributing guidelines at https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md and update the PR to comply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that function name is from_unixtime, not from_unix. Make sure to use the right name in PR title/description, commit message, Release Notes. etc.

Is TestExpressionCompiler.java the right place for this test? Shouldn't it go somewhere in com/facebook/presto/operator/scalar/TestDateTimeFunctionsBase.java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've moved them to com/facebook/presto/operator/scalar/TestDateTimeFunctionsBase.java in latest commit

@hainenber hainenber changed the title fix(main/op/scalar): correct timestamp precision for FROM_UNIX() result fix(main/op/scalar): correct timestamp precision for FROM_UNIXTIME() result Feb 12, 2024
@hainenber hainenber force-pushed the gh-21891-fix-precision-loss-for-fromunix-timestamp-result branch from 3ef4ef7 to 0484117 Compare February 12, 2024 11:14
@hainenber
Copy link
Contributor Author

Commits squashed and adhere to commit message guideline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you confirm that this test fails without the change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a comment that this particular double was causing loss of precision in the function before the fix and hence we are testing it here.
In case anyone wondering why this number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm the test fail without the changes.

image

I reused many data in your GH issue to provide the details on the why for the change :D

Copy link
Collaborator

@sdruzkin sdruzkin Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a fix for when the timestamp presented as double has nanos? I cannot repro this issue when working with millis directly.

assertFunction("from_unixtime(1.7041507095805E9)"

My not so unit test code:

    public static double toUnixTimeTest(long timestamp)
    {
        return timestamp / 1000.0;
    }
    public static long fromUnixTimeTestOld(double unixTime)
    {
        return Math.round(unixTime * 1000);
    }

    public static long fromUnixTimeTestNew(double unixTime)
    {
        return Math.round(Math.floor(unixTime) * 1000 + Math.round((unixTime - Math.floor(unixTime)) * 1000));
    }

    public static void main(String[] args)
    {
        SqlTimestamp baselineTimestamp = sqlTimestampOf(LocalDateTime.of(2024, 1, 1, 23, 11, 49, millisToNanos(580)));
        long baselineMillis = baselineTimestamp.getMillis();

        for (int i = -1_000_000_000; i < 1_000_000_000; i++) {
            long expectedMillis = baselineMillis + i;
            double doubleValue = toUnixTimeTest(expectedMillis);
            long oldMillis = fromUnixTimeTestOld(doubleValue);
            long newMillis = fromUnixTimeTestNew(doubleValue);

            if (expectedMillis != oldMillis || oldMillis != newMillis) {
                SqlTimestamp tsOld = new SqlTimestamp(oldMillis, MILLISECONDS);
                SqlTimestamp tsNew = new SqlTimestamp(newMillis, MILLISECONDS);

                System.out.println(expectedMillis);
                System.out.println(baselineTimestamp);
                System.out.println(tsOld);
                System.out.println(tsNew);
                System.out.println();
            }
        }
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova In my understanding this change might cause checksum mismatches when running verifier on Velox written DWRF partitions because Velox uses nano precision in timestamps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky change. Future readers won't know why this is done in this particular way and 'git log' won't help much because commit message just says 'loses precision in some corner cases' without providing any details.

Would you add a comment here to explain what's going on and why the computation is done this way and also update commit message to explain clearly the problem and solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hainenber
We could reuse some sentences from the issue and put them in comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've amended the change with more details in the form of comments and commit message.

Copy link
Contributor

@spershin spershin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hainenber

Thank you for tacking ling this change.
It is good to go.
Please, add some comments to the new code as reviewers suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hainenber
We could reuse some sentences from the issue and put them in comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a comment that this particular double was causing loss of precision in the function before the fix and hence we are testing it here.
In case anyone wondering why this number.

@hainenber hainenber force-pushed the gh-21891-fix-precision-loss-for-fromunix-timestamp-result branch from 0484117 to 3f0b9ef Compare February 13, 2024 09:57
@spershin
Copy link
Contributor

@hainenber
You might want to rebase your PR on the fresh master and resubmit - there was an update to prevent e2e tests from failing all the time.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hainenber Thank you for iterating on this PR. Looks good % some nits and commit message needs updating.

There are typos in the commit message, the title is too long and the body has some lines that are too long. I suggest to use the following as the commit message.

Fix precision loss in from_unixtime(double) function

from_unixtime(1.7041507095805E9) used to return 1704150709 seconds and 581
milliseconds. It should return 580 milliseconds.

Before this change, the function used Math.round(unixTime * 1000), which loses
precision in some cases.

In the above case, it pushes the resulting number to 1704150709580.500000000,
which after rounding becomes 1704150709581, e.g. 581 milliseconds.

Current commit message:
Screenshot 2024-02-14 at 1 58 50 AM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and hence we are testing it here

Drop this phrase as it is redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this comment inside the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Machine-representable double for the 1.7041507095805E9 is 1704150709.58049988746643066406.

My understanding is that 1.7041507095805E9 cannot be represented in a computer. Are you saying that 1704150709.58049988746643066406 is the closest value that can be represented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the original sentence from the author :D

@mbasmanova mbasmanova changed the title fix(main/op/scalar): correct timestamp precision for FROM_UNIXTIME() result Fix precision loss in from_unixtime(double) function Feb 14, 2024
@mbasmanova
Copy link
Contributor

CC: @kaikalur @aditi-pandit

@github-actions
Copy link

github-actions bot commented Feb 15, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff 75ebaf1...5acdf21.

No notifications.

@hainenber hainenber force-pushed the gh-21891-fix-precision-loss-for-fromunix-timestamp-result branch from 934f6c9 to b1c9942 Compare February 15, 2024 09:06
Copy link
Contributor

@spershin spershin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you for working on this and addressing tons of comments!

@spershin
Copy link
Contributor

@hainenber

Let's rebase and merge this change.
Thanks!

@mbasmanova mbasmanova requested a review from rschlussel March 7, 2024 17:56
@spershin
Copy link
Contributor

spershin commented Mar 7, 2024

@hainenber

Let's rebase and merge this change.
Let us know, please, if you for some reason are unable to do this.
Thanks!

@hainenber
Copy link
Contributor Author

hi there, this change can have other folks taking over. Sorry for the belated info!

from_unixtime(1.7041507095805E9) used to return 1704150709 seconds and 581
milliseconds. It should return 580 milliseconds.

Before this change, the function used Math.round(unixTime * 1000), which loses
precision in some cases.

In the above case, it pushes the resulting number to 1704150709580.500000000,
which after rounding becomes 1704150709581, e.g. 581 milliseconds.
@tdcmeehan tdcmeehan force-pushed the gh-21891-fix-precision-loss-for-fromunix-timestamp-result branch from b1c9942 to 5acdf21 Compare March 8, 2024 16:21
@tdcmeehan
Copy link
Contributor

@spershin rebased, could you please re-review?

Copy link
Contributor

@spershin spershin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@spershin
Copy link
Contributor

@tdcmeehan

Accepted, however, I'm not a committer, so a committer needs to review this too.

cc @mbasmanova ?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@spershin spershin merged commit 6a9f3cd into prestodb:master Mar 12, 2024
@wanglinsong wanglinsong mentioned this pull request May 1, 2024
48 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FROM_UNIXTIME(double) implementation harbors double precision bug.

5 participants