Skip to content

Handle NaN when pushing filter for JDBC connector#21923

Closed
Praveen2112 wants to merge 3 commits intotrinodb:masterfrom
Praveen2112:praveen/nan_for_pg
Closed

Handle NaN when pushing filter for JDBC connector#21923
Praveen2112 wants to merge 3 commits intotrinodb:masterfrom
Praveen2112:praveen/nan_for_pg

Conversation

@Praveen2112
Copy link
Copy Markdown
Member

@Praveen2112 Praveen2112 commented May 10, 2024

Description

Fixes #21922
Some JDBC based sources like PotgreSql, Oracle, Ignite etc treats NaN values as equal,
and greater than all non-NaN values. This is different from Trino's behaviour where NaN values
are not equal, and they are not greater than or lesser than non-NaN values which results in in-correct
results when they are pushed down to the underlying datasource. So we push down additional condition
to ensure that NaN values are not considered in comparison operations.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label May 10, 2024
@Praveen2112 Praveen2112 force-pushed the praveen/nan_for_pg branch 5 times, most recently from 8e50017 to ded8441 Compare May 12, 2024 02:34
@Praveen2112 Praveen2112 added tests:all Run all tests snowflake Snowflake connector labels May 12, 2024
@Praveen2112 Praveen2112 force-pushed the praveen/nan_for_pg branch from ded8441 to 4391b08 Compare May 12, 2024 06:00
@Praveen2112 Praveen2112 marked this pull request as ready for review May 12, 2024 06:01
@Praveen2112
Copy link
Copy Markdown
Member Author

cc: @findepi I attempted a fix for it Please share your feedback for the same.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about <, <=, =, !=?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this check of other operators as they work as expected - in case of Postgres NaN is placed about Infinity so we need to add them only for > or >= operator.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a code comment.

when you write it down say "in PostgreSQL ..." in a generic class, you'll realize that you're exploiting a (common) implementation detail.

  • what does the spec say about double and NaN ordering? (i think the spec omits existence of NaNs, but i may be wrong)
  • even if the spec said something definitive, implementation vary (for example, Trino and PostgreSQL have different behavior). Which means the behavior should be implemented in connector-specific manner.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'NaN' as a varchar?
are you sure this is portable SQL?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the tests 'NaN' is not parsed as a Varchar but considered as a NaN representation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is used along with double/real columns I think it is coerced automatically and the tests also proves the same.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may work for PostgreSQL and may also work for some other databases, but I doubt this to be a really portable behavior (for example, Trino does not support comparing double values with 'NaN' literal, does it?) and this class is meant to be portable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks wrong. why is it OK to skip this test for mysql?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why override? document

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

braces around left < right are redundant
braces around whole replaced expression are also redundant

(braces are inserted automatically by rewrite engine)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about other comparison operators?

@Praveen2112 Praveen2112 force-pushed the praveen/nan_for_pg branch 8 times, most recently from 1df9d6e to 0e5acfc Compare May 18, 2024 01:23
@Praveen2112 Praveen2112 changed the title Handle NaN when pushing filter for Postgres connector Handle NaN when pushing filter for JDBC connector May 18, 2024
@Praveen2112 Praveen2112 force-pushed the praveen/nan_for_pg branch 2 times, most recently from 22a1cf4 to 86a9211 Compare May 18, 2024 04:05
@Praveen2112 Praveen2112 force-pushed the praveen/nan_for_pg branch from 86a9211 to d3db6cc Compare May 20, 2024 15:48
For equals and not equals rewrite we capture them as GenericRewrite expression and usage of
RewriteComparison rule is redundant here.
Some JDBC based sources like PotgreSql, Oracle, Ignite etc treats NaN values as equal,
and greater than all non-NaN values. This is different from Trino's behaviour where NaN values
are not equal, and they are not greater than or lesser than non-NaN values which results in in-correct
results when they are pushed down to the underlying datasource. So we push down additional condition
to ensure that NaN values are not considered in comparison operations.
@Praveen2112 Praveen2112 force-pushed the praveen/nan_for_pg branch from d3db6cc to 1caffaf Compare May 20, 2024 16:16
@Praveen2112
Copy link
Copy Markdown
Member Author

@findepi Thanks for the review. AC

@findepi
Copy link
Copy Markdown
Member

findepi commented May 21, 2024

First two commits lgtm. You can merge them.

@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jun 11, 2024
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jul 3, 2024

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Jul 3, 2024
@mosabua mosabua reopened this Jul 4, 2024
@mosabua
Copy link
Copy Markdown
Member

mosabua commented Jul 4, 2024

@Praveen2112 @findepi how do you want to proceed on this. Looks like at least part of the PR is approved .. maybe update it all or cherrypick some commits into a separate PR and merge that?

@mosabua mosabua removed the stale label Jul 4, 2024
@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions
Copy link
Copy Markdown

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed snowflake Snowflake connector stale tests:all Run all tests

Development

Successfully merging this pull request may close these issues.

Correctness issue for predicate pushdown on approx numeric column in Postgres

3 participants