Skip to content

Add support for timestamp type to Pinot connector#12145

Closed
ddcprg wants to merge 1 commit intotrinodb:masterfrom
ddcprg:issue_10199
Closed

Add support for timestamp type to Pinot connector#12145
ddcprg wants to merge 1 commit intotrinodb:masterfrom
ddcprg:issue_10199

Conversation

@ddcprg
Copy link
Member

@ddcprg ddcprg commented Apr 26, 2022

Description

Add support for TIMESTAMP data type in Pinot connector. Help fixing #10199

Is this change a fix, improvement, new feature, refactoring, or other?
Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)
Pinot connector

How would you describe this change to a non-technical end user or system administrator?
Include additional Pinot data types in Pinot connector

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
(X) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(X) Release notes entries required with the following suggested text:

# Section
* Add support for TIMESTAMP data type in Pinot connector. ({issue}`10199`)

@cla-bot cla-bot bot added the cla-signed label Apr 26, 2022
@github-actions github-actions bot added the docs label Apr 26, 2022
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@ddcprg ddcprg Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add one more test column for multi-value timestamps - I need to check this is actually supported

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get back to this once apache/pinot#8624 is resolved

@ebyhr ebyhr removed their request for review April 26, 2022 23:15
@hashhar
Copy link
Member

hashhar commented Apr 27, 2022

@elonazoulay can you please help with an initial review? I won't be able to review this thoroughly until ~9th May so expect some delays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be JsonType or VarcharType? How JSON functions will be handled here e.g. json_extract ?

Copy link
Member Author

@ddcprg ddcprg Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think JsonType is not part of the SPI https://github.com/trinodb/trino/tree/master/core/trino-spi/src/main/java/io/trino/spi/type and is a type only available to functions in query time, I'll double check that just in case I missed it. I was going to leave functions out of the scope of this PR but I'll check how easy it would be to include them otherwise I'll raise a separate issue for this. Probably the only 2 function that could be easily mapped are json_extract_scalar and json_format as each has an equivalent in Pinot. I'll get back to you on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@ddcprg ddcprg May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xiangfu0 I'll take a look at those. In the meantime it seems I've found an issue with multi-value timestamp columns, I've raised an issue in the pinot project apache/pinot#8624 I'll leave the test code for timestamp arrays commented out and refer to this issue in a comment for the time being. I'll try to raise PR to fix it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get JsonType via TypeManager. PostgreSqlClient has the example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to revert this change and implement in #13428

@nizarhejazi
Copy link

@elonazoulay can you please help with PR review?

@ddcprg ddcprg force-pushed the issue_10199 branch 3 times, most recently from 5a49dfe to 7a55c31 Compare May 2, 2022 16:28
@nizarhejazi
Copy link

nizarhejazi commented May 6, 2022

@ddcprg thanks for putting this together. @hashhar @elonazoulay @xiangfu0 Can we do another round of reviews on Monday May 9th? Thanks.

@nizarhejazi
Copy link

@hashhar @elonazoulay @xiangfu0 Can you review PR and sign off if no blocker?

@hashhar
Copy link
Member

hashhar commented May 11, 2022

I'll take a look during the end of this week.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid casts from Number. Byte, BigDecimal etc. are all subclasses of Number.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this to Long

Copy link
Member Author

@ddcprg ddcprg May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar seems like the test fails because the aggregate function returns double instead of long, I need to dig further if we don't want to cast from Number

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar @xiangfu0 Pinot's MAX function returns double https://docs.pinot.apache.org/users/user-guide-query/supported-aggregations I think the options are either keeping Number or having 2 conditionals: one for Double and one for Long. I rather keep checking for Number type, let me know your thoughts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think number is ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having else if (value instanceof Double || value instanceof Long) is still useful in my opinion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now made this change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this return correct results if both a null and 1970-01-01T00:00:00 exist?

Copy link
Member Author

@ddcprg ddcprg May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default null value for timestamps is epoch so this should return 1970-01-01T00:00:00 https://docs.pinot.apache.org/configuration-reference/schema#dimensionfieldspec

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would argue it's unsafe to push-down aggregates in presence of nulls. Maybe we can instead use query-passthrough (by implementing a polymorphic table function as in #12325) for such cases to get Pinot semantics.

cc: @findepi Any opinions? i.e. DISTINCT on a column with null and epoch would return epoch since the null becomes epoch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does null get converted to epoch only when DISTINCT is applied?
If SELECT return null and epoch then SELECT DISTINCT should still do that, and SELECT count(DISTINCT ..) should return 2.

@hashhar worth adding correctness smoke tests for null vs 0 and null vs '' in BCT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to double check and get back to you. I think this is the same behaviour of other types like long and double

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, Pinot schema will assign the default null value. So this behavior is expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't push down such queries then probably. While the results match semantics of Pinot they don't match Trino so anyone performing a JOIN or SELECT across two catalogs will see inconsistent results.

For preserving Pinot semantics we should implement something like https://trino.io/docs/current/connector/postgresql.html#query-varchar-table instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in pinot the only way to make distinction between null and non-null values is by using IS NULL and IS NOT NULL other than this pinot will assign the default value if null. Push-down already happens for other types, should this be addressed in a separate ticket? I could take a look at this once we merge this PR. I'll summon @elonazoulay on this one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this for filter pushdown - the behavior is not consistent with trino sql: you can select ... where is null and the result will be non null (i.e. the default value). Will take a look and see what can be done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the projection in Pinot will always return the default value if column is null

@nizarhejazi
Copy link

@ddcprg Can you address @hashhar feedback? thanks.

@ddcprg ddcprg force-pushed the issue_10199 branch 2 times, most recently from ead0d04 to 50cdb05 Compare May 23, 2022 15:06
@nizarhejazi
Copy link

@hashhar @xiangfu0 can you take another look? thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why write microseconds to this TIMESTAMP_MILLIS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nizarhejazi @ddcprg can you check this?
Also make the CI passing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangfu0 I'm absent at the moment. I'll retake this in 2 weeks time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder

Copy link
Member Author

@ddcprg ddcprg Jul 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a temporary conversion to micros to be able to invoke TIMESTAMP_MILLIS.writeLong(output, epochMicros); if there is a better way to write milliseconds to the output block please let me know. Please read the javadocs in ShortTimestampType

@nizarhejazi
Copy link

@ddcprg @xiangfu0 @hashhar @findepi any update on this PR? It has been in review for more than a month and half!

Copy link
Contributor

@xiangfu0 xiangfu0 Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a few tests for timestamp predicate and pushdown?
E.g. where timestamp_col > timestamp '2022-01-01 00:00:00.000'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangfu0 thanks for suggesting this test! I found I had to add one more conversion

@ddcprg
Copy link
Member Author

ddcprg commented Jul 21, 2022

@ebyhr I can't reply to your comment about JSON type and using TypeManager. I will have to do a bit of refactoring to use that class, is it fine if we keep it like this for now? We can address in a future PR

@ebyhr
Copy link
Member

ebyhr commented Jul 21, 2022

is it fine if we keep it like this for now?

I would defer to @hashhar (I haven't reviewed this PR at all)

@hashhar
Copy link
Member

hashhar commented Jul 21, 2022

Sorry for the delay here. I'll take a look over the weekend.

@hashhar hashhar self-requested a review July 21, 2022 10:35
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only for ShortTimestampType i.e. precision <= 6.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make an additional note

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Pinot only support timestamps with precision <= 6? If not we should probably add a verify here to make sure we don't silently do incorrect things with higher precision values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elonazoulay
Copy link
Member

Hi @ddcprg ! I have a pr, we use json support internally here for some time but I just pushed this, didn't realize you are doing something similar: #13428 - would it make sense to just implement the timestamp support in this pr? We have a really good use case for it but did not implement it. lmk if you want to chat offline, I can help test this as well. Also if you could take a look at #13428 whenever you have some time. lmk what works for you.

@ddcprg
Copy link
Member Author

ddcprg commented Aug 2, 2022

@elonazoulay @ebyhr I'm happy to remove the JSON type from this PR, you've done a lot of refactoring in the other PR which what was initially avoiding with this one. This PR has been open for long time though so wonder if we just drop it, unfortunately I don't have plenty of time to dedicate to open source contributions

@ddcprg
Copy link
Member Author

ddcprg commented Aug 2, 2022

@elonazoulay is a bit late for me right now, I'll ping you over the chat tomorrow

@xiangfu0
Copy link
Contributor

xiangfu0 commented Aug 3, 2022

+1 on separated the JSON and TIMESTAMP type support

@ddcprg ddcprg changed the title Add support for timestamp and json types to Pinot connector Add support for timestamp type to Pinot connector Aug 3, 2022
@ddcprg ddcprg force-pushed the issue_10199 branch 2 times, most recently from 97a5116 to 0424a1d Compare August 3, 2022 07:16
@ddcprg
Copy link
Member Author

ddcprg commented Aug 3, 2022

@xiangfu0 @elonazoulay @ebyhr @hashhar the scope of this PR is now reduced to simple timestamp types, please take a look when you have a min

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@etadelta222
Copy link

Just wanted to follow-up on this 👀

@jatin5251
Copy link

@ddcprg can you please resolve conflicts if possible then we can request for merging thanks

@xiangfu0 xiangfu0 requested a review from martint September 7, 2022 20:10
@xiangfu0
Copy link
Contributor

xiangfu0 commented Sep 7, 2022

@ddcprg can you please resolve conflicts if possible then we can request for merging thanks

+1

@xiangfu0
Copy link
Contributor

hi @ddcprg, could you please resolve the conflict?

@xiangfu0
Copy link
Contributor

Since I cannot edit @ddcprg 's branch to amend this PR. I took his branch and rebased the code to latest master and create a new PR: #14244

@ddcprg ddcprg closed this Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

9 participants