Add support for timestamp type to Pinot connector#12145
Add support for timestamp type to Pinot connector#12145ddcprg wants to merge 1 commit intotrinodb:masterfrom
Conversation
There was a problem hiding this comment.
There was a problem hiding this comment.
I'll add one more test column for multi-value timestamps - I need to check this is actually supported
There was a problem hiding this comment.
I'll get back to this once apache/pinot#8624 is resolved
plugin/trino-pinot/src/main/java/io/trino/plugin/pinot/PinotColumnHandle.java
Outdated
Show resolved
Hide resolved
|
@elonazoulay can you please help with an initial review? I won't be able to review this thoroughly until ~9th May so expect some delays. |
There was a problem hiding this comment.
Should this be JsonType or VarcharType? How JSON functions will be handled here e.g. json_extract ?
There was a problem hiding this comment.
I think JsonType is not part of the SPI https://github.com/trinodb/trino/tree/master/core/trino-spi/src/main/java/io/trino/spi/type and is a type only available to functions in query time, I'll double check that just in case I missed it. I was going to leave functions out of the scope of this PR but I'll check how easy it would be to include them otherwise I'll raise a separate issue for this. Probably the only 2 function that could be easily mapped are json_extract_scalar and json_format as each has an equivalent in Pinot. I'll get back to you on this.
There was a problem hiding this comment.
I see, can you test it locally with PinotJsonQuickStart?
Here is a reference PR as well.
https://github.com/prestodb/presto/pull/17015/files#diff-379277f1690c545f2f182815aa2b984399e26318df7e2402be7149a2c5e035b4R135
There was a problem hiding this comment.
Thanks @xiangfu0 I'll take a look at those. In the meantime it seems I've found an issue with multi-value timestamp columns, I've raised an issue in the pinot project apache/pinot#8624 I'll leave the test code for timestamp arrays commented out and refer to this issue in a comment for the time being. I'll try to raise PR to fix it
There was a problem hiding this comment.
You can get JsonType via TypeManager. PostgreSqlClient has the example.
There was a problem hiding this comment.
I think it would be better to revert this change and implement in #13428
|
@elonazoulay can you please help with PR review? |
5a49dfe to
7a55c31
Compare
|
@ddcprg thanks for putting this together. @hashhar @elonazoulay @xiangfu0 Can we do another round of reviews on Monday May 9th? Thanks. |
|
@hashhar @elonazoulay @xiangfu0 Can you review PR and sign off if no blocker? |
|
I'll take a look during the end of this week. |
plugin/trino-pinot/src/main/java/io/trino/plugin/pinot/conversion/PinotTimestamps.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Avoid casts from Number. Byte, BigDecimal etc. are all subclasses of Number.
There was a problem hiding this comment.
@hashhar seems like the test fails because the aggregate function returns double instead of long, I need to dig further if we don't want to cast from Number
There was a problem hiding this comment.
@hashhar @xiangfu0 Pinot's MAX function returns double https://docs.pinot.apache.org/users/user-guide-query/supported-aggregations I think the options are either keeping Number or having 2 conditionals: one for Double and one for Long. I rather keep checking for Number type, let me know your thoughts
There was a problem hiding this comment.
having else if (value instanceof Double || value instanceof Long) is still useful in my opinion.
There was a problem hiding this comment.
I've now made this change
plugin/trino-pinot/src/test/java/io/trino/plugin/pinot/AbstractPinotIntegrationSmokeTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Does this return correct results if both a null and 1970-01-01T00:00:00 exist?
There was a problem hiding this comment.
the default null value for timestamps is epoch so this should return 1970-01-01T00:00:00 https://docs.pinot.apache.org/configuration-reference/schema#dimensionfieldspec
There was a problem hiding this comment.
Then I would argue it's unsafe to push-down aggregates in presence of nulls. Maybe we can instead use query-passthrough (by implementing a polymorphic table function as in #12325) for such cases to get Pinot semantics.
cc: @findepi Any opinions? i.e. DISTINCT on a column with null and epoch would return epoch since the null becomes epoch.
There was a problem hiding this comment.
Does null get converted to epoch only when DISTINCT is applied?
If SELECT return null and epoch then SELECT DISTINCT should still do that, and SELECT count(DISTINCT ..) should return 2.
@hashhar worth adding correctness smoke tests for null vs 0 and null vs '' in BCT.
There was a problem hiding this comment.
I'm going to double check and get back to you. I think this is the same behaviour of other types like long and double
There was a problem hiding this comment.
yes, Pinot schema will assign the default null value. So this behavior is expected.
There was a problem hiding this comment.
We shouldn't push down such queries then probably. While the results match semantics of Pinot they don't match Trino so anyone performing a JOIN or SELECT across two catalogs will see inconsistent results.
For preserving Pinot semantics we should implement something like https://trino.io/docs/current/connector/postgresql.html#query-varchar-table instead.
There was a problem hiding this comment.
in pinot the only way to make distinction between null and non-null values is by using IS NULL and IS NOT NULL other than this pinot will assign the default value if null. Push-down already happens for other types, should this be addressed in a separate ticket? I could take a look at this once we merge this PR. I'll summon @elonazoulay on this one
There was a problem hiding this comment.
I tried this for filter pushdown - the behavior is not consistent with trino sql: you can select ... where is null and the result will be non null (i.e. the default value). Will take a look and see what can be done.
There was a problem hiding this comment.
yes, the projection in Pinot will always return the default value if column is null
ead0d04 to
50cdb05
Compare
There was a problem hiding this comment.
Why write microseconds to this TIMESTAMP_MILLIS?
There was a problem hiding this comment.
@nizarhejazi @ddcprg can you check this?
Also make the CI passing?
There was a problem hiding this comment.
@xiangfu0 I'm absent at the moment. I'll retake this in 2 weeks time
There was a problem hiding this comment.
this is a temporary conversion to micros to be able to invoke TIMESTAMP_MILLIS.writeLong(output, epochMicros); if there is a better way to write milliseconds to the output block please let me know. Please read the javadocs in ShortTimestampType
There was a problem hiding this comment.
Can you also add a few tests for timestamp predicate and pushdown?
E.g. where timestamp_col > timestamp '2022-01-01 00:00:00.000'
There was a problem hiding this comment.
@xiangfu0 thanks for suggesting this test! I found I had to add one more conversion
|
@ebyhr I can't reply to your comment about JSON type and using |
I would defer to @hashhar (I haven't reviewed this PR at all) |
|
Sorry for the delay here. I'll take a look over the weekend. |
There was a problem hiding this comment.
only for ShortTimestampType i.e. precision <= 6.
There was a problem hiding this comment.
I'll make an additional note
There was a problem hiding this comment.
Does Pinot only support timestamps with precision <= 6? If not we should probably add a verify here to make sure we don't silently do incorrect things with higher precision values.
There was a problem hiding this comment.
pinot timestamps are milliseconds, see https://github.com/apache/pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java#L468
|
Hi @ddcprg ! I have a pr, we use json support internally here for some time but I just pushed this, didn't realize you are doing something similar: #13428 - would it make sense to just implement the timestamp support in this pr? We have a really good use case for it but did not implement it. lmk if you want to chat offline, I can help test this as well. Also if you could take a look at #13428 whenever you have some time. lmk what works for you. |
|
@elonazoulay @ebyhr I'm happy to remove the JSON type from this PR, you've done a lot of refactoring in the other PR which what was initially avoiding with this one. This PR has been open for long time though so wonder if we just drop it, unfortunately I don't have plenty of time to dedicate to open source contributions |
|
@elonazoulay is a bit late for me right now, I'll ping you over the chat tomorrow |
|
+1 on separated the JSON and TIMESTAMP type support |
97a5116 to
0424a1d
Compare
|
@xiangfu0 @elonazoulay @ebyhr @hashhar the scope of this PR is now reduced to simple timestamp types, please take a look when you have a min |
|
Just wanted to follow-up on this 👀 |
|
@ddcprg can you please resolve conflicts if possible then we can request for merging thanks |
+1 |
|
hi @ddcprg, could you please resolve the conflict? |
Description
Add support for TIMESTAMP data type in Pinot connector. Help fixing #10199
Related issues, pull requests, and links
Documentation
( ) No documentation is needed.
(X) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(X) Release notes entries required with the following suggested text: