Skip to content

Add support for PostgreSQL arrays#317

Merged
findepi merged 4 commits intotrinodb:masterfrom
vincentpoon:jdbc_arrays
Apr 14, 2019
Merged

Add support for PostgreSQL arrays#317
findepi merged 4 commits intotrinodb:masterfrom
vincentpoon:jdbc_arrays

Conversation

@vincentpoon
Copy link
Member

@vincentpoon vincentpoon commented Feb 25, 2019

This adds support for multidimensional ARRAY types in PostgreSql.

@cla-bot cla-bot bot added the cla-signed label Feb 25, 2019
@vincentpoon
Copy link
Member Author

Thanks @ebyhr , updated with your suggestions

@findepi
Copy link
Member

findepi commented Feb 27, 2019

@ebyhr thanks for doing first review pass. Appreciated!

@vincentpoon thanks for your PR!

In my opinion, there are some design changes to be done. In particular, I don't like the fact that we're constructing List<JdbcColumnHandle> and then remap them to another List<JdbcColumnHandle>. This is error prone, as it's possible to add future call site obtaining wrong List<JdbcColumnHandle> without remapping them.

My suggestion regarding moving it forward:

  1. Extract a commit introducing Block-typed read and write functions (call it eg Add BlockReadFunction, BlockWriteFunction).
    This change touches multiple files but otherwise is rather straightforward. You can even create a separate PR, so we can merge this sooner and have less code changes here.

  2. Replace private static ResultSet BaseJdbcClient#getColumns(JdbcTableHandle, java.sql.DatabaseMetaData) with protected Map<String,JdbcTypeHandle> getColumnTypes(ConnectorSession session, JdbcTableHandle). (map to be key-ed by column name)
    Default implementation would still use java.sql.DatabaseMetaData#getColumns of course.
    (Call this eg Allow overriding querying column metadata)

  3. Add the code to handle postgres arrays.

    • extend JdbcTypeHandle with additional fields (not sure what these should be: "dimensions"? "elementType"?)
    • override this new getColumnTypes method in PostgresClient
    • ... i don't know yet how this should look like -- see below

I don't have yet answer on how arrayReadFunction and arrayWriteFunction could look like.
What are the abstractions needed, what assumptions are OK to be made.
If we consider the fact that we don't a priori know how a base-jdbc -based connector is going to make in/out values, we probably don't know how it will map in/out array elements.
Maybe for starter, this should be a Postgres-specific feature?

cc @electrum

@vincentpoon
Copy link
Member Author

Thanks for looking at this @findepi . Agree that the new methods in BaseJdbcClient here aren't ideal - I think that was a result of me trying to push as much reusable code up into BaseJdbcClient as possible. This work actually came out of my work on the Phoenix connector (#76) as I figured I could get rid of a lot of array-specific stuff there if I made a more general solution. But as you suggest, perhaps it's best to leave a lot of this in the specific jdbc implementations.

The BlockReadFunction and BlockWriteFunction are the only things I really need, i think. I'll create a separate PR for that.

Let me try to rework this a little. AFAIK, the main thing with jdbc ARRAY is there's no standard naming convention (but that's already true with other types, which is why we have e.g. WriteMapping#getDataType), and no standard in terms of what ResultSet#getArray returns you (as the javadoc clearly states).

@vincentpoon
Copy link
Member Author

@findepi I updated this, now that #454 is merged. There's less impact on base-jdbc now, and all the ARRAY-related functionality is in presto-postgresql.

I couldn't quite implement getColumnTypes() the way you suggested, since there was a recent change that uses the ResultSet to get whether a column is nullable. So in other words, getColumns() and getJdbcTypeHandle() both need to use the ResultSet, which is why I pass that as a parameter.

Let me know what you think, thanks!

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentpoon apologies, i didn't review the main part (the value handling) of the PR.
Here are some comments I have so far. Please ping me back if you have any doubts (or when PR updated).

good job, btw!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently, there was a need to further customize the metadata loading.
See
#408 (comment)
#408 (comment)

in that PR we ended up with a bit of a hack, but I was thinking about changing the base jdbc client to make such customized metadata loading possible. For that, the subclass would need to be able to control not only ResultSet → JdbcTypeHandle conversion but also how the ResultSet is init'd.

However, I'm also thinking that sometimes, a connector would override io.prestosql.plugin.jdbc.BaseJdbcClient#getColumns(io.prestosql.spi.connector.ConnectorSession, io.prestosql.plugin.jdbc.JdbcTableHandle) directly (for example if i would like to add some caching in my connector). So I don't see clearly yet how such extensible API would look like.

Also, in current approach, getArrayDimensions(ConnectorSession session, JdbcTableHandle tableHandle, String columnName) gets called for every array column separately and roundtrip times add up.

For these reasons, I would suggest that you do not introduce this method. Instead, I'd suggest that you override getColumns in postgres client directly. I am aware this will introduce code duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not correct. With legacy timestamp semantics (default), prestoNative is a utc millis (Java timestamp) which interpreted in session zone yields year/month/day/hour/minutes that the value represents.
OTOH, java.sql.TImestamp(x) treats x as utc millis (Java timestamp) which interpreted in JVM zone yields year/month/day/hour/minutes that the value represents.
To fix this you would need:

  • write different conversion code
  • pass ConnectorSession so that the code can do condition on io.prestosql.spi.connector.ConnectorSession#isLegacyTimestamp

... and even then the code would not be generic. java.sql.Timestamp cannot represent certain values and many JDBC drivers (but not all) support java.time.LocalDateTime and we should use that.

For this reasons, the prestoNativeToJdbcObject cannot be done in a generic way, without connector-specific code. I would therefore suggest two things:

  1. move this conversion to PostgresClient (or some companion helper utility class -- if possible)
    • then make sure there are no unnecessary changes in StandardColumnMappings.java (i would expect none are needed)
  2. pass ConnectorSession here
  3. remove support for time and timestamp types from this PR. It's easy to get them wrong and it's better that we focus on more generic stuff first and add timestamp arrays later.
    • to ensure we (and others) do not forget about timesamps and the necessary abstractions they require, I would keep the TIMESTAMP.equals(prestoType) condition like:
      if (TIMESTAMP.equals(prestoType)) {
          throw new PrestoException(.....); // TODO
      

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safer to throw when we don't know what to do:

throw new PrestoException( .... "Unsupported type: " + prestoType);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • prestoType is what actually matters, so include that in the message (prestoNative.getClass() will be always Slice here)
  • i am not sure FUNCTION_IMPLEMENTATION_ERROR is the right error code... if you can't find anything more proper, use generic err code instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: per the code style, we do not write else when not necessary, ie when the then block always returns.

I would remove else and also add blank space before subsequent if (here & for all cases below)

@vincentpoon
Copy link
Member Author

@findepi I've updated this with two separate commits. The first only adds arrayDimensions to base-jdbc. Everything else in the second commit is in presto-postgresql now.

Thanks for your helpful comments, it's looking a lot better now.

@findepi
Copy link
Member

findepi commented Mar 16, 2019

@vincentpoon thanks! i'll try to look into this during the week.
one more comment that i have now is about tests -- please look
into how we usually test type mappings in TestPostgreSqlTypeMapping.

I understanding this is taking time, but conceptually not a simple change. (at least for me.)

@guyco33 i remember you saying you're using postgres arrays and
force-mapping them to varchar (#186)
would you have time to see if this PR improves your situation in any way?
I guess it might not help with arrays of postgres-specific types, but
would be good to know for sure.

@findepi
Copy link
Member

findepi commented Mar 16, 2019

one more comment that i have now is about tests -- please look
into how we usually test type mappings in TestPostgreSqlTypeMapping.

@vincentpoon to be clear: i am not sure this will be applicable. This needs trying / more thinking.

@findepi findepi changed the title Add jdbc support for ARRAY type Add support for Postgres arrays Mar 18, 2019
@findepi findepi changed the title Add support for Postgres arrays Add support for PostgreSQL arrays Mar 18, 2019
@vincentpoon
Copy link
Member Author

Thanks, I'll take a look at TestPostreSqlTypeMapping

@vincentpoon
Copy link
Member Author

vincentpoon commented Mar 20, 2019

@findepi I've pushed another commit with new TestPostreSqlTypeMapping tests. They helped me catch a bug with certain char[] - I now throw NOT_SUPPORTED explicitly in TypeUtils to avoid this kind of error.

Also, apparently PostgreSQL's JDBC driver doesn't yet support varbinary arrays, so I removed support for that.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentpoon good job!
this looks very well.
I had a bunch of comments, mostly editorial.

Sorry for having you wait long. I am very busy lately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible? Do you have a test with char[] created in postgres?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test is in table prestoCreateAsSelect("test_array_parameterized_char_unicode")

@vincentpoon
Copy link
Member Author

@findepi Thanks again for the review, I have pushed a new squashed revision. I believe I have addressed all the comments, and have replied to some specifically.

@findepi
Copy link
Member

findepi commented Apr 12, 2019

LGTM. Thanks @vincentpoon for working on this.
Can you please rebase?

@vincentpoon
Copy link
Member Author

@findepi rebased

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentpoon apologies. Apparently I didn't publish my comments when I was writing "LGTM".
Publishing them now.

@vincentpoon
Copy link
Member Author

Thanks @findepi , addressed your latest comments

@findepi findepi merged commit 842fce1 into trinodb:master Apr 14, 2019
@findepi
Copy link
Member

findepi commented Apr 14, 2019

Merged, thanks!

@findepi
Copy link
Member

findepi commented Jun 28, 2019

Note: as per #687, this needs to be explicitly enabled.

cc @hashbash (prestodb/presto#11422), @troels (prestodb/presto#8849)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants