Add support for PostgreSQL arrays by vincentpoon · Pull Request #317 · trinodb/trino

vincentpoon · 2019-02-25T21:37:30Z

This adds support for multidimensional ARRAY types in PostgreSql.

presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/BaseJdbcClient.java

...tgresql/src/test/java/io/prestosql/plugin/postgresql/TestPostgreSqlIntegrationSmokeTest.java

presto-postgresql/src/main/java/io/prestosql/plugin/postgresql/PostgreSqlClient.java

vincentpoon · 2019-02-26T06:59:16Z

Thanks @ebyhr , updated with your suggestions

findepi · 2019-02-27T09:52:54Z

@ebyhr thanks for doing first review pass. Appreciated!

@vincentpoon thanks for your PR!

In my opinion, there are some design changes to be done. In particular, I don't like the fact that we're constructing List<JdbcColumnHandle> and then remap them to another List<JdbcColumnHandle>. This is error prone, as it's possible to add future call site obtaining wrong List<JdbcColumnHandle> without remapping them.

My suggestion regarding moving it forward:

Extract a commit introducing Block-typed read and write functions (call it eg Add BlockReadFunction, BlockWriteFunction).
This change touches multiple files but otherwise is rather straightforward. You can even create a separate PR, so we can merge this sooner and have less code changes here.
Replace private static ResultSet BaseJdbcClient#getColumns(JdbcTableHandle, java.sql.DatabaseMetaData) with protected Map<String,JdbcTypeHandle> getColumnTypes(ConnectorSession session, JdbcTableHandle). (map to be key-ed by column name)
Default implementation would still use java.sql.DatabaseMetaData#getColumns of course.
(Call this eg Allow overriding querying column metadata)
Add the code to handle postgres arrays.
- extend JdbcTypeHandle with additional fields (not sure what these should be: "dimensions"? "elementType"?)
- override this new getColumnTypes method in PostgresClient
- ... i don't know yet how this should look like -- see below

I don't have yet answer on how arrayReadFunction and arrayWriteFunction could look like.
What are the abstractions needed, what assumptions are OK to be made.
If we consider the fact that we don't a priori know how a base-jdbc -based connector is going to make in/out values, we probably don't know how it will map in/out array elements.
Maybe for starter, this should be a Postgres-specific feature?

cc @electrum

vincentpoon · 2019-02-27T18:52:18Z

Thanks for looking at this @findepi . Agree that the new methods in BaseJdbcClient here aren't ideal - I think that was a result of me trying to push as much reusable code up into BaseJdbcClient as possible. This work actually came out of my work on the Phoenix connector (#76) as I figured I could get rid of a lot of array-specific stuff there if I made a more general solution. But as you suggest, perhaps it's best to leave a lot of this in the specific jdbc implementations.

The BlockReadFunction and BlockWriteFunction are the only things I really need, i think. I'll create a separate PR for that.

Let me try to rework this a little. AFAIK, the main thing with jdbc ARRAY is there's no standard naming convention (but that's already true with other types, which is why we have e.g. WriteMapping#getDataType), and no standard in terms of what ResultSet#getArray returns you (as the javadoc clearly states).

vincentpoon · 2019-03-14T22:32:47Z

@findepi I updated this, now that #454 is merged. There's less impact on base-jdbc now, and all the ARRAY-related functionality is in presto-postgresql.

I couldn't quite implement getColumnTypes() the way you suggested, since there was a recent change that uses the ResultSet to get whether a column is nullable. So in other words, getColumns() and getJdbcTypeHandle() both need to use the ResultSet, which is why I pass that as a parameter.

Let me know what you think, thanks!

findepi

@vincentpoon apologies, i didn't review the main part (the value handling) of the PR.
Here are some comments I have so far. Please ping me back if you have any doubts (or when PR updated).

good job, btw!

findepi · 2019-03-15T09:43:08Z

presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/BaseJdbcClient.java

Recently, there was a need to further customize the metadata loading.
See
#408 (comment)
#408 (comment)

in that PR we ended up with a bit of a hack, but I was thinking about changing the base jdbc client to make such customized metadata loading possible. For that, the subclass would need to be able to control not only ResultSet → JdbcTypeHandle conversion but also how the ResultSet is init'd.

However, I'm also thinking that sometimes, a connector would override io.prestosql.plugin.jdbc.BaseJdbcClient#getColumns(io.prestosql.spi.connector.ConnectorSession, io.prestosql.plugin.jdbc.JdbcTableHandle) directly (for example if i would like to add some caching in my connector). So I don't see clearly yet how such extensible API would look like.

Also, in current approach, getArrayDimensions(ConnectorSession session, JdbcTableHandle tableHandle, String columnName) gets called for every array column separately and roundtrip times add up.

For these reasons, I would suggest that you do not introduce this method. Instead, I'd suggest that you override getColumns in postgres client directly. I am aware this will introduce code duplication.

presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/JdbcTypeHandle.java

findepi · 2019-03-15T10:06:38Z

presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/StandardColumnMappings.java

That's not correct. With legacy timestamp semantics (default), prestoNative is a utc millis (Java timestamp) which interpreted in session zone yields year/month/day/hour/minutes that the value represents.
OTOH, java.sql.TImestamp(x) treats x as utc millis (Java timestamp) which interpreted in JVM zone yields year/month/day/hour/minutes that the value represents.
To fix this you would need:

write different conversion code

pass ConnectorSession so that the code can do condition on io.prestosql.spi.connector.ConnectorSession#isLegacyTimestamp

... and even then the code would not be generic. java.sql.Timestamp cannot represent certain values and many JDBC drivers (but not all) support java.time.LocalDateTime and we should use that.

For this reasons, the prestoNativeToJdbcObject cannot be done in a generic way, without connector-specific code. I would therefore suggest two things:

move this conversion to PostgresClient (or some companion helper utility class -- if possible)

then make sure there are no unnecessary changes in StandardColumnMappings.java (i would expect none are needed)

pass ConnectorSession here

remove support for time and timestamp types from this PR. It's easy to get them wrong and it's better that we focus on more generic stuff first and add timestamp arrays later.

to ensure we (and others) do not forget about timesamps and the necessary abstractions they require, I would keep the TIMESTAMP.equals(prestoType) condition like:
if (TIMESTAMP.equals(prestoType)) { throw new PrestoException(.....); // TODO

findepi · 2019-03-15T10:07:41Z

presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/StandardColumnMappings.java

It's safer to throw when we don't know what to do:

throw new PrestoException( .... "Unsupported type: " + prestoType);

findepi · 2019-03-15T10:08:46Z

presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/StandardColumnMappings.java

prestoType is what actually matters, so include that in the message (prestoNative.getClass() will be always Slice here)

i am not sure FUNCTION_IMPLEMENTATION_ERROR is the right error code... if you can't find anything more proper, use generic err code instead

findepi · 2019-03-15T10:09:58Z

presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/StandardColumnMappings.java

nit: per the code style, we do not write else when not necessary, ie when the then block always returns.

I would remove else and also add blank space before subsequent if (here & for all cases below)

vincentpoon · 2019-03-16T00:10:13Z

@findepi I've updated this with two separate commits. The first only adds arrayDimensions to base-jdbc. Everything else in the second commit is in presto-postgresql now.

Thanks for your helpful comments, it's looking a lot better now.

findepi · 2019-03-16T10:38:33Z

@vincentpoon thanks! i'll try to look into this during the week.
one more comment that i have now is about tests -- please look
into how we usually test type mappings in TestPostgreSqlTypeMapping.

I understanding this is taking time, but conceptually not a simple change. (at least for me.)

@guyco33 i remember you saying you're using postgres arrays and
force-mapping them to varchar (#186)
would you have time to see if this PR improves your situation in any way?
I guess it might not help with arrays of postgres-specific types, but
would be good to know for sure.

findepi · 2019-03-16T10:40:36Z

one more comment that i have now is about tests -- please look
into how we usually test type mappings in TestPostgreSqlTypeMapping.

@vincentpoon to be clear: i am not sure this will be applicable. This needs trying / more thinking.

vincentpoon · 2019-03-18T17:35:10Z

Thanks, I'll take a look at TestPostreSqlTypeMapping

vincentpoon · 2019-03-20T00:49:18Z

@findepi I've pushed another commit with new TestPostreSqlTypeMapping tests. They helped me catch a bug with certain char[] - I now throw NOT_SUPPORTED explicitly in TypeUtils to avoid this kind of error.

Also, apparently PostgreSQL's JDBC driver doesn't yet support varbinary arrays, so I removed support for that.

findepi

@vincentpoon good job!
this looks very well.
I had a bunch of comments, mostly editorial.

Sorry for having you wait long. I am very busy lately.

presto-base-jdbc/src/main/java/io/prestosql/plugin/jdbc/JdbcTypeHandle.java

presto-postgresql/src/main/java/io/prestosql/plugin/postgresql/PostgreSqlClient.java

findepi · 2019-04-05T12:47:40Z

presto-postgresql/src/main/java/io/prestosql/plugin/postgresql/TypeUtils.java

is it possible? Do you have a test with char[] created in postgres?

test is in table prestoCreateAsSelect("test_array_parameterized_char_unicode")

presto-postgresql/src/main/java/io/prestosql/plugin/postgresql/TypeUtils.java

vincentpoon · 2019-04-12T00:34:51Z

@findepi Thanks again for the review, I have pushed a new squashed revision. I believe I have addressed all the comments, and have replied to some specifically.

findepi · 2019-04-12T09:11:20Z

LGTM. Thanks @vincentpoon for working on this.
Can you please rebase?

vincentpoon · 2019-04-12T20:10:44Z

@findepi rebased

findepi

@vincentpoon apologies. Apparently I didn't publish my comments when I was writing "LGTM".
Publishing them now.

presto-postgresql/src/main/java/io/prestosql/plugin/postgresql/PostgreSqlClient.java

presto-postgresql/src/main/java/io/prestosql/plugin/postgresql/TypeUtils.java

presto-postgresql/src/main/java/io/prestosql/plugin/postgresql/PostgreSqlClient.java

presto-postgresql/src/test/java/io/prestosql/plugin/postgresql/TestPostgreSqlTypeMapping.java

presto-postgresql/src/main/java/io/prestosql/plugin/postgresql/PostgreSqlClient.java

vincentpoon · 2019-04-12T20:52:38Z

Thanks @findepi , addressed your latest comments

findepi · 2019-04-14T20:29:17Z

Merged, thanks!

findepi · 2019-06-28T21:18:39Z

Note: as per #687, this needs to be explicitly enabled.

cc @hashbash (prestodb/presto#11422), @troels (prestodb/presto#8849)

cla-bot bot added the cla-signed label Feb 25, 2019