Skip to content

Fix pushdown filter for PostgreSQL enum type#408

Merged
findepi merged 1 commit intotrinodb:masterfrom
guyco33:fix_postgres_enum_type_pushdown
Mar 13, 2019
Merged

Fix pushdown filter for PostgreSQL enum type#408
findepi merged 1 commit intotrinodb:masterfrom
guyco33:fix_postgres_enum_type_pushdown

Conversation

@guyco33
Copy link
Member

@guyco33 guyco33 commented Mar 7, 2019

Filters on PostgreSQL enum type are failing since they are pushdown to connector (enum type is considered as a Jdbc VARCHAR type:
JdbcTypeHandle.jdbcType=java.sql.Types.VARCHAR ,JdbcTypeHandle.jdbcTypeName=<name of user-defined enum type>)

In order to fix it and keep the pushdown, a cast to the proper typeName is implemented by passing a proper bindingPlaceholder

@cla-bot cla-bot bot added the cla-signed label Mar 7, 2019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can retrieve JdbcColumnHandle from 4th argument JdbcColumnHandle column. Why don't you reuse the exising argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it's just the function's signature. I am using the actual 4th argument in the apply in the return clause

Copy link
Member

@ebyhr ebyhr Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my comment was not enough. I meant we can change the type to Optional<String> bindingPlaceholder or boolean and then change like below. This is just my idea, therefore let's wait for other member's review. I understand your implementation is opened for future changes.

bindingPlaceholder.map(bind -> bind + column.getJdbcTypeHandle().getJdbcTypeName()).orElse("?")

Copy link
Member Author

@guyco33 guyco33 Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function<JdbcColumnHandle, String> is much generic and the connector can also pass function that results with something like this: "cast(? as typeName)"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the generics. I wanted to avoid optional-functional argument. Let's follow kokosing's suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • No point to pass Optional here. You can always pass function that always return "?"
  • Can you make it a field of QueryBuilder, this function should not changed between queries, right?

Copy link
Member Author

@guyco33 guyco33 Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Thanks @kokosing . done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private final

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requireNonNull

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra ( and ), please remove

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please extract (columnHandle) -> "?::" + columnHandle.getJdbcTypeHandle().getJdbcTypeName() as method, then replace it with this::yourNewMethodName

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format this like:

            assertQuery(
                "SELECT column_name, data_type FROM information_schema.columns WHERE table_schema = 'tpch' AND table_name = 'test_enum'",
                "VALUES ('id','integer'),('col_s','varchar'),('col_e','varchar')");

@guyco33 guyco33 force-pushed the fix_postgres_enum_type_pushdown branch from 099d29b to 73c5d04 Compare March 9, 2019 08:26
@findepi findepi self-requested a review March 9, 2019 23:16
@findepi
Copy link
Member

findepi commented Mar 9, 2019

@guyco33 I think we should investigate alternatives within existing framework before committing to this approach.
According to jdbi/jdbi#420 (comment) you should be able to bind enum value using java.sql.PreparedStatement#setObject(int, java.lang.Object, int) with java.sql.Types.OTHER. There might be also other ways, like what we do with json in https://github.com/prestosql/presto/blob/371bc0b1a533d7e6a43705f6dcb2a3bb24e7fb85/presto-postgresql/src/main/java/io/prestosql/plugin/postgresql/PostgreSqlClient.java#L151-L154

@guyco33
Copy link
Member Author

guyco33 commented Mar 10, 2019

Thanks @findepi for your feedback!

How are we going to filter enum types in PostgreSqlClient#toPrestoType?
I came up with this:
typeHandle.getJdbcType() == Types.VARCHAR && !typeHandle.getJdbcTypeName().equals("varchar")

I've pushed a fixup to illustrate it

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @guyco33!
Looks better to me and lower risk of side effects.

Some comments. Then we need to look into how to pull "is enum" information more directly.
Can you please check what are the other types that show up as Types.VARCHAR in Postgres?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call it typedVarcharWriteFunction. Name is not perfect, but at least it signals some varchar-like Slice and UTF-8 decoding takes place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just typedStringWriteFunction("json"), there is no need to use jdbcTypeName here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was writing JSON as "json", not "jsonb".
Please use "json". If your change is intentional, please extract to a separate commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert changes related to "bindingPlaceholder"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now add explanatory comment here:

// This can be e.g. an ENUM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you cleanup, I will see whether we can pull "is enum" information in some better way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it typedVarcharColumnMapping

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guyco33 good job!

I looked into how to improve/replace if (typeHandle.getJdbcType() == Types.VARCHAR && !typeHandle.getJdbcTypeName().equals("varchar")), but neither information_schema.columns nor pg_attribute didn't seem to explicitly say "this is enum".
Let's leave it as it is.

Several minor comments in the test code. Otherwise LGTM.
Please squash everything into single commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove column col_s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert another data row

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename col_e to enum_column

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: upper case "AS ENUM" as in Postgres docs https://www.postgresql.org/docs/11/datatype-enum.html

@guyco33 guyco33 force-pushed the fix_postgres_enum_type_pushdown branch from 3d07187 to e545fba Compare March 12, 2019 13:30
@guyco33
Copy link
Member Author

guyco33 commented Mar 12, 2019

I looked into how to improve/replace if (typeHandle.getJdbcType() == Types.VARCHAR && !typeHandle.getJdbcTypeName().equals("varchar")), but neither information_schema.columns nor pg_attribute didn't seem to explicitly say "this is enum".

select * from pg_type where typcategory = 'E' and typname='enum_t'

@findepi findepi merged commit 3792816 into trinodb:master Mar 13, 2019
@findepi
Copy link
Member

findepi commented Mar 13, 2019

Merged, thanks!

@findepi findepi added this to the 306 milestone Mar 13, 2019
This was referenced Mar 13, 2019
@guyco33 guyco33 deleted the fix_postgres_enum_type_pushdown branch October 13, 2019 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants