Skip to content

Support PostgreSQL array created using array internal type name#659

Merged
findepi merged 1 commit intotrinodb:masterfrom
guyco33:fix_support_for_postgersql_arrays
Jun 4, 2019
Merged

Support PostgreSQL array created using array internal type name#659
findepi merged 1 commit intotrinodb:masterfrom
guyco33:fix_support_for_postgersql_arrays

Conversation

@guyco33
Copy link
Member

@guyco33 guyco33 commented Apr 23, 2019

Fixing failure when selecting Postgres Array column that was created with native type name (eg: _int4)

Postgres: create table t (a _int4);
Presto: select a from t;

io.prestosql.spi.PrestoException: No array dimensions for ARRAY type: JdbcTypeHandle{jdbcType=2003, jdbcTypeName=_int4, columnSize=10, decimalDigits=0}
	at io.prestosql.plugin.postgresql.PostgreSqlClient.lambda$null$0(PostgreSqlClient.java:248)
	at java.util.Optional.orElseThrow(Optional.java:290)
	at io.prestosql.plugin.postgresql.PostgreSqlClient.lambda$toPrestoType$1(PostgreSqlClient.java:248)

Explanation:
pg_attribute.attndims is 0 when creating columns with the native type names

create table t (a _int4); --> pg_attribute.attndims = 0 for column a
create table t (a int[]); --> pg_attribute.attndims = 1 for column a

Fix potential issue for #317

@cla-bot cla-bot bot added the cla-signed label Apr 23, 2019
@guyco33 guyco33 force-pushed the fix_support_for_postgersql_arrays branch from 22731d2 to 9d40b66 Compare April 25, 2019 09:06
@findepi
Copy link
Member

findepi commented Apr 25, 2019

@guyco33 i checked in Postgres 11.2 and confirmed your findings:

test=# create table t(a int4[], b _int4);
CREATE TABLE
test=# \d t
                  Table "public.t"
 Column |   Type    | Collation | Nullable | Default
--------+-----------+-----------+----------+---------
 a      | integer[] |           |          |
 b      | integer[] |           |          |
test=# SELECT attname,  attndims FROM pg_attribute att JOIN pg_class tbl ON tbl.oid = att.attrelid WHERE tbl.relname = 't';
 attname  | attndims
----------+----------
...
 a        |        1
 b        |        0

However

@findepi
Copy link
Member

findepi commented Apr 25, 2019

@guyco33 for now, let's get #668 in.

let's discuss whether we need this change or not.
i found an old thread (https://www.postgresql.org/message-id/8C5B026B51B6854CBE88121DBF097A8651DB95%40ehost010-33.exch010.intermedia.net)
and i am under impression that we don't and this is on Postgres side.

But maybe there is something newer that I should refer to?
Especially -- is there a bug/issue in Postgres tracking undesirable difference in behavior between eg int4[] and _int4?

@guyco33
Copy link
Member Author

guyco33 commented Apr 25, 2019

  • if i understand correctly and int4[], _int4 should be treated equivalent in Postgres, they should also have the same attndims.
    Is this a bug in Postgres? Is it reported?

@findepi Maybe attndims has other internal purpose. For example to be able to reproduce the original DDL statement.

@guyco33
Copy link
Member Author

guyco33 commented Apr 25, 2019

let's discuss whether we need this change or not.
i found an old thread (https://www.postgresql.org/message-id/8C5B026B51B6854CBE88121DBF097A8651DB95%40ehost010-33.exch010.intermedia.net)
and i am under impression that we don't and this is on Postgres side.

I think that our main motivation should be to map all Postgres array columns although they might be created with native names - I guess that there are plenty of automated tools that construct Postgres tables from their native names and we might loose them without this change (I encountered lots of Postgres sources with such tables)

But maybe there is something newer that I should refer to?
Especially -- is there a bug/issue in Postgres tracking undesirable difference in behavior between eg int4[] and _int4?

From https://www.postgresql.org/docs/11/catalog-pg-attribute.html we can learn that attndims is not enforced:
Presently, the number of dimensions of an array is not enforced, so any nonzero value effectively means “it's an array”.

@findepi
Copy link
Member

findepi commented Apr 25, 2019

@martint
Copy link
Member

martint commented Apr 25, 2019

@findepi
Copy link
Member

findepi commented Apr 26, 2019

And a follow up: https://www.postgresql.org/message-id/32678.1556228173%40sss.pgh.pa.us

In particular:

the Postgres type system doesn't distinguish
arrays of different numbers of dimensions, ie, int4[][] is not
really different from int4[]. So you can't attach any very strong
meaning to attndims = 2 vs attndims = 1.

I therefore propose that we go with #682

@findepi
Copy link
Member

findepi commented Apr 26, 2019

@guyco33 let's discuss under #682

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we agreed to the way forward in #682 (comment), I feel this is a good change -- let's treat arrays with undeclared dimensions as single-dimension arrays.

I have some review comments.
Also, please rebase to accommodate for my #687.

Please change the commit message to something like

Support PostgreSQL array created using array internal type name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be following be enough

Suggested change
"AND CASE WHEN att.attndims = 0 AND attyp.typcategory = 'A' THEN 1 ELSE att.attndims END > 0 ";
"AND attyp.typcategory = 'A' ";

(if not sufficient, then this should be AND (attyp.typcategory = 'A' OR att.attndims > 0), but I would strongly not have such an OR until I understand why one would be needed)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that your suggestion is sufficient unless there are cases that attndims is NULL.
Anyway, selecting COALESCE(att.attndims,1) will cover this as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that work?

Suggested change
"SELECT att.attname, CASE WHEN att.attndims = 0 AND attyp.typcategory = 'A' THEN 1 ELSE att.attndims END AS attndims " +
"SELECT att.attname, max(att.attndims, 1) " +

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be greatest(att.attndims, 1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase. You will need to remove testInternalArray.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use DataTypeTest with arrayDataType(integerDataType(), "_int4")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there away to use DataTypeTest with a predefined table?
_int4 can't be created by CTAS from presto.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see how TestPostgreSqlTypeMapping#postgresCreateAndInsert is used. When using this method, the table is created and populated directly in PostgreSQL.

Copy link
Member Author

@guyco33 guyco33 May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

@guyco33 guyco33 changed the title Fix support for PostgreSQL arrays Support PostgreSQL array created using array internal type name May 22, 2019
@guyco33 guyco33 force-pushed the fix_support_for_postgersql_arrays branch from 9d40b66 to bd6c8c0 Compare May 22, 2019 07:56
@guyco33 guyco33 force-pushed the fix_support_for_postgersql_arrays branch from bd6c8c0 to c9e057a Compare May 24, 2019 06:34
@findepi findepi merged commit d8f0650 into trinodb:master Jun 4, 2019
@findepi
Copy link
Member

findepi commented Jun 4, 2019

Merged, thanks!

@findepi findepi mentioned this pull request Jun 4, 2019
6 tasks
@findepi findepi added this to the 314 milestone Jun 6, 2019
@guyco33 guyco33 deleted the fix_support_for_postgersql_arrays branch October 13, 2019 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants