Skip to content

[SPARK-34311][SQL] PostgresDialect can't treat arrays of some types#31419

Closed
sarutak wants to merge 4 commits intoapache:masterfrom
sarutak:postgres-array-types
Closed

[SPARK-34311][SQL] PostgresDialect can't treat arrays of some types#31419
sarutak wants to merge 4 commits intoapache:masterfrom
sarutak:postgres-array-types

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Feb 1, 2021

What changes were proposed in this pull request?

This PR fixes the issue that PostgresDialect can't treat arrays of some types.
Though PostgreSQL supports wide range of types (https://www.postgresql.org/docs/13/datatype.html), the current PostgresDialect can't treat arrays of the following types.

  • xml
  • tsvector
  • tsquery
  • macaddr
  • macaddr8
  • txid_snapshot
  • pg_snapshot
  • point
  • line
  • lseg
  • box
  • path
  • polygon
  • circle
  • pg_lsn
  • bit varying
  • interval

NOTE: PostgreSQL doesn't implement arrays of serial types so this PR doesn't care about them.

Why are the changes needed?

To provide better support with PostgreSQL.

Does this PR introduce any user-facing change?

Yes. PostgresDialect can handle arrays of types shown above.

How was this patch tested?

New test.

@github-actions github-actions bot added the SQL label Feb 1, 2021
@sarutak
Copy link
Member Author

sarutak commented Feb 1, 2021

For reviews: Array of serial is not implemented in PostgreSQL so I didn't care about serial2, serial4 and serial8.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK if these new types all have a string representation that's meaningful, and it just helps read more data types.

@SparkQA
Copy link

SparkQA commented Feb 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39321/

@SparkQA
Copy link

SparkQA commented Feb 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39321/

@SparkQA
Copy link

SparkQA commented Feb 1, 2021

Test build #134735 has finished for PR 31419 at commit b0c3ba4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


conn.prepareStatement("CREATE TABLE st_with_array (c0 uuid, c1 inet, c2 cidr," +
"c3 json, c4 jsonb, c5 uuid[], c6 inet[], c7 cidr[], c8 json[], c9 jsonb[])")
"c3 json, c4 jsonb, c5 uuid[], c6 inet[], c7 cidr[], c8 json[], c9 jsonb[], c10 xml[], " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding tests for non-array cases, e.g., col xml and col tsvector?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found there are lots of types not tested so far. So I'll open another PR for non-array cases and focus on array cases in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean there are types that aren't supported, but for which we support arrays? did I misunderstand that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that some supported non-array types like numeric, varchar, date and etc are not tested.
So, we need to add tests for those types in addition to the types supported by this PR.

@maropu
Copy link
Member

maropu commented Feb 2, 2021

Does this PR supports all types in PostgreSQL? are there unsupported types?

@sarutak
Copy link
Member Author

sarutak commented Feb 2, 2021

Does this PR supports all types in PostgreSQL?

I think so except for serial types as I mentioned above.

Here is all types in PostgreSQL.
https://www.postgresql.org/docs/13/datatype.html

@maropu
Copy link
Member

maropu commented Feb 2, 2021

okay, could you copy the statement above into the PR description?

@sarutak
Copy link
Member Author

sarutak commented Feb 2, 2021

I noticed the previous change didn't cover macaddr8, pg_snapshot, bit varying and interval so I've updated the code and description.

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39377/

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39377/

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Test build #134788 has finished for PR 31419 at commit ac54aa2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member Author

sarutak commented Feb 2, 2021

retest this please.

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39386/

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39386/

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Test build #134798 has finished for PR 31419 at commit ac54aa2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39583/

@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39583/

@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Test build #135000 has finished for PR 31419 at commit 010413e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AlterTableSetLocation(
  • case class AlterTableSetProperties(
  • case class AlterTableUnsetProperties(
  • implicit class MetadataColumnHelper(attr: Attribute)
  • class ResolveSessionCatalog(val catalogManager: CatalogManager)

@sarutak
Copy link
Member Author

sarutak commented Feb 9, 2021

@srowen @maropu Do you have any comment on the latest change?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I'm clear, you're saying things like tsvector are already supported, and now tested, and now tsvector[] is also supported? If not then I'm still wondering why the non-array types can't be supported the same way.

@sarutak
Copy link
Member Author

sarutak commented Feb 9, 2021

Just so I'm clear, you're saying things like tsvector are already supported, and now tested, and now tsvector[] is also supported?

Yes. tsvector is already supported but it's not tested so I opened #31456.
On the other hand, tsvector[] and other array types added in this PR are not supported before this PR.

For non-array types, PostgresDialect maps JDBC OTHER type to StringType.

So, tsvector is already supported.

For array types, PostgresDialect maps the JDBC type of elements to a Spark SQL type according to the type name.
If a corresponding Spark SQL type is defined, JDBC ARRAY is mapped to ArrayType.

.
But, if there is no corresponding mapping definition for an array type, it's mapped to null rather than ArrayType.
.

@sarutak
Copy link
Member Author

sarutak commented Feb 9, 2021

@srowen
Ah, you mean how about mapping to StringType by default for array elements?

@srowen
Copy link
Member

srowen commented Feb 9, 2021

No i just wondered if I should see tests for the non-array types here. It is OK if it's in another PR. This seems fine if you're ready to merge.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the test can pass on my local env.

@maropu maropu closed this in f79305a Feb 10, 2021
@maropu
Copy link
Member

maropu commented Feb 10, 2021

Thanks! Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants