[SPARK-34180][SQL] Fix the regression brought by SPARK-33888 for PostgresDialect by sarutak · Pull Request #31262 · apache/spark

sarutak · 2021-01-20T10:49:15Z

What changes were proposed in this pull request?

This PR fixes the regression bug brought by SPARK-33888 (#30902).
After that PR merged, PostgresDIalect#getCatalystType throws Exception for array types.

[info] - Type mapping for various types *** FAILED *** (551 milliseconds)
[info]   java.util.NoSuchElementException: key not found: scale
[info]   at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:106)
[info]   at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:104)
[info]   at org.apache.spark.sql.types.Metadata.get(Metadata.scala:111)
[info]   at org.apache.spark.sql.types.Metadata.getLong(Metadata.scala:51)
[info]   at org.apache.spark.sql.jdbc.PostgresDialect$.getCatalystType(PostgresDialect.scala:43)
[info]   at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:321)

Why are the changes needed?

To fix the regression bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I confirmed the test case SPARK-22291: Conversion error when transforming array types of uuid, inet and cidr to StingType in PostgreSQL in PostgresIntegrationSuite passed.

I also confirmed whether all the v2.*IntegrationSuite pass because this PR changed them and they passed.

SparkQA · 2021-01-20T11:43:33Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38852/

SparkQA · 2021-01-20T11:46:52Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38855/

SparkQA · 2021-01-20T12:11:18Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38852/

SparkQA · 2021-01-20T12:15:10Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38855/

SparkQA · 2021-01-20T13:03:33Z

Test build #134266 has finished for PR 31262 at commit 392482d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sarutak · 2021-01-20T15:22:45Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

For reviewers: This is the main change of this PR.
fieldScale is passed to getCatalystType directly rather than done through metadata.

srowen · 2021-01-20T15:52:59Z

...er-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala

Hm, is this correct? doesn't look like it should be an int but is there a reason it must be in postgres?

As of #30902, JDBC TIME type seems to be mapped to Catalyst IntegerType.
https://github.com/apache/spark/pull/30902/files#diff-c3859e97335ead4b131263565c987d877bea0af3adbd6c5bf2d3716768d2e083R230

That PR (#30902) has two problems.

Regression which affects PostgresDialect.

jdbc.*IntegrationSuite doesn't reflect the change (jdbc.Types.TIME to sql.types.IntegerType mapping).

Even though the problematic PR is the same (#30902), the root cause is different. So I'll focus on the problem-1 in this PR and open another PR to fix jdbc.*IntegrationSuite

SparkQA · 2021-01-20T16:18:06Z

Test build #134283 has finished for PR 31262 at commit b9cf73e.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

sarutak · 2021-01-20T16:53:03Z

project/MimaExcludes.scala

This change breaks API compatibility for JdbcDialect and its subclasses but they are developer API.
So I'd like to discuss whether it's acceptable to do it or not.

Please add JIRA ID there at the beginning of this block. BTW, why do we need to add this 2.0.x exclusion rule?

// Exclude rules for 2.0.x lazy val v20excludes = {

Thanks, it's just a mistake. I'll fix it.

SparkQA · 2021-01-20T16:56:50Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38869/

SparkQA · 2021-01-20T17:32:36Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38869/

dongjoon-hyun · 2021-01-20T17:42:45Z

cc @saikocat , @cloud-fan , @gengliangwang

SparkQA · 2021-01-20T17:59:26Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38872/

SparkQA · 2021-01-20T18:25:59Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38872/

SparkQA · 2021-01-20T19:30:02Z

Test build #134286 has finished for PR 31262 at commit 642b6fb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

saikocat · 2021-01-21T02:43:00Z

Just for discussion, I think updating the API like solve the chicken and egg problems with putting and getting stuffs in the metadata builder in the pull request (31252). Though, I am not sure why don't we:

Create a case class (JdbcMetadata(sqlType, dataType, scale...) so the API of JdbcDialect won't be changed so frequent in case we want to add in additional field parameter like in this PR.
Or just pass in the ResultSetMetadata - this break abstraction & introduce dependencies though

Cheers,

SparkQA · 2021-01-21T03:25:32Z

Test build #134302 has finished for PR 31262 at commit 084fb2d.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

sarutak · 2021-01-21T04:15:48Z

@srowen @dongjoon-hyun @saikocat I found another problem related to #30902. So I've opened another PR (#31270) to revert it. If that PR is reverted, the regression should be recovered.

EDIT:
After discussion with @cloud-fan in #31270, I found it's not necessary to revert SPARK-33888 so I'll continue this PR.

SparkQA · 2021-01-21T04:49:10Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38889/

SparkQA · 2021-01-21T05:17:50Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38889/

cloud-fan · 2021-01-21T05:21:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

A simple change is to always include the scale metadata even if it's 0. I think having extra metadata doesn't hurt, and we can update some tests if needed.

Actually, the first solution is like what you said but I noticed we need to update some tests.
If it's reasonable, I'll do it.

SparkQA · 2021-01-21T06:24:38Z

Test build #134303 has finished for PR 31262 at commit 9fd2ba6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-01-21T06:58:41Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38897/

SparkQA · 2021-01-21T07:33:08Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38897/

cloud-fan · 2021-01-21T07:42:41Z

Do we have a jenkins command to trigger JDBC integration test?

SparkQA · 2021-01-21T07:56:39Z

Test build #134310 has finished for PR 31262 at commit 85ecfe0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sarutak · 2021-01-21T08:54:02Z

@cloud-fan AFAIK we have no way to run them on Jenkins and GA...
So I tested on my laptop with build/sbt -Phive -Phive-thriftserver -Pdocker-integration-tests "docker-integration-tests/test"

SparkQA · 2021-01-21T11:49:47Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38918/

SparkQA · 2021-01-21T12:18:28Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38918/

SparkQA · 2021-01-21T12:56:53Z

Test build #134331 has finished for PR 31262 at commit 4834112.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2021-01-22T04:02:42Z

Merged to master.

…gresDialect ### What changes were proposed in this pull request? This PR fixes the regression bug brought by SPARK-33888 (apache#30902). After that PR merged, `PostgresDIalect#getCatalystType` throws Exception for array types. ``` [info] - Type mapping for various types *** FAILED *** (551 milliseconds) [info] java.util.NoSuchElementException: key not found: scale [info] at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:106) [info] at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:104) [info] at org.apache.spark.sql.types.Metadata.get(Metadata.scala:111) [info] at org.apache.spark.sql.types.Metadata.getLong(Metadata.scala:51) [info] at org.apache.spark.sql.jdbc.PostgresDialect$.getCatalystType(PostgresDialect.scala:43) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:321) ``` ### Why are the changes needed? To fix the regression bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I confirmed the test case `SPARK-22291: Conversion error when transforming array types of uuid, inet and cidr to StingType in PostgreSQL` in `PostgresIntegrationSuite` passed. I also confirmed whether all the `v2.*IntegrationSuite` pass because this PR changed them and they passed. Closes apache#31262 from sarutak/fix-postgres-dialect-regression. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

github-actions bot added the SQL label Jan 20, 2021

sarutak force-pushed the fix-postgres-dialect-regression branch from 392482d to b9cf73e Compare January 20, 2021 15:18

sarutak commented Jan 20, 2021

View reviewed changes

sarutak changed the title ~~[SPARK-34180][SQL] Fix the regression for PostgresDialect brought by SPARK-33888~~ [SPARK-34180][SQL] Fix the regression brought by SPARK-33888 for PostgresDialect Jan 20, 2021

srowen reviewed Jan 20, 2021

View reviewed changes

sarutak commented Jan 20, 2021

View reviewed changes

github-actions bot added the BUILD label Jan 20, 2021

cloud-fan reviewed Jan 21, 2021

View reviewed changes

Simple solution

85ecfe0

sarutak force-pushed the fix-postgres-dialect-regression branch from 9fd2ba6 to 85ecfe0 Compare January 21, 2021 06:06

Modify existing tests.

4834112

cloud-fan approved these changes Jan 21, 2021

View reviewed changes

gengliangwang approved these changes Jan 21, 2021

View reviewed changes

HyukjinKwon approved these changes Jan 22, 2021

View reviewed changes

HyukjinKwon closed this in 8429021 Jan 22, 2021

skestle mentioned this pull request Jan 29, 2021

[SPARK-33888][SQL][FOLLOWUP] Restored scale metadata for ARRAY type (Postgres) #31252

Closed

Conversation

sarutak commented Jan 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarutak Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

dongjoon-hyun commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

SparkQA commented Jan 20, 2021

Uh oh!

saikocat commented Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

sarutak commented Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

cloud-fan commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

sarutak commented Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jan 21, 2021

Uh oh!

SparkQA commented Jan 21, 2021

sarutak commented Jan 20, 2021 •

edited

Loading

sarutak Jan 21, 2021 •

edited

Loading

saikocat commented Jan 21, 2021 •

edited

Loading

sarutak commented Jan 21, 2021 •

edited

Loading

sarutak commented Jan 21, 2021 •

edited

Loading