Skip to content

[SPARK-34180][SQL] Fix the regression brought by SPARK-33888 for PostgresDialect #31262

Closed
sarutak wants to merge 2 commits intoapache:masterfrom
sarutak:fix-postgres-dialect-regression
Closed

[SPARK-34180][SQL] Fix the regression brought by SPARK-33888 for PostgresDialect #31262
sarutak wants to merge 2 commits intoapache:masterfrom
sarutak:fix-postgres-dialect-regression

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Jan 20, 2021

What changes were proposed in this pull request?

This PR fixes the regression bug brought by SPARK-33888 (#30902).
After that PR merged, PostgresDIalect#getCatalystType throws Exception for array types.

[info] - Type mapping for various types *** FAILED *** (551 milliseconds)
[info]   java.util.NoSuchElementException: key not found: scale
[info]   at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:106)
[info]   at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:104)
[info]   at org.apache.spark.sql.types.Metadata.get(Metadata.scala:111)
[info]   at org.apache.spark.sql.types.Metadata.getLong(Metadata.scala:51)
[info]   at org.apache.spark.sql.jdbc.PostgresDialect$.getCatalystType(PostgresDialect.scala:43)
[info]   at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:321)

Why are the changes needed?

To fix the regression bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I confirmed the test case SPARK-22291: Conversion error when transforming array types of uuid, inet and cidr to StingType in PostgreSQL in PostgresIntegrationSuite passed.

I also confirmed whether all the v2.*IntegrationSuite pass because this PR changed them and they passed.

@github-actions github-actions bot added the SQL label Jan 20, 2021
@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38852/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38855/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38852/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38855/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134266 has finished for PR 31262 at commit 392482d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak sarutak force-pushed the fix-postgres-dialect-regression branch from 392482d to b9cf73e Compare January 20, 2021 15:18
Comment on lines 312 to 321
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: This is the main change of this PR.
fieldScale is passed to getCatalystType directly rather than done through metadata.

@sarutak sarutak changed the title [SPARK-34180][SQL] Fix the regression for PostgresDialect brought by SPARK-33888 [SPARK-34180][SQL] Fix the regression brought by SPARK-33888 for PostgresDialect Jan 20, 2021
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is this correct? doesn't look like it should be an int but is there a reason it must be in postgres?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@sarutak sarutak Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That PR (#30902) has two problems.

  1. Regression which affects PostgresDialect.
  2. jdbc.*IntegrationSuite doesn't reflect the change (jdbc.Types.TIME to sql.types.IntegerType mapping).

Even though the problematic PR is the same (#30902), the root cause is different. So I'll focus on the problem-1 in this PR and open another PR to fix jdbc.*IntegrationSuite

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134283 has finished for PR 31262 at commit b9cf73e.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment on lines 1750 to 1760
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks API compatibility for JdbcDialect and its subclasses but they are developer API.
So I'd like to discuss whether it's acceptable to do it or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add JIRA ID there at the beginning of this block. BTW, why do we need to add this 2.0.x exclusion rule?

  // Exclude rules for 2.0.x
  lazy val v20excludes = {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it's just a mistake. I'll fix it.

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38869/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38869/

@dongjoon-hyun
Copy link
Member

cc @saikocat , @cloud-fan , @gengliangwang

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38872/

@github-actions github-actions bot added the BUILD label Jan 20, 2021
@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38872/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134286 has finished for PR 31262 at commit 642b6fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@saikocat
Copy link
Contributor

saikocat commented Jan 21, 2021

Just for discussion, I think updating the API like solve the chicken and egg problems with putting and getting stuffs in the metadata builder in the pull request (31252). Though, I am not sure why don't we:

  1. Create a case class (JdbcMetadata(sqlType, dataType, scale...) so the API of JdbcDialect won't be changed so frequent in case we want to add in additional field parameter like in this PR.
  2. Or just pass in the ResultSetMetadata - this break abstraction & introduce dependencies though

Cheers,

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Test build #134302 has finished for PR 31262 at commit 084fb2d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member Author

sarutak commented Jan 21, 2021

@srowen @dongjoon-hyun @saikocat I found another problem related to #30902. So I've opened another PR (#31270) to revert it. If that PR is reverted, the regression should be recovered.

EDIT:
After discussion with @cloud-fan in #31270, I found it's not necessary to revert SPARK-33888 so I'll continue this PR.

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38889/

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38889/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple change is to always include the scale metadata even if it's 0. I think having extra metadata doesn't hurt, and we can update some tests if needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the first solution is like what you said but I noticed we need to update some tests.
If it's reasonable, I'll do it.

@sarutak sarutak force-pushed the fix-postgres-dialect-regression branch from 9fd2ba6 to 85ecfe0 Compare January 21, 2021 06:06
@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Test build #134303 has finished for PR 31262 at commit 9fd2ba6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38897/

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38897/

@cloud-fan
Copy link
Contributor

Do we have a jenkins command to trigger JDBC integration test?

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Test build #134310 has finished for PR 31262 at commit 85ecfe0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member Author

sarutak commented Jan 21, 2021

@cloud-fan AFAIK we have no way to run them on Jenkins and GA...
So I tested on my laptop with build/sbt -Phive -Phive-thriftserver -Pdocker-integration-tests "docker-integration-tests/test"

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38918/

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38918/

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Test build #134331 has finished for PR 31262 at commit 4834112.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
…gresDialect

### What changes were proposed in this pull request?

This PR fixes the regression bug brought by SPARK-33888 (apache#30902).
After that PR merged, `PostgresDIalect#getCatalystType` throws Exception for array types.
```
[info] - Type mapping for various types *** FAILED *** (551 milliseconds)
[info]   java.util.NoSuchElementException: key not found: scale
[info]   at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:106)
[info]   at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:104)
[info]   at org.apache.spark.sql.types.Metadata.get(Metadata.scala:111)
[info]   at org.apache.spark.sql.types.Metadata.getLong(Metadata.scala:51)
[info]   at org.apache.spark.sql.jdbc.PostgresDialect$.getCatalystType(PostgresDialect.scala:43)
[info]   at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:321)
```

### Why are the changes needed?

To fix the regression bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed the test case `SPARK-22291: Conversion error when transforming array types of uuid, inet and cidr to StingType in PostgreSQL` in `PostgresIntegrationSuite` passed.

I also confirmed whether all the `v2.*IntegrationSuite` pass because this PR changed them and they passed.

Closes apache#31262 from sarutak/fix-postgres-dialect-regression.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants