[SPARK-34180][SQL] Fix the regression brought by SPARK-33888 for PostgresDialect #31262
[SPARK-34180][SQL] Fix the regression brought by SPARK-33888 for PostgresDialect #31262sarutak wants to merge 2 commits intoapache:masterfrom
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test status success |
|
Test build #134266 has finished for PR 31262 at commit
|
392482d to
b9cf73e
Compare
There was a problem hiding this comment.
For reviewers: This is the main change of this PR.
fieldScale is passed to getCatalystType directly rather than done through metadata.
There was a problem hiding this comment.
Hm, is this correct? doesn't look like it should be an int but is there a reason it must be in postgres?
There was a problem hiding this comment.
As of #30902, JDBC TIME type seems to be mapped to Catalyst IntegerType.
https://github.com/apache/spark/pull/30902/files#diff-c3859e97335ead4b131263565c987d877bea0af3adbd6c5bf2d3716768d2e083R230
There was a problem hiding this comment.
That PR (#30902) has two problems.
- Regression which affects PostgresDialect.
jdbc.*IntegrationSuitedoesn't reflect the change (jdbc.Types.TIMEtosql.types.IntegerTypemapping).
Even though the problematic PR is the same (#30902), the root cause is different. So I'll focus on the problem-1 in this PR and open another PR to fix jdbc.*IntegrationSuite
|
Test build #134283 has finished for PR 31262 at commit
|
project/MimaExcludes.scala
Outdated
There was a problem hiding this comment.
This change breaks API compatibility for JdbcDialect and its subclasses but they are developer API.
So I'd like to discuss whether it's acceptable to do it or not.
There was a problem hiding this comment.
Please add JIRA ID there at the beginning of this block. BTW, why do we need to add this 2.0.x exclusion rule?
// Exclude rules for 2.0.x
lazy val v20excludes = {
There was a problem hiding this comment.
Thanks, it's just a mistake. I'll fix it.
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
cc @saikocat , @cloud-fan , @gengliangwang |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134286 has finished for PR 31262 at commit
|
|
Just for discussion, I think updating the API like solve the chicken and egg problems with putting and getting stuffs in the metadata builder in the pull request (31252). Though, I am not sure why don't we:
Cheers, |
|
Test build #134302 has finished for PR 31262 at commit
|
|
@srowen @dongjoon-hyun @saikocat I found another problem related to #30902. So I've opened another PR (#31270) to revert it. If that PR is reverted, the regression should be recovered. EDIT: |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
There was a problem hiding this comment.
A simple change is to always include the scale metadata even if it's 0. I think having extra metadata doesn't hurt, and we can update some tests if needed.
There was a problem hiding this comment.
Actually, the first solution is like what you said but I noticed we need to update some tests.
If it's reasonable, I'll do it.
9fd2ba6 to
85ecfe0
Compare
|
Test build #134303 has finished for PR 31262 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Do we have a jenkins command to trigger JDBC integration test? |
|
Test build #134310 has finished for PR 31262 at commit
|
|
@cloud-fan AFAIK we have no way to run them on Jenkins and GA... |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134331 has finished for PR 31262 at commit
|
|
Merged to master. |
…gresDialect ### What changes were proposed in this pull request? This PR fixes the regression bug brought by SPARK-33888 (apache#30902). After that PR merged, `PostgresDIalect#getCatalystType` throws Exception for array types. ``` [info] - Type mapping for various types *** FAILED *** (551 milliseconds) [info] java.util.NoSuchElementException: key not found: scale [info] at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:106) [info] at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:104) [info] at org.apache.spark.sql.types.Metadata.get(Metadata.scala:111) [info] at org.apache.spark.sql.types.Metadata.getLong(Metadata.scala:51) [info] at org.apache.spark.sql.jdbc.PostgresDialect$.getCatalystType(PostgresDialect.scala:43) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:321) ``` ### Why are the changes needed? To fix the regression bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I confirmed the test case `SPARK-22291: Conversion error when transforming array types of uuid, inet and cidr to StingType in PostgreSQL` in `PostgresIntegrationSuite` passed. I also confirmed whether all the `v2.*IntegrationSuite` pass because this PR changed them and they passed. Closes apache#31262 from sarutak/fix-postgres-dialect-regression. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR fixes the regression bug brought by SPARK-33888 (#30902).
After that PR merged,
PostgresDIalect#getCatalystTypethrows Exception for array types.Why are the changes needed?
To fix the regression bug.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
I confirmed the test case
SPARK-22291: Conversion error when transforming array types of uuid, inet and cidr to StingType in PostgreSQLinPostgresIntegrationSuitepassed.I also confirmed whether all the
v2.*IntegrationSuitepass because this PR changed them and they passed.