[SPARK-34333][SQL] Fix PostgresDialect to handle money types properly#31442
[SPARK-34333][SQL] Fix PostgresDialect to handle money types properly#31442sarutak wants to merge 6 commits intoapache:masterfrom
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134793 has finished for PR 31442 at commit
|
docs/sql-migration-guide.md
Outdated
|
|
||
| ## Upgrading from Spark SQL 3.1 to 3.2 | ||
|
|
||
| - In Spark 3.2, money type in PostgreSQL table is converted to `StringType` and money[] type is not supported due to the JDBC driver for PostgreSQL can't handle those types properly. |
There was a problem hiding this comment.
Yeah, DecimalType fixes the accuracy problem with money, but can't account for units. I suppose we could one day support it as a struct of DecimalType and StringType for currency, but string seems fine now.
Why not a string array for a money array?
There was a problem hiding this comment.
Why not a string array for a money array?
For money type, Spark SQL calls PgResultSet.getDouble causing the error.
[info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.204 executor driver): org.postgresql.util.PSQLException: Bad value for type double : 1,000.00
[info] at org.postgresql.jdbc.PgResultSet.toDouble(PgResultSet.java:3104)
[info] at org.postgresql.jdbc.PgResultSet.getDouble(PgResultSet.java:2432)
[info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$5(JdbcUtils.scala:418)
So, we can avoid this issue by mapping money type to StringType to let Spark SQL call getString rather than getDouble.
For money[] type, on the other hand, the PostgreSQL's JDBC driver calls PgResultSet.toDouble internally.
[info] at org.postgresql.jdbc.PgResultSet.toDouble(PgResultSet.java:3104)
[info] at org.postgresql.jdbc.ArrayDecoding$5.parseValue(ArrayDecoding.java:235)
[info] at org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:122)
[info] at org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:764)
We can control how Spark SQL gets the value from the array obtained by PgResultSet.getArray, but it's difficult to control how the JDBC driver handles the elements in the array which is to be returned.
There was a problem hiding this comment.
In the migration guide, we should also mention the previous behavior.
|
cc: @cloud-fan |
|
LGTM except one comment about migration guide. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #135060 has finished for PR 31442 at commit
|
|
Could you resolve the conflict? Looks fine otherwise. |
docs/sql-migration-guide.md
Outdated
|
|
||
| ## Upgrading from Spark SQL 3.1 to 3.2 | ||
|
|
||
| - In Spark 3.2, PostgreSQL JDBC dialect uses StringType for MONEY and MONEY[] is not supported due to the JDBC driver for PostgreSQL can't handle those types properly. Previously, DoubleType and ArrayType of DoubleType are used respectively. |
There was a problem hiding this comment.
nit: Previously, => In Spark 3.1 or earlier,
|
@maropu Thanks. I've updated. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #135091 has finished for PR 31442 at commit
|
|
|
||
| ## Upgrading from Spark SQL 3.1 to 3.2 | ||
|
|
||
| - In Spark 3.2, PostgreSQL JDBC dialect uses StringType for MONEY and MONEY[] is not supported due to the JDBC driver for PostgreSQL can't handle those types properly. In Spark 3.1 or earlier, DoubleType and ArrayType of DoubleType are used respectively. |
|
If there is no objection, I'll merge soon. |
|
Merged to |
|
Thank you, @sarutak and all. |
What changes were proposed in this pull request?
This PR changes the type mapping for
moneyandmoney[]types for PostgreSQL.Currently, those types are tried to convert to
DoubleTypeandArrayTypeofdoublerespectively.But the JDBC driver seems not to be able to handle those types properly.
pgjdbc/pgjdbc#100
pgjdbc/pgjdbc#1405
Due to these issue, we can get the error like as follows.
money type.
money[] type.
For money type, a known workaround is to treat it as string so this PR do it.
For money[], however, there is no reasonable workaround so this PR remove the support.
Why are the changes needed?
This is a bug.
Does this PR introduce any user-facing change?
Yes. As of this PR merged, money type is mapped to
StringTyperather thanDoubleTypeand the support for money[] is stopped.For money type, if the value is less than one thousand,
$100.00for instance, it works without this change so I also updated the migration guide because it's a behavior change for such small values.On the other hand, money[] seems not to work with any value but mentioned in the migration guide just in case.
How was this patch tested?
New test.