Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ license: |

## Upgrading from Spark SQL 3.1 to 3.2

- In Spark 3.2, money type in PostgreSQL table is converted to `StringType` and money[] type is not supported due to the JDBC driver for PostgreSQL can't handle those types properly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, DecimalType fixes the accuracy problem with money, but can't account for units. I suppose we could one day support it as a struct of DecimalType and StringType for currency, but string seems fine now.
Why not a string array for a money array?

@sarutak sarutak Feb 3, 2021

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a string array for a money array?

For money type, Spark SQL calls PgResultSet.getDouble causing the error.

[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.204 executor driver): org.postgresql.util.PSQLException: Bad value for type double : 1,000.00
[info] 	at org.postgresql.jdbc.PgResultSet.toDouble(PgResultSet.java:3104)
[info] 	at org.postgresql.jdbc.PgResultSet.getDouble(PgResultSet.java:2432)
[info] 	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$5(JdbcUtils.scala:418)

So, we can avoid this issue by mapping money type to StringType to let Spark SQL call getString rather than getDouble.

For money[] type, on the other hand, the PostgreSQL's JDBC driver calls PgResultSet.toDouble internally.

[info] 	at org.postgresql.jdbc.PgResultSet.toDouble(PgResultSet.java:3104)
[info] 	at org.postgresql.jdbc.ArrayDecoding$5.parseValue(ArrayDecoding.java:235)
[info] 	at org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:122)
[info] 	at org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:764)

We can control how Spark SQL gets the value from the array obtained by PgResultSet.getArray, but it's difficult to control how the JDBC driver handles the elements in the array which is to be returned.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the migration guide, we should also mention the previous behavior.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've updated.


- In Spark 3.2, `spark.sql.adaptive.enabled` is enabled by default. To restore the behavior before Spark 3.2, you can set `spark.sql.adaptive.enabled` to `false`.

- In Spark 3.2, the following meta-characters are escaped in the `show()` action. In Spark 3.1 or earlier, the following metacharacters are output as it is.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,11 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
conn.prepareStatement("INSERT INTO char_array_types VALUES " +
"""('{"a", "bcd"}', '{"ef", "gh"}', '{"i", "j", "kl"}', '{"mnop"}', '{"q", "r"}')"""
).executeUpdate()

conn.prepareStatement("CREATE TABLE money_types (" +
"c0 money)").executeUpdate()
conn.prepareStatement("INSERT INTO money_types VALUES " +
"('$1,000.00')").executeUpdate()
}

test("Type mapping for various types") {
Expand Down Expand Up @@ -262,4 +267,12 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
assert(row(0).getSeq[String](3) === Seq("mnop"))
assert(row(0).getSeq[String](4) === Seq("q", "r"))
}

test("SPARK-34333: money type tests") {
val df = sqlContext.read.jdbc(jdbcUrl, "money_types", new Properties)
val row = df.collect()
assert(row.length === 1)
assert(row(0).length === 1)
assert(row(0).getString(0) === "$1,000.00")
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,12 @@ private object PostgresDialect extends JdbcDialect {
Some(FloatType)
} else if (sqlType == Types.SMALLINT) {
Some(ShortType)
} else if (sqlType == Types.BIT && typeName.equals("bit") && size != 1) {
} else if (sqlType == Types.BIT && typeName == "bit" && size != 1) {
Some(BinaryType)
} else if (sqlType == Types.DOUBLE && typeName == "money") {
// money type seems to be broken but one workaround is to handle it as string.
// See SPARK-34333 and https://github.com/pgjdbc/pgjdbc/issues/100
Some(StringType)
} else if (sqlType == Types.OTHER) {
Some(StringType)
} else if (sqlType == Types.ARRAY) {
Expand All @@ -56,7 +60,7 @@ private object PostgresDialect extends JdbcDialect {
case "int4" => Some(IntegerType)
case "int8" | "oid" => Some(LongType)
case "float4" => Some(FloatType)
case "money" | "float8" => Some(DoubleType)
case "float8" => Some(DoubleType)
case "text" | "varchar" | "char" | "bpchar" | "cidr" | "inet" | "json" | "jsonb" | "uuid" =>
Some(StringType)
case "bytea" => Some(BinaryType)
Expand All @@ -66,6 +70,11 @@ private object PostgresDialect extends JdbcDialect {
case "numeric" | "decimal" =>
// SPARK-26538: handle numeric without explicit precision and scale.
Some(DecimalType. SYSTEM_DEFAULT)
case "money" =>
// money[] type seems to be broken and difficult to handle.
// So this method returns None for now.
// See SPARK-34333 and https://github.com/pgjdbc/pgjdbc/issues/1405
None
case _ => None
}

Expand Down