[SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType#31491
[SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType#31491sarutak wants to merge 7 commits intoapache:masterfrom
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134939 has finished for PR 31491 at commit
|
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #134997 has finished for PR 31491 at commit
|
docs/sql-migration-guide.md
Outdated
|
|
||
| ## Upgrading from Spark SQL 3.1 to 3.2 | ||
|
|
||
| - Since Spark 3.2, all the supported JDBC dialects use StringType for ROWID. Previously, Oracle dialect uses StringType and the other dialects use LongType. |
There was a problem hiding this comment.
nit: Previously, => In Spark 3.1 or earlier,
|
Could you add tests in the |
|
@maropu I've added. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
| } | ||
|
|
||
| test("SPARK-34379: Map JDBC RowID to StringType rather than LongType") { | ||
| val mockRsmd = mock(classOf[java.sql.ResultSetMetaData]) |
There was a problem hiding this comment.
h2 database cannot generate rowid-typed data?
There was a problem hiding this comment.
It cannot.
H2 has _rowid_ as a hidden column but it's not compatible with JDBC ROWID.
_rowid_ is represented as long and H2 doesn't support getRowId.
https://github.com/h2database/h2database/blob/6290b79a2418189c5faa0e0506bf6503fc7630e6/h2/src/main/org/h2/jdbc/JdbcResultSet.java#L3292
|
|
||
| ## Upgrading from Spark SQL 3.1 to 3.2 | ||
|
|
||
| - Since Spark 3.2, all the supported JDBC dialects use StringType for ROWID. In Spark 3.1 or earlier, Oracle dialect uses StringType and the other dialects use LongType. |
There was a problem hiding this comment.
dialects is an internal word? If so, how about saying "Since Spark 3.2, `java.sql.ROWID` is mapped to `StringType` when reading data from other databases via JDBC"?
There was a problem hiding this comment.
At least, I use dialects here as a general word, not represents specific implementations like PostgresDialect.
dialect and dialects have been used from before in the migration guide.
|
It looks fine otherwise (note: I've checked that |
|
Test build #135096 has finished for PR 31491 at commit
|
| case java.sql.Types.REF => StringType | ||
| case java.sql.Types.REF_CURSOR => null | ||
| case java.sql.Types.ROWID => LongType | ||
| case java.sql.Types.ROWID => StringType |
There was a problem hiding this comment.
So basically we can't assume the row ID is <= 8 bytes? if that's true then I agree.
There was a problem hiding this comment.
Not only we can't assume the length of the ROWID but also it's not required to be represented as integer.
JDBC RowId declares getBytes and toString to represent ROWID so I think we can safely map ROWID to StringType.
https://docs.oracle.com/javase/8/docs/api/java/sql/RowId.html
|
Test build #135184 has started for PR 31491 at commit |
|
@sarutak feel free to merge when ready |
|
Thanks all. Merged to |
What changes were proposed in this pull request?
This PR fix an issue that
java.sql.RowIdis mapped toLongTypeand preferStringType.In the current implementation, JDBC RowID type is mapped to
LongTypeexcept forOracleDialect, but there is no guarantee to be able to convert RowID to long.java.sql.RowIddeclarestoStringand the specification ofjava.sql.RowIdsaysSo, we should prefer StringType to LongType.
Why are the changes needed?
This seems to be a potential bug.
Does this PR introduce any user-facing change?
Yes. RowID is mapped to StringType rather than LongType.
How was this patch tested?
New test and the existing test case
SPARK-32992: map Oracle's ROWID type to StringTypeinOracleIntegrationSuitepasses.