Skip to content

[SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType#31491

Closed
sarutak wants to merge 7 commits intoapache:masterfrom
sarutak:rowid-type
Closed

[SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType#31491
sarutak wants to merge 7 commits intoapache:masterfrom
sarutak:rowid-type

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Feb 5, 2021

What changes were proposed in this pull request?

This PR fix an issue that java.sql.RowId is mapped to LongType and prefer StringType.

In the current implementation, JDBC RowID type is mapped to LongType except for OracleDialect, but there is no guarantee to be able to convert RowID to long.
java.sql.RowId declares toString and the specification of java.sql.RowId says

all methods on the RowId interface must be fully implemented if the JDBC driver supports the data type
(https://docs.oracle.com/javase/8/docs/api/java/sql/RowId.html)

So, we should prefer StringType to LongType.

Why are the changes needed?

This seems to be a potential bug.

Does this PR introduce any user-facing change?

Yes. RowID is mapped to StringType rather than LongType.

How was this patch tested?

New test and the existing test case SPARK-32992: map Oracle's ROWID type to StringType in OracleIntegrationSuite passes.

@github-actions github-actions bot added the SQL label Feb 5, 2021
@SparkQA
Copy link

SparkQA commented Feb 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39522/

@SparkQA
Copy link

SparkQA commented Feb 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39522/

@SparkQA
Copy link

SparkQA commented Feb 5, 2021

Test build #134939 has finished for PR 31491 at commit 222be6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot added the DOCS label Feb 8, 2021
@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39580/

@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39580/

@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Test build #134997 has finished for PR 31491 at commit 899706f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


## Upgrading from Spark SQL 3.1 to 3.2

- Since Spark 3.2, all the supported JDBC dialects use StringType for ROWID. Previously, Oracle dialect uses StringType and the other dialects use LongType.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Previously, => In Spark 3.1 or earlier,

@maropu
Copy link
Member

maropu commented Feb 10, 2021

Could you add tests in the core package?

@sarutak
Copy link
Member Author

sarutak commented Feb 10, 2021

@maropu I've added.

@SparkQA
Copy link

SparkQA commented Feb 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39678/

@SparkQA
Copy link

SparkQA commented Feb 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39678/

}

test("SPARK-34379: Map JDBC RowID to StringType rather than LongType") {
val mockRsmd = mock(classOf[java.sql.ResultSetMetaData])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h2 database cannot generate rowid-typed data?

Copy link
Member Author

@sarutak sarutak Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot.
H2 has _rowid_ as a hidden column but it's not compatible with JDBC ROWID.
_rowid_ is represented as long and H2 doesn't support getRowId.
https://github.com/h2database/h2database/blob/6290b79a2418189c5faa0e0506bf6503fc7630e6/h2/src/main/org/h2/jdbc/JdbcResultSet.java#L3292


## Upgrading from Spark SQL 3.1 to 3.2

- Since Spark 3.2, all the supported JDBC dialects use StringType for ROWID. In Spark 3.1 or earlier, Oracle dialect uses StringType and the other dialects use LongType.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dialects is an internal word? If so, how about saying "Since Spark 3.2, `java.sql.ROWID` is mapped to `StringType` when reading data from other databases via JDBC"?

Copy link
Member Author

@sarutak sarutak Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, I use dialects here as a general word, not represents specific implementations like PostgresDialect.
dialect and dialects have been used from before in the migration guide.

@maropu
Copy link
Member

maropu commented Feb 10, 2021

It looks fine otherwise (note: I've checked that OracleIntegrationSuite can pass)

@SparkQA
Copy link

SparkQA commented Feb 10, 2021

Test build #135096 has finished for PR 31491 at commit 315c7e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case java.sql.Types.REF => StringType
case java.sql.Types.REF_CURSOR => null
case java.sql.Types.ROWID => LongType
case java.sql.Types.ROWID => StringType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically we can't assume the row ID is <= 8 bytes? if that's true then I agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only we can't assume the length of the ROWID but also it's not required to be represented as integer.
JDBC RowId declares getBytes and toString to represent ROWID so I think we can safely map ROWID to StringType.
https://docs.oracle.com/javase/8/docs/api/java/sql/RowId.html

@SparkQA
Copy link

SparkQA commented Feb 17, 2021

Test build #135184 has started for PR 31491 at commit 73f53d9.

@srowen
Copy link
Member

srowen commented Feb 20, 2021

@sarutak feel free to merge when ready

@asfgit asfgit closed this in 82b33a3 Feb 20, 2021
@sarutak
Copy link
Member Author

sarutak commented Feb 20, 2021

Thanks all. Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants