Skip to content

[SPARK-34180][SQL] Revert SPARK-33888#31270

Closed
sarutak wants to merge 1 commit intoapache:masterfrom
sarutak:revert-33888
Closed

[SPARK-34180][SQL] Revert SPARK-33888#31270
sarutak wants to merge 1 commit intoapache:masterfrom
sarutak:revert-33888

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Jan 21, 2021

What changes were proposed in this pull request?

This PR reverts SPARK-33888 (#30902) .
That PR has the two serious problems.

  1. A regression which affects PostgresDialect causing PostgreSQLIntegrationSuite to fail.
  2. That PR changes the JDBC-Catalyst type mapping. sql.Types.TIME is mapped to sql.types.IntegerType so we'll get millisecond value. But the start time point is different between MsSqlServer and other RDBMSs. For MsSqlServer, it starts from 1900-01-01 00:00:00 while 1970-01-01 00:00:00 for other RDBMS.

More about problem-2, the following assertion passes before SPARK-33888.

// This is from jdbc.MsSqlServerIntegrationSuite
assert(row.getAs[Timestamp](5).equals(Timestamp.valueOf("1900-01-01 13:31:24.0")))

But after SPARK-33888, the following modified assertion fails because the millisecond value is considered starting from 1970-01-01 00:00:00.

assert(row.getAs[Integer](5).equals(Timestamp.valueOf("1900-01-01 13:31:24.0").getTime.toInt))

Why are the changes needed?

To fix the serious issues.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

This is just reverting a previous change.

…as TimestampType, it should be physical Int in millis"

This reverts commit 0b647fe.
@cloud-fan
Copy link
Contributor

What do you mean by "start time point"? The TIME type has no date part and should return seconds(with certain precision) of the hour:minute:second fields.

@sarutak
Copy link
Member Author

sarutak commented Jan 21, 2021

@cloud-fan

What do you mean by "start time point"? The TIME type has no date part and should return seconds(with certain precision) of the hour:minute:second fields.

Yes it has no date part. So in MsSqlServer, the date part is assumed 1900-01-01 00:00:00 while other RDBMS assumes it as 1970-01-01 00:00:00.
https://docs.microsoft.com/ja-jp/sql/t-sql/data-types/time-transact-sql?view=sql-server-ver15

@sarutak
Copy link
Member Author

sarutak commented Jan 21, 2021

@cloud-fan
Or, do you think the assert(row.getAs[Timestamp](5).equals(Timestamp.valueOf("1900-01-01 13:31:24.0"))) is wrong because it contains date part?

@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38890/

@cloud-fan
Copy link
Contributor

@sarutak can you take a look at https://github.com/apache/spark/pull/30902/files#r561601546 ? I think the test is wrong.

@sarutak
Copy link
Member Author

sarutak commented Jan 21, 2021

@cloud-fan O.K, I'll close this PR and open another PR to fix the wrong assertion. Thanks.

@sarutak sarutak closed this Jan 21, 2021
@SparkQA
Copy link

SparkQA commented Jan 21, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38890/

@sarutak sarutak deleted the revert-33888 branch June 4, 2021 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants