Skip to content

Conversation

@dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Apr 4, 2023

What changes were proposed in this pull request?

This PR fixes a correctness bug for INSERT commands with timestamp literals. The bug manifests when:

  • An INSERT command includes a user-specified column list of fewer columns than the target table.
  • The provided values include timestamp literals.

The bug was that the long integer values stored in the rows to represent these timestamp literals were getting assigned back to UnresolvedInlineTable rows without the timestamp type. Then the analyzer inserted an implicit cast from LongType to TimestampType later, which incorrectly caused the value to change during execution.

This PR fixes the bug by propagating the timestamp type directly to the output table instead.

Why are the changes needed?

This PR fixes a correctness bug.

Does this PR introduce any user-facing change?

Yes, this PR fixes a correctness bug.

How was this patch tested?

This PR adds a new unit test suite.

commit
@github-actions github-actions bot added the SQL label Apr 4, 2023
@dtenedor
Copy link
Contributor Author

dtenedor commented Apr 4, 2023

Hi @gengliangwang here is the correctness bug fix 🙏

@dtenedor dtenedor requested a review from gengliangwang April 4, 2023 18:18
@dtenedor dtenedor requested a review from gengliangwang April 4, 2023 20:50
@dtenedor dtenedor requested a review from gengliangwang April 4, 2023 21:08
@dtenedor dtenedor requested a review from gengliangwang April 4, 2023 21:53
@dtenedor dtenedor changed the title [SPARK-43018][SQL] Fix bug for INSERT commands with timetstamp literals [SPARK-43018][SQL] Fix bug for INSERT commands with timestamp literals Apr 5, 2023
Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one minor comment

@gengliangwang
Copy link
Member

Thanks, merging to master/branch-3.4
cc @xinrong-meng

gengliangwang pushed a commit that referenced this pull request Apr 6, 2023
### What changes were proposed in this pull request?

This PR fixes a correctness bug for INSERT commands with timestamp literals. The bug manifests when:

* An INSERT command includes a user-specified column list of fewer columns than the target table.
* The provided values include timestamp literals.

The bug was that the long integer values stored in the rows to represent these timestamp literals were getting assigned back to `UnresolvedInlineTable` rows without the timestamp type. Then the analyzer inserted an implicit cast from `LongType` to `TimestampType` later, which incorrectly caused the value to change during execution.

This PR fixes the bug by propagating the timestamp type directly to the output table instead.

### Why are the changes needed?

This PR fixes a correctness bug.

### Does this PR introduce _any_ user-facing change?

Yes, this PR fixes a correctness bug.

### How was this patch tested?

This PR adds a new unit test suite.

Closes #40652 from dtenedor/assign-correct-insert-types.

Authored-by: Daniel Tenedorio <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
(cherry picked from commit 9f0bf51)
Signed-off-by: Gengliang Wang <[email protected]>
* limitations under the License.
*/

package org.apache.spark.sql.catalyst.analysis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @dtenedor and @gengliangwang . I have a question.

Although I understand this test suite provide a test coverage for org.apache.spark.sql.catalyst.analysis.ResolveDefaultColumns, it doesn't mean this test suite is belong to org.apache.spark.sql.catalyst.analysis package. This test suite exists in sql module and alone in this directory

$ tree sql/core/src/test/scala/org/apache/spark/sql/catalyst
sql/core/src/test/scala/org/apache/spark/sql/catalyst
└── analysis
    └── ResolveDefaultColumnsSuite.scala

Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dongjoon-hyun I don't think this is intentional, we could move the ResolveDefaultColumnsSuite to org.apache.spark.sql package. What do you think? If you want me to do this, I can prepare a PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(asLocalRelation(result) == localRelation)
}

test("SPARK-43018: INSERT timestamp values into a table with column DEFAULTs") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially, this test case tests more than catalyst/analysis.

snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
### What changes were proposed in this pull request?

This PR fixes a correctness bug for INSERT commands with timestamp literals. The bug manifests when:

* An INSERT command includes a user-specified column list of fewer columns than the target table.
* The provided values include timestamp literals.

The bug was that the long integer values stored in the rows to represent these timestamp literals were getting assigned back to `UnresolvedInlineTable` rows without the timestamp type. Then the analyzer inserted an implicit cast from `LongType` to `TimestampType` later, which incorrectly caused the value to change during execution.

This PR fixes the bug by propagating the timestamp type directly to the output table instead.

### Why are the changes needed?

This PR fixes a correctness bug.

### Does this PR introduce _any_ user-facing change?

Yes, this PR fixes a correctness bug.

### How was this patch tested?

This PR adds a new unit test suite.

Closes apache#40652 from dtenedor/assign-correct-insert-types.

Authored-by: Daniel Tenedorio <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
(cherry picked from commit 9f0bf51)
Signed-off-by: Gengliang Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants