Skip to content

Conversation

@gatorsmile
Copy link
Member

This PR is to backport #20621 to branch 2.3�


What changes were proposed in this pull request?

Before the patch, Spark could infer as Date a partition value which cannot be casted to Date (this can happen when there are extra characters after a valid date, like 2018-02-15AAA).

When this happens and the input format has metadata which define the schema of the table, then null is returned as a value for the partition column, because the cast operator used in (PartitioningAwareFileIndex.inferPartitioning) is unable to convert the value.

The PR checks in the partition inference that values can be casted to Date and Timestamp, in order to infer that datatype to them.

How was this patch tested?

added UT

…o Date

## What changes were proposed in this pull request?

Before the patch, Spark could infer as Date a partition value which cannot be casted to Date (this can happen when there are extra characters after a valid date, like `2018-02-15AAA`).

When this happens and the input format has metadata which define the schema of the table, then `null` is returned as a value for the partition column, because the `cast` operator used in (`PartitioningAwareFileIndex.inferPartitioning`) is unable to convert the value.

The PR checks in the partition inference that values can be casted to Date and Timestamp, in order to infer that datatype to them.

## How was this patch tested?

added UT

Author: Marco Gaido <[email protected]>

Closes apache#20621 from mgaido91/SPARK-23436.
@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88063 has finished for PR 20764 at commit a69d8d1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

mgaido91 commented Mar 8, 2018

LGTM

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@HyukjinKwon
Copy link
Member

Merged to branch-2.3.

asfgit pushed a commit that referenced this pull request Mar 9, 2018
…an be casted to Date

This PR is to backport #20621 to branch 2.3�

---
## What changes were proposed in this pull request?

Before the patch, Spark could infer as Date a partition value which cannot be casted to Date (this can happen when there are extra characters after a valid date, like `2018-02-15AAA`).

When this happens and the input format has metadata which define the schema of the table, then `null` is returned as a value for the partition column, because the `cast` operator used in (`PartitioningAwareFileIndex.inferPartitioning`) is unable to convert the value.

The PR checks in the partition inference that values can be casted to Date and Timestamp, in order to infer that datatype to them.

## How was this patch tested?

added UT

Author: Marco Gaido <[email protected]>

Closes #20764 from gatorsmile/backport23436.
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 9, 2018

Thank you for merging, @HyukjinKwon .
Ping @mgaido91 . You can close this. :) Oops, it's @gatorsmile 's backporting PR.

@gatorsmile gatorsmile closed this Mar 9, 2018
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…an be casted to Date

This PR is to backport apache#20621 to branch 2.3�

---
## What changes were proposed in this pull request?

Before the patch, Spark could infer as Date a partition value which cannot be casted to Date (this can happen when there are extra characters after a valid date, like `2018-02-15AAA`).

When this happens and the input format has metadata which define the schema of the table, then `null` is returned as a value for the partition column, because the `cast` operator used in (`PartitioningAwareFileIndex.inferPartitioning`) is unable to convert the value.

The PR checks in the partition inference that values can be casted to Date and Timestamp, in order to infer that datatype to them.

## How was this patch tested?

added UT

Author: Marco Gaido <[email protected]>

Closes apache#20764 from gatorsmile/backport23436.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants