Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Feb 28, 2025

What changes were proposed in this pull request?

This PR proposes to use correct MySQL datetime functions when pushing down EXTRACT.

Why are the changes needed?

bug fix

Does this PR introduce any user-facing change?

Yes, query result is corrected, but this bug is not released yet.

How was this patch tested?

updated test

Was this patch authored or co-authored using generative AI tooling?

'No'.

@github-actions github-actions bot added the SQL label Feb 28, 2025
@beliefer beliefer force-pushed the SPARK-49488_followup branch 10 times, most recently from 3bca323 to 9a5d70c Compare March 2, 2025 07:52
@beliefer beliefer changed the title [WIP][SPARK-49488][SQL][FOLLOWUP] Use correct mysql datetime fields when pushing down EXTRACT [SPARK-49488][SQL][FOLLOWUP] Use correct mysql datetime fields when pushing down EXTRACT Mar 2, 2025
@beliefer beliefer changed the title [SPARK-49488][SQL][FOLLOWUP] Use correct mysql datetime fields when pushing down EXTRACT [SPARK-49488][SQL][FOLLOWUP] Use correct MySQL datetime functions when pushing down EXTRACT Mar 2, 2025
@beliefer
Copy link
Contributor Author

beliefer commented Mar 2, 2025

ping @cloud-fan

@beliefer beliefer force-pushed the SPARK-49488_followup branch from 9a5d70c to 7c9da46 Compare March 3, 2025 01:52
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is this a new correctness issue due to the original implementation of SPARK-49488, @beliefer ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, may I ask why this is changed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this PR is adding a new test coverage for YEAROFWEEK, it seems that we don't need to touch this line, doesn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change follows the change from #50101

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the test coverage of YEAR, but it was not tested before.

@beliefer
Copy link
Contributor Author

beliefer commented Mar 4, 2025

So, is this a new correctness issue due to the original implementation of SPARK-49488, @beliefer ?

Yes. The original PR brings the bug that the behavior of Spark not match MySQL very well.

@beliefer beliefer requested a review from dongjoon-hyun March 4, 2025 09:51
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it supported in all mysql versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is supported from MySQL 5.7 to MySQL 9.2(latest)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fragile, shall we reject such EXTRACT to be pushed down?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simply throw UnsupportedOperation to reject it. This is also how we use isSupportedFunction to reject predicate pushdown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we come up with 52?

Copy link
Contributor Author

@beliefer beliefer Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the behavior of MySQL.
The range is 1-53 if the mode is 3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment is outdated now.

@beliefer beliefer force-pushed the SPARK-49488_followup branch from 012a3cb to 7322837 Compare March 6, 2025 02:10
@cloud-fan
Copy link
Contributor

thanks, merging to master/4.0!

@cloud-fan cloud-fan closed this in 7b39d24 Mar 6, 2025
cloud-fan pushed a commit that referenced this pull request Mar 6, 2025
…n pushing down EXTRACT

### What changes were proposed in this pull request?
This PR proposes to use correct MySQL datetime functions when pushing down `EXTRACT`.

### Why are the changes needed?
bug fix

### Does this PR introduce _any_ user-facing change?
Yes, query result is corrected, but this bug is not released yet.

### How was this patch tested?
updated test

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #50112 from beliefer/SPARK-49488_followup.

Authored-by: beliefer <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 7b39d24)
Signed-off-by: Wenchen Fan <[email protected]>
@beliefer
Copy link
Contributor Author

beliefer commented Mar 6, 2025

@cloud-fan @dongjoon-hyun Thank you!

case "DAY_OF_WEEK" => s"(WEEKDAY($source) + 1)"
case _ => super.visitExtract(field, source)
case "DAY_OF_WEEK" => s"(WEEKDAY(${build(extract.source())}) + 1)"
// SECOND, MINUTE, HOUR, DAY, MONTH, QUARTER, YEAR are identical on MySQL and Spark for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract second returns integer on MySQL, while spark returns decimal(8,6), so it cannot be pushed down because of loosing decimal part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…n pushing down EXTRACT

### What changes were proposed in this pull request?
This PR proposes to use correct MySQL datetime functions when pushing down `EXTRACT`.

### Why are the changes needed?
bug fix

### Does this PR introduce _any_ user-facing change?
Yes, query result is corrected, but this bug is not released yet.

### How was this patch tested?
updated test

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes apache#50112 from beliefer/SPARK-49488_followup.

Authored-by: beliefer <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit c7d57d7)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants