[SPARK-29712][SQL] Take into account the left bound in `fromDayTimeString()` #26358

MaxGekk · 2019-11-01T11:12:31Z

What changes were proposed in this pull request?

In the PR, I propose to check the left bound - the from interval unit, and reset all unit values out of the specified range of interval units. Currently, IntervalUtils.fromDayTimeString() takes into account only the right bound but not the left one.

Note: the reset is not performed if spark.sql.dialect = PostgreSQL.

Why are the changes needed?

This fix makes fromDayTimeString() consistent in bound handling.

Does this PR introduce any user-facing change?

Yes, before:

spark-sql> SELECT interval '1 2:03:04' hour to minute;
interval 1 days 2 hours 3 minutes

After:

spark-sql> SELECT interval '1 2:03:04' hour to minute;
interval 2 hours 3 minutes

How was this patch tested?

Added new tests to IntervalUtilsSuite
Fixed expected results for literals.sql and interval.sql

SparkQA · 2019-11-01T15:12:58Z

Test build #113085 has finished for PR 26358 at commit d7bc01c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-02T12:06:00Z

Test build #113125 has finished for PR 26358 at commit e860772.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2019-11-03T09:01:04Z

jenkins, retest this, please

SparkQA · 2019-11-03T12:01:25Z

Test build #113158 has finished for PR 26358 at commit e860772.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2019-11-03T16:25:11Z

@srowen @dongjoon-hyun @cloud-fan Please, have a look at this.

cloud-fan · 2019-11-04T08:05:55Z

the reset is not performed if spark.sql.dialect = PostgreSQL.

Do you mean pgsql has a different behavior?

MaxGekk · 2019-11-04T08:35:57Z

Do you mean pgsql has a different behavior?

It ignores from but takes into account to. Not sure it is a bug or a feature.

maxim=# select interval '5 12:40:30.999999999' hour to minute;
    interval
-----------------
 5 days 12:40:00
(1 row)

…tring

cloud-fan · 2019-11-04T09:20:37Z

Does SQL spec define the behavior? I tried oracle and it fails if the from/to field doesn't match the actual value.

SparkQA · 2019-11-04T12:31:32Z

Test build #113197 has finished for PR 26358 at commit d40e3af.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class GangliaReporter extends ScheduledReporter
public static class Builder
case class AlterTableDropPartitionStatement(

srowen · 2019-11-04T15:27:57Z

Pardon the dumb question, but what is the 'hour to minute' supposed to mean? select just the hours and minutes out of the interval? yeah, the pgsql implementation doesn't seem to do that, whether a bug or by design. Just trying to understand what we think the semantics are, and whether they need to be different in Spark.

MaxGekk · 2019-11-04T15:54:51Z

what is the 'hour to minute' supposed to mean? select just the hours and minutes out of the interval?

Correct, it means to extract an interval with hour:minute from a string.

MaxGekk · 2019-11-04T16:02:11Z

I haven't found precise description of acceptable input string but it seems the format should be defined by the interval qualifier. For example, if it is HOUR to MINUTE, the input string must be in the format hh21:mi but not any string as we suppose now:

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala

Lines 182 to 183 in a4382f7

    
           private val dayTimePattern = 
        
             "^([+|-])?((\\d+) )?((\\d+):)?(\\d+):(\\d+)(\\.(\\d+))?$".r

srowen · 2019-11-04T16:06:02Z

Hm, none of those examples show 'extracting' a subset of the interval. It seems like a way to specify what the unit-less string means. It seems like the pgsql you have above ought to fail if so, as the string doesn't match the units, yeah.

It seems like the current behavior works like pgsql? if so, is that maybe an OK stance on the semantics in these 'incorrect' cases?

MaxGekk · 2019-11-04T16:17:16Z

if so, is that maybe an OK stance on the semantics in these 'incorrect' cases?

I am not sure that fully following other dbms in bugs and features is right decision. From the user point of view, if I specify hour to minute, I would expect only hours and minutes in the result column but not seconds and days, or something else.

cloud-fan · 2019-11-04T16:42:28Z

Since this is only for interval literal, I think it doesn't hurt if we fail invalid string format. The oracle behavior looks more reasonable to me, we can try more to see if that's a common behavior.

MaxGekk · 2019-11-08T18:38:51Z

For example, MySQL is strict as proposed in the PR:

mysql> SELECT DATE_SUB("1998-01-01 00:00:00", INTERVAL "1 1:1" HOUR_MINUTE);
+---------------------------------------------------------------+
| DATE_SUB("1998-01-01 00:00:00", INTERVAL "1 1:1" HOUR_MINUTE) |
+---------------------------------------------------------------+
| NULL                                                          |
+---------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> SELECT DATE_SUB("1998-01-01 00:00:00", INTERVAL "1:1" HOUR_MINUTE);
+-------------------------------------------------------------+
| DATE_SUB("1998-01-01 00:00:00", INTERVAL "1:1" HOUR_MINUTE) |
+-------------------------------------------------------------+
| 1997-12-31 22:59:00                                         |
+-------------------------------------------------------------+
1 row in set (0.00 sec)

@cloud-fan @srowen What can I do to continue with the PR except of resolving the conflicts.

srowen · 2019-11-08T19:20:28Z

OK, so it would be reasonable to fail / return null, to match Oracle? There doesn't seem to be consistent behavior out there. I guess I prefer to fail this if it doesn't make sense semantically.

…y-time-string # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala # sql/core/src/test/resources/sql-tests/results/literals.sql.out

SparkQA · 2019-11-08T23:26:26Z

Test build #113472 has finished for PR 26358 at commit a78dcc3.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class DateTimeConstants
public final class CalendarInterval implements Serializable, Comparable<CalendarInterval>
.doc(\"Comma-separated list of class names implementing \" +
sealed abstract class PluginContainer
class PluginMetricsSource(
case class PluginMessage(pluginName: String, message: AnyRef)
abstract class IntervalNumOperation(
case class MultiplyInterval(interval: Expression, num: Expression)
case class DivideInterval(interval: Expression, num: Expression)
case class AlterTableAddPartitionStatement(
case class AlterTableSerDePropertiesStatement(
case class ShowCurrentNamespaceStatement() extends ParsedStatement
case class ShowCurrentNamespace(catalogManager: CatalogManager) extends Command
case class LocalShuffleReaderExec(child: SparkPlan) extends UnaryExecNode
case class ShowCurrentNamespaceExec(
class ContinuousRecordEndpoint(buckets: Seq[Seq[UnsafeRow]], lock: Object)

cloud-fan · 2019-11-11T07:16:06Z

so let's keep the existing behavior if pgsql dialect is set, and fail for invalid bounds otherwise.

…y-time-string # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala

cloud-fan · 2019-11-11T08:25:47Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala

+      }
+    }
+
+    withSQLConf(SQLConf.DIALECT.key -> "Spark") {


can we have a test case to show the error behavior?

Just to be clear, this PR just truncates results according to specified bound. In particular, it takes into account the left bound. Could you explain, please, what do you mean by saying error behavior?

I think the conclusion is to fail for invalid bounds?

Do you mean if a string doesn't match to specified bounds like in literals.sql: https://github.com/apache/spark/pull/26358/files#diff-4f9e28af8e9fcb40a8a99b4e49f3b9b2R424 ?

yea, like what Oracle does.

Here is the PR with strict parsing #26473

SparkQA · 2019-11-11T11:51:25Z

Test build #113576 has finished for PR 26358 at commit 296de1f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk added 4 commits November 1, 2019 13:09

Add tests

3b2bd48

Fix

34ebb1b

Regenerate literals.sql.out

4edf311

Regenerate interval.sql.out

d7bc01c

MaxGekk added 2 commits November 2, 2019 11:23

Put checking under the dialect config

a5e60f6

Regenerate interval.sql.out

e860772

dongjoon-hyun added the SQL label Nov 3, 2019

Merge remote-tracking branch 'origin/master' into fix-from-day-time-s…

d40e3af

…tring

MaxGekk added 2 commits November 11, 2019 10:40

Merge remote-tracking branch 'remotes/origin/master' into fix-from-da…

003f1b4

…y-time-string # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala

Merge remote-tracking branch 'remotes/origin/master' into fix-from-da…

296de1f

…y-time-string # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala

cloud-fan reviewed Nov 11, 2019

View reviewed changes

MaxGekk mentioned this pull request Nov 11, 2019

[SPARK-29864][SPARK-29920][SQL] Strict parsing of day-time strings to intervals #26473

Closed

cloud-fan closed this in e933539 Dec 11, 2019

MaxGekk deleted the fix-from-day-time-string branch June 5, 2020 19:41

[SPARK-29712][SQL] Take into account the left bound in fromDayTimeString() #26358

[SPARK-29712][SQL] Take into account the left bound in fromDayTimeString() #26358

Uh oh!

Conversation

MaxGekk commented Nov 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Nov 1, 2019

Uh oh!

SparkQA commented Nov 2, 2019

Uh oh!

MaxGekk commented Nov 3, 2019

Uh oh!

SparkQA commented Nov 3, 2019

Uh oh!

MaxGekk commented Nov 3, 2019

Uh oh!

cloud-fan commented Nov 4, 2019

Uh oh!

MaxGekk commented Nov 4, 2019

Uh oh!

cloud-fan commented Nov 4, 2019

Uh oh!

SparkQA commented Nov 4, 2019

Uh oh!

srowen commented Nov 4, 2019

Uh oh!

MaxGekk commented Nov 4, 2019

Uh oh!

MaxGekk commented Nov 4, 2019

Uh oh!

srowen commented Nov 4, 2019

Uh oh!

MaxGekk commented Nov 4, 2019

Uh oh!

cloud-fan commented Nov 4, 2019

Uh oh!

MaxGekk commented Nov 8, 2019

Uh oh!

srowen commented Nov 8, 2019

Uh oh!

SparkQA commented Nov 8, 2019

Uh oh!

cloud-fan commented Nov 11, 2019

Uh oh!

cloud-fan Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

MaxGekk Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

MaxGekk Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

MaxGekk Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 11, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-29712][SQL] Take into account the left bound in `fromDayTimeString()` #26358

[SPARK-29712][SQL] Take into account the left bound in `fromDayTimeString()` #26358

MaxGekk commented Nov 1, 2019 •

edited

Loading