Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Dec 9, 2019

What changes were proposed in this pull request?

Currently, we parse interval from multi units strings or from date-time/year-month pattern strings, the former handles all whitespace, the latter not or even spaces.

Why are the changes needed?

behavior consistency

Does this PR introduce any user-facing change?

yes, interval in date-time/year-month like

select interval '\n-\t10\t 12:34:46.789\t' day to second
-- !query 126 schema
struct<INTERVAL '-10 days -12 hours -34 minutes -46.789 seconds':interval>
-- !query 126 output
-10 days -12 hours -34 minutes -46.789 seconds

is valid now.

How was this patch tested?

add ut.

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 9, 2019

cc @cloud-fan @maropu, thanks in advance.

@maropu
Copy link
Member

maropu commented Dec 9, 2019

also cc: @MaxGekk

@maropu
Copy link
Member

maropu commented Dec 9, 2019

Just in case, have you checked the changes of performance numbers in IntervalBenchmark? No change?

@MaxGekk
Copy link
Member

MaxGekk commented Dec 9, 2019

@yaooqinn Maybe I have missed something but where does the requirement of supporting \n, \r ( and anything behind of ) come from?

}

private val yearMonthPattern = "^([+|-])?(\\d+)-(\\d+)$".r
private val yearMonthPattern = "^([+|-])?(\\s+)?(\\d+)-(\\d+)$".r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just \\s*?

@MaxGekk
Copy link
Member

MaxGekk commented Dec 9, 2019

Before patching the buggy method, can we wait till this #26473 be merged?

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 9, 2019

Just in case, have you checked the changes of performance numbers in IntervalBenchmark? No change?

AFAIK,IntervalBenchmark only test multi unit interval strings

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 9, 2019

@yaooqinn Maybe I have missed something but where does the requirement of supporting \n, \r ( and anything behind of ) come from?

We have support dealing with ISO whitespaces for almost all internal types

@SparkQA
Copy link

SparkQA commented Dec 9, 2019

Test build #115035 has finished for PR 26815 at commit 534667b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

is this still valid?

@yaooqinn
Copy link
Member Author

yaooqinn commented Mar 9, 2020

I will check this fix as ASAP, thanks for reminding me.


// whitespaces
check("\t +5 12:40\t ", DAY, MINUTE, "5 days 12 hours 40 minutes")
checkFail("+5\t 12:40", DAY, MINUTE, "must match day-time format")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for multi-unit syntax, do we support 1 day \t 2 hours?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. My opinion is that the interval string in multi-unit syntax, we parse each value and each unit separately and they can be produced by the providers separately too so the whitespaces can be in the middle of each part. In year-month and day-time syntax, the string should be considered as a whole part so the whitespaces only appear on both ends, not in the middle.

take java.sql.Timestamp for an example, the timestamp string is treated as a whole

scala> java.sql.Timestamp.valueOf("2011-11-11 11:11:11.11\t")
res8: java.sql.Timestamp = 2011-11-11 11:11:11.11

scala> java.sql.Timestamp.valueOf("2011-11-11 \t 11:11:11.11\t")
java.lang.NumberFormatException: For input string: "	 11"
  at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
  at java.lang.Integer.parseInt(Integer.java:569)
  at java.lang.Integer.parseInt(Integer.java:615)
  at java.sql.Timestamp.valueOf(Timestamp.java:243)
  ... 28 elided

Also, we do the same in Spark

spark-sql> select timestamp '2013-01-02 \t 00:00:00';
Error in query:
Cannot parse the TIMESTAMP value: 2013-01-02 	 00:00:00(line 1, pos 7)

== SQL ==
select timestamp '2013-01-02 \t 00:00:00'
-------^^^

spark-sql> select timestamp '2013-01-02 00:00:00 \t';
2013-01-02 00:00:00

org.apache.spark.sql.catalyst.parser.ParseException

requirement failed: Interval string must match day-time format of '^(?<sign>[+|-])?(?<day>\d+) (?<hour>\d{1,2}):(?<minute>\d{1,2}):(?<second>(\d{1,2})(\.(\d{1,9}))?)$':
- 10 12:34:46.789 , set spark.sql.legacy.fromDayTimeString.enabled to true to restore the behavior before Spark 3.0.(line 1, pos 16)
Copy link
Contributor

@cloud-fan cloud-fan Mar 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but we should only ask users to set the legacy config if it works.

@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR: we should put the test queries that have different behaviors if ansi mode is on to a different file, and only test that file with ansi mode on and off.

@SparkQA
Copy link

SparkQA commented Mar 10, 2020

Test build #119612 has finished for PR 26815 at commit 4556ad4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 3bd6ebf Mar 10, 2020
cloud-fan pushed a commit that referenced this pull request Mar 10, 2020
…andle whitespaces

### What changes were proposed in this pull request?

Currently, we parse interval from multi units strings or from date-time/year-month pattern strings, the former handles all whitespace, the latter not or even spaces.

### Why are the changes needed?

behavior consistency

### Does this PR introduce any user-facing change?
yes, interval in date-time/year-month like
```
select interval '\n-\t10\t 12:34:46.789\t' day to second
-- !query 126 schema
struct<INTERVAL '-10 days -12 hours -34 minutes -46.789 seconds':interval>
-- !query 126 output
-10 days -12 hours -34 minutes -46.789 seconds
```
is valid now.

### How was this patch tested?

add ut.

Closes #26815 from yaooqinn/SPARK-30189.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 3bd6ebf)
Signed-off-by: Wenchen Fan <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…andle whitespaces

### What changes were proposed in this pull request?

Currently, we parse interval from multi units strings or from date-time/year-month pattern strings, the former handles all whitespace, the latter not or even spaces.

### Why are the changes needed?

behavior consistency

### Does this PR introduce any user-facing change?
yes, interval in date-time/year-month like
```
select interval '\n-\t10\t 12:34:46.789\t' day to second
-- !query 126 schema
struct<INTERVAL '-10 days -12 hours -34 minutes -46.789 seconds':interval>
-- !query 126 output
-10 days -12 hours -34 minutes -46.789 seconds
```
is valid now.

### How was this patch tested?

add ut.

Closes apache#26815 from yaooqinn/SPARK-30189.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants