Skip to content
Closed
Original file line number Diff line number Diff line change
Expand Up @@ -409,9 +409,12 @@ object IntervalUtils {
}

private object ParseState extends Enumeration {
type ParseState = Value

val PREFIX,
BEGIN_VALUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can rename it to TRIM_BEFORE_PARSE_VALUE

Copy link
Member Author

@yaooqinn yaooqinn Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we may change PARSE_UNIT_VALUE to PARSE_VALUE too

How about renaming them all as,

    val PREFIX,
        BODY,
        SIGN,
        TRIM_BEFORE_VALUE,
        VALUE,
        VALUE_FRACTIONAL_PART,
        TRIM_BEFORE_UNIT,
        UNIT_BEGIN,
        UNIT_SUFFIX,
        UNIT_END = Value

Seems loud and clear enough for each state

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does BODY mean? others LGTM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interval (PREFIX, BODY)
BODY (SIGN, VALUE , UNIT)+

Seems not persuadable enough

Copy link
Member Author

@yaooqinn yaooqinn Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or NEXT_VALUE_UNIT? NEXT_VALUE_UNIT_PAIR

PARSE_SIGN,
TRIM_VALUE,
Copy link
Contributor

@cloud-fan cloud-fan Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the name clearer? e.g. TRIM_BEFORE_PARSE_UNIT_VALUE

PARSE_UNIT_VALUE,
FRACTIONAL_PART,
BEGIN_UNIT_NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In BEGIN_UNIT_NAME we also trim the spaces. Can we have a consistent way to do it? Now there are 2 ways:

  1. have an intermedia state to trim the spaces
  2. trim the spaces at the beginning of a state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a TRIM_UNIT state to unify these 2 ways

Expand Down Expand Up @@ -439,7 +442,7 @@ object IntervalUtils {
val s = input.trim.toLowerCase
// scalastyle:on
val bytes = s.getBytes
if (bytes.length == 0) {
if (bytes.isEmpty) {
return null
}
var state = PREFIX
Expand All @@ -452,6 +455,13 @@ object IntervalUtils {
var fractionScale: Int = 0
var fraction: Int = 0

def trimToNextState(b: Byte, next: ParseState): Unit = {
b match {
case ' ' => i += 1
case _ => state = next
}
}

while (i < bytes.length) {
val b = bytes(i)
state match {
Expand All @@ -464,11 +474,7 @@ object IntervalUtils {
}
}
state = BEGIN_VALUE
case BEGIN_VALUE =>
b match {
case ' ' => i += 1
case _ => state = PARSE_SIGN
}
case BEGIN_VALUE => trimToNextState(b, PARSE_SIGN)
case PARSE_SIGN =>
b match {
case '-' =>
Expand All @@ -486,7 +492,8 @@ object IntervalUtils {
// Sets the scale to an invalid value to track fraction presence
// in the BEGIN_UNIT_NAME state
fractionScale = -1
state = PARSE_UNIT_VALUE
state = TRIM_VALUE
case TRIM_VALUE => trimToNextState(b, PARSE_UNIT_VALUE)
case PARSE_UNIT_VALUE =>
b match {
case _ if '0' <= b && b <= '9' =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ class IntervalUtilsSuite extends SparkFunSuite {
"-1 MONTH 1 day -1 microseconds" -> new CalendarInterval(-1, 1, -1),
" 123 MONTHS 123 DAYS 123 Microsecond " -> new CalendarInterval(123, 123, 123),
"interval -1 day +3 Microseconds" -> new CalendarInterval(0, -1, 3),
"interval - 1 day + 3 Microseconds" -> new CalendarInterval(0, -1, 3),
" interval 8 years -11 months 123 weeks -1 day " +
"23 hours -22 minutes 1 second -123 millisecond 567 microseconds " ->
new CalendarInterval(85, 860, 81480877567L)).foreach { case (input, expected) =>
Expand Down
48 changes: 26 additions & 22 deletions sql/core/benchmarks/IntervalBenchmark-jdk11-results.txt
Original file line number Diff line number Diff line change
@@ -1,25 +1,29 @@
OpenJDK 64-Bit Server VM 11.0.2+9 on Mac OS X 10.15.1
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.14.6
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
cast strings to intervals: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
prepare string w/ interval 442 472 41 2.3 442.4 1.0X
prepare string w/o interval 420 423 6 2.4 419.6 1.1X
1 units w/ interval 350 359 9 2.9 349.8 1.3X
1 units w/o interval 316 317 1 3.2 316.4 1.4X
2 units w/ interval 457 459 2 2.2 457.0 1.0X
2 units w/o interval 432 435 3 2.3 432.2 1.0X
3 units w/ interval 610 613 3 1.6 609.8 0.7X
3 units w/o interval 581 583 2 1.7 580.5 0.8X
4 units w/ interval 720 724 4 1.4 720.4 0.6X
4 units w/o interval 699 704 8 1.4 699.4 0.6X
5 units w/ interval 850 850 0 1.2 849.9 0.5X
5 units w/o interval 829 832 5 1.2 828.7 0.5X
6 units w/ interval 927 932 4 1.1 927.1 0.5X
6 units w/o interval 891 892 1 1.1 890.5 0.5X
7 units w/ interval 1033 1040 8 1.0 1033.2 0.4X
7 units w/o interval 1020 1024 5 1.0 1020.2 0.4X
8 units w/ interval 1168 1169 2 0.9 1168.0 0.4X
8 units w/o interval 1155 1157 2 0.9 1154.5 0.4X
9 units w/ interval 1326 1328 3 0.8 1326.1 0.3X
9 units w/o interval 1372 1381 14 0.7 1372.5 0.3X
prepare string w/ interval 574 610 45 1.7 573.9 1.0X
prepare string w/o interval 518 538 27 1.9 517.7 1.1X
1 units w/ interval 425 439 16 2.4 425.3 1.3X
1 units w/o interval 385 393 10 2.6 385.2 1.5X
2 units w/ interval 553 561 11 1.8 553.1 1.0X
2 units w/o interval 531 543 11 1.9 531.0 1.1X
3 units w/ interval 1134 1159 32 0.9 1134.0 0.5X
3 units w/o interval 1121 1126 6 0.9 1121.3 0.5X
4 units w/ interval 1226 1250 21 0.8 1226.1 0.5X
4 units w/o interval 1227 1239 11 0.8 1227.1 0.5X
5 units w/ interval 1375 1447 93 0.7 1374.7 0.4X
5 units w/o interval 1335 1346 19 0.7 1335.1 0.4X
6 units w/ interval 1530 1556 24 0.7 1529.5 0.4X
6 units w/o interval 1481 1492 17 0.7 1480.7 0.4X
7 units w/ interval 1730 1745 14 0.6 1729.9 0.3X
7 units w/o interval 1788 1859 112 0.6 1788.1 0.3X
8 units w/ interval 1952 2087 117 0.5 1951.7 0.3X
8 units w/o interval 2083 2207 209 0.5 2082.5 0.3X
9 units w/ interval 2228 2291 60 0.4 2227.5 0.3X
9 units w/o interval 2130 2184 75 0.5 2130.1 0.3X
10 units w/ interval 2414 2502 81 0.4 2413.8 0.2X
10 units w/o interval 2463 2488 35 0.4 2463.1 0.2X
11 units w/ interval 2717 2755 42 0.4 2716.8 0.2X
11 units w/o interval 2578 2661 77 0.4 2577.7 0.2X

48 changes: 26 additions & 22 deletions sql/core/benchmarks/IntervalBenchmark-results.txt
Original file line number Diff line number Diff line change
@@ -1,25 +1,29 @@
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.1
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.14.6
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
cast strings to intervals: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
prepare string w/ interval 422 437 16 2.4 421.8 1.0X
prepare string w/o interval 369 374 8 2.7 369.4 1.1X
1 units w/ interval 426 430 5 2.3 425.5 1.0X
1 units w/o interval 382 386 5 2.6 382.1 1.1X
2 units w/ interval 519 527 9 1.9 518.5 0.8X
2 units w/o interval 505 512 6 2.0 505.4 0.8X
3 units w/ interval 650 653 3 1.5 649.6 0.6X
3 units w/o interval 630 633 4 1.6 629.7 0.7X
4 units w/ interval 755 761 6 1.3 754.9 0.6X
4 units w/o interval 745 749 3 1.3 745.3 0.6X
5 units w/ interval 882 891 14 1.1 882.0 0.5X
5 units w/o interval 867 870 3 1.2 867.4 0.5X
6 units w/ interval 1008 1013 4 1.0 1008.2 0.4X
6 units w/o interval 990 995 5 1.0 990.4 0.4X
7 units w/ interval 1057 1063 6 0.9 1056.9 0.4X
7 units w/o interval 1042 1046 4 1.0 1042.3 0.4X
8 units w/ interval 1206 1208 2 0.8 1206.0 0.3X
8 units w/o interval 1194 1198 4 0.8 1194.1 0.4X
9 units w/ interval 1322 1324 3 0.8 1321.5 0.3X
9 units w/o interval 1314 1318 4 0.8 1313.6 0.3X
prepare string w/ interval 531 566 34 1.9 530.5 1.0X
prepare string w/o interval 466 479 21 2.1 466.5 1.1X
1 units w/ interval 475 521 63 2.1 475.0 1.1X
1 units w/o interval 440 457 25 2.3 440.1 1.2X
2 units w/ interval 614 621 11 1.6 613.7 0.9X
2 units w/o interval 596 605 8 1.7 596.5 0.9X
3 units w/ interval 1115 1120 4 0.9 1115.0 0.5X
3 units w/o interval 1100 1107 6 0.9 1100.2 0.5X
4 units w/ interval 1255 1263 9 0.8 1255.1 0.4X
4 units w/o interval 1254 1393 130 0.8 1253.8 0.4X
5 units w/ interval 1367 1373 5 0.7 1367.2 0.4X
5 units w/o interval 1366 1376 9 0.7 1366.2 0.4X
6 units w/ interval 1526 1530 6 0.7 1526.0 0.3X
6 units w/o interval 1504 1510 7 0.7 1504.0 0.4X
7 units w/ interval 1748 1778 27 0.6 1748.0 0.3X
7 units w/o interval 1740 1744 5 0.6 1740.0 0.3X
8 units w/ interval 2092 2107 14 0.5 2092.5 0.3X
8 units w/o interval 2094 2098 5 0.5 2094.4 0.3X
9 units w/ interval 1874 1880 5 0.5 1873.9 0.3X
9 units w/o interval 1867 1872 4 0.5 1867.3 0.3X
10 units w/ interval 2127 2134 13 0.5 2126.5 0.2X
10 units w/o interval 2045 2049 6 0.5 2045.0 0.3X
11 units w/ interval 2242 2254 13 0.4 2241.9 0.2X
11 units w/o interval 2221 2227 6 0.5 2221.1 0.2X

9 changes: 9 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/interval.sql
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,12 @@ select max(cast(v as interval)) from VALUES ('1 seconds'), ('4 seconds'), ('3 se

-- min
select min(cast(v as interval)) from VALUES ('1 seconds'), ('4 seconds'), ('3 seconds') t(v);

-- SPARK-29605: cast string to intervals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be SPARK-29822 because this is newly added test coverage for this PR SPARK-29822.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I just rm it from here, thanks

select cast(v as interval) from values ('1 second') t(v);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: select cast('1 second' as interval)

select cast(v as interval) from values ('+1 second') t(v);
select cast(v as interval) from values ('-1 second') t(v);
select cast(v as interval) from values ('+ 1 second') t(v);
select cast(v as interval) from values ('- 1 second') t(v);
select cast(v as interval) from values ('- -1 second') t(v);
select cast(v as interval) from values ('- +1 second') t(v);
58 changes: 57 additions & 1 deletion sql/core/src/test/resources/sql-tests/results/interval.sql.out
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 22
-- Number of queries: 29


-- !query 0
Expand Down Expand Up @@ -178,3 +178,59 @@ select min(cast(v as interval)) from VALUES ('1 seconds'), ('4 seconds'), ('3 se
struct<min(CAST(v AS INTERVAL)):interval>
-- !query 21 output
1 seconds


-- !query 22
select cast(v as interval) from values ('1 second') t(v)
-- !query 22 schema
struct<v:interval>
-- !query 22 output
1 seconds


-- !query 23
select cast(v as interval) from values ('+1 second') t(v)
-- !query 23 schema
struct<v:interval>
-- !query 23 output
1 seconds


-- !query 24
select cast(v as interval) from values ('-1 second') t(v)
-- !query 24 schema
struct<v:interval>
-- !query 24 output
-1 seconds


-- !query 25
select cast(v as interval) from values ('+ 1 second') t(v)
-- !query 25 schema
struct<v:interval>
-- !query 25 output
1 seconds


-- !query 26
select cast(v as interval) from values ('- 1 second') t(v)
-- !query 26 schema
struct<v:interval>
-- !query 26 output
-1 seconds


-- !query 27
select cast(v as interval) from values ('- -1 second') t(v)
-- !query 27 schema
struct<v:interval>
-- !query 27 output
NULL


-- !query 28
select cast(v as interval) from values ('- +1 second') t(v)
-- !query 28 schema
struct<v:interval>
-- !query 28 output
NULL
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,8 @@ object IntervalBenchmark extends SqlBasedBenchmark {
override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
val N = 1000000
val timeUnits = Seq(
"13 months", "100 weeks", "9 days", "12 hours",
"13 months", " 1 months",
"100 weeks", "9 days", "12 hours", "- 3 hours",
"5 minutes", "45 seconds", "123 milliseconds", "567 microseconds")
val intervalToTest = ListBuffer[String]()

Expand Down