Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ license: |

### UDFs and Built-in Functions

- Since Spark 3.0, the `date_add` and `date_sub` functions only accepts int, smallint, tinyint as the 2nd argument, fractional and string types are not valid anymore, e.g. `date_add(cast('1964-05-23' as date), '12.34')` will cause `AnalysisException`. In Spark version 2.4 and earlier, if the 2nd argument is fractional or string value, it will be coerced to int value, and the result will be a date value of `1964-06-04`.
- Since Spark 3.0, the `date_add` and `date_sub` functions only accepts int, smallint, tinyint as the 2nd argument, fractional and string types are not valid anymore, e.g. `date_add(cast('1964-05-23' as date), '12.34')` will cause `AnalysisException`. Note that, string literals are still allowed, but Spark will throw Analysis Exception if the string is not a valid integer. In Spark version 2.4 and earlier, if the 2nd argument is fractional or string value, it will be coerced to int value, and the result will be a date value of `1964-06-04`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepts -> accept; fractional and string types are not valid anymore seems to be inconsistent with string literals are still allowed


- Since Spark 3.0, the function `percentile_approx` and its alias `approx_percentile` only accept integral value with range in `[1, 2147483647]` as its 3rd argument `accuracy`, fractional and string types are disallowed, e.g. `percentile_approx(10.0, 0.2, 1.8D)` will cause `AnalysisException`. In Spark version 2.4 and earlier, if `accuracy` is fractional or string value, it will be coerced to an int value, `percentile_approx(10.0, 0.2, 1.8D)` is operated as `percentile_approx(10.0, 0.2, 1)` which results in `10.0`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import scala.annotation.tailrec
import scala.collection.mutable

import org.apache.spark.internal.Logging
import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.expressions.aggregate._
import org.apache.spark.sql.catalyst.plans.logical._
Expand Down Expand Up @@ -63,6 +64,7 @@ object TypeCoercion {
ImplicitTypeCasts ::
DateTimeOperations ::
WindowFrameCoercion ::
StringLiteralCoercion ::
Nil

// See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types.
Expand Down Expand Up @@ -1043,6 +1045,34 @@ object TypeCoercion {
}
}
}

/**
* A special rule to support string literal as the second argument of date_add/date_sub functions,
* to keep backward compatibility as a temporary workaround.
* TODO: revisit the type coercion rules for string.
Copy link
Member

@maropu maropu Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about filing a jira for this TODO then leave a jira ID here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we started the ANSI type coercion (you have a design doc right?). Did we create a JIRA ticket at that time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I wrote the design doc though, I didn't file a jira at that time. Probably, @gengliangwang migith have done so?

*/
object StringLiteralCoercion extends TypeCoercionRule {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a behavior difference in arithmatic operations, too. Could you describe the following change in the PR description? New one looks reasonable to me.

2.4.5 and 3.0.0-preview2

scala> sql("select (cast('2020-03-28' AS DATE) + '1')").show
org.apache.spark.sql.AnalysisException: cannot resolve '(CAST('2020-03-28' AS DATE) + CAST('1' AS DOUBLE))' due to data type mismatch: differing types in '(CAST('2020-03-28' AS DATE) + CAST('1' AS DOUBLE))' (date and double).; line 1 pos 8;

This PR.

scala> sql("select (cast('2020-03-28' AS DATE) + '1')").show
+-------------------------------------+
|date_add(CAST(2020-03-28 AS DATE), 1)|
+-------------------------------------+
|                           2020-03-29|
+-------------------------------------+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to write a function name in the second query above?

scala> sql("select (cast('2020-03-28' AS DATE) + '1')").show
                 ^^^^

Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No~ The both queries are the same. What I meant was it's the behavior of this PR; this PR extends expressions, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.

override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
// Skip nodes who's children have not been resolved yet.
case e if !e.childrenResolved => e
case DateAdd(l, r) if r.dataType == StringType && r.foldable =>
val days = try {
AnsiCast(r, IntegerType).eval().asInstanceOf[Int]
} catch {
case e: NumberFormatException => throw new AnalysisException(
"The second argument of 'date_add' function needs to be an integer.", cause = Some(e))
}
DateAdd(l, Literal(days))
case DateSub(l, r) if r.dataType == StringType && r.foldable =>
val days = try {
AnsiCast(r, IntegerType).eval().asInstanceOf[Int]
} catch {
case e: NumberFormatException => throw new AnalysisException(
"The second argument of 'date_sub' function needs to be an integer.", cause = Some(e))
}
DateSub(l, Literal(days))
}
}
}

trait TypeCoercionRule extends Rule[LogicalPlan] with Logging {
Expand Down
8 changes: 8 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/datetime.sql
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,12 @@ select date_add('2011-11-11', 1L);
select date_add('2011-11-11', 1.0);
select date_add('2011-11-11', 1E1);
select date_add('2011-11-11', '1');
select date_add('2011-11-11', '1.2');
select date_add(date'2011-11-11', 1);
select date_add(timestamp'2011-11-11', 1);
select date_sub(date'2011-11-11', 1);
select date_sub(date'2011-11-11', '1');
select date_sub(date'2011-11-11', '1.2');
select date_sub(timestamp'2011-11-11', 1);
select date_sub(null, 1);
select date_sub(date'2011-11-11', null);
Expand All @@ -72,6 +75,11 @@ select date '2001-10-01' - 7;
select date '2001-09-28' + null;
select date '2001-09-28' - null;

-- date add/sub with non-literal string column
create temp view v as select '1' str;
select date_add('2011-11-11', str) from v;
select date_sub('2011-11-11', str) from v;

-- subtract dates
select null - date '2019-10-06';
select date '2001-10-01' - date '2001-09-28';
Expand Down
55 changes: 53 additions & 2 deletions sql/core/src/test/resources/sql-tests/results/datetime.sql.out
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 77
-- Number of queries: 83


-- !query
Expand Down Expand Up @@ -266,10 +266,18 @@ cannot resolve 'date_add(CAST('2011-11-11' AS DATE), 10.0D)' due to data type mi
-- !query
select date_add('2011-11-11', '1')
-- !query schema
struct<date_add(CAST(2011-11-11 AS DATE), 1):date>
-- !query output
2011-11-12


-- !query
select date_add('2011-11-11', '1.2')
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'date_add(CAST('2011-11-11' AS DATE), '1')' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, ''1'' is of string type.; line 1 pos 7
The second argument of 'date_add' function needs to be an integer.;


-- !query
Expand All @@ -296,6 +304,23 @@ struct<date_sub(DATE '2011-11-11', 1):date>
2011-11-10


-- !query
select date_sub(date'2011-11-11', '1')
-- !query schema
struct<date_sub(DATE '2011-11-11', 1):date>
-- !query output
2011-11-10


-- !query
select date_sub(date'2011-11-11', '1.2')
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
The second argument of 'date_sub' function needs to be an integer.;


-- !query
select date_sub(timestamp'2011-11-11', 1)
-- !query schema
Expand Down Expand Up @@ -377,6 +402,32 @@ struct<date_sub(DATE '2001-09-28', CAST(NULL AS INT)):date>
NULL


-- !query
create temp view v as select '1' str
-- !query schema
struct<>
-- !query output



-- !query
select date_add('2011-11-11', str) from v
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'date_add(CAST('2011-11-11' AS DATE), v.`str`)' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'v.`str`' is of string type.; line 1 pos 7


-- !query
select date_sub('2011-11-11', str) from v
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'date_sub(CAST('2011-11-11' AS DATE), v.`str`)' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, 'v.`str`' is of string type.; line 1 pos 7


-- !query
select null - date '2019-10-06'
-- !query schema
Expand Down