Skip to content
Closed
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
c48a70d
Prepare for session local timezone support.
ueshin Dec 6, 2016
1d21fec
Make Cast TimeZoneAwareExpression.
ueshin Dec 6, 2016
0763c8f
Fix DateTimeUtilsSuite to follow changes.
ueshin Dec 6, 2016
449d93d
Make some datetime expressions TimeZoneAwareExpression.
ueshin Dec 8, 2016
b59d902
Fix compiler error in sql/core.
ueshin Dec 8, 2016
3ddfae4
Add constructors without zoneId to TimeZoneAwareExpressions for Funct…
ueshin Dec 9, 2016
f58f00d
Add DateTimeUtils.threadLocalLocalTimeZone to partition-reltated Cast.
ueshin Dec 13, 2016
8f2040b
Fix timezone for Hive timestamp string.
ueshin Dec 13, 2016
63c103c
Use defaultTimeZone instead of threadLocalLocalTimeZone.
ueshin Dec 13, 2016
7066850
Add TimeZone to DateFormats.
ueshin Dec 13, 2016
1aaca29
Make `CurrentBatchTimestamp` `TimeZoneAwareExpression`.
ueshin Dec 14, 2016
e5bb246
Add tests for date functions with session local timezone.
ueshin Dec 14, 2016
32cc391
Remove unused import and small cleanup.
ueshin Dec 16, 2016
f434378
Fix tests.
ueshin Dec 16, 2016
16fd1e4
Rename `zoneId` to `timeZoneId`.
ueshin Dec 19, 2016
009c17b
Use lazy val to avoid to keep creating a new timezone object (or doin…
ueshin Dec 19, 2016
a2936ed
Modify ComputeCurrentTime to hold the same date.
ueshin Dec 19, 2016
c5ca73e
Add comments.
ueshin Dec 19, 2016
b860379
Fix `Cast.needTimeZone()` to handle complex types.
ueshin Dec 19, 2016
6746265
Fix `Dataset.showString()` to use session local timezone.
ueshin Dec 19, 2016
4b6900c
Merge branch 'master' into issues/SPARK-18350
ueshin Dec 20, 2016
4f9cc40
Modify to analyze `ResolveTimeZone` only once.
ueshin Dec 24, 2016
2ca2413
Use session local timezone for Hive string.
ueshin Dec 24, 2016
c232854
Merge branch 'master' into issues/SPARK-18350
ueshin Dec 26, 2016
5b6dd4f
Merge branch 'master' into issues/SPARK-18350
ueshin Jan 5, 2017
1ca5808
Use `addReferenceMinorObj` to avoid adding member variables.
ueshin Jan 10, 2017
702dd81
Use Option[String] for timeZoneId.
ueshin Jan 10, 2017
33a3425
Update a comment.
ueshin Jan 10, 2017
5cc93e3
Fix overloaded constructors.
ueshin Jan 11, 2017
5521165
Fix session local timezone for timezone sensitive tests.
ueshin Jan 11, 2017
bd8275e
Remove `timeZoneResolved` and use `timeZoneId.isEmpty` instead in `Re…
ueshin Jan 11, 2017
183945c
Merge branch 'master' into issues/SPARK-18350
ueshin Jan 14, 2017
22a3b6e
Remove unused parameter.
ueshin Jan 16, 2017
30d51fa
Merge branch 'master' into issues/SPARK-18350
ueshin Jan 16, 2017
043ab52
Use Cast directly instead of dsl.
ueshin Jan 16, 2017
3ba5830
Merge branch 'master' into issues/SPARK-18350
ueshin Jan 22, 2017
9ab31f0
Revert unnecessary changes.
ueshin Jan 22, 2017
b954947
Use `@` binding to simplify pattern match.
ueshin Jan 22, 2017
dbb2604
Inline a `lazy val`.
ueshin Jan 22, 2017
186cd3e
Add some TODO comments for follow-up prs.
ueshin Jan 22, 2017
6631a69
Add a config document.
ueshin Jan 22, 2017
3610465
Use an overload version of `checkAnswer`.
ueshin Jan 22, 2017
c12e596
Fix CastSuite and add some comments to describe the tests.
ueshin Jan 22, 2017
8a04e80
Use None instead of null.
ueshin Jan 22, 2017
efe3aff
Add some comments to describe the tests.
ueshin Jan 22, 2017
cdbb266
Make TimeAdd/TimeSub/MonthsBetween TimeZoneAwareExpression.
ueshin Jan 22, 2017
328399a
Add comments to explain tests.
ueshin Jan 23, 2017
7352612
Modify a test.
ueshin Jan 23, 2017
b99cf79
Refine tests.
ueshin Jan 25, 2017
a85377f
Remove unnecessary new lines.
ueshin Jan 26, 2017
f0c911b
Add newDateFormat to DateTimeUtils and use it.
ueshin Jan 26, 2017
6fa1d6a
Parameterize some tests.
ueshin Jan 26, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

package org.apache.spark.sql.catalyst

import java.util.TimeZone

import org.apache.spark.sql.catalyst.analysis._

/**
Expand All @@ -36,6 +38,8 @@ trait CatalystConf {

def warehousePath: String

def sessionLocalTimeZone: String

/** If true, cartesian products between relations will be allowed for all
* join types(inner, (left|right|full) outer).
* If false, cartesian products will require explicit CROSS JOIN syntax.
Expand Down Expand Up @@ -68,5 +72,6 @@ case class SimpleCatalystConf(
runSQLonFile: Boolean = true,
crossJoinEnabled: Boolean = false,
cboEnabled: Boolean = false,
warehousePath: String = "/user/hive/warehouse")
warehousePath: String = "/user/hive/warehouse",
sessionLocalTimeZone: String = TimeZone.getDefault().getID)
extends CatalystConf
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@ class Analyzer(
HandleNullInputsForUDF),
Batch("FixNullability", Once,
FixNullability),
Batch("ResolveTimeZone", Once,
ResolveTimeZone),
Batch("Cleanup", fixedPoint,
CleanupAliases)
)
Expand Down Expand Up @@ -215,7 +217,7 @@ class Analyzer(
case ne: NamedExpression => ne
case e if !e.resolved => u
case g: Generator => MultiAlias(g, Nil)
case c @ Cast(ne: NamedExpression, _) => Alias(c, ne.name)()
case c @ Cast(ne: NamedExpression, _, _) => Alias(c, ne.name)()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we add a Cast.unapply that returns only the first two arguments, we can reduce a lot of the cast match changes. Not sure if it is worth it though.

case e: ExtractValue => Alias(e, toPrettySQL(e))()
case e if optGenAliasFunc.isDefined =>
Alias(child, optGenAliasFunc.get.apply(e))()
Expand Down Expand Up @@ -2304,6 +2306,18 @@ class Analyzer(
}
}
}

/**
* Replace [[TimeZoneAwareExpression]] without [[TimeZone]] by its copy with session local
* time zone.
*/
object ResolveTimeZone extends Rule[LogicalPlan] {

override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
case e: TimeZoneAwareExpression if e.timeZoneId.isEmpty =>
e.withTimeZone(conf.sessionLocalTimeZone)
}
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ import org.apache.spark.sql.catalyst.{FunctionIdentifier, InternalRow, TableIden
import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, Cast, Literal}
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.util.quoteIdentifier
import org.apache.spark.sql.catalyst.util.DateTimeUtils
import org.apache.spark.sql.types.StructType



/**
* A function defined in the catalog.
*
Expand Down Expand Up @@ -114,7 +114,9 @@ case class CatalogTablePartition(
*/
def toRow(partitionSchema: StructType): InternalRow = {
InternalRow.fromSeq(partitionSchema.map { field =>
Cast(Literal(spec(field.name)), field.dataType).eval()
// TODO: use correct timezone for partition values.
Cast(Literal(spec(field.name)), field.dataType,
Option(DateTimeUtils.defaultTimeZone().getID)).eval()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this a default value? e.g.

trait TimeZoneAwareExpression extends Expression {
  def timeZoneId: Option[String]

  @transient lazy val timeZone: TimeZone =
    TimeZone.getTimeZone(timeZoneId.getOrElse(DateTimeUtils.defaultTimeZone().getID))
 }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by doing this, we can reduce code changes to the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we don't want to have callers forgetting to specify

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @rxin.
We should explicitly specify the timezone id or use session local timezone resolved by the ResolveTimeZone rule.

})
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -131,15 +131,23 @@ object Cast {
private def resolvableNullability(from: Boolean, to: Boolean) = !from || to
}

/** Cast the child expression to the target data type. */
/**
* Cast the child expression to the target data type.
*
* When cast from/to timezone related types, we need timeZoneId, which will be resolved with
* session local timezone by an analyzer [[ResolveTimeZone]].
*/
@ExpressionDescription(
usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`.",
extended = """
Examples:
> SELECT _FUNC_('10' as int);
10
""")
case class Cast(child: Expression, dataType: DataType) extends UnaryExpression with NullIntolerant {
case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String] = None)
extends UnaryExpression with TimeZoneAwareExpression with NullIntolerant {

def this(child: Expression, dataType: DataType) = this(child, dataType, None)

override def toString: String = s"cast($child as ${dataType.simpleString})"

Expand All @@ -154,6 +162,9 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w

override def nullable: Boolean = Cast.forceNullable(child.dataType, dataType) || child.nullable

override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
copy(timeZoneId = Option(timeZoneId))

// [[func]] assumes the input is no longer null because eval already does the null check.
@inline private[this] def buildCast[T](a: Any, func: T => Any): Any = func(a.asInstanceOf[T])

Expand All @@ -162,7 +173,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes)
case DateType => buildCast[Int](_, d => UTF8String.fromString(DateTimeUtils.dateToString(d)))
case TimestampType => buildCast[Long](_,
t => UTF8String.fromString(DateTimeUtils.timestampToString(t)))
t => UTF8String.fromString(DateTimeUtils.timestampToString(t, timeZone)))
case _ => buildCast[Any](_, o => UTF8String.fromString(o.toString))
}

Expand Down Expand Up @@ -207,7 +218,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
// TimestampConverter
private[this] def castToTimestamp(from: DataType): Any => Any = from match {
case StringType =>
buildCast[UTF8String](_, utfs => DateTimeUtils.stringToTimestamp(utfs).orNull)
buildCast[UTF8String](_, utfs => DateTimeUtils.stringToTimestamp(utfs, timeZone).orNull)
case BooleanType =>
buildCast[Boolean](_, b => if (b) 1L else 0)
case LongType =>
Expand All @@ -219,7 +230,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
case ByteType =>
buildCast[Byte](_, b => longToTimestamp(b.toLong))
case DateType =>
buildCast[Int](_, d => DateTimeUtils.daysToMillis(d) * 1000)
buildCast[Int](_, d => DateTimeUtils.daysToMillis(d, timeZone) * 1000)
// TimestampWritable.decimalToTimestamp
case DecimalType() =>
buildCast[Decimal](_, d => decimalToTimestamp(d))
Expand Down Expand Up @@ -254,7 +265,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
case TimestampType =>
// throw valid precision more than seconds, according to Hive.
// Timestamp.nanos is in 0 to 999,999,999, no more than a second.
buildCast[Long](_, t => DateTimeUtils.millisToDays(t / 1000L))
buildCast[Long](_, t => DateTimeUtils.millisToDays(t / 1000L, timeZone))
}

// IntervalConverter
Expand Down Expand Up @@ -531,8 +542,9 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString(
org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));"""
case TimestampType =>
val tz = ctx.addReferenceMinorObj(timeZone)
(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString(
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c));"""
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));"""
case _ =>
(c, evPrim, evNull) => s"$evPrim = UTF8String.fromString(String.valueOf($c));"
}
Expand All @@ -558,8 +570,9 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
}
"""
case TimestampType =>
val tz = ctx.addReferenceMinorObj(timeZone)
(c, evPrim, evNull) =>
s"$evPrim = org.apache.spark.sql.catalyst.util.DateTimeUtils.millisToDays($c / 1000L);";
s"$evPrim = org.apache.spark.sql.catalyst.util.DateTimeUtils.millisToDays($c / 1000L, $tz);"
case _ =>
(c, evPrim, evNull) => s"$evNull = true;"
}
Expand Down Expand Up @@ -637,11 +650,12 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
from: DataType,
ctx: CodegenContext): CastFunction = from match {
case StringType =>
val tz = ctx.addReferenceMinorObj(timeZone)
val longOpt = ctx.freshName("longOpt")
(c, evPrim, evNull) =>
s"""
scala.Option<Long> $longOpt =
org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToTimestamp($c);
org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToTimestamp($c, $tz);
if ($longOpt.isDefined()) {
$evPrim = ((Long) $longOpt.get()).longValue();
} else {
Expand All @@ -653,8 +667,9 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
case _: IntegralType =>
(c, evPrim, evNull) => s"$evPrim = ${longToTimeStampCode(c)};"
case DateType =>
val tz = ctx.addReferenceMinorObj(timeZone)
(c, evPrim, evNull) =>
s"$evPrim = org.apache.spark.sql.catalyst.util.DateTimeUtils.daysToMillis($c) * 1000;"
s"$evPrim = org.apache.spark.sql.catalyst.util.DateTimeUtils.daysToMillis($c, $tz) * 1000;"
case DecimalType() =>
(c, evPrim, evNull) => s"$evPrim = ${decimalToTimestampCode(c)};"
case DoubleType =>
Expand Down
Loading