Skip to content
Closed
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
c48a70d
Prepare for session local timezone support.
ueshin Dec 6, 2016
1d21fec
Make Cast TimeZoneAwareExpression.
ueshin Dec 6, 2016
0763c8f
Fix DateTimeUtilsSuite to follow changes.
ueshin Dec 6, 2016
449d93d
Make some datetime expressions TimeZoneAwareExpression.
ueshin Dec 8, 2016
b59d902
Fix compiler error in sql/core.
ueshin Dec 8, 2016
3ddfae4
Add constructors without zoneId to TimeZoneAwareExpressions for Funct…
ueshin Dec 9, 2016
f58f00d
Add DateTimeUtils.threadLocalLocalTimeZone to partition-reltated Cast.
ueshin Dec 13, 2016
8f2040b
Fix timezone for Hive timestamp string.
ueshin Dec 13, 2016
63c103c
Use defaultTimeZone instead of threadLocalLocalTimeZone.
ueshin Dec 13, 2016
7066850
Add TimeZone to DateFormats.
ueshin Dec 13, 2016
1aaca29
Make `CurrentBatchTimestamp` `TimeZoneAwareExpression`.
ueshin Dec 14, 2016
e5bb246
Add tests for date functions with session local timezone.
ueshin Dec 14, 2016
32cc391
Remove unused import and small cleanup.
ueshin Dec 16, 2016
f434378
Fix tests.
ueshin Dec 16, 2016
16fd1e4
Rename `zoneId` to `timeZoneId`.
ueshin Dec 19, 2016
009c17b
Use lazy val to avoid to keep creating a new timezone object (or doin…
ueshin Dec 19, 2016
a2936ed
Modify ComputeCurrentTime to hold the same date.
ueshin Dec 19, 2016
c5ca73e
Add comments.
ueshin Dec 19, 2016
b860379
Fix `Cast.needTimeZone()` to handle complex types.
ueshin Dec 19, 2016
6746265
Fix `Dataset.showString()` to use session local timezone.
ueshin Dec 19, 2016
4b6900c
Merge branch 'master' into issues/SPARK-18350
ueshin Dec 20, 2016
4f9cc40
Modify to analyze `ResolveTimeZone` only once.
ueshin Dec 24, 2016
2ca2413
Use session local timezone for Hive string.
ueshin Dec 24, 2016
c232854
Merge branch 'master' into issues/SPARK-18350
ueshin Dec 26, 2016
5b6dd4f
Merge branch 'master' into issues/SPARK-18350
ueshin Jan 5, 2017
1ca5808
Use `addReferenceMinorObj` to avoid adding member variables.
ueshin Jan 10, 2017
702dd81
Use Option[String] for timeZoneId.
ueshin Jan 10, 2017
33a3425
Update a comment.
ueshin Jan 10, 2017
5cc93e3
Fix overloaded constructors.
ueshin Jan 11, 2017
5521165
Fix session local timezone for timezone sensitive tests.
ueshin Jan 11, 2017
bd8275e
Remove `timeZoneResolved` and use `timeZoneId.isEmpty` instead in `Re…
ueshin Jan 11, 2017
183945c
Merge branch 'master' into issues/SPARK-18350
ueshin Jan 14, 2017
22a3b6e
Remove unused parameter.
ueshin Jan 16, 2017
30d51fa
Merge branch 'master' into issues/SPARK-18350
ueshin Jan 16, 2017
043ab52
Use Cast directly instead of dsl.
ueshin Jan 16, 2017
3ba5830
Merge branch 'master' into issues/SPARK-18350
ueshin Jan 22, 2017
9ab31f0
Revert unnecessary changes.
ueshin Jan 22, 2017
b954947
Use `@` binding to simplify pattern match.
ueshin Jan 22, 2017
dbb2604
Inline a `lazy val`.
ueshin Jan 22, 2017
186cd3e
Add some TODO comments for follow-up prs.
ueshin Jan 22, 2017
6631a69
Add a config document.
ueshin Jan 22, 2017
3610465
Use an overload version of `checkAnswer`.
ueshin Jan 22, 2017
c12e596
Fix CastSuite and add some comments to describe the tests.
ueshin Jan 22, 2017
8a04e80
Use None instead of null.
ueshin Jan 22, 2017
efe3aff
Add some comments to describe the tests.
ueshin Jan 22, 2017
cdbb266
Make TimeAdd/TimeSub/MonthsBetween TimeZoneAwareExpression.
ueshin Jan 22, 2017
328399a
Add comments to explain tests.
ueshin Jan 23, 2017
7352612
Modify a test.
ueshin Jan 23, 2017
b99cf79
Refine tests.
ueshin Jan 25, 2017
a85377f
Remove unnecessary new lines.
ueshin Jan 26, 2017
f0c911b
Add newDateFormat to DateTimeUtils and use it.
ueshin Jan 26, 2017
6fa1d6a
Parameterize some tests.
ueshin Jan 26, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

package org.apache.spark.sql.catalyst

import java.util.TimeZone

import org.apache.spark.sql.catalyst.analysis._

/**
Expand All @@ -36,6 +38,8 @@ trait CatalystConf {

def warehousePath: String

def sessionLocalTimeZone: String

/** If true, cartesian products between relations will be allowed for all
* join types(inner, (left|right|full) outer).
* If false, cartesian products will require explicit CROSS JOIN syntax.
Expand All @@ -62,5 +66,6 @@ case class SimpleCatalystConf(
maxCaseBranchesForCodegen: Int = 20,
runSQLonFile: Boolean = true,
crossJoinEnabled: Boolean = false,
warehousePath: String = "/user/hive/warehouse")
warehousePath: String = "/user/hive/warehouse",
sessionLocalTimeZone: String = TimeZone.getDefault().getID)
extends CatalystConf
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ class Analyzer(
ResolveAggregateFunctions ::
TimeWindowing ::
ResolveInlineTables ::
ResolveTimeZone ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overkill. We only need to run the rule once, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell Thank you for your suggestion.
I overrode resolved of TimeZoneAwareExpressions, then we needed to add ResolveTimeZone to Resolution batch, but I found we don't need to worry about the resolution because to have the timezone or not doesn't affect the resolution and we only need to run once now.

TypeCoercion.typeCoercionRules ++
extendedResolutionRules : _*),
Batch("Nondeterministic", Once,
Expand Down Expand Up @@ -180,7 +181,7 @@ class Analyzer(
case ne: NamedExpression => ne
case e if !e.resolved => u
case g: Generator => MultiAlias(g, Nil)
case c @ Cast(ne: NamedExpression, _) => Alias(c, ne.name)()
case c @ Cast(ne: NamedExpression, _, _) => Alias(c, ne.name)()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we add a Cast.unapply that returns only the first two arguments, we can reduce a lot of the cast match changes. Not sure if it is worth it though.

case e: ExtractValue => Alias(e, toPrettySQL(e))()
case e if optGenAliasFunc.isDefined =>
Alias(child, optGenAliasFunc.get.apply(e))()
Expand Down Expand Up @@ -2211,6 +2212,18 @@ class Analyzer(
}
}
}

/**
* Replace [[TimeZoneAwareExpression]] without [[TimeZone]] by its copy with session local
* time zone.
*/
object ResolveTimeZone extends Rule[LogicalPlan] {

override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
case e: TimeZoneAwareExpression if !e.timeZoneResolved =>
e.withTimeZone(conf.sessionLocalTimeZone)
}
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -438,7 +438,7 @@ object TypeCoercion {
// Skip nodes who's children have not been resolved yet.
case e if !e.childrenResolved => e

case Cast(e @ StringType(), t: IntegralType) =>
case Cast(e @ StringType(), t: IntegralType, _) =>
Cast(Cast(e, DecimalType.forType(LongType)), t)
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.{FunctionIdentifier, InternalRow, TableIden
import org.apache.spark.sql.catalyst.expressions.{Attribute, Cast, Literal}
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
import org.apache.spark.sql.catalyst.util.quoteIdentifier
import org.apache.spark.sql.catalyst.util.DateTimeUtils
import org.apache.spark.sql.types.{StructField, StructType}


Expand Down Expand Up @@ -111,7 +112,8 @@ case class CatalogTablePartition(
*/
def toRow(partitionSchema: StructType): InternalRow = {
InternalRow.fromSeq(partitionSchema.map { field =>
Cast(Literal(spec(field.name)), field.dataType).eval()
Cast(Literal(spec(field.name)), field.dataType,
DateTimeUtils.defaultTimeZone().getID).eval()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this change the behavior on how we interpret partition values when timezone settings change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the behavior doesn't change by timezone setting, i.e. using system timezone.

This is a part that I was not sure which we should handle the partition values, use timezone settings or system timezone.
Should we use timezone settings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, now I think we should use timezone settings for partition values, because the values are also parts of data so they should be affected by the settings.

})
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,14 @@ object Cast {
case (_: FractionalType, _: IntegralType) => true // NaN, infinity
case _ => false
}

def needTimeZone(from: DataType, to: DataType): Boolean = (from, to) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's important to document this ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I'll add a document.

case (StringType, TimestampType) => true
case (TimestampType, StringType) => true
case (DateType, TimestampType) => true
case (TimestampType, DateType) => true
case _ => false
}
}

/** Cast the child expression to the target data type. */
Expand All @@ -120,7 +128,10 @@ object Cast {
> SELECT _FUNC_('10' as int);
10
""")
case class Cast(child: Expression, dataType: DataType) extends UnaryExpression with NullIntolerant {
case class Cast(child: Expression, dataType: DataType, zoneId: String = null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 100% sure whether this is a good idea, but should we consider adding a Cast.unapply that does not match on zoneId?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should add classdoc to explain what zoneId is. I'd probably call it timeZoneId.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe an extra unapply is probably a bad idea, since then we can miss a pattern match.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that an extra unapply is a bad idea. I'll leave it as it is for now.

extends UnaryExpression with TimeZoneAwareExpression with NullIntolerant {

def this(child: Expression, dataType: DataType) = this(child, dataType, null)

override def toString: String = s"cast($child as ${dataType.simpleString})"

Expand All @@ -135,6 +146,14 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w

override def nullable: Boolean = Cast.forceNullable(child.dataType, dataType) || child.nullable

override def timeZoneResolved: Boolean =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a mental note. A timezone is resolved when:

  • We don't need one
  • Or is has been resolve

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right.

(!(childrenResolved && Cast.needTimeZone(child.dataType, dataType))) || super.timeZoneResolved

override lazy val resolved: Boolean =
childrenResolved && checkInputDataTypes().isSuccess && timeZoneResolved

override def withTimeZone(zoneId: String): TimeZoneAwareExpression = copy(zoneId = zoneId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a copy ctor isn't it? Maybe no need to add this? Not a big deal though.

copy(zoneId = zoneId)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a copy ctor, but the analyzer ResolveTimeZone can't call the copy ctor because it doesn't know the actual expression class.


// [[func]] assumes the input is no longer null because eval already does the null check.
@inline private[this] def buildCast[T](a: Any, func: T => Any): Any = func(a.asInstanceOf[T])

Expand All @@ -143,7 +162,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes)
case DateType => buildCast[Int](_, d => UTF8String.fromString(DateTimeUtils.dateToString(d)))
case TimestampType => buildCast[Long](_,
t => UTF8String.fromString(DateTimeUtils.timestampToString(t)))
t => UTF8String.fromString(DateTimeUtils.timestampToString(t, timeZone)))
case _ => buildCast[Any](_, o => UTF8String.fromString(o.toString))
}

Expand Down Expand Up @@ -188,7 +207,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
// TimestampConverter
private[this] def castToTimestamp(from: DataType): Any => Any = from match {
case StringType =>
buildCast[UTF8String](_, utfs => DateTimeUtils.stringToTimestamp(utfs).orNull)
buildCast[UTF8String](_, utfs => DateTimeUtils.stringToTimestamp(utfs, timeZone).orNull)
case BooleanType =>
buildCast[Boolean](_, b => if (b) 1L else 0)
case LongType =>
Expand All @@ -200,7 +219,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
case ByteType =>
buildCast[Byte](_, b => longToTimestamp(b.toLong))
case DateType =>
buildCast[Int](_, d => DateTimeUtils.daysToMillis(d) * 1000)
buildCast[Int](_, d => DateTimeUtils.daysToMillis(d, timeZone) * 1000)
// TimestampWritable.decimalToTimestamp
case DecimalType() =>
buildCast[Decimal](_, d => decimalToTimestamp(d))
Expand Down Expand Up @@ -235,7 +254,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
case TimestampType =>
// throw valid precision more than seconds, according to Hive.
// Timestamp.nanos is in 0 to 999,999,999, no more than a second.
buildCast[Long](_, t => DateTimeUtils.millisToDays(t / 1000L))
buildCast[Long](_, t => DateTimeUtils.millisToDays(t / 1000L, timeZone))
}

// IntervalConverter
Expand Down Expand Up @@ -512,8 +531,9 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString(
org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));"""
case TimestampType =>
val tz = ctx.addReferenceObj("timeZone", timeZone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use addReferenceMinorObj to avoid adding member variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'll modify to use it.

(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString(
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c));"""
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));"""
case _ =>
(c, evPrim, evNull) => s"$evPrim = UTF8String.fromString(String.valueOf($c));"
}
Expand All @@ -539,8 +559,9 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
}
"""
case TimestampType =>
val tz = ctx.addReferenceObj("timeZone", timeZone)
(c, evPrim, evNull) =>
s"$evPrim = org.apache.spark.sql.catalyst.util.DateTimeUtils.millisToDays($c / 1000L);";
s"$evPrim = org.apache.spark.sql.catalyst.util.DateTimeUtils.millisToDays($c / 1000L, $tz);"
case _ =>
(c, evPrim, evNull) => s"$evNull = true;"
}
Expand Down Expand Up @@ -618,11 +639,12 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
from: DataType,
ctx: CodegenContext): CastFunction = from match {
case StringType =>
val tz = ctx.addReferenceObj("timeZone", timeZone)
val longOpt = ctx.freshName("longOpt")
(c, evPrim, evNull) =>
s"""
scala.Option<Long> $longOpt =
org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToTimestamp($c);
org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToTimestamp($c, $tz);
if ($longOpt.isDefined()) {
$evPrim = ((Long) $longOpt.get()).longValue();
} else {
Expand All @@ -634,8 +656,9 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
case _: IntegralType =>
(c, evPrim, evNull) => s"$evPrim = ${longToTimeStampCode(c)};"
case DateType =>
val tz = ctx.addReferenceObj("timeZone", timeZone)
(c, evPrim, evNull) =>
s"$evPrim = org.apache.spark.sql.catalyst.util.DateTimeUtils.daysToMillis($c) * 1000;"
s"$evPrim = org.apache.spark.sql.catalyst.util.DateTimeUtils.daysToMillis($c, $tz) * 1000;"
case DecimalType() =>
(c, evPrim, evNull) => s"$evPrim = ${decimalToTimestampCode(c)};"
case DoubleType =>
Expand Down
Loading