Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -486,14 +486,16 @@ class Analyzer(
case Pivot(groupByExprs, pivotColumn, pivotValues, aggregates, child) =>
val singleAgg = aggregates.size == 1
def outputName(value: Literal, aggregate: Expression): String = {
val utf8Value = Cast(value, StringType, Some(conf.sessionLocalTimeZone)).eval(EmptyRow)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we can cast into StringType in all the ways -

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is this a correct way for handling timezone - @ueshin ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your confirmation.

val stringValue: String = Option(utf8Value).map(_.toString).getOrElse("null")
if (singleAgg) {
value.toString
stringValue
} else {
val suffix = aggregate match {
case n: NamedExpression => n.name
case _ => toPrettySQL(aggregate)
}
value + "_" + suffix
stringValue + "_" + suffix
}
}
if (aggregates.forall(a => PivotFirst.supportsDataType(a.dataType))) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.test.SharedSQLContext
import org.apache.spark.sql.types._

class DataFramePivotSuite extends QueryTest with SharedSQLContext{
class DataFramePivotSuite extends QueryTest with SharedSQLContext {
import testImplicits._

test("pivot courses") {
Expand Down Expand Up @@ -230,4 +230,20 @@ class DataFramePivotSuite extends QueryTest with SharedSQLContext{
.groupBy($"a").pivot("a").agg(min($"b")),
Row(null, Seq(null, 7), null) :: Row(1, null, Seq(1, 7)) :: Nil)
}

test("pivot with timestamp and count should not print internal representation") {
val ts = "2012-12-31 16:00:10.011"
val tsWithZone = "2013-01-01 00:00:10.011"

withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> "GMT") {
val df = Seq(java.sql.Timestamp.valueOf(ts)).toDF("a").groupBy("a").pivot("a").count()
val expected = StructType(
StructField("a", TimestampType) ::
StructField(tsWithZone, LongType) :: Nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it expected? users will see different values now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I was confused of it too because the original values are apprently rendered differently. However, it seems intended.

scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")

scala> val timestamp = java.sql.Timestamp.valueOf("2012-12-31 16:00:10.011")
timestamp: java.sql.Timestamp = 2012-12-31 16:00:10.011

scala> Seq(timestamp).toDF("a").show()
+--------------------+
|                   a|
+--------------------+
|2012-12-30 23:00:...|
+--------------------+

Internal values seem as they are but it seems only changing human readable format according to the given timezone.

I guess this is as described in #16308

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the column name changes with timezone, but what about the value? can you also check the result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sure.

scala> val timestamp = java.sql.Timestamp.valueOf("2012-12-31 16:00:10.011")
timestamp: java.sql.Timestamp = 2012-12-31 16:00:10.011

scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")

scala> Seq(timestamp).toDF("a").groupBy("a").pivot("a").count().show(false)
+-----------------------+-----------------------+
|a                      |2012-12-30 23:00:10.011|
+-----------------------+-----------------------+
|2012-12-30 23:00:10.011|1                      |
+-----------------------+-----------------------+

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the default timezone ...

scala> val timestamp = java.sql.Timestamp.valueOf("2012-12-31 16:00:10.011")
timestamp: java.sql.Timestamp = 2012-12-31 16:00:10.011

scala> Seq(timestamp).toDF("a").groupBy("a").pivot("a").count().show(false)
+-----------------------+-----------------------+
|a                      |2012-12-31 16:00:10.011|
+-----------------------+-----------------------+
|2012-12-31 16:00:10.011|1                      |
+-----------------------+-----------------------+

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more tests with string cast ...

scala> val timestamp = java.sql.Timestamp.valueOf("2012-12-31 16:00:10.011")
timestamp: java.sql.Timestamp = 2012-12-31 16:00:10.011

scala> Seq(timestamp).toDF("a").groupBy("a").pivot("a").count().selectExpr("cast(a as string)", "`2012-12-31 16:00:10.011`").show(false)
+-----------------------+-----------------------+
|a                      |2012-12-31 16:00:10.011|
+-----------------------+-----------------------+
|2012-12-31 16:00:10.011|1                      |
+-----------------------+-----------------------+
scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")

scala> val timestamp = java.sql.Timestamp.valueOf("2012-12-31 16:00:10.011")
timestamp: java.sql.Timestamp = 2012-12-31 16:00:10.011

scala> Seq(timestamp).toDF("a").groupBy("a").pivot("a").count().selectExpr("cast(a as string)", "`2012-12-30 23:00:10.011`").show(false)
+-----------------------+-----------------------+
|a                      |2012-12-30 23:00:10.011|
+-----------------------+-----------------------+
|2012-12-30 23:00:10.011|1                      |
+-----------------------+-----------------------+

assert(df.schema == expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a checkAnswer to make sure the value is also tsWithZone?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

// String representation of timestamp with timezone should take the time difference
// into account.
checkAnswer(df.select($"a".cast(StringType)), Row(tsWithZone))
}
}
}