-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20018][SQL] Pivot with timestamp and count should not print internal representation #17348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3c619df
93f05f3
4e4cfa7
803a094
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,7 +23,7 @@ import org.apache.spark.sql.internal.SQLConf | |
| import org.apache.spark.sql.test.SharedSQLContext | ||
| import org.apache.spark.sql.types._ | ||
|
|
||
| class DataFramePivotSuite extends QueryTest with SharedSQLContext{ | ||
| class DataFramePivotSuite extends QueryTest with SharedSQLContext { | ||
| import testImplicits._ | ||
|
|
||
| test("pivot courses") { | ||
|
|
@@ -230,4 +230,20 @@ class DataFramePivotSuite extends QueryTest with SharedSQLContext{ | |
| .groupBy($"a").pivot("a").agg(min($"b")), | ||
| Row(null, Seq(null, 7), null) :: Row(1, null, Seq(1, 7)) :: Nil) | ||
| } | ||
|
|
||
| test("pivot with timestamp and count should not print internal representation") { | ||
| val ts = "2012-12-31 16:00:10.011" | ||
| val tsWithZone = "2013-01-01 00:00:10.011" | ||
|
|
||
| withSQLConf(SQLConf.SESSION_LOCAL_TIMEZONE.key -> "GMT") { | ||
| val df = Seq(java.sql.Timestamp.valueOf(ts)).toDF("a").groupBy("a").pivot("a").count() | ||
| val expected = StructType( | ||
| StructField("a", TimestampType) :: | ||
| StructField(tsWithZone, LongType) :: Nil) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it expected? users will see different values now
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, I was confused of it too because the original values are apprently rendered differently. However, it seems intended. scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
scala> val timestamp = java.sql.Timestamp.valueOf("2012-12-31 16:00:10.011")
timestamp: java.sql.Timestamp = 2012-12-31 16:00:10.011
scala> Seq(timestamp).toDF("a").show()
+--------------------+
| a|
+--------------------+
|2012-12-30 23:00:...|
+--------------------+Internal values seem as they are but it seems only changing human readable format according to the given timezone. I guess this is as described in #16308
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the column name changes with timezone, but what about the value? can you also check the result?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, sure. scala> val timestamp = java.sql.Timestamp.valueOf("2012-12-31 16:00:10.011")
timestamp: java.sql.Timestamp = 2012-12-31 16:00:10.011
scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
scala> Seq(timestamp).toDF("a").groupBy("a").pivot("a").count().show(false)
+-----------------------+-----------------------+
|a |2012-12-30 23:00:10.011|
+-----------------------+-----------------------+
|2012-12-30 23:00:10.011|1 |
+-----------------------+-----------------------+
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With the default timezone ... scala> val timestamp = java.sql.Timestamp.valueOf("2012-12-31 16:00:10.011")
timestamp: java.sql.Timestamp = 2012-12-31 16:00:10.011
scala> Seq(timestamp).toDF("a").groupBy("a").pivot("a").count().show(false)
+-----------------------+-----------------------+
|a |2012-12-31 16:00:10.011|
+-----------------------+-----------------------+
|2012-12-31 16:00:10.011|1 |
+-----------------------+-----------------------+
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Few more tests with string cast ... scala> val timestamp = java.sql.Timestamp.valueOf("2012-12-31 16:00:10.011")
timestamp: java.sql.Timestamp = 2012-12-31 16:00:10.011
scala> Seq(timestamp).toDF("a").groupBy("a").pivot("a").count().selectExpr("cast(a as string)", "`2012-12-31 16:00:10.011`").show(false)
+-----------------------+-----------------------+
|a |2012-12-31 16:00:10.011|
+-----------------------+-----------------------+
|2012-12-31 16:00:10.011|1 |
+-----------------------+-----------------------+scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
scala> val timestamp = java.sql.Timestamp.valueOf("2012-12-31 16:00:10.011")
timestamp: java.sql.Timestamp = 2012-12-31 16:00:10.011
scala> Seq(timestamp).toDF("a").groupBy("a").pivot("a").count().selectExpr("cast(a as string)", "`2012-12-30 23:00:10.011`").show(false)
+-----------------------+-----------------------+
|a |2012-12-30 23:00:10.011|
+-----------------------+-----------------------+
|2012-12-30 23:00:10.011|1 |
+-----------------------+-----------------------+ |
||
| assert(df.schema == expected) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we add a
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. |
||
| // String representation of timestamp with timezone should take the time difference | ||
| // into account. | ||
| checkAnswer(df.select($"a".cast(StringType)), Row(tsWithZone)) | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we can cast into
StringTypein all the ways -spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Line 41 in e9e2c61
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, is this a correct way for handling timezone - @ueshin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your confirmation.