-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20639][SQL] Add single argument support for to_timestamp in SQL with documentation improvement #17901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
45bf353
f8921f4
b2d3b0a
497a229
b6f867c
fc02460
b038927
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1752,15 +1752,15 @@ setMethod("toRadians", | |
|
|
||
| #' to_date | ||
| #' | ||
| #' Converts the column into a DateType. You may optionally specify a format | ||
| #' Converts the column into a date column. You may optionally specify a format | ||
| #' according to the rules in: | ||
| #' \url{http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html}. | ||
| #' If the string cannot be parsed according to the specified format (or default), | ||
| #' the value of the column will be null. | ||
| #' The default format is 'yyyy-MM-dd'. | ||
| #' By default, it follows casting rules to a date if the format is omitted. | ||
| #' | ||
| #' @param x Column to parse. | ||
| #' @param format string to use to parse x Column to DateType. (optional) | ||
| #' @param format string to use to parse x column to a date column. (optional) | ||
|
||
| #' | ||
| #' @rdname to_date | ||
| #' @name to_date | ||
|
|
@@ -1827,15 +1827,15 @@ setMethod("to_json", signature(x = "Column"), | |
|
|
||
| #' to_timestamp | ||
| #' | ||
| #' Converts the column into a TimestampType. You may optionally specify a format | ||
| #' Converts the column into a timestamp column. You may optionally specify a format | ||
|
||
| #' according to the rules in: | ||
| #' \url{http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html}. | ||
| #' If the string cannot be parsed according to the specified format (or default), | ||
| #' the value of the column will be null. | ||
| #' The default format is 'yyyy-MM-dd HH:mm:ss'. | ||
| #' | ||
| #' @param x Column to parse. | ||
| #' @param format string to use to parse x Column to DateType. (optional) | ||
| #' @param format string to use to parse x column to a timestamp column. (optional) | ||
| #' | ||
| #' @rdname to_timestamp | ||
| #' @name to_timestamp | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -144,12 +144,6 @@ def _(): | |
| 'measured in radians.', | ||
| } | ||
|
|
||
| _functions_2_2 = { | ||
| 'to_date': 'Converts a string date into a DateType using the (optionally) specified format.', | ||
| 'to_timestamp': 'Converts a string timestamp into a timestamp type using the ' + | ||
| '(optionally) specified format.', | ||
| } | ||
|
|
||
|
||
| # math functions that take two arguments as input | ||
| _binary_mathfunctions = { | ||
| 'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' + | ||
|
|
@@ -987,9 +981,10 @@ def months_between(date1, date2): | |
| def to_date(col, format=None): | ||
| """Converts a :class:`Column` of :class:`pyspark.sql.types.StringType` or | ||
| :class:`pyspark.sql.types.TimestampType` into :class:`pyspark.sql.types.DateType` | ||
| using the optionally specified format. Default format is 'yyyy-MM-dd'. | ||
| Specify formats according to | ||
| using the optionally specified format. Specify formats according to | ||
| `SimpleDateFormats <http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html>`_. | ||
| By default, it follows casting rules to :class:`pyspark.sql.types.DateType` if the format | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto, not sure if it's clear to python user with |
||
| is omitted. | ||
|
|
||
| >>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t']) | ||
| >>> df.select(to_date(df.t).alias('date')).collect() | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -1148,13 +1148,6 @@ case class ToUTCTimestamp(left: Expression, right: Expression) | |||||||
| /** | ||||||||
| * Returns the date part of a timestamp or string. | ||||||||
| */ | ||||||||
| @ExpressionDescription( | ||||||||
| usage = "_FUNC_(expr) - Extracts the date part of the date or timestamp expression `expr`.", | ||||||||
| extended = """ | ||||||||
| Examples: | ||||||||
| > SELECT _FUNC_('2009-07-30 04:17:52'); | ||||||||
| 2009-07-30 | ||||||||
| """) | ||||||||
|
||||||||
| case class ToDate(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { | ||||||||
|
|
||||||||
| // Implicit casting of spark will accept string in both date and timestamp format, as | ||||||||
|
|
@@ -1175,15 +1168,19 @@ case class ToDate(child: Expression) extends UnaryExpression with ImplicitCastIn | |||||||
| /** | ||||||||
| * Parses a column to a date based on the given format. | ||||||||
| */ | ||||||||
| // scalastyle:off line.size.limit | ||||||||
| @ExpressionDescription( | ||||||||
| usage = "_FUNC_(date_str, fmt) - Parses the `left` expression with the `fmt` expression. Returns null with invalid input.", | ||||||||
| usage = """ | ||||||||
| _FUNC_(date_str[, fmt]) - Parses the `date_str` expression with the `fmt` expression to | ||||||||
| a date. Returns null with invalid input. By default, it follows casting rules to a date if | ||||||||
| the `fmt` is omitted. | ||||||||
| """, | ||||||||
| extended = """ | ||||||||
| Examples: | ||||||||
| > SELECT _FUNC_('2009-07-30 04:17:52'); | ||||||||
| 2009-07-30 | ||||||||
| > SELECT _FUNC_('2016-12-31', 'yyyy-MM-dd'); | ||||||||
| 2016-12-31 | ||||||||
| """) | ||||||||
| // scalastyle:on line.size.limit | ||||||||
| case class ParseToDate(left: Expression, format: Option[Expression], child: Expression) | ||||||||
| extends RuntimeReplaceable { | ||||||||
|
|
||||||||
|
|
@@ -1212,22 +1209,27 @@ case class ParseToDate(left: Expression, format: Option[Expression], child: Expr | |||||||
| /** | ||||||||
| * Parses a column to a timestamp based on the supplied format. | ||||||||
| */ | ||||||||
| // scalastyle:off line.size.limit | ||||||||
| @ExpressionDescription( | ||||||||
| usage = "_FUNC_(timestamp, fmt) - Parses the `left` expression with the `format` expression to a timestamp. Returns null with invalid input.", | ||||||||
| usage = """ | ||||||||
| _FUNC_(timestamp[, fmt]) - Parses the `timestamp` expression with the `format` expression to | ||||||||
| a timestamp. Returns null with invalid input. Default `fmt` is 'yyyy-MM-dd HH:mm:ss'. | ||||||||
|
||||||||
| new ParseToTimestamp(s.expr, Literal("yyyy-MM-dd HH:mm:ss")) |
Do you have any suggestion that I could try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres has a single-argument to_timestamp function, but that is used to convert Unix epoch to timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it seems we documented them here and there
Line 1835 in f21897f
| #' The default format is 'yyyy-MM-dd HH:mm:ss'. |
spark/python/pyspark/sql/functions.py
Line 1014 in 63d90e7
| using the optionally specified format. Default format is 'yyyy-MM-dd HH:mm:ss'. Specify |
If the suggestion can be simply done with the format in SimpleDateFormat, I am willing to do this but in a quick look it looks not. Do you mind if we try this later in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the default values we choose. I am not sure whether we should simply choose ISO as the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do understand your concern but I am not introducing the default value. It is already there in coressponding APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we follow to_date and using the casting rules if the format is not specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me give a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks the default format is ...
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Lines 430 to 435 in 2269155
which looks used in the casting rule to a date type.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there more info we could provide for R users, who might not know where to look for this "casting rules to a date"?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I am not sure if i should write out all the contents above ... these format above look actually a bit informal to me (dose anyone know if I understood this correctly?) for a use of documentation. Do you have any good idea for a better description maybe ... ? Let me leave another comment while addressing the comments if I come up with a better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, let me give a shot with adding an example -
cast(df$x, "date").