-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26902][SQL] Support java.time.Instant as an external type of TimestampType #23811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #102420 has finished for PR 23811 at commit
|
| .booleanConf | ||
| .createWithDefault(true) | ||
|
|
||
| val TIMESTAMP_EXTERNAL_TYPE = buildConf("spark.sql.catalyst.timestampType") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can support reading from both types at the same time right?
I don't know if it's worth changing what it is written to; not worth a flag IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can support reading from both types at the same time right?
At Spark side, we can read both.
I don't know if it's worth changing what it is written to; not worth a flag IMHO.
Timestamps can be loaded from a datasource, casted from other types and etc. If an user wants to imports (collect) non-legacy timestamps (I mean java.time.Instant), how she/he can do that without the flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import is fine; we could potentially read both types to TimestampType. Can we just be opinionated about the right way to write it back out, and keep current behavior? it may be 'legacy' but not sure it's worth the behavior change. You may have more context on why that's important though.
As with many things I just don't know how realistically people will understand the issue, find the flag, set it, and maintain it across deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be 'legacy' but not sure it's worth the behavior change.
The SQL config spark.sql.catalyst.timestampType has default value Timestamp which preserves current behavior. When an user wants to import java.time.Instant from Spark, she/he can change the config to point out the Java timestamp class.
|
Just curios: is it ok to convert |
Actually we are not exposing |
|
I still have a preference for keeping it simple and returning one type, being opinionated. That would probably argue for the newer type. However I can imagine that could break a lot of code, even though this is only a major version upgrade. RIght? or would most user code see the same methods exposed on Instant and Timestamp and not care much? It's a case where I do understand having a flag. I'd even be OK with defaulting to instant with this as a safety-valve, to push people to better timestamp implementations. The Java 8 class has been out for years. |
This is what I propose in this PR - return
If we stick on ... but if we will brave enough and make
I would prefer this way too but I am just afraid to break user's apps even in major version - 3.0. @cloud-fan Could you look at the PR as well, please. |
|
The |
|
BTW do we have end-to-end tests for this feature? I'd like to see |
Not yet. I just wanted to be sure this will be accepted in general before investing time on this. |
|
As long as it's protected by a config, I think we are fine. We can have more discussion about whether or not make it default in the followup PR. |
ok. I'll continue and support |
Added such tests - UDF test contains collect as well. |
|
Test build #102797 has finished for PR 23811 at commit
|
|
can you rebase your branch? We need to make changes in |
|
otherwise LGTM |
|
Test build #102816 has finished for PR 23811 at commit
|
|
thanks, merging to master! |
|
@cloud-fan Thanks. I will do similar changes for |
|
@MaxGekk FYI: I'm proposing the same refactoring for |
|
@HeartSaVioR can you add it in #23908 ? thanks! |
|
Ah OK it was a missing spot. Not sure from my side. I'll add. Thanks! |
Probably I missed it because the code is not covered by any of my tests. Just for the future when I will add new type for |
|
It doesn't look like exact match of ScalaReflectionSuite for JavaTypeInference. (better to have one I guess) |
## What changes were proposed in this pull request? In the PR, I propose to extend `Literal.apply` to support constructing literals of `TimestampType` and `DateType` from `java.time.Instant` and `java.time.LocalDate`. The java classes have been already supported as external types for `TimestampType` and `DateType` by the PRs #23811 and #23913. ## How was this patch tested? Added new tests to `LiteralExpressionSuite`. Closes #24161 from MaxGekk/literal-instant-localdate. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
|
How useful is this change? Wouldn't it break a lot of user code that use Timestmap type when they upgrade to 3.0? It seems like we wouldn't be able to ever remove the config flag. |
Please, take a look at the motivation points in the PR description.
No, it will not break because Spark still returns
The flag has been removed already, and replaced by spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Lines 2033 to 2039 in ed44926
|
|
But if it is off by default, it means almost nobody will be using the new type right? My point is that it's not great when you have a feature that's almost never on and mostly just be dead code. Is there a plan to transition over to the new type? Wouldn't that plan involve breaking a lot of user code? |
From my point of view, this is debatable statement. Java 8 is 6 years old already. I could guess significant amount of modern apps including Spark apps is written on top of Java 8 time API. I do think users will look for how to parallelize Java 8 time-related values to Spark. Maybe we should highlight in Spark SQL docs more clearly how do that by using the flag.
Actually the
Spark 3.0 is going to introduce Java 8 time classes so far. In this release, we could keep both with old returned types by default. I would switch on Java 8 time API by default in the next release - Spark 3.1 or 3.2. From another side, the major release is good time for switching since Java 8 time API is mature enough.
I don't have statistics on hands. Switching on Java 8 time API by default, definitely will break: 1. UDFs and 2. apps that collects results from Spark. In the first case, we could try to detect input types of UDF, and maybe avoid failures by passing legacy types. but in the case of collecting datasets from Spark, it depends on users apps. In any case, the code can be easily fixed by converting, for instance |
|
As a somewhat heavy user of Datasets I'm actually +1 on letting users get nicer implementations of Timestamp out of Spark. The flag is not the most elegant way to expose this, but in some cases I think its our only choice (i.e. when Spark is returning Apologies, if I misread the PR, but couldn't we seamlessly support both types in many cases without the flag? (i.e. when reflection can tell us which type the user is expecting). That said, I'm -1 on making this the default anytime soon. Sounds like it will break a lot of programs. |
|
For |
What changes were proposed in this pull request?
In the PR, I propose to add new Catalyst type converter for
TimestampType. It should be able to convertjava.time.Instantto/fromTimestampType.Main motivations for the changes:
java.sql.Timestamp(hybrid calendar - Julian + Gregorian).By default, Spark converts values of
TimestampTypetojava.sql.Timestampinstances but the SQL configspark.sql.catalyst.timestampTypecan change the behavior. It accepts two valuesTimestamp(default) andInstant. If the former one is set, Spark returnsjava.time.Instantinstances for timestamp values.How was this patch tested?
Added new testes to
CatalystTypeConvertersSuiteto check conversion ofTimestampTypeto/fromjava.time.Instant.