-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27199][SQL][FOLLOWUP] Fix bug in codegen templates in UnixTime and FromUnixTime #24352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Also cc the reviewers of the original PR @HyukjinKwon @srowen @kiszk |
|
We have to be really careful when touching the code that has a wide impact. If you are not the expert in that area, please be more careful when you merge it. Thanks! |
|
@gatorsmile would you like to more actively review these changes? Please do if you're concerned, or suggest reviewers who you want to look at them. I certainly didn't catch this, because it wasn't covered by tests; isn't that really the issue? |
|
In general, I have to say the test coverage is not good. After we reduce the total test time, I plan to suggest porting more end-to-end tests from the other open source SQL engine. When reviewing the changes, we need to encourage the community to add more tests and the reviewers also need to spend more time to check all the code paths are covered by the tests; otherwise, it is easy to be broken by the future code changes. I am not sure how the other committers are reviewing the code. For me, it is very time consuming in the code review. We normally downloaded the code, play with the changes and run them in our local environment. I will try to allocate more time in the code review in the future. |
|
Yep, +1 to more tests, but as you say, it's a question of effort. I think that if we implement the standard you're suggesting, we'd merge very little, and that has its own costs. We're already not able to review even most of the open PRs. On the meta-issue here: I don't think anybody disagrees with "be careful" but it's always a judgment call: how likely is this to cause a problem of what size? compared to the benefit of not doing it? too eager and we'll introduce bugs, too conservative and we'll miss important fixes and changes. We can all only make our best-effort guess. You are an important voice for being conservative; I'm only saying it's not as simple as others being uncareful. Anyway yeah let's get this fix in of course. It's great that there are downstream tests making additional checks, at least, after the fact. |
|
Just a data point FYI: I caught this bug through visual scanning of the original PR. There were 3 occurrences of similar-looking code but one stood out being different from the other two. Upon further inspection of the other two, it was obvious that they had a bug that was fixed in the first one. Similar-looking but different code could be a trait worth checking in future code reviews. |
|
Another idea that just popped into mind is: perhaps we can enhance |
|
Test build #104528 has finished for PR 24352 at commit
|
|
thanks, merging to master! |
|
Late for reviewing this, good catch! |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late LGTM.
What changes were proposed in this pull request?
SPARK-27199 introduced the use of
ZoneIdinstead ofTimeZonein a few date/time expressions.There were 3 occurrences of
ctx.addReferenceObj("zoneId", zoneId)in that PR, which had a bug because while thejava.time.ZoneIdbase type is public, the actual concrete implementation classes are not public, so using the 2-arg version ofCodegenContext.addReferenceObjwould incorrectly generate code that reference non-public types (java.time.ZoneRegion, to be specific). The 3-arg version should be used, with the class name of the referenced object explicitly specified to the public base type.One of such occurrences was caught in testing in the main PR of SPARK-27199 (#24141), for
DateFormatClass. But the other 2 occurrences slipped through because there were no test cases that covered them.Example of this bug in the current Apache Spark master, in a Spark Shell:
This PR fixes the codegen issues and adds the corresponding unit tests.
How was this patch tested?
Enhanced tests in
DateExpressionsSuiteforto_unix_timestampandfrom_unixtime.