Skip to content

Conversation

@rednaxelafx
Copy link
Contributor

@rednaxelafx rednaxelafx commented Apr 11, 2019

What changes were proposed in this pull request?

SPARK-27199 introduced the use of ZoneId instead of TimeZone in a few date/time expressions.
There were 3 occurrences of ctx.addReferenceObj("zoneId", zoneId) in that PR, which had a bug because while the java.time.ZoneId base type is public, the actual concrete implementation classes are not public, so using the 2-arg version of CodegenContext.addReferenceObj would incorrectly generate code that reference non-public types (java.time.ZoneRegion, to be specific). The 3-arg version should be used, with the class name of the referenced object explicitly specified to the public base type.

One of such occurrences was caught in testing in the main PR of SPARK-27199 (#24141), for DateFormatClass. But the other 2 occurrences slipped through because there were no test cases that covered them.

Example of this bug in the current Apache Spark master, in a Spark Shell:

scala> Seq(("2016-04-08", "yyyy-MM-dd")).toDF("s", "f").repartition(1).selectExpr("to_unix_timestamp(s, f)").show
...
java.lang.IllegalAccessError: tried to access class java.time.ZoneRegion from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1

This PR fixes the codegen issues and adds the corresponding unit tests.

How was this patch tested?

Enhanced tests in DateExpressionsSuite for to_unix_timestamp and from_unixtime.

@rednaxelafx
Copy link
Contributor Author

cc @cloud-fan @MaxGekk

@gatorsmile
Copy link
Member

Also cc the reviewers of the original PR @HyukjinKwon @srowen @kiszk

@gatorsmile
Copy link
Member

gatorsmile commented Apr 11, 2019

We have to be really careful when touching the code that has a wide impact. If you are not the expert in that area, please be more careful when you merge it. Thanks!

@srowen
Copy link
Member

srowen commented Apr 11, 2019

@gatorsmile would you like to more actively review these changes? Please do if you're concerned, or suggest reviewers who you want to look at them. I certainly didn't catch this, because it wasn't covered by tests; isn't that really the issue?

@gatorsmile
Copy link
Member

In general, I have to say the test coverage is not good. After we reduce the total test time, I plan to suggest porting more end-to-end tests from the other open source SQL engine. When reviewing the changes, we need to encourage the community to add more tests and the reviewers also need to spend more time to check all the code paths are covered by the tests; otherwise, it is easy to be broken by the future code changes.

I am not sure how the other committers are reviewing the code. For me, it is very time consuming in the code review. We normally downloaded the code, play with the changes and run them in our local environment. I will try to allocate more time in the code review in the future.

@srowen
Copy link
Member

srowen commented Apr 11, 2019

Yep, +1 to more tests, but as you say, it's a question of effort. I think that if we implement the standard you're suggesting, we'd merge very little, and that has its own costs. We're already not able to review even most of the open PRs.

On the meta-issue here: I don't think anybody disagrees with "be careful" but it's always a judgment call: how likely is this to cause a problem of what size? compared to the benefit of not doing it? too eager and we'll introduce bugs, too conservative and we'll miss important fixes and changes. We can all only make our best-effort guess. You are an important voice for being conservative; I'm only saying it's not as simple as others being uncareful.

Anyway yeah let's get this fix in of course. It's great that there are downstream tests making additional checks, at least, after the fact.

@rednaxelafx
Copy link
Contributor Author

Just a data point FYI: I caught this bug through visual scanning of the original PR. There were 3 occurrences of similar-looking code but one stood out being different from the other two. Upon further inspection of the other two, it was obvious that they had a bug that was fixed in the first one.

Similar-looking but different code could be a trait worth checking in future code reviews.

@rednaxelafx
Copy link
Contributor Author

Another idea that just popped into mind is: perhaps we can enhance checkEvaluation to automatically test two versions of the input expression, one that runs the input expression verbatim, the other substitutes Literal with NonFoldableLiteral to force runtime evaluation. It's not enough to test all combinations/variations, but would force exercise a lot more paths in codegen then the current test infrastructure -- we test the foldable paths in UTs much more than the non-foldable paths, because of the common pattern of using Literal as inputs.

@SparkQA
Copy link

SparkQA commented Apr 12, 2019

Test build #104528 has finished for PR 24352 at commit 8005c0e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in bbbe54a Apr 12, 2019
@kiszk
Copy link
Member

kiszk commented Apr 14, 2019

Late for reviewing this, good catch!
Interesting discussion on how we improve test coverage in codegen.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants