[SPARK-44064][CORE][SQL] Add a new `apply` function to `NonFateSharingCache` #41654

LuciferYang · 2023-06-19T06:03:13Z

What changes were proposed in this pull request?

This pr add a new apply function to NonFateSharingCache and change CodeGenerator to use the new NonFateSharingCache#apply to make catalyst module can test using maven.

Why are the changes needed?

SPARK-43300 introduced NonFateSharingCache to core module and it only used by CodeGenerator which is in catalyst module.

There are two apply funcitons in NonFateSharingCache and the input parameter of NonFateSharingCache#apply is com.google.common.cache.Cache/LoadingCache.

The catalyst module may use shaded spark-core jar when we do maven testing and in the shaded spark-core jar, the Guava related classes will be relocated from com.google.common to org.sparkproject.guava, so the input parameter of NonFateSharingCache#apply will change to org.sparkproject.guava.cache.Cache/LoadingCache, but the catalyst module has not been shaded yet when do maven testing, so CodeGenerator will still use type com.google.common.cache.Cache to call the NonFateSharingCache#apply function, then this will result in a mismatch of input types when do maven testing and maven test will aborted as follows:

ProductAggSuite:
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.variable(javaCode.scala:64)
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.isNullVariable(javaCode.scala:77)
  at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.$anonfun$create$1(GenerateSafeProjection.scala:156)
  at scala.collection.immutable.List.map(List.scala:293)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:153)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
  at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1369)

So this pr add a new apply function to break non-core modules directly using Guava Cache related types as input parameter to invoke NonFateSharingCache#apply function for workaround, this way can avoid non-core modules Maven test failures caused by using shaded core module.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GitHub Actions
Manual test

 build/mvn clean install -DskipTests -pl sql/catalyst -am
 build/mvn test -pl sql/catalyst

Before

ProductAggSuite:
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.variable(javaCode.scala:64)
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.isNullVariable(javaCode.scala:77)
  at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.$anonfun$create$1(GenerateSafeProjection.scala:156)
  at scala.collection.immutable.List.map(List.scala:293)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:153)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
  at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1369)

After

Run completed in 4 minutes, 22 seconds.
Total number of tests run: 7088
Suites: completed 295, aborted 0
Tests: succeeded 7088, failed 0, canceled 0, ignored 5, pending 0
All tests passed.

LuciferYang · 2023-06-20T04:15:16Z

friendly ping @HyukjinKwon @dongjoon-hyun , I want to use this one instead of #41622

HyukjinKwon

I am good with this but let's wait for the author's feedback.

LuciferYang · 2023-06-20T06:59:50Z

Thanks @HyukjinKwon for your review, friendly ping @JoshRosen @liuzqt

ryan-johnson-databricks · 2023-06-20T20:43:48Z

friendly ping @HyukjinKwon @dongjoon-hyun , I want to use this one instead of #41622

Could you explain why you favor the more complex approach that changes prod code to address a test-only issue?

All else being equal, the other PR should merge because it's a lot simpler.

HyukjinKwon · 2023-06-20T23:53:46Z

I do't mind either this or #41622. I think the main reason is to keep NonFateSharingCache.scala in Spark core module instead of moving it to SQL module because this looks more for a general purpose. So TBH I am fine which one to merge. It's not an API in the end.

LuciferYang · 2023-06-21T10:49:48Z

+1, Agree with what @HyukjinKwon said

liuzqt · 2023-06-21T22:20:05Z

Hi @LuciferYang , thanks for the fix! I'm fine with either option.

LuciferYang · 2023-06-25T02:35:17Z

Thanks @liuzqt

LuciferYang · 2023-06-25T02:39:25Z

Hi @LuciferYang , thanks for the fix! I'm fine with either option.

I rebase the code to make GA test this one again. @HyukjinKwon seems the author approves of this fix. I am planning to merge this one today, do you think it's ok?

dongjoon-hyun

+1, LGTM. I also agree with @LuciferYang and @HyukjinKwon , and support this PR .

LuciferYang · 2023-06-26T02:16:20Z

Thanks @dongjoon-hyun @HyukjinKwon @liuzqt @ryan-johnson-databricks

LuciferYang added 3 commits June 19, 2023 13:09

add a new apply function

dbaf24d

fix compile

7d3a4f5

add more comments

61a37e0

github-actions bot added CORE SQL labels Jun 19, 2023

LuciferYang mentioned this pull request Jun 19, 2023

[SPARK-44064][SQL] Move NonFateSharingCache from core module to catalyst module #41622

Closed

HyukjinKwon approved these changes Jun 20, 2023

View reviewed changes

Merge branch 'apache:master' into SPARK-44064-2

efcafea

for ga

cadc263

dongjoon-hyun approved these changes Jun 25, 2023

View reviewed changes

dongjoon-hyun closed this in 703e819 Jun 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-44064][CORE][SQL] Add a new `apply` function to `NonFateSharingCache` #41654

[SPARK-44064][CORE][SQL] Add a new `apply` function to `NonFateSharingCache` #41654

Uh oh!

LuciferYang commented Jun 19, 2023

Uh oh!

LuciferYang commented Jun 20, 2023

Uh oh!

HyukjinKwon left a comment

Uh oh!

LuciferYang commented Jun 20, 2023

Uh oh!

ryan-johnson-databricks commented Jun 20, 2023

Uh oh!

HyukjinKwon commented Jun 20, 2023

Uh oh!

LuciferYang commented Jun 21, 2023

Uh oh!

liuzqt commented Jun 21, 2023

Uh oh!

LuciferYang commented Jun 25, 2023

Uh oh!

LuciferYang commented Jun 25, 2023

Uh oh!

dongjoon-hyun left a comment

Uh oh!

LuciferYang commented Jun 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-44064][CORE][SQL] Add a new apply function to NonFateSharingCache #41654

[SPARK-44064][CORE][SQL] Add a new apply function to NonFateSharingCache #41654

Uh oh!

Conversation

LuciferYang commented Jun 19, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

LuciferYang commented Jun 20, 2023

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Jun 20, 2023

Uh oh!

ryan-johnson-databricks commented Jun 20, 2023

Uh oh!

HyukjinKwon commented Jun 20, 2023

Uh oh!

LuciferYang commented Jun 21, 2023

Uh oh!

liuzqt commented Jun 21, 2023

Uh oh!

LuciferYang commented Jun 25, 2023

Uh oh!

LuciferYang commented Jun 25, 2023

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Jun 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-44064][CORE][SQL] Add a new `apply` function to `NonFateSharingCache` #41654

[SPARK-44064][CORE][SQL] Add a new `apply` function to `NonFateSharingCache` #41654