Skip to content

Conversation

@LuciferYang
Copy link
Contributor

What changes were proposed in this pull request?

This pr add a new apply function to NonFateSharingCache and change CodeGenerator to use the new NonFateSharingCache#apply to make catalyst module can test using maven.

Why are the changes needed?

SPARK-43300 introduced NonFateSharingCache to core module and it only used by CodeGenerator which is in catalyst module.

There are two apply funcitons in NonFateSharingCache and the input parameter of NonFateSharingCache#apply is com.google.common.cache.Cache/LoadingCache.

The catalyst module may use shaded spark-core jar when we do maven testing and in the shaded spark-core jar, the Guava related classes will be relocated from com.google.common to org.sparkproject.guava, so the input parameter of NonFateSharingCache#apply will change to org.sparkproject.guava.cache.Cache/LoadingCache, but the catalyst module has not been shaded yet when do maven testing, so CodeGenerator will still use type com.google.common.cache.Cache to call the NonFateSharingCache#apply function, then this will result in a mismatch of input types when do maven testing and maven test will aborted as follows:

ProductAggSuite:
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.variable(javaCode.scala:64)
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.isNullVariable(javaCode.scala:77)
  at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.$anonfun$create$1(GenerateSafeProjection.scala:156)
  at scala.collection.immutable.List.map(List.scala:293)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:153)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
  at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1369) 

So this pr add a new apply function to break non-core modules directly using Guava Cache related types as input parameter to invoke NonFateSharingCache#apply function for workaround, this way can avoid non-core modules Maven test failures caused by using shaded core module.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Pass GitHub Actions
  • Manual test
 build/mvn clean install -DskipTests -pl sql/catalyst -am
 build/mvn test -pl sql/catalyst 

Before

ProductAggSuite:
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.variable(javaCode.scala:64)
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.isNullVariable(javaCode.scala:77)
  at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.$anonfun$create$1(GenerateSafeProjection.scala:156)
  at scala.collection.immutable.List.map(List.scala:293)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:153)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
  at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1369) 

After

Run completed in 4 minutes, 22 seconds.
Total number of tests run: 7088
Suites: completed 295, aborted 0
Tests: succeeded 7088, failed 0, canceled 0, ignored 5, pending 0
All tests passed.

@LuciferYang
Copy link
Contributor Author

friendly ping @HyukjinKwon @dongjoon-hyun , I want to use this one instead of #41622

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with this but let's wait for the author's feedback.

@LuciferYang
Copy link
Contributor Author

Thanks @HyukjinKwon for your review, friendly ping @JoshRosen @liuzqt

@ryan-johnson-databricks
Copy link
Contributor

friendly ping @HyukjinKwon @dongjoon-hyun , I want to use this one instead of #41622

Could you explain why you favor the more complex approach that changes prod code to address a test-only issue?

All else being equal, the other PR should merge because it's a lot simpler.

@HyukjinKwon
Copy link
Member

I do't mind either this or #41622. I think the main reason is to keep NonFateSharingCache.scala in Spark core module instead of moving it to SQL module because this looks more for a general purpose. So TBH I am fine which one to merge. It's not an API in the end.

@LuciferYang
Copy link
Contributor Author

+1, Agree with what @HyukjinKwon said

@liuzqt
Copy link
Contributor

liuzqt commented Jun 21, 2023

Hi @LuciferYang , thanks for the fix! I'm fine with either option.

@LuciferYang
Copy link
Contributor Author

Thanks @liuzqt

@LuciferYang
Copy link
Contributor Author

Hi @LuciferYang , thanks for the fix! I'm fine with either option.

I rebase the code to make GA test this one again. @HyukjinKwon seems the author approves of this fix. I am planning to merge this one today, do you think it's ok?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. I also agree with @LuciferYang and @HyukjinKwon , and support this PR .

@LuciferYang
Copy link
Contributor Author

Thanks @dongjoon-hyun @HyukjinKwon @liuzqt @ryan-johnson-databricks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants