Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Jun 16, 2023

What changes were proposed in this pull request?

This pr move NonFateSharingCache from core module to catalyst to make catalyst module can test using maven.

Why are the changes needed?

SPARK-43300 introduced NonFateSharingCache to core module and it only used by CodeGenerator which is in catalyst module.

There are two apply funcitons in NonFateSharingCache and the input parameter of NonFateSharingCache#apply is com.google.common.cache.Cache/LoadingCache.

The catalyst module may use shaded spark-core jar when we do maven testing and in the shaded spark-core jar, the Guava related classes will be relocated from com.google.common to org.sparkproject.guava, so the input parameter of NonFateSharingCache#apply will change to org.sparkproject.guava.cache.Cache/LoadingCache, but the catalyst module has not been shaded yet when do maven testing, so CodeGenerator will still use type com.google.common.cache.Cache to call the NonFateSharingCache#apply function, then this will result in a mismatch of input types when do maven testing and maven test will aborted as follows:

ProductAggSuite:
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.variable(javaCode.scala:64)
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.isNullVariable(javaCode.scala:77)
  at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.$anonfun$create$1(GenerateSafeProjection.scala:156)
  at scala.collection.immutable.List.map(List.scala:293)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:153)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
  at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1369) 

So this pr move NonFateSharingCache from core module to catalyst for workaround, this is safe now because NonFateSharingCache is only used by catalyst now.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Pass GitHub Actions
  • Manual test
 build/mvn clean install -DskipTests -pl sql/catalyst -am
 build/mvn test -pl sql/catalyst 

Before

ProductAggSuite:
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.variable(javaCode.scala:64)
  at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.isNullVariable(javaCode.scala:77)
  at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.$anonfun$create$1(GenerateSafeProjection.scala:156)
  at scala.collection.immutable.List.map(List.scala:293)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:153)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
  at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1369) 

After

Run completed in 4 minutes, 29 seconds.
Total number of tests run: 7091
Suites: completed 296, aborted 0
Tests: succeeded 7091, failed 0, canceled 0, ignored 5, pending 0
All tests passed.

@github-actions github-actions bot added the SQL label Jun 16, 2023
@LuciferYang
Copy link
Contributor Author

this is safe now because NonFateSharingCache is only used by catalyst now.

In the future, if NonFateSharingCache is used by more other modules, other ways need to be found to make maven test pass

@LuciferYang LuciferYang marked this pull request as ready for review June 16, 2023 07:19
*/

package org.apache.spark.util
package org.apache.spark.sql.util
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the code content, NonFateSharingCache fits org.apache.spark.util instead of org.apache.spark.sql.util because it's more general. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but seems like it causes some problem @LuciferYang pointed at here #40982 (comment) if we move this to other modules (and catalyst module only has sql module). It doesn't look so beautiful but I think this is a minimized fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, got it. I missed the context. Thanks!

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this approach but would better need to check https://github.com/apache/spark/pull/41622/files#r1233374951 and the original authors' response.

@pan3793
Copy link
Member

pan3793 commented Jun 19, 2023

This is actually the limitation of Maven, AFAIK, Gradle does well in this case, not sure about SBT.
@LuciferYang could you please add some comments on the class? I remember there was a discussion about moving the building system to single SBT, we may want to re-evaluate it once the migration occurs.

@LuciferYang LuciferYang marked this pull request as draft June 19, 2023 06:05
@LuciferYang
Copy link
Contributor Author

@HyukjinKwon @dongjoon-hyun @pan3793

After more thought, I give a new one #41654

In #41654, I added a new apply function for NonFateSharingCache, the new apply function no longer use any Guava related types as input parameters, which can avoid the impact of using the shaded core module during Maven testing, and there is no need to move NonFateSharingCache from core to catalyst

Does this approach look better?

@pan3793
Copy link
Member

pan3793 commented Jun 19, 2023

thanks, @LuciferYang, the new approach looks better

@LuciferYang
Copy link
Contributor Author

close this one due to #41654 merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants