[SPARK-42169] [SQL] Implement code generation for to_csv function (StructsToCsv)#39719
[SPARK-42169] [SQL] Implement code generation for to_csv function (StructsToCsv)#39719NarekDW wants to merge 7 commits intoapache:masterfrom
Conversation
MaxGekk
left a comment
There was a problem hiding this comment.
Does CSVBenchmark show any performance improvements?
|
@MaxGekk This is from current branch: On my local machine - |
|
@NarekDW Could you regenerate benchmark results using GitHub actions, see
in your PR. |
@MaxGekk sorry for the late response. I've added benchmark results from GitHub actions. Execution links for master branch: Execution links for current branch: P.S. Scala 2.12 was used for all executions. |
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala
Outdated
Show resolved
Hide resolved
|
@jaceklaskowski thank you for the review. @MaxGekk just a reminder. |
There was a problem hiding this comment.
Could you clarify this test title, please. I don't see anything related to 64KB in the test.
Doesn't the test just duplicates existing one: to_csv - struct?
There was a problem hiding this comment.
Sure, checkEvaluation will execute this test in both modes without codegen and with codegen modes. In case of codegen mode, if the generated java code (method) will be larger than 64 kb in size it won't be able to compile it, and the test will fail, as Java has a 64kb limit on the size of methods.
It doesn't duplicate to_csv - struct test case, because the purpose of this test - is to generate big StructType(with 5000 literals in current case) and test it on StructsToCsv expression to be sure that generated java code doesn't exceed the limit of 64 kb in size.
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala
Outdated
Show resolved
Hide resolved
|
@NarekDW Could you rebase this PR on the recent master, please. |
|
@MaxGekk sure, I've rebased, but there were conflicts in benchmarks. It will take some time to regenerate them. |
MaxGekk
left a comment
There was a problem hiding this comment.
LGTM, waiting for benchmark results.
|
@MaxGekk sorry for delay, benchmark results are updated. |
|
+1, LGTM. Merging to master. |


What changes were proposed in this pull request?
This PR enhances
StructsToCsvclass withdoGenCodefunction instead of extending it fromCodegenFallbacktrait (performance improvement).Why are the changes needed?
It will improve performance.
Does this PR introduce any user-facing change?
No
How was this patch tested?
an additional test case were added to
org.apache.spark.sql.CsvFunctionsSuiteclass.