[SPARK-53525][CONNECT][FOLLOWUP] Spark Connect ArrowBatch Result Chunking - Scala Client #52496

xi-db · 2025-10-01T09:50:47Z

What changes were proposed in this pull request?

In the previous PR #52271 of Spark Connect ArrowBatch Result Chunking, both Server-side and PySpark client changes were implemented.

In this PR, the corresponding Scala client changes are implemented, so large Arrow rows are now supported on the Scala client as well.

To reproduce the existing issue we are solving here, run this code on Spark Connect Scala client:

val res = spark.sql("select repeat('a', 1024*1024*300)").collect()
println(res(0).getString(0).length)

It fails with RESOURCE_EXHAUSTED error with message gRPC message exceeds maximum size 134217728: 314573320, because the server is trying to send an ExecutePlanResponse of ~300MB to the client.

With the improvement introduced by the PR, the above code runs successfully and prints the expected result.

Why are the changes needed?

It improves Spark Connect stability when returning large rows.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests.

Was this patch authored or co-authored using generative AI tooling?

No.

vicennial

cc @hvanhovell

vicennial · 2025-10-24T12:39:59Z

sql/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala

            throw new IllegalStateException(
-              s"Expected arrow batch to start at row offset $numRecords in results, " +
-                s"but received arrow batch starting at offset $expectedStartOffset.")
+              s"Expected chunk index ${arrowBatchChunksToAssemble.size} of the " +
+                s"arrow batch but got ${arrowBatch.getChunkIndex}.")
          }


Since these are user facing exceptions, should we be using a structured error state/code here?

I didn't introduce new error classes, and the IllegalStateException exceptions were already in SparkResults. Yeah, switching to an structured error class can be a followup.

hvanhovell · 2025-10-29T13:21:25Z

sql/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala

      .addAllTags(tags.get.toSeq.asJava)
+
+    // Add request option to allow result chunking.
+    val chunkingOptionsBuilder = proto.ResultChunkingOptions


Do we need to set this if the chunking is disabled?

You're right, it is not needed. I've updated the logic, so we only set chunking option when configuration.allowArrowBatchChunking is enabled.

hvanhovell · 2025-10-29T13:33:17Z

sql/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala

  private[this] var arrowSchema: pojo.Schema = _
  private[this] var nextResultIndex: Int = 0
  private val resultMap = mutable.Map.empty[Int, (Long, Seq[ArrowMessage])]
+  private val arrowBatchChunksToAssemble = mutable.Buffer.empty[ByteString]


Should this be local to processReponses? AFAICT it should not return unless we have a complete arrow batch.

Good point, it's now updated to be local to processResponses.

…s enabled

…-scala-client # Conflicts: # sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/ClientE2ETestSuite.scala

hvanhovell · 2025-11-03T18:05:45Z

Merging to master/4.1. Thanks!

…king - Scala Client ### What changes were proposed in this pull request? In the previous PR #52271 of Spark Connect ArrowBatch Result Chunking, both Server-side and PySpark client changes were implemented. In this PR, the corresponding Scala client changes are implemented, so large Arrow rows are now supported on the Scala client as well. To reproduce the existing issue we are solving here, run this code on Spark Connect Scala client: ``` val res = spark.sql("select repeat('a', 1024*1024*300)").collect() println(res(0).getString(0).length) ``` It fails with `RESOURCE_EXHAUSTED` error with message `gRPC message exceeds maximum size 134217728: 314573320`, because the server is trying to send an ExecutePlanResponse of ~300MB to the client. With the improvement introduced by the PR, the above code runs successfully and prints the expected result. ### Why are the changes needed? It improves Spark Connect stability when returning large rows. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52496 from xi-db/arrow-batch-chuking-scala-client. Authored-by: Xi Lyu <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> (cherry picked from commit daa83fc) Signed-off-by: Herman van Hovell <[email protected]>

pan3793 · 2025-11-05T19:49:44Z

@hvanhovell @xi-db, unfortunately, the daily maven test starts to fail after this patch

ClientE2ETestSuite:
- throw SparkException with null filename in stack trace elements *** FAILED ***
  null was not instance of org.apache.spark.SparkException (ClientE2ETestSuite.scala:81)
  ...
- throw SparkException with large cause exception *** FAILED ***
  null was not instance of org.apache.spark.SparkException (ClientE2ETestSuite.scala:134)
  ...

after a closer look, I think this should be a test-only issue related to maven classpath and won't cause problems on real deployment

org.apache.spark.SparkException: org.apache.spark.SparkClassNotFoundException: [INTERNAL_ERROR] Failed to load class: io.grpc.ClientInterceptor. Make sure the artifact where the class is defined is installed by calling session.addArtifact. SQLSTATE: XX000
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.unpackScalaUDF(SparkConnectPlanner.scala:2086)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:2064)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformScalaFunction(SparkConnectPlanner.scala:2130)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformScalaUDF(SparkConnectPlanner.scala:2108)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformCommonInlineUserDefinedFunction(SparkConnectPlanner.scala:2036)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.doTransformExpression(SparkConnectPlanner.scala:1917)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformExpression$1(SparkConnectPlanner.scala:1837)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:107)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformExpression(SparkConnectPlanner.scala:1837)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformExpression(SparkConnectPlanner.scala:1816)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformWithColumns$1(SparkConnectPlanner.scala:1303)
	at scala.collection.immutable.List.map(List.scala:236)
	at scala.collection.immutable.List.map(List.scala:79)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformWithColumns(SparkConnectPlanner.scala:1292)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformRelation$1(SparkConnectPlanner.scala:194)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$usePlanCache$4(SessionHolder.scala:589)
	at scala.Option.getOrElse(Option.scala:201)
	at org.apache.spark.sql.connect.service.SessionHolder.usePlanCache(SessionHolder.scala:588)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:146)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.handlePlan(SparkConnectPlanExecution.scala:73)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:225)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:197)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:396)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:396)
	at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
	at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:185)
	at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:102)
	at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
	at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:395)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:197)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:126)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:334)

for reference, there was a similar issue #41622 but I'm afraid the solution is not applicable for this PR.

also cc @LuciferYang @dongjoon-hyun

LuciferYang · 2025-11-06T02:41:25Z

@hvanhovell @xi-db, unfortunately, the daily maven test starts to fail after this patch

ClientE2ETestSuite:
- throw SparkException with null filename in stack trace elements *** FAILED ***
  null was not instance of org.apache.spark.SparkException (ClientE2ETestSuite.scala:81)
  ...
- throw SparkException with large cause exception *** FAILED ***
  null was not instance of org.apache.spark.SparkException (ClientE2ETestSuite.scala:134)
  ...

after a closer look, I think this should be a test-only issue related to maven classpath and won't cause problems on real deployment

org.apache.spark.SparkException: org.apache.spark.SparkClassNotFoundException: [INTERNAL_ERROR] Failed to load class: io.grpc.ClientInterceptor. Make sure the artifact where the class is defined is installed by calling session.addArtifact. SQLSTATE: XX000
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.unpackScalaUDF(SparkConnectPlanner.scala:2086)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:2064)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformScalaFunction(SparkConnectPlanner.scala:2130)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformScalaUDF(SparkConnectPlanner.scala:2108)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformCommonInlineUserDefinedFunction(SparkConnectPlanner.scala:2036)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.doTransformExpression(SparkConnectPlanner.scala:1917)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformExpression$1(SparkConnectPlanner.scala:1837)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:107)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformExpression(SparkConnectPlanner.scala:1837)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformExpression(SparkConnectPlanner.scala:1816)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformWithColumns$1(SparkConnectPlanner.scala:1303)
	at scala.collection.immutable.List.map(List.scala:236)
	at scala.collection.immutable.List.map(List.scala:79)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformWithColumns(SparkConnectPlanner.scala:1292)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformRelation$1(SparkConnectPlanner.scala:194)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$usePlanCache$4(SessionHolder.scala:589)
	at scala.Option.getOrElse(Option.scala:201)
	at org.apache.spark.sql.connect.service.SessionHolder.usePlanCache(SessionHolder.scala:588)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:146)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.handlePlan(SparkConnectPlanExecution.scala:73)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:225)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:197)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:396)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:396)
	at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
	at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:185)
	at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:102)
	at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
	at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:395)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:197)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:126)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:334)

for reference, there was a similar issue #41622 but I'm afraid the solution is not applicable for this PR.

also cc @LuciferYang @dongjoon-hyun

@xi-db Do you have time to fix this problem?
also cc @hvanhovell and @HyukjinKwon

dongjoon-hyun · 2025-11-06T05:35:08Z

Thank you for pinging me, @pan3793 and @LuciferYang .

dongjoon-hyun · 2025-11-06T05:38:55Z

To @xi-db and @hvanhovell . This seems to break both master branch and branch-4.1 over 2 days.

dongjoon-hyun · 2025-11-06T05:55:12Z

To @xi-db and @hvanhovell . I agree with @pan3793 's analysis that it will be only class path issues due to the difference between Maven and SBT. However, I hope this patch didn't hide any other regression for last 2 days. Inevitably, let me reverted this follow-up commit from branch-4.1 only to recover CIs of the release branch.

25a11c4

To @pan3793 and @LuciferYang , I re-triggered Maven (branch-4.1) CIs to make it sure. For the master branch, let's give them more time and wait a little more.

LuciferYang · 2025-11-07T09:06:52Z

Is there any progress on this? If it's difficult to fix, can we ignore these two cases for now?

xi-db · 2025-11-07T09:58:37Z

Hi @LuciferYang , I'm now looking into the SparkClassNotFoundException issue in maven tests. Do you know a way I can reproduce the error in ClientE2ETestSuite? Because it passed correctly in sbt tests.

pan3793 · 2025-11-07T10:25:37Z

@xi-db, this can be reproduced by

$ build/mvn -Phive clean install -DskipTests
$ build/mvn -Phive -pl sql/connect/client/jvm test -Dtest=none -DwildcardSuites=org.apache.spark.sql.connect.ClientE2ETestSuite

xi-db · 2025-11-07T14:44:09Z

Hi @pan3793 @LuciferYang , I opened a PR to fix the issue: #52941. I reproduced the issue with the commands you shared and now the Maven tests succeed with the fix. PTAL, thanks!

dongjoon-hyun · 2025-11-07T14:51:06Z

Thank you for the follow-up, @xi-db and all.

…ct testing ### What changes were proposed in this pull request? In this PR #52496, tests were implemented using `io.grpc.ClientInterceptor` to verify gRPC messages. However, it failed the Maven tests ([comment](#52496 (comment))) because the related gRPC classes are missing in the testing SparkConnectService in Maven tests. In this PR, gRPC classes for testing purposes are added as artifacts like other existing classes from `scalatest` and `spark-catalyst` to also allow io.grpc classes in tests. ### Why are the changes needed? To fix the broken daily Maven tests ([comment](#52496 (comment))). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Maven tests with following commands passed. ``` $ build/mvn -Phive clean install -DskipTests $ build/mvn -Phive -pl sql/connect/client/jvm test -Dtest=none -DwildcardSuites=org.apache.spark.sql.connect.ClientE2ETestSuite ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52941 from xi-db/arrow-batch-chunking-scala-client-fix-maven. Authored-by: Xi Lyu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2025-11-07T15:53:00Z

For the record, #52941 is merged to master for Apache Spark 4.2.0.

I asked @xi-db to make a backporting PR to branch-4.1 for Apache Spark 4.1.0 via a single PR containing both the reverted this PR (#52496) and Today's follow-up PR (#52941 ).

[SPARK-53525][CONNECT][FOLLOWUP] Allow io.grpc classes in Spark Connect testing #52941 (comment)

…king - Scala Client ### What changes were proposed in this pull request? In the previous PR apache#52271 of Spark Connect ArrowBatch Result Chunking, both Server-side and PySpark client changes were implemented. In this PR, the corresponding Scala client changes are implemented, so large Arrow rows are now supported on the Scala client as well. To reproduce the existing issue we are solving here, run this code on Spark Connect Scala client: ``` val res = spark.sql("select repeat('a', 1024*1024*300)").collect() println(res(0).getString(0).length) ``` It fails with `RESOURCE_EXHAUSTED` error with message `gRPC message exceeds maximum size 134217728: 314573320`, because the server is trying to send an ExecutePlanResponse of ~300MB to the client. With the improvement introduced by the PR, the above code runs successfully and prints the expected result. ### Why are the changes needed? It improves Spark Connect stability when returning large rows. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52496 from xi-db/arrow-batch-chuking-scala-client. Authored-by: Xi Lyu <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> (cherry picked from commit daa83fc)

…ct testing ### What changes were proposed in this pull request? In this PR apache#52496, tests were implemented using `io.grpc.ClientInterceptor` to verify gRPC messages. However, it failed the Maven tests ([comment](apache#52496 (comment))) because the related gRPC classes are missing in the testing SparkConnectService in Maven tests. In this PR, gRPC classes for testing purposes are added as artifacts like other existing classes from `scalatest` and `spark-catalyst` to also allow io.grpc classes in tests. ### Why are the changes needed? To fix the broken daily Maven tests ([comment](apache#52496 (comment))). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Maven tests with following commands passed. ``` $ build/mvn -Phive clean install -DskipTests $ build/mvn -Phive -pl sql/connect/client/jvm test -Dtest=none -DwildcardSuites=org.apache.spark.sql.connect.ClientE2ETestSuite ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52941 from xi-db/arrow-batch-chunking-scala-client-fix-maven. Authored-by: Xi Lyu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 1f7bbeb)

… Chunking - Scala Client ### What changes were proposed in this pull request? (This PR is a backporting PR containing #52496 and the test fix #52941.) In the previous PR #52271 of Spark Connect ArrowBatch Result Chunking, both Server-side and PySpark client changes were implemented. In this PR, the corresponding Scala client changes are implemented, so large Arrow rows are now supported on the Scala client as well. To reproduce the existing issue we are solving here, run this code on Spark Connect Scala client: ``` val res = spark.sql("select repeat('a', 1024*1024*300)").collect() println(res(0).getString(0).length) ``` It fails with `RESOURCE_EXHAUSTED` error with message `gRPC message exceeds maximum size 134217728: 314573320`, because the server is trying to send an ExecutePlanResponse of ~300MB to the client. With the improvement introduced by the PR, the above code runs successfully and prints the expected result. ### Why are the changes needed? It improves Spark Connect stability when returning large rows. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52953 from xi-db/[email protected]. Authored-by: Xi Lyu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…ct testing ### What changes were proposed in this pull request? In this PR apache#52496, tests were implemented using `io.grpc.ClientInterceptor` to verify gRPC messages. However, it failed the Maven tests ([comment](apache#52496 (comment))) because the related gRPC classes are missing in the testing SparkConnectService in Maven tests. In this PR, gRPC classes for testing purposes are added as artifacts like other existing classes from `scalatest` and `spark-catalyst` to also allow io.grpc classes in tests. ### Why are the changes needed? To fix the broken daily Maven tests ([comment](apache#52496 (comment))). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Maven tests with following commands passed. ``` $ build/mvn -Phive clean install -DskipTests $ build/mvn -Phive -pl sql/connect/client/jvm test -Dtest=none -DwildcardSuites=org.apache.spark.sql.connect.ClientE2ETestSuite ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52941 from xi-db/arrow-batch-chunking-scala-client-fix-maven. Authored-by: Xi Lyu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…king - Scala Client ### What changes were proposed in this pull request? In the previous PR apache#52271 of Spark Connect ArrowBatch Result Chunking, both Server-side and PySpark client changes were implemented. In this PR, the corresponding Scala client changes are implemented, so large Arrow rows are now supported on the Scala client as well. To reproduce the existing issue we are solving here, run this code on Spark Connect Scala client: ``` val res = spark.sql("select repeat('a', 1024*1024*300)").collect() println(res(0).getString(0).length) ``` It fails with `RESOURCE_EXHAUSTED` error with message `gRPC message exceeds maximum size 134217728: 314573320`, because the server is trying to send an ExecutePlanResponse of ~300MB to the client. With the improvement introduced by the PR, the above code runs successfully and prints the expected result. ### Why are the changes needed? It improves Spark Connect stability when returning large rows. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52496 from xi-db/arrow-batch-chuking-scala-client. Authored-by: Xi Lyu <[email protected]> Signed-off-by: Herman van Hovell <[email protected]>

…ct testing ### What changes were proposed in this pull request? In this PR apache#52496, tests were implemented using `io.grpc.ClientInterceptor` to verify gRPC messages. However, it failed the Maven tests ([comment](apache#52496 (comment))) because the related gRPC classes are missing in the testing SparkConnectService in Maven tests. In this PR, gRPC classes for testing purposes are added as artifacts like other existing classes from `scalatest` and `spark-catalyst` to also allow io.grpc classes in tests. ### Why are the changes needed? To fix the broken daily Maven tests ([comment](apache#52496 (comment))). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Maven tests with following commands passed. ``` $ build/mvn -Phive clean install -DskipTests $ build/mvn -Phive -pl sql/connect/client/jvm test -Dtest=none -DwildcardSuites=org.apache.spark.sql.connect.ClientE2ETestSuite ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52941 from xi-db/arrow-batch-chunking-scala-client-fix-maven. Authored-by: Xi Lyu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

github-actions bot added SQL CONNECT labels Oct 1, 2025

Spark Connect ArrowBatch Result Chunking - Scala Client

0648956

xi-db force-pushed the arrow-batch-chuking-scala-client branch from ea236b1 to 0648956 Compare October 1, 2025 09:52

xi-db added 2 commits October 1, 2025 10:55

Fix compile errors

5ace9c0

Reformat

72e1efe

vicennial reviewed Oct 24, 2025

View reviewed changes

hvanhovell reviewed Oct 29, 2025

View reviewed changes

xi-db added 3 commits November 3, 2025 13:54

Only set chunking option when configuration.allowArrowBatchChunking i…

7dfed0b

…s enabled

make arrowBatchChunksToAssemble local to processResponses

3606b68

Merge remote-tracking branch 'origin/master' into arrow-batch-chuking…

9bd527e

…-scala-client # Conflicts: # sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/ClientE2ETestSuite.scala

asf-gitbox-commits closed this in daa83fc Nov 3, 2025

pan3793 mentioned this pull request Nov 6, 2025

[SPARK-54190][BUILD] Guava dependency governance #52873

Closed

xi-db mentioned this pull request Nov 7, 2025

[SPARK-53525][CONNECT][FOLLOWUP] Allow io.grpc classes in Spark Connect testing #52941

Closed

xi-db mentioned this pull request Nov 8, 2025

[SPARK-53525][CONNECT][FOLLOWUP][4.1] Spark Connect ArrowBatch Result Chunking - Scala Client #52953

Closed

[SPARK-53525][CONNECT][FOLLOWUP] Spark Connect ArrowBatch Result Chunking - Scala Client #52496

[SPARK-53525][CONNECT][FOLLOWUP] Spark Connect ArrowBatch Result Chunking - Scala Client #52496

Uh oh!

Conversation

xi-db commented Oct 1, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

vicennial left a comment

Choose a reason for hiding this comment

Uh oh!

vicennial Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

xi-db Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

hvanhovell Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

xi-db Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

hvanhovell Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

xi-db Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

hvanhovell commented Nov 3, 2025

Uh oh!

pan3793 commented Nov 5, 2025

Uh oh!

LuciferYang commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Nov 6, 2025

Uh oh!

LuciferYang commented Nov 7, 2025

Uh oh!

xi-db commented Nov 7, 2025

Uh oh!

pan3793 commented Nov 7, 2025

Uh oh!

xi-db commented Nov 7, 2025

Uh oh!

dongjoon-hyun commented Nov 7, 2025

Uh oh!

dongjoon-hyun commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

LuciferYang commented Nov 6, 2025 •

edited

Loading

dongjoon-hyun commented Nov 6, 2025 •

edited

Loading

dongjoon-hyun commented Nov 6, 2025 •

edited

Loading