Skip to content

Conversation

@xi-db
Copy link
Contributor

@xi-db xi-db commented Oct 1, 2025

What changes were proposed in this pull request?

In the previous PR #52271 of Spark Connect ArrowBatch Result Chunking, both Server-side and PySpark client changes were implemented.

In this PR, the corresponding Scala client changes are implemented, so large Arrow rows are now supported on the Scala client as well.

To reproduce the existing issue we are solving here, run this code on Spark Connect Scala client:

val res = spark.sql("select repeat('a', 1024*1024*300)").collect()
println(res(0).getString(0).length)

It fails with RESOURCE_EXHAUSTED error with message gRPC message exceeds maximum size 134217728: 314573320, because the server is trying to send an ExecutePlanResponse of ~300MB to the client.

With the improvement introduced by the PR, the above code runs successfully and prints the expected result.

Why are the changes needed?

It improves Spark Connect stability when returning large rows.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@xi-db xi-db force-pushed the arrow-batch-chuking-scala-client branch from ea236b1 to 0648956 Compare October 1, 2025 09:52
Copy link
Contributor

@vicennial vicennial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 169 to 172
throw new IllegalStateException(
s"Expected arrow batch to start at row offset $numRecords in results, " +
s"but received arrow batch starting at offset $expectedStartOffset.")
s"Expected chunk index ${arrowBatchChunksToAssemble.size} of the " +
s"arrow batch but got ${arrowBatch.getChunkIndex}.")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are user facing exceptions, should we be using a structured error state/code here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't introduce new error classes, and the IllegalStateException exceptions were already in SparkResults. Yeah, switching to an structured error class can be a followup.

.addAllTags(tags.get.toSeq.asJava)

// Add request option to allow result chunking.
val chunkingOptionsBuilder = proto.ResultChunkingOptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set this if the chunking is disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it is not needed. I've updated the logic, so we only set chunking option when configuration.allowArrowBatchChunking is enabled.

private[this] var arrowSchema: pojo.Schema = _
private[this] var nextResultIndex: Int = 0
private val resultMap = mutable.Map.empty[Int, (Long, Seq[ArrowMessage])]
private val arrowBatchChunksToAssemble = mutable.Buffer.empty[ByteString]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be local to processReponses? AFAICT it should not return unless we have a complete arrow batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, it's now updated to be local to processResponses.

@hvanhovell
Copy link
Contributor

Merging to master/4.1. Thanks!

asf-gitbox-commits pushed a commit that referenced this pull request Nov 3, 2025
…king - Scala Client

### What changes were proposed in this pull request?

In the previous PR #52271 of Spark Connect ArrowBatch Result Chunking, both Server-side and PySpark client changes were implemented.

In this PR, the corresponding Scala client changes are implemented, so large Arrow rows are now supported on the Scala client as well.

To reproduce the existing issue we are solving here, run this code on Spark Connect Scala client:

```
val res = spark.sql("select repeat('a', 1024*1024*300)").collect()
println(res(0).getString(0).length)
```

It fails with `RESOURCE_EXHAUSTED` error with message `gRPC message exceeds maximum size 134217728: 314573320`, because the server is trying to send an ExecutePlanResponse of ~300MB to the client.

With the improvement introduced by the PR, the above code runs successfully and prints the expected result.

### Why are the changes needed?

It improves Spark Connect stability when returning large rows.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52496 from xi-db/arrow-batch-chuking-scala-client.

Authored-by: Xi Lyu <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
(cherry picked from commit daa83fc)
Signed-off-by: Herman van Hovell <[email protected]>
@pan3793
Copy link
Member

pan3793 commented Nov 5, 2025

@hvanhovell @xi-db, unfortunately, the daily maven test starts to fail after this patch

ClientE2ETestSuite:
- throw SparkException with null filename in stack trace elements *** FAILED ***
  null was not instance of org.apache.spark.SparkException (ClientE2ETestSuite.scala:81)
  ...
- throw SparkException with large cause exception *** FAILED ***
  null was not instance of org.apache.spark.SparkException (ClientE2ETestSuite.scala:134)
  ...

after a closer look, I think this should be a test-only issue related to maven classpath and won't cause problems on real deployment

org.apache.spark.SparkException: org.apache.spark.SparkClassNotFoundException: [INTERNAL_ERROR] Failed to load class: io.grpc.ClientInterceptor. Make sure the artifact where the class is defined is installed by calling session.addArtifact. SQLSTATE: XX000
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.unpackScalaUDF(SparkConnectPlanner.scala:2086)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:2064)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformScalaFunction(SparkConnectPlanner.scala:2130)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformScalaUDF(SparkConnectPlanner.scala:2108)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformCommonInlineUserDefinedFunction(SparkConnectPlanner.scala:2036)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.doTransformExpression(SparkConnectPlanner.scala:1917)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformExpression$1(SparkConnectPlanner.scala:1837)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:107)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformExpression(SparkConnectPlanner.scala:1837)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformExpression(SparkConnectPlanner.scala:1816)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformWithColumns$1(SparkConnectPlanner.scala:1303)
	at scala.collection.immutable.List.map(List.scala:236)
	at scala.collection.immutable.List.map(List.scala:79)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformWithColumns(SparkConnectPlanner.scala:1292)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformRelation$1(SparkConnectPlanner.scala:194)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$usePlanCache$4(SessionHolder.scala:589)
	at scala.Option.getOrElse(Option.scala:201)
	at org.apache.spark.sql.connect.service.SessionHolder.usePlanCache(SessionHolder.scala:588)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:146)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.handlePlan(SparkConnectPlanExecution.scala:73)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:225)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:197)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:396)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:396)
	at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
	at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:185)
	at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:102)
	at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
	at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:395)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:197)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:126)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:334)

for reference, there was a similar issue #41622 but I'm afraid the solution is not applicable for this PR.

also cc @LuciferYang @dongjoon-hyun

@LuciferYang
Copy link
Contributor

LuciferYang commented Nov 6, 2025

@hvanhovell @xi-db, unfortunately, the daily maven test starts to fail after this patch

ClientE2ETestSuite:
- throw SparkException with null filename in stack trace elements *** FAILED ***
  null was not instance of org.apache.spark.SparkException (ClientE2ETestSuite.scala:81)
  ...
- throw SparkException with large cause exception *** FAILED ***
  null was not instance of org.apache.spark.SparkException (ClientE2ETestSuite.scala:134)
  ...

after a closer look, I think this should be a test-only issue related to maven classpath and won't cause problems on real deployment

org.apache.spark.SparkException: org.apache.spark.SparkClassNotFoundException: [INTERNAL_ERROR] Failed to load class: io.grpc.ClientInterceptor. Make sure the artifact where the class is defined is installed by calling session.addArtifact. SQLSTATE: XX000
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.unpackScalaUDF(SparkConnectPlanner.scala:2086)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.org$apache$spark$sql$connect$planner$SparkConnectPlanner$$unpackUdf(SparkConnectPlanner.scala:2064)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformScalaFunction(SparkConnectPlanner.scala:2130)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformScalaUDF(SparkConnectPlanner.scala:2108)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformCommonInlineUserDefinedFunction(SparkConnectPlanner.scala:2036)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.doTransformExpression(SparkConnectPlanner.scala:1917)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformExpression$1(SparkConnectPlanner.scala:1837)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:107)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformExpression(SparkConnectPlanner.scala:1837)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformExpression(SparkConnectPlanner.scala:1816)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformWithColumns$1(SparkConnectPlanner.scala:1303)
	at scala.collection.immutable.List.map(List.scala:236)
	at scala.collection.immutable.List.map(List.scala:79)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformWithColumns(SparkConnectPlanner.scala:1292)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformRelation$1(SparkConnectPlanner.scala:194)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$usePlanCache$4(SessionHolder.scala:589)
	at scala.Option.getOrElse(Option.scala:201)
	at org.apache.spark.sql.connect.service.SessionHolder.usePlanCache(SessionHolder.scala:588)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:146)
	at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.handlePlan(SparkConnectPlanExecution.scala:73)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:225)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:197)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:396)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:396)
	at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
	at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:185)
	at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:102)
	at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
	at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:395)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:197)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:126)
	at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:334)

for reference, there was a similar issue #41622 but I'm afraid the solution is not applicable for this PR.

also cc @LuciferYang @dongjoon-hyun

@xi-db Do you have time to fix this problem?
also cc @hvanhovell and @HyukjinKwon

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 6, 2025

Thank you for pinging me, @pan3793 and @LuciferYang .

@dongjoon-hyun
Copy link
Member

To @xi-db and @hvanhovell . I agree with @pan3793 's analysis that it will be only class path issues due to the difference between Maven and SBT. However, I hope this patch didn't hide any other regression for last 2 days. Inevitably, let me reverted this follow-up commit from branch-4.1 only to recover CIs of the release branch.

To @pan3793 and @LuciferYang , I re-triggered Maven (branch-4.1) CIs to make it sure. For the master branch, let's give them more time and wait a little more.

@LuciferYang
Copy link
Contributor

Is there any progress on this? If it's difficult to fix, can we ignore these two cases for now?

@xi-db
Copy link
Contributor Author

xi-db commented Nov 7, 2025

Hi @LuciferYang , I'm now looking into the SparkClassNotFoundException issue in maven tests. Do you know a way I can reproduce the error in ClientE2ETestSuite? Because it passed correctly in sbt tests.

@pan3793
Copy link
Member

pan3793 commented Nov 7, 2025

@xi-db, this can be reproduced by

$ build/mvn -Phive clean install -DskipTests
$ build/mvn -Phive -pl sql/connect/client/jvm test -Dtest=none -DwildcardSuites=org.apache.spark.sql.connect.ClientE2ETestSuite

@xi-db
Copy link
Contributor Author

xi-db commented Nov 7, 2025

Hi @pan3793 @LuciferYang , I opened a PR to fix the issue: #52941. I reproduced the issue with the commands you shared and now the Maven tests succeed with the fix. PTAL, thanks!

@dongjoon-hyun
Copy link
Member

Thank you for the follow-up, @xi-db and all.

dongjoon-hyun pushed a commit that referenced this pull request Nov 7, 2025
…ct testing

### What changes were proposed in this pull request?

In this PR #52496, tests were implemented using `io.grpc.ClientInterceptor` to verify gRPC messages. However, it failed the Maven tests ([comment](#52496 (comment))) because the related gRPC classes are missing in the testing SparkConnectService in Maven tests.

In this PR, gRPC classes for testing purposes are added as artifacts like other existing classes from `scalatest` and `spark-catalyst` to also allow io.grpc classes in tests.

### Why are the changes needed?

To fix the broken daily Maven tests ([comment](#52496 (comment))).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Maven tests with following commands passed.

```
$ build/mvn -Phive clean install -DskipTests
$ build/mvn -Phive -pl sql/connect/client/jvm test -Dtest=none -DwildcardSuites=org.apache.spark.sql.connect.ClientE2ETestSuite
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52941 from xi-db/arrow-batch-chunking-scala-client-fix-maven.

Authored-by: Xi Lyu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

For the record, #52941 is merged to master for Apache Spark 4.2.0.

I asked @xi-db to make a backporting PR to branch-4.1 for Apache Spark 4.1.0 via a single PR containing both the reverted this PR (#52496) and Today's follow-up PR (#52941 ).

xi-db added a commit to xi-db/spark that referenced this pull request Nov 8, 2025
…king - Scala Client

### What changes were proposed in this pull request?

In the previous PR apache#52271 of Spark Connect ArrowBatch Result Chunking, both Server-side and PySpark client changes were implemented.

In this PR, the corresponding Scala client changes are implemented, so large Arrow rows are now supported on the Scala client as well.

To reproduce the existing issue we are solving here, run this code on Spark Connect Scala client:

```
val res = spark.sql("select repeat('a', 1024*1024*300)").collect()
println(res(0).getString(0).length)
```

It fails with `RESOURCE_EXHAUSTED` error with message `gRPC message exceeds maximum size 134217728: 314573320`, because the server is trying to send an ExecutePlanResponse of ~300MB to the client.

With the improvement introduced by the PR, the above code runs successfully and prints the expected result.

### Why are the changes needed?

It improves Spark Connect stability when returning large rows.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52496 from xi-db/arrow-batch-chuking-scala-client.

Authored-by: Xi Lyu <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
(cherry picked from commit daa83fc)
xi-db added a commit to xi-db/spark that referenced this pull request Nov 8, 2025
…ct testing

### What changes were proposed in this pull request?

In this PR apache#52496, tests were implemented using `io.grpc.ClientInterceptor` to verify gRPC messages. However, it failed the Maven tests ([comment](apache#52496 (comment))) because the related gRPC classes are missing in the testing SparkConnectService in Maven tests.

In this PR, gRPC classes for testing purposes are added as artifacts like other existing classes from `scalatest` and `spark-catalyst` to also allow io.grpc classes in tests.

### Why are the changes needed?

To fix the broken daily Maven tests ([comment](apache#52496 (comment))).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Maven tests with following commands passed.

```
$ build/mvn -Phive clean install -DskipTests
$ build/mvn -Phive -pl sql/connect/client/jvm test -Dtest=none -DwildcardSuites=org.apache.spark.sql.connect.ClientE2ETestSuite
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52941 from xi-db/arrow-batch-chunking-scala-client-fix-maven.

Authored-by: Xi Lyu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 1f7bbeb)
dongjoon-hyun pushed a commit that referenced this pull request Nov 8, 2025
… Chunking - Scala Client

### What changes were proposed in this pull request?

(This PR is a backporting PR containing #52496 and the test fix #52941.)

In the previous PR #52271 of Spark Connect ArrowBatch Result Chunking, both Server-side and PySpark client changes were implemented.

In this PR, the corresponding Scala client changes are implemented, so large Arrow rows are now supported on the Scala client as well.

To reproduce the existing issue we are solving here, run this code on Spark Connect Scala client:

```
val res = spark.sql("select repeat('a', 1024*1024*300)").collect()
println(res(0).getString(0).length)
```

It fails with `RESOURCE_EXHAUSTED` error with message `gRPC message exceeds maximum size 134217728: 314573320`, because the server is trying to send an ExecutePlanResponse of ~300MB to the client.

With the improvement introduced by the PR, the above code runs successfully and prints the expected result.

### Why are the changes needed?

It improves Spark Connect stability when returning large rows.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52953 from xi-db/[email protected].

Authored-by: Xi Lyu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
…ct testing

### What changes were proposed in this pull request?

In this PR apache#52496, tests were implemented using `io.grpc.ClientInterceptor` to verify gRPC messages. However, it failed the Maven tests ([comment](apache#52496 (comment))) because the related gRPC classes are missing in the testing SparkConnectService in Maven tests.

In this PR, gRPC classes for testing purposes are added as artifacts like other existing classes from `scalatest` and `spark-catalyst` to also allow io.grpc classes in tests.

### Why are the changes needed?

To fix the broken daily Maven tests ([comment](apache#52496 (comment))).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Maven tests with following commands passed.

```
$ build/mvn -Phive clean install -DskipTests
$ build/mvn -Phive -pl sql/connect/client/jvm test -Dtest=none -DwildcardSuites=org.apache.spark.sql.connect.ClientE2ETestSuite
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52941 from xi-db/arrow-batch-chunking-scala-client-fix-maven.

Authored-by: Xi Lyu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…king - Scala Client

### What changes were proposed in this pull request?

In the previous PR apache#52271 of Spark Connect ArrowBatch Result Chunking, both Server-side and PySpark client changes were implemented.

In this PR, the corresponding Scala client changes are implemented, so large Arrow rows are now supported on the Scala client as well.

To reproduce the existing issue we are solving here, run this code on Spark Connect Scala client:

```
val res = spark.sql("select repeat('a', 1024*1024*300)").collect()
println(res(0).getString(0).length)
```

It fails with `RESOURCE_EXHAUSTED` error with message `gRPC message exceeds maximum size 134217728: 314573320`, because the server is trying to send an ExecutePlanResponse of ~300MB to the client.

With the improvement introduced by the PR, the above code runs successfully and prints the expected result.

### Why are the changes needed?

It improves Spark Connect stability when returning large rows.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52496 from xi-db/arrow-batch-chuking-scala-client.

Authored-by: Xi Lyu <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…ct testing

### What changes were proposed in this pull request?

In this PR apache#52496, tests were implemented using `io.grpc.ClientInterceptor` to verify gRPC messages. However, it failed the Maven tests ([comment](apache#52496 (comment))) because the related gRPC classes are missing in the testing SparkConnectService in Maven tests.

In this PR, gRPC classes for testing purposes are added as artifacts like other existing classes from `scalatest` and `spark-catalyst` to also allow io.grpc classes in tests.

### Why are the changes needed?

To fix the broken daily Maven tests ([comment](apache#52496 (comment))).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Maven tests with following commands passed.

```
$ build/mvn -Phive clean install -DskipTests
$ build/mvn -Phive -pl sql/connect/client/jvm test -Dtest=none -DwildcardSuites=org.apache.spark.sql.connect.ClientE2ETestSuite
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52941 from xi-db/arrow-batch-chunking-scala-client-fix-maven.

Authored-by: Xi Lyu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants