Skip to content

Conversation

@shujingyang-db
Copy link
Contributor

What changes were proposed in this pull request?

As titled.

Why are the changes needed?

This allows users to use direct passthrough partitioning API in connect mode

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No

@shujingyang-db shujingyang-db marked this pull request as ready for review September 4, 2025 22:55
@shujingyang-db shujingyang-db marked this pull request as draft September 4, 2025 23:01
@HyukjinKwon HyukjinKwon changed the title [SPARK-53402] Support Direct Passthrough Partitioning Dataset API in Spark Connect [SPARK-53402][CONNECT] Support Direct Passthrough Partitioning Dataset API in Spark Connect Sep 4, 2025
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shujingyang-db
Copy link
Contributor Author

Realized there are some issues, I’m fixing them.

* @group typedrel
* @since 4.1.0
*/
def repartitionById(numPartitions: Int, partitionIdExpr: Column): Dataset[T] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not implement it in sql/api ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shujingyang-db shujingyang-db changed the title [SPARK-53402][CONNECT] Support Direct Passthrough Partitioning Dataset API in Spark Connect [SPARK-53402][CONNECT] Support Direct Passthrough Partitioning Dataset API in Spark Connect in Scala Sep 8, 2025
@shujingyang-db shujingyang-db marked this pull request as ready for review September 8, 2025 05:19
Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending the CI

@pan3793
Copy link
Member

pan3793 commented Sep 8, 2025

You may need to regenerate the Python code when touching connect proto files. Otherwise looks fine.

@zhengruifeng
Copy link
Contributor

merged to master

dongjoon-hyun added a commit to apache/spark-connect-swift that referenced this pull request Oct 1, 2025
…th `4.1.0-preview2`

### What changes were proposed in this pull request?

This PR aims to update Spark Connect-generated Swift source code with Apache Spark `4.1.0-preview2`.

### Why are the changes needed?

There are many changes from Apache Spark 4.1.0.

- apache/spark#52342
- apache/spark#52256
- apache/spark#52271
- apache/spark#52242
- apache/spark#51473
- apache/spark#51653
- apache/spark#52072
- apache/spark#51561
- apache/spark#51563
- apache/spark#51489
- apache/spark#51507
- apache/spark#51462
- apache/spark#51464
- apache/spark#51442

To use the latest bug fixes and new messages to develop for new features of `4.1.0-preview2`.

```
$ git clone -b v4.1.0-preview2 https://github.com/apache/spark.git
$ cd spark/sql/connect/common/src/main/protobuf/
$ protoc --swift_out=. spark/connect/*.proto
$ protoc --grpc-swift_out=. spark/connect/*.proto

// Remove empty GRPC files
$ cd spark/connect

$ grep 'This file contained no services' *
catalog.grpc.swift:// This file contained no services.
commands.grpc.swift:// This file contained no services.
common.grpc.swift:// This file contained no services.
example_plugins.grpc.swift:// This file contained no services.
expressions.grpc.swift:// This file contained no services.
ml_common.grpc.swift:// This file contained no services.
ml.grpc.swift:// This file contained no services.
pipelines.grpc.swift:// This file contained no services.
relations.grpc.swift:// This file contained no services.
types.grpc.swift:// This file contained no services.

$ rm catalog.grpc.swift commands.grpc.swift common.grpc.swift example_plugins.grpc.swift expressions.grpc.swift ml_common.grpc.swift ml.grpc.swift pipelines.grpc.swift relations.grpc.swift types.grpc.swift
```

### Does this PR introduce _any_ user-facing change?

Pass the CIs.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #250 from dongjoon-hyun/SPARK-53777.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…t API in Spark Connect in Scala

### What changes were proposed in this pull request?

As titled.

### Why are the changes needed?

This allows users to use direct passthrough partitioning API in connect mode

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

New unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#52242 from shujingyang-db/direct-shuffle-partition-id-connect.

Authored-by: Shujing Yang <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants