Skip to content

Conversation

@imarkowitz
Copy link
Contributor

@imarkowitz imarkowitz commented Sep 5, 2025

What changes were proposed in this pull request?

Adds breaking change metadata to error messages

Each breaking change includes a migration message explaining how the user should update their code. It also can include a spark config value which can be used to mitigate the breaking change.

The migration message is concatenated to the error message. In Scala, we also include the breaking change info in the structured error message, when the STANDARD error format is used.

We also include breaking change info in pyspark errors.

Why are the changes needed?

By tagging breaking changes with metadata and a spark config flag, we can build tools to automatically retry spark jobs with the breaking change disabled.

Does this PR introduce any user-facing change?

This PR only adds a framework for creating breaking change errors, but does not define any breaking change errors yet. It adds new methods, for example getBreakingChangeInfo on SparkThrowable. For existing errors, this function will return None.

How was this patch tested?

Tests are added in SparkThrowableSuite, test_connect_errors_conversion.py, test_errors.py, and FetchErrorDetailsHandlerSuite.

Was this patch authored or co-authored using generative AI tooling?

No

@imarkowitz imarkowitz force-pushed the ian/breaking-changes branch 2 times, most recently from dc8434d to 7ff851c Compare September 9, 2025 17:54
@imarkowitz imarkowitz changed the title [WIP] [SPARK-53507]Add breaking change info to errors [SPARK-53507]Add breaking change info to errors Sep 9, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shall we add an \n between the main error message and the breaking change message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to use a space

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this may look weird as the migration message itself can be multi lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based this code on the existing logic for joining the subclass message:

errorInfo.messageTemplate + " " + errorSubInfo.messageTemplate

That logic uses a space so I think it makes sense to match that for consistency.

In the common case where the message is a single line, I think a newline is more confusing than a space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's exclude these code style only changes from the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. These were auto-generated from running ./dev/scalafmt -- is there a better workflow for formatting these changes?

Comment on lines 186 to 187
private case class ErrorSubInfo(message: Seq[String],
breakingChangeInfo: Option[BreakingChangeInfo] = None) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private case class ErrorSubInfo(message: Seq[String],
breakingChangeInfo: Option[BreakingChangeInfo] = None) {
private case class ErrorSubInfo(
message: Seq[String],
breakingChangeInfo: Option[BreakingChangeInfo] = None) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* mitigated manually.
*/
case class BreakingChangeInfo(
migrationMessage: Seq[String],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 4 spaces indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @param key The spark config key.
* @param value The spark config value that mitigates the breaking change.
*/
case class MitigationSparkConfig(key: String, value: String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case class MitigationSparkConfig(key: String, value: String)
case class MitigationConfig(key: String, value: String)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

g.writeStringField("errorClass", errorClass)
if (format == STANDARD) {
g.writeStringField("messageTemplate", errorReader.getMessageTemplate(errorClass))
errorReader.getBreakingChangeInfo(errorClass).foreach{ breakingChangeInfo =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
errorReader.getBreakingChangeInfo(errorClass).foreach{ breakingChangeInfo =>
errorReader.getBreakingChangeInfo(errorClass).foreach { breakingChangeInfo =>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

g.writeStringField("migrationMessage",
breakingChangeInfo.migrationMessage.mkString("\n"))
breakingChangeInfo.mitigationSparkConfig.foreach{ mitigationSparkConfig =>
g.writeObjectFieldStart("mitigationSparkConfig")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my education: this can write JSON array? The JSON writer can recognize duplicated object field names automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an array, it's just an Option

def getGrpcStatusCode(self) -> grpc.StatusCode:
return self._grpc_status_code

def getBreakingChangeInfo(self) -> Optional[Dict[str, Any]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add a BreakingChangeInfo class in python as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a proto class defined in sql/connect/common/src/main/protobuf/spark/connect/base.proto but I didn't want to introduce that as a dependency here

message BreakingChangeInfo {
// A message explaining how the user can migrate their job to work
// with the breaking change.
repeated string migration_message = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we concatenate the string in the server side? For client's point of view, it's a bit weird to see the message as a list of string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine either way. The migration message is also returned as part of the error message, so I think that's the main way end users would interact with it. Instead of doing a conversion here I thought I would just return the value in the same format it's defined in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a string list in JSON because we don't want to have super long lines in the JSON file. It's not really about semantic and I think we should not inherit this trick in the protobuf message.

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cc @HyukjinKwon

@HyukjinKwon HyukjinKwon changed the title [SPARK-53507]Add breaking change info to errors [SPARK-53507][CONNECT] Add breaking change info to errors Sep 17, 2025
@imarkowitz
Copy link
Contributor Author

Test failures in org.apache.spark.sql.kafka010.KafkaMicroBatchV1SourceWithConsumerSuite and org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite look unrelated

@HyukjinKwon
Copy link
Member

Merged to master.

* If false, the spark job should be retried by setting the
* mitigationConfig.
*/
case class BreakingChangeInfo(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realize that we expose it as a public API via SparkThrowable.getBreakingChangeInfo. We shouldn't expose a case class as public API as it has a wide API surface, including the companion object.

We should follow SparkThrowable and define it in Java.

dongjoon-hyun added a commit to apache/spark-connect-swift that referenced this pull request Oct 1, 2025
…th `4.1.0-preview2`

### What changes were proposed in this pull request?

This PR aims to update Spark Connect-generated Swift source code with Apache Spark `4.1.0-preview2`.

### Why are the changes needed?

There are many changes from Apache Spark 4.1.0.

- apache/spark#52342
- apache/spark#52256
- apache/spark#52271
- apache/spark#52242
- apache/spark#51473
- apache/spark#51653
- apache/spark#52072
- apache/spark#51561
- apache/spark#51563
- apache/spark#51489
- apache/spark#51507
- apache/spark#51462
- apache/spark#51464
- apache/spark#51442

To use the latest bug fixes and new messages to develop for new features of `4.1.0-preview2`.

```
$ git clone -b v4.1.0-preview2 https://github.com/apache/spark.git
$ cd spark/sql/connect/common/src/main/protobuf/
$ protoc --swift_out=. spark/connect/*.proto
$ protoc --grpc-swift_out=. spark/connect/*.proto

// Remove empty GRPC files
$ cd spark/connect

$ grep 'This file contained no services' *
catalog.grpc.swift:// This file contained no services.
commands.grpc.swift:// This file contained no services.
common.grpc.swift:// This file contained no services.
example_plugins.grpc.swift:// This file contained no services.
expressions.grpc.swift:// This file contained no services.
ml_common.grpc.swift:// This file contained no services.
ml.grpc.swift:// This file contained no services.
pipelines.grpc.swift:// This file contained no services.
relations.grpc.swift:// This file contained no services.
types.grpc.swift:// This file contained no services.

$ rm catalog.grpc.swift commands.grpc.swift common.grpc.swift example_plugins.grpc.swift expressions.grpc.swift ml_common.grpc.swift ml.grpc.swift pipelines.grpc.swift relations.grpc.swift types.grpc.swift
```

### Does this PR introduce _any_ user-facing change?

Pass the CIs.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #250 from dongjoon-hyun/SPARK-53777.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
cloud-fan added a commit that referenced this pull request Oct 9, 2025
### What changes were proposed in this pull request?

Don't use a Scala case class for BreakingChangeInfo and MitigationConfig

### Why are the changes needed?

Per comment: #52256 (comment)

> [cloud-fan](https://github.com/cloud-fan) [5 days ago](#52256 (comment))
> I just realize that we expose it as a public API via SparkThrowable.getBreakingChangeInfo. We shouldn't expose a case class as public API as it has a wide API surface, including the companion object.
> We should follow SparkThrowable and define it in Java.

### Does this PR introduce _any_ user-facing change?

No -- interface is almost the same

### How was this patch tested?

Updated unit tests

### Was this patch authored or co-authored using generative AI tooling?

Used Github co-pilot, mainly for the `equals` and `hashCode` functions

Closes #52484 from imarkowitz/ian/breaking-changes-case-class.

Lead-authored-by: imarkowitz <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
### What changes were proposed in this pull request?

Adds breaking change metadata to error messages

Each breaking change includes a migration message explaining how the user should update their code. It also can include a spark config value which can be used to mitigate the breaking change.

The migration message is concatenated to the error message. In Scala, we also include the breaking change info in the structured error message, when the STANDARD error format is used.

We also include breaking change info in pyspark errors.

### Why are the changes needed?

By tagging breaking changes with metadata and a spark config flag, we can build tools to automatically retry spark jobs with the breaking change disabled.

### Does this PR introduce _any_ user-facing change?

This PR only adds a framework for creating breaking change errors, but does not define any breaking change errors yet. It adds new methods, for example `getBreakingChangeInfo` on `SparkThrowable`. For existing errors, this function will return `None`.

### How was this patch tested?

Tests are added in `SparkThrowableSuite`, `test_connect_errors_conversion.py`, `test_errors.py`, and `FetchErrorDetailsHandlerSuite`.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#52256 from imarkowitz/ian/breaking-changes.

Authored-by: imarkowitz <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
### What changes were proposed in this pull request?

Don't use a Scala case class for BreakingChangeInfo and MitigationConfig

### Why are the changes needed?

Per comment: apache#52256 (comment)

> [cloud-fan](https://github.com/cloud-fan) [5 days ago](apache#52256 (comment))
> I just realize that we expose it as a public API via SparkThrowable.getBreakingChangeInfo. We shouldn't expose a case class as public API as it has a wide API surface, including the companion object.
> We should follow SparkThrowable and define it in Java.

### Does this PR introduce _any_ user-facing change?

No -- interface is almost the same

### How was this patch tested?

Updated unit tests

### Was this patch authored or co-authored using generative AI tooling?

Used Github co-pilot, mainly for the `equals` and `hashCode` functions

Closes apache#52484 from imarkowitz/ian/breaking-changes-case-class.

Lead-authored-by: imarkowitz <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants