Skip to content

Conversation

@sryza
Copy link
Contributor

@sryza sryza commented Jul 15, 2025

What changes were proposed in this pull request?

Adds a new spark-pipelines command that launches an execution of a pipeline that doesn't write or read any data, but catches many kinds of errors that would be caught if the pipeline were to actually run. E.g.

  • Syntax errors – e.g. invalid Python or SQL code
  • Analysis errors – e.g. selecting from a table that doesn't exist or selecting a column that doesn't exist
  • Graph validation errors - e.g. cyclic dependencies

Why are the changes needed?

Leverage the declarative nature of Declarative Pipelines to make pipeline development easier.

Does this PR introduce any user-facing change?

Adds behavior; doesn't change existing behavior.

How was this patch tested?

  • Added unit tests
  • Executed dry-run on the CLI, for both success and error scenarios

Was this patch authored or co-authored using generative AI tooling?

@sryza sryza changed the title [SDP] dry run [SPARK-52511][SDP] dry run Jul 15, 2025
@sryza sryza marked this pull request as ready for review July 15, 2025 15:25
@sryza sryza requested review from cloud-fan and gengliangwang July 15, 2025 15:25
run_parser.add_argument("--spec", help="Path to the pipeline spec.")

# "dry-run" subcommand
run_parser = subparsers.add_parser(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sryza shall we have an end-to-end test for the dry run mode? We should check that it can detect failures without side effects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing – just added, in test_spark_connect.py

@gengliangwang gengliangwang changed the title [SPARK-52511][SDP] dry run [SPARK-52511][SDP] Support dry-run mode in spark-pipelines command Jul 16, 2025
@sryza sryza requested a review from gengliangwang July 17, 2025 03:06
@gengliangwang
Copy link
Member

"dry-run",
help="Launch a run that just validates the graph and checks for errors.",
)
run_parser.add_argument("--spec", help="Path to the pipeline spec.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be added mistakenly. Please remove this duplication because we already have this at line 258, @sryza . 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching – just fixed

@dongjoon-hyun
Copy link
Member

cc @peter-toth

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @sryza .

@sryza sryza closed this in 93748cc Jul 18, 2025
@sryza
Copy link
Contributor Author

sryza commented Jul 18, 2025

Merged to master

dongjoon-hyun added a commit to apache/spark-connect-swift that referenced this pull request Oct 1, 2025
…th `4.1.0-preview2`

### What changes were proposed in this pull request?

This PR aims to update Spark Connect-generated Swift source code with Apache Spark `4.1.0-preview2`.

### Why are the changes needed?

There are many changes from Apache Spark 4.1.0.

- apache/spark#52342
- apache/spark#52256
- apache/spark#52271
- apache/spark#52242
- apache/spark#51473
- apache/spark#51653
- apache/spark#52072
- apache/spark#51561
- apache/spark#51563
- apache/spark#51489
- apache/spark#51507
- apache/spark#51462
- apache/spark#51464
- apache/spark#51442

To use the latest bug fixes and new messages to develop for new features of `4.1.0-preview2`.

```
$ git clone -b v4.1.0-preview2 https://github.com/apache/spark.git
$ cd spark/sql/connect/common/src/main/protobuf/
$ protoc --swift_out=. spark/connect/*.proto
$ protoc --grpc-swift_out=. spark/connect/*.proto

// Remove empty GRPC files
$ cd spark/connect

$ grep 'This file contained no services' *
catalog.grpc.swift:// This file contained no services.
commands.grpc.swift:// This file contained no services.
common.grpc.swift:// This file contained no services.
example_plugins.grpc.swift:// This file contained no services.
expressions.grpc.swift:// This file contained no services.
ml_common.grpc.swift:// This file contained no services.
ml.grpc.swift:// This file contained no services.
pipelines.grpc.swift:// This file contained no services.
relations.grpc.swift:// This file contained no services.
types.grpc.swift:// This file contained no services.

$ rm catalog.grpc.swift commands.grpc.swift common.grpc.swift example_plugins.grpc.swift expressions.grpc.swift ml_common.grpc.swift ml.grpc.swift pipelines.grpc.swift relations.grpc.swift types.grpc.swift
```

### Does this PR introduce _any_ user-facing change?

Pass the CIs.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #250 from dongjoon-hyun/SPARK-53777.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants