-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-52511][SDP] Support dry-run mode in spark-pipelines command #51489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyspark/pipelines/cli.py
Outdated
| run_parser.add_argument("--spec", help="Path to the pipeline spec.") | ||
|
|
||
| # "dry-run" subcommand | ||
| run_parser = subparsers.add_parser( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sryza shall we have an end-to-end test for the dry run mode? We should check that it can detect failures without side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing – just added, in test_spark_connect.py
|
@sryza seems there is linter failure https://github.com/sryza/spark/actions/runs/16335212360/job/46145897514 |
python/pyspark/pipelines/cli.py
Outdated
| "dry-run", | ||
| help="Launch a run that just validates the graph and checks for errors.", | ||
| ) | ||
| run_parser.add_argument("--spec", help="Path to the pipeline spec.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be added mistakenly. Please remove this duplication because we already have this at line 258, @sryza . 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching – just fixed
|
cc @peter-toth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @sryza .
|
Merged to master |
…th `4.1.0-preview2` ### What changes were proposed in this pull request? This PR aims to update Spark Connect-generated Swift source code with Apache Spark `4.1.0-preview2`. ### Why are the changes needed? There are many changes from Apache Spark 4.1.0. - apache/spark#52342 - apache/spark#52256 - apache/spark#52271 - apache/spark#52242 - apache/spark#51473 - apache/spark#51653 - apache/spark#52072 - apache/spark#51561 - apache/spark#51563 - apache/spark#51489 - apache/spark#51507 - apache/spark#51462 - apache/spark#51464 - apache/spark#51442 To use the latest bug fixes and new messages to develop for new features of `4.1.0-preview2`. ``` $ git clone -b v4.1.0-preview2 https://github.com/apache/spark.git $ cd spark/sql/connect/common/src/main/protobuf/ $ protoc --swift_out=. spark/connect/*.proto $ protoc --grpc-swift_out=. spark/connect/*.proto // Remove empty GRPC files $ cd spark/connect $ grep 'This file contained no services' * catalog.grpc.swift:// This file contained no services. commands.grpc.swift:// This file contained no services. common.grpc.swift:// This file contained no services. example_plugins.grpc.swift:// This file contained no services. expressions.grpc.swift:// This file contained no services. ml_common.grpc.swift:// This file contained no services. ml.grpc.swift:// This file contained no services. pipelines.grpc.swift:// This file contained no services. relations.grpc.swift:// This file contained no services. types.grpc.swift:// This file contained no services. $ rm catalog.grpc.swift commands.grpc.swift common.grpc.swift example_plugins.grpc.swift expressions.grpc.swift ml_common.grpc.swift ml.grpc.swift pipelines.grpc.swift relations.grpc.swift types.grpc.swift ``` ### Does this PR introduce _any_ user-facing change? Pass the CIs. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #250 from dongjoon-hyun/SPARK-53777. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Adds a new
spark-pipelinescommand that launches an execution of a pipeline that doesn't write or read any data, but catches many kinds of errors that would be caught if the pipeline were to actually run. E.g.Why are the changes needed?
Leverage the declarative nature of Declarative Pipelines to make pipeline development easier.
Does this PR introduce any user-facing change?
Adds behavior; doesn't change existing behavior.
How was this patch tested?
dry-runon the CLI, for both success and error scenariosWas this patch authored or co-authored using generative AI tooling?