Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented May 14, 2024

What changes were proposed in this pull request?

This PR adds spark.checkpoint.dir configuration so users can set the checkpoint dir when they submit their application.

Why are the changes needed?

Separate the configuration logic so the same app can run with a different checkpoint.
In addition, this would be useful for Spark Connect with #46570.

Does this PR introduce any user-facing change?

Yes, it adds a new user-facing configuration.

How was this patch tested?

unittest added

Was this patch authored or co-authored using generative AI tooling?

No.

@HyukjinKwon HyukjinKwon marked this pull request as draft May 14, 2024 05:09
@github-actions github-actions bot added the CORE label May 14, 2024
@HyukjinKwon HyukjinKwon changed the title [DO-NOT-MERGE][SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir May 15, 2024
@HyukjinKwon HyukjinKwon marked this pull request as ready for review May 15, 2024 05:07
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about the corner case. What happens when the users have different values for this configuration and SparkContext.setCheckpointDir. It can happen during the migration as a type of human mistakes.

  • I guess SparkContext.setCheckpointDir will override this configuration. If then, please write the precedence in the config documentation.
  • Also, I'm wondering if we want to show a proper warning or even to raise exceptions because this is a critical mistake.

@HyukjinKwon
Copy link
Member Author

Also, I'm wondering if we want to show a proper warning or even to raise exceptions because this is a critical mistake.

This one, I think it's fine. We already have similar configurations such as spark.log.level vs setLogLevel.

@github-actions github-actions bot added the DOCS label May 15, 2024
Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comment on wording. The code change itself looks good to me.

@HyukjinKwon
Copy link
Member Author

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants