Skip to content

Conversation

@c21
Copy link
Contributor

@c21 c21 commented Jan 25, 2022

What changes were proposed in this pull request?

FileCommitProtocol is the class to commit Spark job output (staging file & directory renaming, etc). During Spark 3.2 development, we added new functions into this class to allow more flexible output file naming. We didn’t delete the existing file naming functions (newTaskTempFile(ext) & newTaskTempFileAbsPath(ext)), because we were aware of many other downstream projects or codebases already implemented their own custom implementation for FileCommitProtocol. Delete the existing functions would be a breaking change for them when upgrading Spark version, and we would like to avoid this unpleasant surprise for anyone if possible. But we also need to clean up legacy as we evolve our codebase.

So for next step, I would like to propose:

Spark 3.3 (now): Add @deprecate annotation to legacy functions in FileCommitProtocol - newTaskTempFile(ext) & newTaskTempFileAbsPath(ext).

Why are the changes needed?

Clean up codebase.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit tests.

@github-actions github-actions bot added the CORE label Jan 25, 2022
@c21
Copy link
Contributor Author

c21 commented Jan 25, 2022

cc @cloud-fan and @dongjoon-hyun.
Also I started a mailing list discussion thread in dev - "[DISCUSS] Deprecate legacy file naming functions in FileCommitProtocol" per request - #33012 (review) .

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@HyukjinKwon
Copy link
Member

Merged to master.

@c21
Copy link
Contributor Author

c21 commented Jan 26, 2022

Thank you all!

@c21 c21 deleted the file-naming branch January 26, 2022 00:59
senthh pushed a commit to senthh/spark-1 that referenced this pull request Feb 3, 2022
…n FileCommitProtocol

### What changes were proposed in this pull request?

FileCommitProtocol is the class to commit Spark job output (staging file & directory renaming, etc). During Spark 3.2 development, we added new functions into this class to allow more flexible output file naming. We didn’t delete the existing file naming functions (newTaskTempFile(ext) & newTaskTempFileAbsPath(ext)), because we were aware of many other downstream projects or codebases already implemented their own custom implementation for FileCommitProtocol. Delete the existing functions would be a breaking change for them when upgrading Spark version, and we would like to avoid this unpleasant surprise for anyone if possible. But we also need to clean up legacy as we evolve our codebase.

So for next step, I would like to propose:

Spark 3.3 (now): Add deprecate annotation to legacy functions in FileCommitProtocol - newTaskTempFile(ext) & newTaskTempFileAbsPath(ext).

### Why are the changes needed?

Clean up codebase.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests.

Closes apache#35311 from c21/file-naming.

Authored-by: Cheng Su <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants