-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-22972][datastream] Remove StreamOperator#dispose in favour of close and finish #16351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit c134bf9 (Thu Sep 23 17:55:59 UTC 2021) ✅no warnings Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
|
@flinkbot run azure |
gaoyunhaii
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dawidwys very thanks for opening the PR! The PR looks in general very good to me and I only left few comments.
Besides the comments inline,
- it seems we might also need to modify the document task_lifecycle.md
- It seems there are also some operators:
PythonTimestampsAndWatermarksOperator,CollectSinkOperator,AbstractMapBundleOperator,StreamSortOperator,RowTimeMiniBatchAssginerOperator,TestBoundedMultipleInputOperatorandTimestampITCase.CustomOperator, some of their logic inclose()should also need to be migrated to be infinish()and some of them would also emit records inclose()
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
Show resolved
Hide resolved
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
Outdated
Show resolved
Hide resolved
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
Outdated
Show resolved
Hide resolved
| */ | ||
| @Override | ||
| public void dispose() throws Exception { | ||
| public void close() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it is definitely not introduced in this PR, I have a bit concern here in that we rely on the subclass to call super.close(), otherwise the statehandler won't get cleaned up, which might further cause resource leak (like the memory occupied by rocksdb). Currently it seems we have operators like GenericWriteAheadSink , TemporalProcessTimeJoinOperator, TemporalProcessTimeJoinOperator and WatermarkAssignerOperator that indeed do not call super.close(). Perhaps we could introduce a final method closeAndCleanupState that get called by the framework, and call close() in that method~?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely a good observation. I am a bit hesitant to introducing the change as it implies changing again the API of the StreamOperator. Underneath the framework calls StreamOperator#close. If we wanted to introduce closeAndCleanupState the way you described it that it'd call close(), we'd need to do it in the StreamOperator. If we wanted to do it the other way around and make AbstractStreamOperator#close final and call abstract closeAndCleanupState or similar we'd need to change all operators and most probably all user's operators as it's virtually impossible to implement an operator without extending one of the AbstractStreamOperatorV2.
How about we create a JIRA ticket for that and we try to fix it once we work making the operator API "more public". There is a desire to expose e.g. the MailboxProcessor and similar features in a better thought through manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very thanks for the explanation! It indeed makes a lot of sense to me and I also agree with the plan.
And for now perhaps we first complement super.close() for those three operators? Since some of them indeed used states~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Missed that bit 🤦 Yes, will update those three operators.
| */ | ||
| @Override | ||
| public void dispose() throws Exception { | ||
| public void close() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here there might be a similar concern as the AbstractStreamOperator.
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/tasks/StreamTaskTest.java
Outdated
Show resolved
Hide resolved
...-state-processing-api/src/main/java/org/apache/flink/state/api/output/BoundedStreamTask.java
Outdated
Show resolved
Hide resolved
.../org/apache/flink/table/runtime/operators/multipleinput/MultipleInputStreamOperatorBase.java
Outdated
Show resolved
Hide resolved
…close and finish This commit cleans up StreamOperator API in regards to the termination phase and introduces a clean finish() method for flushing all records without releasing resources. The StreamOperator#close method which is supposed to flush all records, but at the same time, currently, it closes all resources, including connections to external systems. We need separate methods for flushing and closing resources because we might need the connections when performing the final checkpoint, once all records are flushed. Moreover, the logic for closing resources is duplicated in the StreamOperator#dispose method.
|
Very thanks @dawidwys for the updates! The PR LGTM now~ |
What is the purpose of the change
This PR cleans up StreamOperator API in regards to the termination phase and introduces a clean finish() method for flushing all records without releasing resources.
The StreamOperator#close method which is supposed to flush all records, but at the same time, currently, it closes all resources, including connections to external systems. We need separate methods for flushing and closing resources because we might need the connections when performing the final checkpoint, once all records are flushed. Moreover, the logic for closing resources is duplicated in the StreamOperator#dispose method.
Verifying this change
All existing tests pass.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation