-
Notifications
You must be signed in to change notification settings - Fork 593
HDDS-5111. DataNode should not always report full information in heartbeat #2182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ef466a2 to
11a2abe
Compare
11a2abe to
bef180e
Compare
c88378d to
f381601
Compare
26180fd to
b43325f
Compare
013ae29 to
3203e45
Compare
|
@avijayanhwx @symious @adoroszlai please take a review , thanks! |
|
@GlenGeng @bshashikant can you help reviewing this PR? thx |
|
Thanks @JacksonYao287 for the patch. I am not very sure whether we should do this. Any incremental change on the datanode with respect to pipeline/containers is initiated by either pipeline or container actions. A report ideally means a full report status of datanode to SCM. If the reports need to be sent incrementally, there can be cases where SCM marks conatiner replica's missing and trigger re-replication. I think, it needs some more thought and some more discussion to arrive to a conclusion on this. cc @nandakumar131 , @mukul1987 |
|
@bshashikant , thanks for the review!
so I think a report may not always mean a full report. In my test environment , there are about a million containers. if scm is always acquired full container report, i think it will be a heavy burden for network, scm and datanode for now , there are two problems. 2 current logic of "scmcontex#addreport" is a little confused, so i refactor this , split it into to two functions, so that the logic is now clear. after refactoring , now we can send IncrementalPipelineReport instead of full PipelineReport |
|
@JacksonYao287 , thanks for the explanation. The code here suggests, it still tries to send the entire report if it less than the configured max limit otherwise will send the incremental report. This will be sent in each HearBeatEndPointTask. Is my understanding correct? |
|
@avijayanhwx , can you have a look at this? This is related to changes to report handling done for Recon as well done for https://issues.apache.org/jira/browse/HDDS-4404. |
|
Thanks @bshashikant !
for now, the code above is the only place where by the way , i think we should not limit the total size of a heartbeat report . What we should do is just making sure that the datanode always sends what it should send , which is controlled by the parameters in the configuration file. @avijayanhwx PTAL |
|
Thanks for working on this @JacksonYao287. I understand the approach being implemented here, and the solutions looks fine to me. In your setup, is the OOM caused due to multiple reports being bottlenecked at the event queue layer? cc @smengcl for review as well. |
|
Thanks @avijayanhwx for the review!
yes, when there is a large number of containers(in my cluster , about 1 million containers, heartbeat interval is set to 2s and container size is set to 128m), if datanode always send full report, the single thread report handler may not handle this stress, so there will be more and more container report wait for handling, this leads to the scm oom. @smengcl can you help reviewing this pr? |
smengcl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JacksonYao287 . The overall logic makes sense to me by reading it.
This patch seeks to eliminate repeatedly sending the same full report to the same endpoint over and over again. This was indeed a problem before this patch.
Overall lgtm. Minor nits inline.
btw have you done any external / manual testing on this?
...ervice/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/StateContext.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/StateContext.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/StateContext.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/StateContext.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/StateContext.java
Outdated
Show resolved
Hide resolved
…ozone/container/common/statemachine/StateContext.java add Code Comments Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
|
Thanks @smengcl very much for the review !
i have applied this patch to my own code, and it has been running in my test cluster for several weeks. until now , lt seems well |
…ozone/container/common/statemachine/StateContext.java Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
…ozone/container/common/statemachine/StateContext.java Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
add a new full report type: CRL_STATUS_REPORT_PROTO
|
updated ! |
avijayanhwx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
|
Thanks for the review @bshashikant @avijayanhwx @smengcl ! |
...ervice/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/StateContext.java
Outdated
Show resolved
Hide resolved
…ozone/container/common/statemachine/StateContext.java check before AtomicReference#get Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
|
Thanks @JacksonYao287 for the contribution! |
What changes were proposed in this pull request?
when investigating SCM OOM, I find that datanode will always report full information about containers,pipeline and node.
By default , ContainerReportPublisher thread runs periodically (HDDS_CONTAINER_REPORT_INTERVAL, default 60s) in Datanode , and The HeartbeatEndpointTask ,which runs periodically (hdds.heartbeat.interval)should only report information in incrementalReportQueue
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5111
How was this patch tested?
unit test