HDDS-5111. DataNode should not always report full information in heartbeat #2182

JacksonYao287 · 2021-04-25T03:25:21Z

What changes were proposed in this pull request?

when investigating SCM OOM, I find that datanode will always report full information about containers，pipeline and node.
By default , ContainerReportPublisher thread runs periodically (HDDS_CONTAINER_REPORT_INTERVAL, default 60s) in Datanode , and The HeartbeatEndpointTask ，which runs periodically (hdds.heartbeat.interval)should only report information in incrementalReportQueue

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5111

How was this patch tested?

unit test

JacksonYao287 · 2021-05-07T11:13:01Z

@avijayanhwx @symious @adoroszlai please take a review , thanks!

JacksonYao287 · 2021-05-12T09:51:45Z

@GlenGeng @bshashikant can you help reviewing this PR? thx

bshashikant · 2021-05-19T05:59:51Z

Thanks @JacksonYao287 for the patch. I am not very sure whether we should do this. Any incremental change on the datanode with respect to pipeline/containers is initiated by either pipeline or container actions. A report ideally means a full report status of datanode to SCM. If the reports need to be sent incrementally, there can be cases where SCM marks conatiner replica's missing and trigger re-replication.

I think, it needs some more thought and some more discussion to arrive to a conclusion on this. cc @nandakumar131 , @mukul1987

JacksonYao287 · 2021-05-19T08:43:58Z

@bshashikant , thanks for the review!
for now, there are two kinds of report: IncrementalReport, which will be put into incrementalMessageQueue, and FullReport, which will be refreshed periodically.
for IncrementalReport, if container/pipeline take new actions, they want SCM to know this action is done as soon as possible by sending through sendICR and triggering a new heartbeat proactively , so that SCM can be acquired this and update the corresponding state of container/pipeline as fast as possible and then take the next action.(eg, reply to OM).
for FullReport , they are published periodically and controlled by hdds.node.report.interval,hdds.container.report.interval,hdds.pipeline.report.interval, so if these parameters are longer than heartbeat interval, they should not exist in every heartbeat information.

A report ideally means a full report status of datanode to SCM

so I think a report may not always mean a full report. In my test environment , there are about a million containers. if scm is always acquired full container report, i think it will be a heavy burden for network, scm and datanode

for now , there are two problems.
1 acorrding to the current implementation, heartbeat will always report full report , even if they just only want to send an incremental report,and thus hdds.node.report.interval,hdds.container.report.interval,hdds.pipeline.report.interval do not take effect , i think this is a bug , and i fix it.

2 current logic of "scmcontex#addreport" is a little confused, so i refactor this , split it into to two functions, so that the logic is now clear.

 private void sendPipelineReport() {
    if (context !=  null) {
      // TODO: Send IncrementalPipelineReport instead of full PipelineReport
      context.addReport(context.getParent().getContainer().getPipelineReport());
      context.getParent().triggerHeartbeat();
    }
  }

after refactoring , now we can send IncrementalPipelineReport instead of full PipelineReport

bshashikant · 2021-05-20T06:39:16Z

@JacksonYao287 , thanks for the explanation.
public List<GeneratedMessage> getReports(InetSocketAddress endpoint, int maxLimit) { if (maxLimit < 0) { throw new IllegalArgumentException("Illegal maxLimit value: " + maxLimit); } List<GeneratedMessage> reports = getNonIncrementalReports(endpoint); if (maxLimit <= reports.size()) { return reports.subList(0, maxLimit); } else { reports.addAll(getIncrementalReports(endpoint, maxLimit - reports.size())); return reports; } }

The code here suggests, it still tries to send the entire report if it less than the configured max limit otherwise will send the incremental report. This will be sent in each HearBeatEndPointTask. Is my understanding correct?

bshashikant · 2021-05-20T10:54:37Z

@avijayanhwx , can you have a look at this? This is related to changes to report handling done for Recon as well done for https://issues.apache.org/jira/browse/HDDS-4404.

JacksonYao287 · 2021-05-20T12:59:32Z

Thanks @bshashikant !

The code here suggests, it still tries to send the entire report if it less than the configured max limit otherwise will send the incremental report. This will be sent in each HearBeatEndPointTask. Is my understanding correct?

  public List<GeneratedMessage> getAllAvailableReports(
      InetSocketAddress endpoint) {
    return getReports(endpoint, Integer.MAX_VALUE);
  }

for now, the code above is the only place where StateContext#getReports is called, and thus maxLimit will always be Integer.MAX_VALUE. So actually ,maxLimit here does not take any effect and fullReport will always be sent in every heartBeatEndpointTask.

by the way , i think we should not limit the total size of a heartbeat report . What we should do is just making sure that the datanode always sends what it should send , which is controlled by the parameters in the configuration file.

@avijayanhwx PTAL

…tbeat

avijayanhwx · 2021-06-01T20:34:39Z

Thanks for working on this @JacksonYao287. I understand the approach being implemented here, and the solutions looks fine to me. In your setup, is the OOM caused due to multiple reports being bottlenecked at the event queue layer?

cc @smengcl for review as well.

JacksonYao287 · 2021-06-02T06:59:35Z

Thanks @avijayanhwx for the review!

In your setup, is the OOM caused due to multiple reports being bottlenecked at the event queue layer?

yes, when there is a large number of containers(in my cluster , about 1 million containers, heartbeat interval is set to 2s and container size is set to 128m), if datanode always send full report, the single thread report handler may not handle this stress, so there will be more and more container report wait for handling, this leads to the scm oom.

@smengcl can you help reviewing this pr？

smengcl

Thanks @JacksonYao287 . The overall logic makes sense to me by reading it.
This patch seeks to eliminate repeatedly sending the same full report to the same endpoint over and over again. This was indeed a problem before this patch.

Overall lgtm. Minor nits inline.

btw have you done any external / manual testing on this?

...ervice/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/StateContext.java

…ozone/container/common/statemachine/StateContext.java add Code Comments Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

JacksonYao287 · 2021-06-03T08:10:15Z

Thanks @smengcl very much for the review !

btw have you done any external / manual testing on this?

i have applied this patch to my own code, and it has been running in my test cluster for several weeks. until now , lt seems well

…ozone/container/common/statemachine/StateContext.java Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

add a new full report type: CRL_STATUS_REPORT_PROTO

JacksonYao287 · 2021-06-03T12:37:51Z

updated !
@avijayanhwx @smengcl PTAL, thanks

avijayanhwx

LGTM +1

JacksonYao287 · 2021-06-04T02:48:54Z

Thanks for the review @bshashikant @avijayanhwx @smengcl !
can we merge this pr now ?

...ervice/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/StateContext.java

…ozone/container/common/statemachine/StateContext.java check before AtomicReference#get Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

smengcl · 2021-06-07T18:04:24Z

Thanks @JacksonYao287 for the contribution!

JacksonYao287 force-pushed the HDDS-5111 branch 2 times, most recently from ef466a2 to 11a2abe Compare April 25, 2021 03:32

JacksonYao287 closed this Apr 25, 2021

JacksonYao287 force-pushed the HDDS-5111 branch from 11a2abe to bef180e Compare April 25, 2021 04:06

JacksonYao287 reopened this Apr 25, 2021

JacksonYao287 force-pushed the HDDS-5111 branch 4 times, most recently from c88378d to f381601 Compare April 26, 2021 10:00

JacksonYao287 closed this Apr 26, 2021

JacksonYao287 reopened this Apr 26, 2021

JacksonYao287 force-pushed the HDDS-5111 branch 2 times, most recently from 26180fd to b43325f Compare April 26, 2021 12:54

adoroszlai changed the title ~~HDDS-5111.DataNode should not always report full information in heartbeat~~ HDDS-5111. DataNode should not always report full information in heartbeat Apr 27, 2021

JacksonYao287 force-pushed the HDDS-5111 branch 3 times, most recently from 013ae29 to 3203e45 Compare May 6, 2021 11:55

JacksonYao287 force-pushed the HDDS-5111 branch from 3203e45 to 87d2773 Compare May 7, 2021 08:14

JacksonYao287 closed this May 7, 2021

JacksonYao287 reopened this May 7, 2021

avijayanhwx requested review from nandakumar131 and smengcl May 10, 2021 06:13

bshashikant closed this May 19, 2021

bshashikant reopened this May 19, 2021

Jackson Yao added 2 commits May 25, 2021 20:11

HDDS-5111. DataNode should not always report full information in hear…

aed1272

…tbeat

Merge remote-tracking branch 'origin/master' into HDDS-5111

d94c1f3

JacksonYao287 force-pushed the HDDS-5111 branch from 20da148 to d94c1f3 Compare May 28, 2021 05:46

smengcl approved these changes Jun 3, 2021

View reviewed changes

Update hadoop-hdds/container-service/src/main/java/org/apache/hadoop/…

30baf9f

…ozone/container/common/statemachine/StateContext.java add Code Comments Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

Jackson Yao and others added 2 commits June 3, 2021 16:10

Update hadoop-hdds/container-service/src/main/java/org/apache/hadoop/…

ec37453

…ozone/container/common/statemachine/StateContext.java Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

Update hadoop-hdds/container-service/src/main/java/org/apache/hadoop/…

3c0f641

…ozone/container/common/statemachine/StateContext.java Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

JacksonYao287 mentioned this pull request Jun 3, 2021

HDDS-5295. testCRLStatusReportPublisher fails to create CRLInfo #2302

Merged

Jackson Yao added 4 commits June 3, 2021 16:49

update

c7fb3ed

Merge remote-tracking branch 'origin/master' into HDDS-5111

683e9d1

add a new full report type: CRL_STATUS_REPORT_PROTO

fix checkstyle

ee825a9

fix unit test

927e094

avijayanhwx approved these changes Jun 3, 2021

View reviewed changes

triger CI for TestSCMInstallSnapshot.setup

be28314

smengcl reviewed Jun 4, 2021

View reviewed changes

...ervice/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/StateContext.java Outdated Show resolved Hide resolved

Jackson Yao and others added 2 commits June 5, 2021 18:21

Update hadoop-hdds/container-service/src/main/java/org/apache/hadoop/…

44f3887

…ozone/container/common/statemachine/StateContext.java check before AtomicReference#get Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

triger ci for TestSCMUpdateServiceGrpcServer.testClientUpdateWithRestart

4cc0399

JacksonYao287 requested review from avijayanhwx and smengcl June 7, 2021 12:31

smengcl merged commit 39954ad into apache:master Jun 7, 2021

JacksonYao287 deleted the HDDS-5111 branch June 8, 2021 01:42

HDDS-5111. DataNode should not always report full information in heartbeat #2182

HDDS-5111. DataNode should not always report full information in heartbeat #2182

Uh oh!

Conversation

JacksonYao287 commented Apr 25, 2021

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

JacksonYao287 commented May 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JacksonYao287 commented May 12, 2021

Uh oh!

bshashikant commented May 19, 2021

Uh oh!

JacksonYao287 commented May 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bshashikant commented May 20, 2021

Uh oh!

bshashikant commented May 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JacksonYao287 commented May 20, 2021

Uh oh!

avijayanhwx commented Jun 1, 2021

Uh oh!

JacksonYao287 commented Jun 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smengcl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JacksonYao287 commented Jun 3, 2021

Uh oh!

JacksonYao287 commented Jun 3, 2021

Uh oh!

avijayanhwx left a comment

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 commented Jun 4, 2021

Uh oh!

Uh oh!

smengcl commented Jun 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JacksonYao287 commented May 7, 2021 •

edited

Loading

JacksonYao287 commented May 19, 2021 •

edited

Loading

bshashikant commented May 20, 2021 •

edited

Loading

JacksonYao287 commented Jun 2, 2021 •

edited

Loading