HDDS-13045. Implement Immediate Triggering of Heartbeat when Volume Full #8492

siddhantsangwan · 2025-05-20T11:23:04Z

What changes were proposed in this pull request?

This pull request is for implementing a part of the design proposed in HDDS-12929. This only contains the implementation for detecting a full volume, getting the latest storage report, adding the container action, then immediately triggering (or throttling) a heartbeat.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13045

How was this patch tested?

Modified existing unit tests. Also did some manual testing using the ozone docker compose cluster.

a. Simulated a close to full volume with a capacity of 2 GB, available space of 150 MB and min free space of 100 MB. Datanode log:

2025-05-20 09:47:05,899 [main] INFO volume.HddsVolume: HddsVolume: { id=DS-64dd669c-71fe-492f-903c-4fc7dbe4440a dir=/data/hdds/hdds type=DISK capacity=2147268899 used=1990197248 available=157071651 minFree=104857600 committed=0 }

b. Wrote 100 MB of data using freon, with the expectation that an immediate heartbeat will be triggered as soon as the available space drops to 100 MB. Datanode log shows that this happened at 09:50:52:

2025-05-20 09:50:52,028 [f8714dd7-31fc-4c63-9703-6fdb1a59b5c4-ChunkWriter-7-0] INFO impl.HddsDispatcher: Triggering heartbeat for full volume /data/hdds/hdds, with node report storageReport {
   storageUuid: "DS-bd34474b-8fd4-49be-be78-72e708b543c0"
   storageLocation: "/data/hdds/hdds"
   capacity: 2147268899
   scmUsed: 2042626048
   remaining: 104642851
   storageType: DISK
   failed: false
   committed: 0
   freeSpaceToSpare: 104857600
 }
 metadataStorageReport {
   storageLocation: "/data/metadata/ratis"
   storageType: DISK
   capacity: 2147268899
   scmUsed: 1990197248
   remaining: 157071651
   failed: false
 }

c. In the SCM, the last storage report BEFORE the write operation was received at 09:50:09:

2025-05-20 09:50:09,399 [IPC Server handler 12 on default port 9861] INFO server.SCMDatanodeHeartbeatDispatcher: Dispatching Node Report storageReport {
storageUuid: "DS-27210be2-ee53-4035-a3a3-63ec8a162456"
   storageLocation: "/data/hdds/hdds"
   capacity: 2147268899
   scmUsed: 1990197248
   remaining: 157071651
   storageType: DISK
   failed: false
   committed: 0
   freeSpaceToSpare: 104857600
 }
 metadataStorageReport {
   storageLocation: "/data/metadata/ratis"
   storageType: DISK
   capacity: 2147268899
   scmUsed: 1990197248
   remaining: 157071651
   failed: false
 }

So, the next storage report should be received a minute later at 09:51:09, unless it's triggered immediately due to volume full. The SCM log shows that the immediately triggered report was received at 09:50:52, corresponding to the DN log:

2025-05-20 09:50:52,033 [IPC Server handler 4 on default port 9861] INFO server.SCMDatanodeHeartbeatDispatcher: Dispatching Node Report storageReport {
   storageUuid: "DS-bd34474b-8fd4-49be-be78-72e708b543c0"
   storageLocation: "/data/hdds/hdds"
   capacity: 2147268899
   scmUsed: 2042626048
   remaining: 104642851
   storageType: DISK
   failed: false
   committed: 0
   freeSpaceToSpare: 104857600
 }
 metadataStorageReport {
   storageLocation: "/data/metadata/ratis"
   storageType: DISK
   capacity: 2147268899
   scmUsed: 1990197248
   remaining: 157071651
   failed: false
 }

The next storage report is received at the expected time of 09:51:09, showing that throttling also worked.

Green CI in my fork: https://github.com/siddhantsangwan/ozone/actions/runs/15135787944/job/42547140475

siddhantsangwan · 2025-05-21T05:13:35Z

...iner-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java

+          nodeReport = context.getParent().getContainer().getNodeReport();
+          context.refreshFullReport(nodeReport);
+          context.getParent().triggerHeartbeat();
+          LOG.info("Triggering heartbeat for full volume {}, with node report: {}.", volume, nodeReport);


This is on the write path, so we must be extra careful about performance. An info log will reduce performance, but I wonder if it's ok in this case because this won't happen often? What do others think?

Moreover the future plan is to fail the write anyway if the size is exceeding the min free and reserved space boundary.

peterxcli

Thanks @siddhantsangwan for this improvement!

...iner-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java

ChenSammi · 2025-05-27T10:22:22Z

...iner-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java

    this.slowOpThresholdNs = getSlowOpThresholdMs(conf) * 1000000;
+    fullVolumeLastHeartbeatTriggerMs = new AtomicLong(-1);
+    long heartbeatInterval =
+        config.getTimeDuration("hdds.heartbeat.interval", 30000, TimeUnit.MILLISECONDS);


Can we call HddsServerUtil#getScmHeartbeatInterval instead?

And there is HDDS_NODE_REPORT_INTERVAL for node report. Shall we use node report property instead of heartbeat property?

HDDS_NODE_REPORT_INTERVAL is 1 minute, it may be too long?

1m or 3s doesn't matter, because you always send out the first heartbeat immediately. This 1m is used to control the throttling, right?

Yes, it's for throttling

ChenSammi · 2025-05-27T10:31:24Z

...iner-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java

+        try {
+          handleFullVolume(container.getContainerData().getVolume());
+        } catch (StorageContainerException e) {
+          ContainerUtils.logAndReturnError(LOG, e, msg);


Are we going to return here?

Good catch, but I'm not sure. There was an exception in getting the node report, but does that mean we should fail the write? Maybe we should still let the write continue here. Otherwise because of an intermittent or not severe exception we could keep on failing writes. What do you think?

It's OK not return here, but instead of calling ContainerUtils.logAndReturnError, you can probably just log the failure message.

To test whether the logging is proper, I added a new test that throws an exception. Here's what the logs look like:

2025-05-30 16:01:08,027 [main] WARN impl.HddsDispatcher (HddsDispatcher.java:dispatchRequest(354)) - Failed to handle full volume while handling request: cmdType: WriteChunk containerID: 1 datanodeUuid: "c6842f19-cbc5-47ca-bce0-f5bc859ef807" writeChunk { blockID { containerID: 1 localID: 1 blockCommitSequenceId: 0 } chunkData { chunkName: "36b4d6b58215a7da96e3bf71a602e3ea_stream_1_chunk_1" offset: 0 len: 36 checksumData { type: NONE bytesPerChecksum: 0 } } data: "b0bc4858-a308-417d-b363-0631e07b97ec" } org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Failed to create node report when handling full volume /var/folders/jp/39hcmgjx4yb_kry3ydxb3c7r0000gn/T/junit-110499014917526916. Volume Report: { id=DS-db481691-4055-404b-8790-f375e6d41215 dir=/var/folders/jp/39hcmgjx4yb_kry3ydxb3c7r0000gn/T/junit-110499014917526916/hdds type=DISK capacity=499 used=390 available=109 minFree=100 committed=50 } at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.handleFullVolume(HddsDispatcher.java:481) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:352) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$1(HddsDispatcher.java:199) at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:198) at org.apache.hadoop.ozone.container.common.impl.TestHddsDispatcher.testExceptionHandlingWhenVolumeFull(TestHddsDispatcher.java:430) ...

ChenSammi · 2025-05-27T10:36:41Z

...iner-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java

+   */
+  private void handleFullVolume(HddsVolume volume) throws StorageContainerException {
+    long current = System.currentTimeMillis();
+    long last = fullVolumeLastHeartbeatTriggerMs.get();


Consider different volume gets full case , for example, P0, /data1 gets full, P1, /data2 gets full,
(P1-P0) < interval, do we expect two emergent container reports, or one report?

Currently we will only send one report. I think this is fine because in the report we send info about all the volumes. However there's a discussion going on here #8460 (comment).

I don't have a good answer for this after thought for a while. The ideal state is if we want to send immediate heartbeat when one volume is full, we should respect each volume, send a heartbeat for each volume when it's full, but consider the complexity introduced to achieve that, I just doubt whether it's worthy to do that.

Because except the heartbeat sent here, there are regular node reports with storage info sent every 60s. If we only sent one report regardless of which volume, them probably we only need to sent the first one, and let the regular periodic node reports do the rest thing.

Ok, let's stick to the current implementation then. I'll change the interval to node report interval instead of heartbeat interval.

I think purpose of sending full volume report is avoiding pipeline and container creation. Now node report is throttled and hence close container is throttled implicitly. Initial purpose was close container immediate to avoid new block allocation for the HB time (ie 30 second).

This may be similar to sending DN HB, only advantage here is for first failure within 1 min, its immediate, but all later failure is throttled.

for node report, there is a new configuration at SCM discovered to avoid new container allocation, "hdds.datanode.storage.utilization.critical.threshold". We need recheck overall target of problem to solve and optimize configuration / fix inconsistency.

cc: @ChenSammi

for node report, there is a new configuration at SCM discovered to avoid new container allocation, "hdds.datanode.storage.utilization.critical.threshold". We need recheck overall target of problem to solve and optimize configuration / fix inconsistency.

As discussed, this is dead code in Ozone and is not used anywhere.

siddhantsangwan · 2025-05-30T10:36:22Z

Thanks for the reviews! I've addressed comments in the latest commit.

sumitagrawl

LGTM

...iner-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java

peterxcli

Thanks @siddhantsangwan for updating the patch, LGTM!

sumitagrawl

LGTM

siddhantsangwan · 2025-06-09T07:45:24Z

I think purpose of sending full volume report is avoiding pipeline and container creation. Now node report is throttled and hence close container is throttled implicitly. Initial purpose was close container immediate to avoid new block allocation for the HB time (ie 30 second).
This may be similar to sending DN HB, only advantage here is for first failure within 1 min, its immediate, but all later failure is throttled.

Based on this comment, we decided to trigger heartbeat immediately when:

The container is (close) to full (the container full check already exists)
The volume is full EXCLUDING committed space (reserved - available - min free <= 0). This is because when a volume is full INCLUDING committed space (reserved - available - committed - min free <= 0), open containers can still accept writes. So the current behaviour of sending a close container action when volume is full including committed space is a bug.
The container is unhealthy (this is existing behaviour).

We decided to not send volume reports in the immediate heartbeat and instead rely on regular node reports for that. This allows us to make the throttling per container.

Closing this PR, opened a new PR instead - #8590

siddhantsangwan added 2 commits May 15, 2025 14:42

trigger heartbeat immediately + throttling logic

2b5d7b9

add tests

865689f

siddhantsangwan requested review from ChenSammi and sumitagrawl May 20, 2025 11:23

Merge branch 'master' into HDDS-13045

0600aa4

siddhantsangwan marked this pull request as ready for review May 21, 2025 05:02

siddhantsangwan commented May 21, 2025

View reviewed changes

peterxcli reviewed May 21, 2025

View reviewed changes

...iner-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java Show resolved Hide resolved

siddhantsangwan mentioned this pull request May 21, 2025

HDDS-12929. Datanode Should Immediately Trigger Container Close when Volume Full #8460

Merged

ChenSammi reviewed May 27, 2025

View reviewed changes

Merge branch 'master' into HDDS-13045

e5743d2

ChenSammi reviewed May 27, 2025

View reviewed changes

siddhantsangwan added 2 commits May 29, 2025 13:49

use HddsServerUtil.getScmHeartbeatInterval()

2437d5e

use node report interval, log error, add logging test

4fd1252

sumitagrawl approved these changes May 30, 2025

View reviewed changes

sumitagrawl reviewed May 30, 2025

View reviewed changes

...iner-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java Outdated Show resolved Hide resolved

peterxcli approved these changes May 30, 2025

View reviewed changes

extract out method, log instead of throwing exception

31bdb20

siddhantsangwan requested a review from sumitagrawl June 2, 2025 05:35

sumitagrawl approved these changes Jun 2, 2025

View reviewed changes

siddhantsangwan mentioned this pull request Jun 9, 2025

HDDS-13045. Implement Immediate Triggering of Heartbeat when Volume Full #8590

Merged

siddhantsangwan closed this Jun 9, 2025

HDDS-13045. Implement Immediate Triggering of Heartbeat when Volume Full #8492

HDDS-13045. Implement Immediate Triggering of Heartbeat when Volume Full #8492

Uh oh!

Conversation

siddhantsangwan commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterxcli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChenSammi May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChenSammi May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan commented May 30, 2025

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

peterxcli left a comment

Choose a reason for hiding this comment

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

siddhantsangwan commented May 20, 2025 •

edited

Loading

ChenSammi May 27, 2025 •

edited

Loading

ChenSammi May 30, 2025 •

edited

Loading

siddhantsangwan commented Jun 9, 2025 •

edited

Loading