-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-12929. Datanode Should Immediately Trigger Container Close when Volume Full #8460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
e1a649c
7bf2678
bfd7cb5
781731e
484e0d9
198b5fb
ae34404
4f70303
1e4418d
65774c7
1f7ec32
a80879d
d57b13e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| --- | ||
| title: Full Volume Handling | ||
| summary: Immediately trigger Datanode heartbeat on detecting full volume | ||
| date: 2025-05-12 | ||
| jira: HDDS-12929 | ||
| status: Design | ||
| author: Siddhant Sangwan, Sumit Agrawal | ||
| --- | ||
|
|
||
| <!-- | ||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. See accompanying LICENSE file. | ||
| --> | ||
|
|
||
| ## Summary | ||
| On detecting a full Datanode volume during write, immediately trigger a heartbeat containing the latest storage report. | ||
|
|
||
| ## Problem | ||
| When a Datanode volume is close to full, the SCM may not be immediately aware because storage reports are only sent | ||
| to it every thirty seconds. This can lead to the SCM allocating multiple blocks to containers on a full DN volume, | ||
| causing performance issues when the write fails. The proposal will partly solve this problem. | ||
|
|
||
| In the future (https://issues.apache.org/jira/browse/HDDS-12151) we plan to fail a write if it's going to exceed the min free space boundary in a volume. To prevent this from happening often, SCM needs to stop allocating blocks to containers on such volumes in the first place. | ||
|
|
||
| ## Non Goals | ||
| The proposed solution describes the complete solution at a high level, however HDDS-12929 will only add the initial Datanode side code for triggering a heartbeat on detecting a full volume + throttling logic. | ||
|
|
||
| Failing the write if it exceeds the min free space boundary is not discussed here. | ||
|
|
||
| ## Proposed Solution | ||
|
|
||
| ### What does the Datanode do currently? | ||
|
|
||
| In HddsDispatcher, on detecting that the volume being written to is close to full, we add a CloseContainerAction for | ||
| that container. This is sent to the SCM in the next heartbeat and makes the SCM close that container. This reaction time | ||
| is OK for a container that is close to full, but not if the volume is close to full. | ||
|
||
|
|
||
| ### Proposal | ||
| This is the proposal, explained via a diagram. | ||
|
||
|
|
||
|  | ||
|
|
||
| Throttling is required so the Datanode doesn't cause a heartbeat storm on detecting that some volumes are full in multiple write calls. | ||
|
||
|
|
||
| ## Benefits | ||
| 1. SCM will not include a Datanode in a new pipeline if all the volumes on it are full. The logic to do this already exists, we just update the volume stats in the SCM faster. | ||
|
||
| 2. Close to full volumes won't cause frequent write failures. | ||
|
|
||
| ## Alternatives | ||
| Instead of including the list of containers present on the full volume in the Storage Report, we could instead add the volume ID to the Container Replica proto. In the SCM, this would imply that we need to do a linear scan through all the Container Replica objects present in the system to figure out which containers are present on the full volume, which is slow. Alternatively we could build and maintain a map to do this, which is more complex than the proposed solution. | ||
|
|
||
| ## Implementation Plan | ||
| 1. HDDS-13045: Initial code for including node report, triggering heartbeat, throttling. | ||
| 2. HDDS-12151: Fail a write call if it exceeds min free space boundary | ||
| 3. Future Jira: Handle full volume report on the SCM side - close containers. | ||
| 4. HDDS-12658: Try not to select full pipelines when allocating a block in SCM. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCM will check the container size before allocating block for this container. Currently the container size is reported when container report is full, or container state is changed from open to other state. So SCM is kindly allocating the blocks blindly. In this 30s storage reports, I think we should consider report the open containers too, to help SCM better understand the open container state and avoid over allocate blocks for one container. @siddhantsangwan , what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean include a full container report for all containers in the DN, not just the ones on the full volume? We can use the method
StateContext#getFullContainerReportDiscardPendingICR.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open containers report, not full container report. As only open container will grow its size. Other container's size will just shrink if there is any change. And a timely open container size update will help SCM allocating block more precisely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siddhantsangwan
When container is
90% full, we add ICR for closeContainer action to be send for next HB. This can be send immediately with this which is already present, and no need wait for next HB for these.I think when sending ICR for this, it would have been added, can be verified.
@ChenSammi Since DN is already taking decision to stop block allocation when 90% is full and send is send, May be SCM sending open container list will not provide any further benefits as action is already taken by DN.
Need to see if any additional benefits we can having sending OpenContainer information also. This may be tracked to send Open Containers on every HB with separate JIRA and based on benefits, this can be implemented.