Skip to content

Conversation

@rich7420
Copy link
Contributor

What changes were proposed in this pull request?

This PR added a configurable multiplier (ozone.scm.container.space.requirement.multiplier, default 2.0) that increases the space check from 5GB to 10GB during container allocation. This gives a buffer so multiple allocations don't fail when there's barely enough space.

What is the link to the Apache JIRA

HDDS-13922

How was this patch tested?

test all passed in:
https://github.com/rich7420/ozone/actions/runs/19655213061

Copy link
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rich7420 for this patch! left some comments

cat target/${{ inputs.script }}/summary.md >> $GITHUB_STEP_SUMMARY
fi
hadoop-ozone/dev-support/checks/_summary.sh target/${{ inputs.script }}/summary.txt
hadoop-ozone/dev-support/checks/_summary.sh target/${{ inputs.script }}/summary.txt || true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hadoop-ozone/dev-support/checks/_summary.sh target/${{ inputs.script }}/summary.txt || true
hadoop-ozone/dev-support/checks/_summary.sh target/${{ inputs.script }}/summary.txt

DEVELOCITY_ACCESS_KEY: ${{ secrets.DEVELOCITY_ACCESS_KEY }}
- name: Summary of failures
run: hadoop-ozone/dev-support/checks/_summary.sh target/unit/summary.txt
run: hadoop-ozone/dev-support/checks/_summary.sh target/unit/summary.txt || true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run: hadoop-ozone/dev-support/checks/_summary.sh target/unit/summary.txt || true
run: hadoop-ozone/dev-support/checks/_summary.sh target/unit/summary.txt

continue-on-error: true
- name: Summary of failures
run: hadoop-ozone/dev-support/checks/_summary.sh target/${{ github.job }}/summary.txt
run: hadoop-ozone/dev-support/checks/_summary.sh target/${{ github.job }}/summary.txt || true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run: hadoop-ozone/dev-support/checks/_summary.sh target/${{ github.job }}/summary.txt || true
run: hadoop-ozone/dev-support/checks/_summary.sh target/${{ github.job }}/summary.txt

"ozone.scm.container.size";
public static final String OZONE_SCM_CONTAINER_SIZE_DEFAULT = "5GB";

public static final String OZONE_SCM_CONTAINER_SPACE_REQUIREMENT_MULTIPLIER =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m thinking this patch only needs to add a config to set the required data size (dataSizeRequired), which we can then read from ReplicationManagerUtil here: https://github.com/peterxcli/ozone/blob/7868a862ff616154e4bd9ab52dada929e15042ec/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManagerUtil.java#L93-L94.
@siddhantsangwan does this approach look correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peterxcli ! I've tried to simplify the approach removed the new config entirely and directly use HddsServerUtil.requiredReplicationSpace() which already provides the 2x multiplier.

</property>
<property>
<name>ozone.scm.container.space.requirement.multiplier</name>
<value>2.0</value>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would solve the problem with 2 concurrent allocations, which would both pass. But with increased concurrency we could expect the same failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we even need to add one more config here, instead this value can be 5 or 10x of container size , concurrency prediction is unknown here. Even if concurrency is very high on same DN at the same time, client always has a mechanism to retry on another pipeline.

@yandrey321
Copy link

I'd rather suggest graceful handling of Out of space event from DN during container creation and if it fails to use one of already opened containers.

The other approach would be serializing container creation if DN has less than 5 -10% of free space.

@peterxcli
Copy link
Member

peterxcli commented Nov 26, 2025

Correct me if im wrong.
IMO, the best way to resolve this from the root cause might be similar approach like: https://issues.apache.org/jira/browse/HDDS-12810 (#8360)

But I havent looked into the related code so I'm not sure whether we can do the same thing at SCM side.
cc @ChenSammi @siddhantsangwan @sumitagrawl

@ashishkumar50
Copy link
Contributor

Correct me if im wrong. IMO, the best way to resolve this from the root cause might be similar approach like: https://issues.apache.org/jira/browse/HDDS-12810 (#8360)

But I havent looked into the related code so I'm not sure whether we can do the same thing at SCM side.

On DN we have committed bytes so it is simple and easy way to handle this scenario. But in SCM we don't have committed bytes to control the same as DN way. And it will be too complex to add committed bytes in SCM.

@rich7420
Copy link
Contributor Author

I'd rather suggest graceful handling of Out of space event from DN during container creation and if it fails to use one of already opened containers.

The other approach would be serializing container creation if DN has less than 5 -10% of free space.

You're right that 2x only provides a buffer for limited concurrency. However, implementing the more robust solutions you suggested (graceful out-of-space handling, serialized allocation) would require significant architectural changes. I think the current approach using the existing requiredReplicationSpace() method (2x) provides immediate improvement for common cases while keeping the code simple. So maybe we can consider more comprehensive solutions in future work if this proves insufficient. Thanks the suggestion @yandrey321 !

@ChenSammi
Copy link
Contributor

ChenSammi commented Dec 1, 2025

@rich7420 , thanks for working on this task.

IIUC, SCM doesn't track the new containers allocated with a specific DN, so in concurrent container allocation requests case, space check in SCM will pass while there is no enough space on DN to serve all new containers(DN side container allocation space reservation is now serialized(https://issues.apache.org/jira/browse/HDDS-12810), so DN will fail the requests when no enough space), so increase the available space requirement check from 5GB in SCM cannot guarantee solve the problem, since the concurrency is not a fixed status, it' hard to decide which number is enough in SCM.
It's not a recommended way to introduce code complexity while it cannot guarantee solve the problem.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will not completely fix the problem but it can improve the situation. If everyone agrees to go ahead with this change, we can make it configurable and have at least 5x, that is 25 GB, as default. This will need to be checked at both times - when creating a pipeline and when creating a new container.

@siddhantsangwan
Copy link
Contributor

Do we have any alternative ways to solve this problem?

@siddhantsangwan
Copy link
Contributor

I'd rather suggest graceful handling of Out of space event from DN during container creation and if it fails to use one of already opened containers.

graceful handling of Out of space

I'm not sure what the client retry behaviour is. We should be having some retry mechanism to pick a different pipeline. But write performance is already degraded at that point, so we need to make better decisions at the SCM level about which pipeline is given to a client in the first place.

I think we need to test the correctness and performance of CapacityPipelineChoosePolicy and make it the default to reduce the probability of a full pipeline being chosen. CapacityPipelineChoosePolicy in addition to this fix should improve the situation.

Comment on lines 250 to 252
hadoop-ozone/dev-support/checks/_summary.sh target/${{ inputs.script }}/summary.txt
if [[ -f "hadoop-ozone/dev-support/checks/_summary.sh" ]]; then
hadoop-ozone/dev-support/checks/_summary.sh target/${{ inputs.script }}/summary.txt
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look unrelated. Why are CI workflow files being changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking @adoroszlai ! This PR changed this part Because I got error in without _summary.sh this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got error in without _summary.sh this file.

_summary.sh does exist in the source repo. Please check for the root cause instead of masking it.

summary.md is checked because it's generated by some CI checks, but not all.

Copilot AI review requested due to automatic review settings December 6, 2025 02:36
@rich7420
Copy link
Contributor Author

rich7420 commented Dec 6, 2025

Thanks for all the valuable feedback.
I agree that while this is a mitigation rather than a strict fix for the race condition, increasing the buffer makes it much more robust for practical use cases.
Updates in the latest commit:
I changed the default ozone.scm.container.space.requirement.multiplier from 2.0 to 5.0 as suggested, to provide a safer buffer for concurrent allocations. And I reverted unrelated changes in .github/workflows to keep the PR clean.
Please take a look. Thanks!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a configurable multiplier for space requirements during container allocation to prevent concurrent allocation failures. The implementation adds ozone.scm.container.space.requirement.multiplier (default 5.0) to increase the space check threshold, though there's a discrepancy with the PR description which states the default is 2.0.

Key Changes:

  • Added configurable space requirement multiplier with default value of 5.0
  • Updated container allocation logic to use multiplied space requirement (25GB with default 5GB container size)
  • Modified pipeline providers (Ratis and EC) to use the new space calculation

Reviewed changes

Copilot reviewed 8 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
hadoop-hdds/common/src/main/resources/ozone-default.xml Adds configuration property for space requirement multiplier with default value 5.0
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfigKeys.java Defines the new configuration key and default constant (5.0)
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/HddsServerUtil.java Adds overloaded method to calculate required space using the configurable multiplier
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManagerImpl.java Updates container allocation to use the new space requirement calculation
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/RatisPipelineProvider.java Updates Ratis pipeline creation to use multiplied space requirement
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/ECPipelineProvider.java Updates EC pipeline creation to use multiplied space requirement
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerManagerImpl.java Adds test to verify space requirement calculation is used correctly
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestECPipelineProvider.java Updates test to use the new space requirement calculation
hadoop-ozone/dist/src/main/k8s/examples/*/config-configmap.yaml Adds container size configuration (1GB) to K8s deployment examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


public static final String OZONE_SCM_CONTAINER_SPACE_REQUIREMENT_MULTIPLIER =
"ozone.scm.container.space.requirement.multiplier";
public static final double OZONE_SCM_CONTAINER_SPACE_REQUIREMENT_MULTIPLIER_DEFAULT = 5.0;
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states the default multiplier is 2.0, but the implementation sets it to 5.0. This inconsistency should be resolved. Either update the PR description to reflect the actual default of 5.0, or change the default value in the code to 2.0 to match the description.

Suggested change
public static final double OZONE_SCM_CONTAINER_SPACE_REQUIREMENT_MULTIPLIER_DEFAULT = 5.0;
public static final double OZONE_SCM_CONTAINER_SPACE_REQUIREMENT_MULTIPLIER_DEFAULT = 2.0;

Copilot uses AI. Check for mistakes.
Comment on lines +1042 to +1052
<description>
Multiplier for container space requirement when checking if a datanode
has enough space for container allocation. The required space is calculated
as container size multiplied by this value. This prevents concurrent clients
from all passing the space check when there's only enough space for one
container. For example, with default container size of 5GB and multiplier
of 5.0, the system will require 25GB of available space before allocating
a new container. This ensures that if only 6GB is remaining, the check will
fail, preventing multiple clients from attempting to create containers
concurrently when there's only space for one.
</description>
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example in the documentation is incorrect. With a default container size of 5GB and a multiplier of 5.0, the required space would be 25GB (5GB * 5.0), not 10GB as mentioned in the PR description. The documentation correctly states 25GB, but this contradicts the PR description which mentions increasing from 5GB to 10GB (which would require a multiplier of 2.0).

Copilot uses AI. Check for mistakes.
</property>
<property>
<name>ozone.scm.container.space.requirement.multiplier</name>
<value>5.0</value>
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states the default multiplier is 2.0, but the implementation sets it to 5.0. This inconsistency should be resolved. Either update the PR description to reflect the actual default of 5.0, or change the default value in the code to 2.0 to match the description.

Copilot uses AI. Check for mistakes.
@adoroszlai adoroszlai marked this pull request as draft December 6, 2025 06:55
@adoroszlai
Copy link
Contributor

Changed to draft due to failures in acceptance tests: https://github.com/rich7420/ozone/actions/runs/19981844525

CI runs with limited space and limited number of nodes. We should either set multiplier=1 or reduce container size.

@siddhantsangwan
Copy link
Contributor

I had a separate discussion with @ChenSammi and @ashishkumar50. It was decided it's better to pause this for now, test CapacityPipelineChoosePolicy and then decide if we should go ahead with this. The reason is that we already have a lot of configurations in this area and it's complicated. Adding another configuration or buffer becomes hard to explain to ozone users.

I'm working on testing and exploring CapacityPipelineChoosePolicy, will create a jira soon. Based on testing we can decide whether to go ahead with this pull request (or a better approach). @rich7420 what do you think?

@rich7420
Copy link
Contributor Author

Sounds good to me, thanks for driving this. Let’s close this PR and proceed with testing CapacityPipelineChoosePolicy. I'm happy to help review the new JIRA/patch or adjust this PR based on the results. thanks @siddhantsangwan

@rich7420 rich7420 closed this Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants