Skip to content

Conversation

@symious
Copy link
Contributor

@symious symious commented Dec 7, 2021

What changes were proposed in this pull request?

Issue happens when we try to update the config of "hdds.container.balancer.utilization.threshold", but the value from the SCM log said it's always the "0.1", which is the default value.

After checking the code, the construction of ContainerBalancerConfiguration doesn't comply with the pattern of "ConfigGroup", this ticket is to fix the issue to update the configuration from config file.

Since the class needs to hold OzoneConfiguration for configuration check, the method of "initialize" is created for this requirement.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6070

How was this patch tested?

unit test

@symious
Copy link
Contributor Author

symious commented Dec 7, 2021

@siddhantsangwan Could you help to review this PR?

@siddhantsangwan
Copy link
Contributor

@symious Thanks for finding this bug. Looking into it.

@symious
Copy link
Contributor Author

symious commented Dec 10, 2021

@siddhantsangwan Thank you for the review.
@JacksonYao287 Could you also help to check the PR?

long size = (long) ozoneConfiguration.getStorageSize(
ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE,
ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE_DEFAULT, StorageUnit.GB) +
ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE_DEFAULT, StorageUnit.BYTES) +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhantsangwan Another issue is the StorageUnit here.
The value from Unit.GB is "5 + 1GB", need to changed to Unit.BYTES.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was just going to raise a jira for this particular bug when I saw that you've pushed an update. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the other configs, like "move.timeout" and "balancing.interval", also have the same issue.

@siddhantsangwan
Copy link
Contributor

siddhantsangwan commented Dec 10, 2021

I have one initial observation that could be a problem. The pattern of

this.config = ozoneConfiguration.getObject(ContainerBalancerConfiguration.class);
config.initialize(ozoneConfiguration);

results in overriding the values of size.entering.target.max, size.leaving.source.max, and balancing.iteration.interval.
For example, if we set the value of size entering target using ozone configuration, the following config.initialize invocation would overwrite that value.

@symious
Copy link
Contributor Author

symious commented Dec 10, 2021

@siddhantsangwan Thanks for the review.

Yes, and in fact the other initialize method seems a little odd. I added the method because the original ContainerBalancerConfiguration is constructed with OzoneConfiguration, which is to override those configurations on purpose.

I think the root problem is that some values of ContainerBalancerConfiguration are dependent on other configurations, like "size.entering.target.max" is depending on "OZONE_SCM_CONTAINER_SIZE_DEFAULT", but the current ConfigGroup annotation doesn't support this requirement.

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@symious thanks for the work , the changes looks good !
i left a comment , please take a look

Comment on lines 123 to 125
this.config = ozoneConfiguration.
getObject(ContainerBalancerConfiguration.class);
config.initialize(ozoneConfiguration);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be we can add a static build function to ContainerBalancerConfiguration class to build and initialize a ContainerBalancerConfiguration instance, for example:

public static ContainerBalancerConfiguration buildFrom(ozoneConfiguration oc) {
.....
}

@siddhantsangwan
Copy link
Contributor

siddhantsangwan commented Dec 13, 2021

I think the root problem is that some values of ContainerBalancerConfiguration are dependent on other configurations, like "size.entering.target.max" is depending on "OZONE_SCM_CONTAINER_SIZE_DEFAULT", but the current ConfigGroup annotation doesn't support this requirement.

Yeah. I can think of the following solutions:

  1. Add a static method that accepts OzoneConfiguration and returns a ContainerBalancerConfiguration by calling a private constructor. This constructor can initialize our special configs if we pass the OzoneConfiguration from the static method. We will also need to check if these values have already been set in ozone-site.xml to avoid overriding them.

may be we can add a static build function to ContainerBalancerConfiguration class to build and initialize a ContainerBalancerConfiguration instance

This seems similar to what @JacksonYao287 is suggesting.

  1. Maybe we can initialize these special values in their respective getters instead? Again, we'll first need to ensure these configs haven't already been set.
  2. We could also simplify it and just put default values in the @Config annotations. For example, 70 minutes for iteration interval, 26GB for max size entering target etc. What do you all think? cc @lokeshj1703

@symious
Copy link
Contributor Author

symious commented Dec 15, 2021

@siddhantsangwan @JacksonYao287 Thanks for the reply.

I think Option 1 is the original implementation. If we initialize ContainerBalancerConfiguration based on OzoneConfiguration, the @ConfigGroup pattern won't work, that is the content of ozone-site.xml won't be projected to ContainerBalancerConfiguration. AFAIK, we need to use ozoneConfiguration.getObject(ContainerBalancerConfiguration.class) to load the value from ozone-site.xml.

I think Option 3 might be a good choice. We can let ContainerBalancerConfiguration only handle the configs from ozone-site.xml, the validation and check of configs can be handled when starting container balancer service..

@lokeshj1703
Copy link
Contributor

Thanks @symious for working on this! I would like to add one more option here. ozoneConfiguration.getObject(ContainerBalancerConfiguration.class) We can try overriding this function to allow using class constructor which takes OzoneConfiguration as input.

@siddhantsangwan
Copy link
Contributor

I think Option 3 might be a good choice.

I agree, it's the simplest. The default values could be:

  • size.entering.target.max = 26GB
  • size.leaving.source.max = 26GB
  • balancing.iteration.interval = 70 minutes

Some very related changes are being made in #2892

@symious
Copy link
Contributor Author

symious commented Dec 16, 2021

@lokeshj1703 Thanks for the review.

We can try overriding this function to allow using class constructor which takes OzoneConfiguration as input.

The current implementation is inherited from ConfigurationSource as follows

default <T> T getObject(Class<T> configurationClass) {
    T configObject;
    try {
      configObject = configurationClass.newInstance();
    } catch (InstantiationException | IllegalAccessException e) {
      throw new ConfigurationException(
          "Configuration class can't be created: " + configurationClass, e);
    }

If we are going to override this method, it may look like "first try configurationClass.getDeclaredConstructor(OzoneConfiguration.class).newInstance(ozoneConfiguration), if not exists, then try configurationClass.newInstance()". Personally I think the implementation would look a little odd.

@symious
Copy link
Contributor Author

symious commented Dec 16, 2021

@siddhantsangwan Sure, will update the default value in the next commit.

@symious
Copy link
Contributor Author

symious commented Dec 20, 2021

@siddhantsangwan Update the commit. Could you help to check?

Added validateConfiguration in ContainerBalancer, it can return boolean to indicate an incorrect configuration, but many changes in test cases need to be made, for now, I leave the method to be void.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look great @symious . I have a few minor comments.

long size = (long) ozoneConfiguration.getStorageSize(
ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE,
ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE_DEFAULT, StorageUnit.BYTES) +
OzoneConsts.GB;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to add another GB here. The configs just need to be greater than size.

ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE_DEFAULT, StorageUnit.BYTES) +
OzoneConsts.GB;

if (conf.getMaxSizeEnteringTarget() < size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the check if another GB isn't added to size (as commented above)

Suggested change
if (conf.getMaxSizeEnteringTarget() < size) {
if (conf.getMaxSizeEnteringTarget() <= size) {

LOG.info("MaxSizeEnteringTarget should be larger than " +
"ozone.scm.container.size");
}
if (conf.getMaxSizeLeavingSource() < size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Suggested change
if (conf.getMaxSizeLeavingSource() < size) {
if (conf.getMaxSizeLeavingSource() <= size) {

// balancing interval should be greater than DUFactory refresh period
DUFactory.Conf duConf = ozoneConfiguration.getObject(DUFactory.Conf.class);
long balancingInterval = duConf.getRefreshPeriod().toMillis() +
Duration.ofMinutes(10).toMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, don't need to add extra 10 minutes since the config only needs to be greater than DU refresh period.

Comment on lines 780 to 781
LOG.info("balancing.iteration.interval should be at lease 10 minutes " +
"larger than hdds.datanode.du.refresh.period.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG.info("balancing.iteration.interval should be at lease 10 minutes " +
"larger than hdds.datanode.du.refresh.period.");
LOG.info("balancing.iteration.interval should be " +
"larger than hdds.datanode.du.refresh.period.");

" by default.")
private String excludeNodes = "";

private DUFactory.Conf duConf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn't getting initialised now, we can remove this and its usages in the setBalancingInterval method.

@siddhantsangwan
Copy link
Contributor

@siddhantsangwan Update the commit. Could you help to check?

Added validateConfiguration in ContainerBalancer, it can return boolean to indicate an incorrect configuration, but many changes in test cases need to be made, for now, I leave the method to be void.

Yes, having it return a boolean makes sense. And sure, we can add this change in a follow-up PR.

@symious
Copy link
Contributor Author

symious commented Dec 20, 2021

@siddhantsangwan Updated the patch, please have a check.

Yes, having it return a boolean makes sense. And sure, we can add this change in a follow-up PR.

Sure, will create a new ticket for this change.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update looks good to me.

@ferhui
Copy link
Contributor

ferhui commented Dec 23, 2021

@lokeshj1703 Can you please take another look?
@symious This PR has conflicts...

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @symious for updating this patch , i have two comments , please take a look

Comment on lines 605 to 606
Assert.assertEquals(cbConf.getThreshold(), 0.01d, DELTA);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove DELTA and use assertTrue(Doublu.compare(a,b) == 0)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed DELTA and use a similar comparison as


, please have a check.

Comment on lines +764 to +779
if (conf.getMaxSizeEnteringTarget() <= size) {
LOG.info("MaxSizeEnteringTarget should be larger than " +
"ozone.scm.container.size");
}
if (conf.getMaxSizeLeavingSource() <= size) {
LOG.info("MaxSizeLeavingSource should be larger than " +
"ozone.scm.container.size");
}

// balancing interval should be greater than DUFactory refresh period
DUFactory.Conf duConf = ozoneConfiguration.getObject(DUFactory.Conf.class);
long balancingInterval = duConf.getRefreshPeriod().toMillis();
if (conf.getBalancingInterval().toMillis() <= balancingInterval) {
LOG.info("balancing.iteration.interval should be larger than " +
"hdds.datanode.du.refresh.period.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when validating , if we find it is illegal(for example, conf.getMaxSizeEnteringTarget() <= size), should we throw an Exception and just return without starting balancer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was intended to return "false" for these situations, but it will break many test cases.
The changes will be brought up in a new ticket.

@symious
Copy link
Contributor Author

symious commented Dec 25, 2021

@symious This PR has conflicts...

@ferhui Thanks for the review. Resolved the conflicts.

@ferhui
Copy link
Contributor

ferhui commented Dec 27, 2021

Will Merge tomorrow if no other comments

@ferhui ferhui merged commit b570d0a into apache:master Dec 28, 2021
@ferhui
Copy link
Contributor

ferhui commented Dec 28, 2021

@symious Thanks for your contribution. @siddhantsangwan @lokeshj1703 @JacksonYao287 Thanks for your review! Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants