Skip to content

Conversation

@aswinshakil
Copy link
Member

@aswinshakil aswinshakil commented Jul 27, 2023

What changes were proposed in this pull request?

ECUnderReplicationHandler is initialized before initializing ReplicationManagerMetrics, So whenever any ReplicationManagerMetrics is used, it throws NPE. The behavior is same for all these other classes and will throw NPEs,
ECOverReplicationHandler, ECMisReplicationHandler, RatisUnderReplicationHandler, RatisMisReplicationHandler

  ECUnderReplicationHandler(final PlacementPolicy containerPlacement,
      final ConfigurationSource conf, ReplicationManager replicationManager) {
    this.containerPlacement = containerPlacement;
    this.currentContainerSize = (long) conf
        .getStorageSize(ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE,
            ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE_DEFAULT, StorageUnit.BYTES);
    this.replicationManager = replicationManager;
    this.metrics = replicationManager.getMetrics(); --> still null because replication manager metrics is not initialized yet.
 }

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9090

How was this patch tested?

Manually tested by adding a line and verified it threw NPE.

  ECUnderReplicationHandler(final PlacementPolicy containerPlacement,
      final ConfigurationSource conf, ReplicationManager replicationManager) {
    this.containerPlacement = containerPlacement;
    this.currentContainerSize = (long) conf
        .getStorageSize(ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE,
            ScmConfigKeys.OZONE_SCM_CONTAINER_SIZE_DEFAULT, StorageUnit.BYTES);
    this.replicationManager = replicationManager;
    this.metrics = replicationManager.getMetrics();
    this.metrics.getEcPartialReconstructionNoneOverloadedTotal(); ---> Added line manually just to check NPE
}
java.lang.NullPointerException
	at org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.<init>(ECUnderReplicationHandler.java:78)
	at org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.<init>(ReplicationManager.java:247)
	at org.apache.hadoop.hdds.scm.container.replication.TestReplicationManager$1.<init>(TestReplicationManager.java:177)
	at 
Screen Shot 2023-07-27 at 10 58 57 AM

Even after metrics is initialized, the metrics inside other classes are null

@umamaheswararao umamaheswararao merged commit 31d43d1 into apache:master Jul 27, 2023
Comment on lines -288 to 291
running = true;
metrics = ReplicationManagerMetrics.create(this);
if (rmConf.isLegacyEnabled()) {
legacyReplicationManager.setMetrics(metrics);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aswinshakil for the fix. Please note that ReplicationManager can be stopped/restarted by admin from CLI.
stop() unregisters the metrics source, and start() used to (re)create it. But now it isn't re-created, so it won't work after restart. One way to fix this is to add a method in ReplicationManagerMetrics to register it with the metrics system (without creating a new instance), and call this new method from start().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out @adoroszlai . I wasn't aware of that, feel free to assign the JIRA to me if created, or I can create one.

Copy link
Contributor

@adoroszlai adoroszlai Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aswinshakil, created HDDS-9235.

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Sep 20, 2023
…gerMetrics because of the order of initialization (apache#5126)

(cherry picked from commit 31d43d1)
Change-Id: I965351495cdbc096425fba6f1f1e2260f53416b3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants