Skip to content

Conversation

@jojochuang
Copy link
Contributor

@jojochuang jojochuang commented Jan 18, 2021

What changes were proposed in this pull request?

Make RDBMetrics#unRegister() synchronized. Accessing the instance object inside RDMBetrics#create() is synchronized, and therefore RDBMetrics#unRegister() should be made synchronized to protect the instance object.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4722

How was this patch tested?

Manually tested.

Change-Id: Ieca93325d5c001c033e1d2d18b273f2c694da037
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jojochuang Synchronizing unRegister may fix the race condition, but I'm not sure using a single shared instance is OK in the first place. Imagine the following steps:

  1. RDBStore is created with new RDBMetrics instance
  2. Another RDBStore is created, reuses RDBMetrics instance
  3. One of the RDBStores is closed, shared RDBMetrics is unregistered
  4. Other RDBStore is still used, but can it report events to the metrics system?

Shouldn't each RDBStore register its RDBMetrics instance with a unique name?

@jojochuang
Copy link
Contributor Author

Great question.... IMO we want a single RDBMetric instance. With the current architecture we will see thousands of containers per DataNode. Having metrics for each and every container... that seems too excessive to me. Plus, the metrics are kept only during the lifetime of a container. I feel like we want a singleton RDBMetric for the duration of the process.

@vivekratnavel thoughts? you wrote the original code.

@adoroszlai
Copy link
Contributor

I feel like we want a singleton RDBMetric for the duration of the process.

I also think this was the intention of introducing the shared instance. In this case I think it should be reference counted, and the last one should clean up.

@vivekratnavel
Copy link
Contributor

Thanks @jojochuang and @adoroszlai for the patch and review!

@avijayanhwx will also have a good context on this issue. If I remember correctly, we discussed some of the potential issues while making the RDBMetrics a shared instance, but I think we overlooked the example scenario that @adoroszlai has stated in his comment above. Please correct me if I am wrong.

@adoroszlai You are right about our intention for the shared Metrics instance. Before sharing the instance, we were getting the wrong metrics since each store was keeping its own count. I think it's time to think about all the scenarios thoroughly and redesign RDBMetrics if needed.

@adoroszlai
Copy link
Contributor

Let's merge this to make synchronized access consistent. Then we can improve the shared instance with reference counting.

@adoroszlai adoroszlai merged commit a795ab0 into apache:master Feb 25, 2021
errose28 added a commit to errose28/ozone that referenced this pull request Mar 2, 2021
…ing-upgrade

* upstream/master: (29 commits)
  HDDS-4741. Modularize upgrade test (apache#1928)
  HDDS-4864. Add acceptance tests to certify Ozone with boto3 python client. (apache#1976)
  HDDS-4791. StateContext.getReports may return list with size larger t… (apache#1892)
  HDDS-4867. Ozone admin datanode list should report dead and stale nodes (apache#1966)
  HDDS-4858. Useless Maven cache cleanup (apache#1956)
  HDDS-4769. Simplify insert operation of ContainerAttribute (apache#1865)
  HDDS-4847. Fix typo in name of IdentityService (apache#1941)
  HDDS-4869. Bump jackson version number (apache#1963)
  HDDS-4871. Fix intellij runConfigurations for datanode (apache#1968)
  HDDS-4870. Bump jetty version (apache#1964)
  HDDS-4722. Creating RDBStore fails due to RDBMetrics instance race (apache#1820)
  HDDS-4138. Improve crc efficiency by using Java.util.zip.CRC when available (apache#1950)
  HDDS-4816. Add UsageInfoSubcommand to get Datanode usage information. (apache#1919)
  HDDS-4754. Make scm heartbeat rpc retry interval configurable (apache#1942)
  HDDS-4832. Show Datanode OperationalState in Recon (apache#1937)
  HDDS-4653. Support TDE for MPU Keys on Encrypted Buckets (apache#1766)
  HDDS-4853. libexec/entrypoint.sh might copy from wrong path (apache#1951)
  HDDS-4857. Format ReplicationType.java which indentation are confusion (apache#1952)
  HDDS-4850. Intermittent failure in ozonesecure due to unable to allocate block (apache#1948)
  HDDS-4808. Add Genesis benchmark for various CRC implementations (apache#1910)
  ...

Conflicts:
	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/client/ScmClient.java
	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java
	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConsts.java
	hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java
	hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto
	hadoop-hdds/interface-client/src/main/proto/hdds.proto
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java
	hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/ContainerOperationClient.java
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
errose28 added a commit to errose28/ozone that referenced this pull request Mar 11, 2021
…ing-upgrade-merge-candidate

* upstream/master: (29 commits)
  HDDS-4741. Modularize upgrade test (apache#1928)
  HDDS-4864. Add acceptance tests to certify Ozone with boto3 python client. (apache#1976)
  HDDS-4791. StateContext.getReports may return list with size larger t… (apache#1892)
  HDDS-4867. Ozone admin datanode list should report dead and stale nodes (apache#1966)
  HDDS-4858. Useless Maven cache cleanup (apache#1956)
  HDDS-4769. Simplify insert operation of ContainerAttribute (apache#1865)
  HDDS-4847. Fix typo in name of IdentityService (apache#1941)
  HDDS-4869. Bump jackson version number (apache#1963)
  HDDS-4871. Fix intellij runConfigurations for datanode (apache#1968)
  HDDS-4870. Bump jetty version (apache#1964)
  HDDS-4722. Creating RDBStore fails due to RDBMetrics instance race (apache#1820)
  HDDS-4138. Improve crc efficiency by using Java.util.zip.CRC when available (apache#1950)
  HDDS-4816. Add UsageInfoSubcommand to get Datanode usage information. (apache#1919)
  HDDS-4754. Make scm heartbeat rpc retry interval configurable (apache#1942)
  HDDS-4832. Show Datanode OperationalState in Recon (apache#1937)
  HDDS-4653. Support TDE for MPU Keys on Encrypted Buckets (apache#1766)
  HDDS-4853. libexec/entrypoint.sh might copy from wrong path (apache#1951)
  HDDS-4857. Format ReplicationType.java which indentation are confusion (apache#1952)
  HDDS-4850. Intermittent failure in ozonesecure due to unable to allocate block (apache#1948)
  HDDS-4808. Add Genesis benchmark for various CRC implementations (apache#1910)
  ...

Conflicts:
	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/client/ScmClient.java
	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java
	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConsts.java
	hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java
	hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto
	hadoop-hdds/interface-client/src/main/proto/hdds.proto
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java
	hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/ContainerOperationClient.java
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
	hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconNodeManager.java
errose28 added a commit to errose28/ozone that referenced this pull request Mar 16, 2021
* HDDS-3698-nonrolling-upgrade: (29 commits)
  HDDS-4741. Modularize upgrade test (apache#1928)
  HDDS-4864. Add acceptance tests to certify Ozone with boto3 python client. (apache#1976)
  HDDS-4791. StateContext.getReports may return list with size larger t… (apache#1892)
  HDDS-4867. Ozone admin datanode list should report dead and stale nodes (apache#1966)
  HDDS-4858. Useless Maven cache cleanup (apache#1956)
  HDDS-4769. Simplify insert operation of ContainerAttribute (apache#1865)
  HDDS-4847. Fix typo in name of IdentityService (apache#1941)
  HDDS-4869. Bump jackson version number (apache#1963)
  HDDS-4871. Fix intellij runConfigurations for datanode (apache#1968)
  HDDS-4870. Bump jetty version (apache#1964)
  HDDS-4722. Creating RDBStore fails due to RDBMetrics instance race (apache#1820)
  HDDS-4138. Improve crc efficiency by using Java.util.zip.CRC when available (apache#1950)
  HDDS-4816. Add UsageInfoSubcommand to get Datanode usage information. (apache#1919)
  HDDS-4754. Make scm heartbeat rpc retry interval configurable (apache#1942)
  HDDS-4832. Show Datanode OperationalState in Recon (apache#1937)
  HDDS-4653. Support TDE for MPU Keys on Encrypted Buckets (apache#1766)
  HDDS-4853. libexec/entrypoint.sh might copy from wrong path (apache#1951)
  HDDS-4857. Format ReplicationType.java which indentation are confusion (apache#1952)
  HDDS-4850. Intermittent failure in ozonesecure due to unable to allocate block (apache#1948)
  HDDS-4808. Add Genesis benchmark for various CRC implementations (apache#1910)
  ...
jojochuang added a commit to jojochuang/ozone that referenced this pull request Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants