Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

  1. ContainerManagerImpl lock is intended only for write operations (see comment on lock). ContainerStateManagerImpl has its own read/write lock. Earlier ConcurrentModificationException (HDDS-6314) happened because ContainerStateManagerImpl returns an unmodifiable view of its internal sets. These are not thread-safe, only protect against changing the set. Replaced these with ImmutableSet.copyOf.
  2. Replace stream with iteration.
  3. Remove ineligible items from the list being sorted before sorting it.
  4. Only filter items if request is not for all items (as indicated by the smallest possible ContainerID: 0.
  5. Increment metric for getContainers(LifeCycleState), was missing previously.
  6. Move metrics calls out of locked block in deleteContainer.

Further improvement is possible, as some of the container sets are already sorted (to be done separately in HDDS-6315).

https://issues.apache.org/jira/browse/HDDS-8913

How was this patch tested?

CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/5356645847

return haManager;
}

private static List<ContainerID> filterSortAndLimit(
Copy link
Contributor

@sodonnel sodonnel Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be made more efficient. The current algorithm takes the set, forms a new list of all elements. Then filters, sorted and returns the range.

Using an adaption of the priorityQueue technique here - https://www.baeldung.com/java-array-top-elements

We could avoid the copy from set to list, and the potentially large sort.

for (contaienrID cid : set) {
    if (id > startID)
        maxHeap.add(number);

        if (maxHeap.size() > count) {
            maxHeap.poll();
        }
    }
});
return new ArrayList<>(maxHeap);
// Might need reversed, I am not sure

I think this would be more memory efficient on large container sets, and should perform better than the full sort

Copy link
Contributor Author

@adoroszlai adoroszlai Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I'll update the patch with it.

FYI there is a bug in that solution due to PriorityQueue#iterator():

The iterator does not return the elements in any particular order.

so new ArrayList<>(maxHeap) randomizes the result.

It seems to work for (small?) Integers, but not for any random Comparable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how many people have not noticed that issue, and just implemented the linked solution as is!

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Nice work making the topN generic for future usage!

@adoroszlai adoroszlai merged commit 9a9fcef into apache:master Jun 27, 2023
@adoroszlai adoroszlai deleted the HDDS-8913 branch June 27, 2023 18:16
@adoroszlai
Copy link
Contributor Author

Thanks @siddhantsangwan, @sodonnel for the review.

errose28 added a commit to errose28/ozone that referenced this pull request Jun 30, 2023
* master:
  HDDS-8555. [Snapshot] When snapshot feature is disabled, block OM startup if there are still snapshots in the system (apache#4994)
  HDDS-8782. Improve Volume Scanner Health checks. (apache#4867)
  HDDS-8447. Datanodes should not process container deletes for failed volumes. (apache#4901)
  HDDS-5869. Added support for stream on S3Gateway write path (apache#4970)
  HDDS-8859. [Snapshot] Return failure message to client for a failed snapshot diff jobs (apache#4993)
  HDDS-8939. [Snapshot] isBlockLocationSame check should be skipped if object is not OmKeyInfo. (apache#4991)
  HDDS-8923. Expose XceiverClient cache stats as metrics (apache#4979)
  HDDS-8913. ContainerManagerImpl: reduce processing while locked (apache#4967)
  HDDS-8935. [Snapshot] Fallback to full diff if getDetlaFiles from compaction DAG fails (apache#4986)
  HDDS-8911. Update Hadoop to 3.3.6 (apache#4985)
  HDDS-8931. Allow EC PipelineChoosingPolicy to be defined separately from Ratis (apache#4983)
  HDDS-8895. Support dynamic change of ozone.readonly.administrators in SCM (apache#4977)
  HDDS-6814. Make OM service ID optional for `ozone s3` commands if only one is defined in config (apache#4953)
  HDDS-8925. BaseFreonGenerator may not complete if last attempts fail (apache#4975)
  HDDS-7100. Container scanner incorrectly marks containers unhealthy when DN is shutdown (apache#4951)
  HDDS-8919. Allow EC pipelines to be created and then added to PipelineManager in two steps (apache#4968)
  HDDS-8901. Enable mTLS for InterSCMGrpcProtocol. (apache#4964)

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/Container.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/ContainerTestUtils.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants