Skip to content

Conversation

@sodonnel
Copy link
Contributor

What changes were proposed in this pull request?

As it stands, the method:

 PipelineManager.createPipeline(ReplicationConfig replicationConfig,
      List<DatanodeDetails> excludedNodes, List<DatanodeDetails> favoredNodes)

Is used to create a new pipeline and add it to the Pipeline manager. This creation and adding is all performed under a single lock The somewhat expensive node selection process from the placement policy is performed under the lock.

For Ratis pipelines, this could be important, as pipelines are created and limited based on the number of existing pipelines on the nodes.

EC pipelines do not have such a limitation, and they also need to be created much more frequently due to their short lived nature.

This PR changes the pipeline manager to have a buildPipeline Method, which calls the placement policy and creates the new pipeline object without any locks.

Then there is an addPipeline method to add the created pipeline to the pipelineManager under the lock.

This will allow for the WritableECPipelineProvider to build pipelines without blocking other threads and then add them to the manager under a lock, reducing the amount of work performed under a lock. This change would need to be made in another PR.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8919

How was this patch tested?

Modified one test to call the new methods

@sodonnel
Copy link
Contributor Author

There are two things which concern me about this change:

  1. It is fine for EC pipeline to be built and never added to the manager, but for Ratis, it is not. Building a Ratis pipeline actually sends messages to the DNs to establish the Ratis ring. Therefore it might make sense to throw an error in a call comes into BuildPipeline with a Ratis replicationConfig.
  2. I was worried about a race condition where we create a pipeline, and then some of the nodes go stale or enter some maintenance state. Both of those events trigger the closing of the pipelines on the affect nodes via the StaleNodeHandler and the StartDatanodeAdminHandler. In both cases, these handlers get the pipeline list on the datanode from NodeManager, which stores it in a concurrent hash map. So even before this change, there is a chance that a pipeline is created, right as the node state changes, and the staleNodeHandler or StartDatanodeAdminHandler will miss the pipeline as it has not made it into the nodeManager map yet. For stale nodes, it would get caught when it goes dead. For decommission / maintenance, I believe it would be a problem (this problem already exists in theory and is no worse due to this change) as the DatanodeAdminMonitorImpl would keep waiting until the pipelines get closed. It would make sense to do something in DatanodeAdminMonitorImpl to trigger another close if it finds pipelines still open.

@guohao-rosicky
Copy link
Contributor

Thanks @sodonnel work on this. This will help the performance of the ec container allocate block.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sodonnel for working on this.

Therefore it might make sense to throw an error in a call comes into BuildPipeline with a Ratis replicationConfig.

Agreed. In fact, it may be good to include EC in the method names.

Comment on lines 283 to 284
stateManager.addPipeline(pipeline.getProtobufMessage(
ClientVersion.CURRENT_VERSION));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline.getProtobufMessage can now be moved outside the locked section.

LOG.debug("Failed to create pipeline with replicationConfig {}.",
replicationConfig, ex);
LOG.debug("Failed to add pipeline with replicationConfig {}.",
pipeline.getReplicationConfig(), ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we may want to log the pipeline itself. Previously this was called with a fresh pipeline, but now pipeline can be "anything".

@adoroszlai
Copy link
Contributor

@sodonnel Thanks for updating the PR. It seems there is a test for PipelineManagerImpl failing due to the new replication type check:

Tests run: 22, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 13.469 s <<< FAILURE! - in org.apache.hadoop.hdds.scm.pipeline.TestPipelineManagerImpl
[ERROR] org.apache.hadoop.hdds.scm.pipeline.TestPipelineManagerImpl.testCreatePipeline  Time elapsed: 0.431 s  <<< ERROR!
java.lang.IllegalArgumentException: Replication type must be EC
	at org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.buildECPipeline(PipelineManagerImpl.java:214)
	at org.apache.hadoop.hdds.scm.pipeline.TestPipelineManagerImpl.testCreatePipeline(TestPipelineManagerImpl.java:199)

@adoroszlai adoroszlai changed the title HDDS-8919. Allow a pipeline to be created and then added to PipelineManager in two steps HDDS-8919. Allow EC pipelines to be created and then added to PipelineManager in two steps Jun 26, 2023
@adoroszlai adoroszlai merged commit 0f15552 into apache:master Jun 26, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Jun 30, 2023
* master:
  HDDS-8555. [Snapshot] When snapshot feature is disabled, block OM startup if there are still snapshots in the system (apache#4994)
  HDDS-8782. Improve Volume Scanner Health checks. (apache#4867)
  HDDS-8447. Datanodes should not process container deletes for failed volumes. (apache#4901)
  HDDS-5869. Added support for stream on S3Gateway write path (apache#4970)
  HDDS-8859. [Snapshot] Return failure message to client for a failed snapshot diff jobs (apache#4993)
  HDDS-8939. [Snapshot] isBlockLocationSame check should be skipped if object is not OmKeyInfo. (apache#4991)
  HDDS-8923. Expose XceiverClient cache stats as metrics (apache#4979)
  HDDS-8913. ContainerManagerImpl: reduce processing while locked (apache#4967)
  HDDS-8935. [Snapshot] Fallback to full diff if getDetlaFiles from compaction DAG fails (apache#4986)
  HDDS-8911. Update Hadoop to 3.3.6 (apache#4985)
  HDDS-8931. Allow EC PipelineChoosingPolicy to be defined separately from Ratis (apache#4983)
  HDDS-8895. Support dynamic change of ozone.readonly.administrators in SCM (apache#4977)
  HDDS-6814. Make OM service ID optional for `ozone s3` commands if only one is defined in config (apache#4953)
  HDDS-8925. BaseFreonGenerator may not complete if last attempts fail (apache#4975)
  HDDS-7100. Container scanner incorrectly marks containers unhealthy when DN is shutdown (apache#4951)
  HDDS-8919. Allow EC pipelines to be created and then added to PipelineManager in two steps (apache#4968)
  HDDS-8901. Enable mTLS for InterSCMGrpcProtocol. (apache#4964)

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/Container.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/ContainerTestUtils.java
vtutrinov pushed a commit to Cyrill/ozone that referenced this pull request Jul 3, 2023
vtutrinov pushed a commit to Cyrill/ozone that referenced this pull request Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants