Skip to content

Conversation

@sodonnel
Copy link
Contributor

@sodonnel sodonnel commented Oct 2, 2020

What changes were proposed in this pull request?

If you call pipelineManager.finalizeAndDestroyPipeline() with onTimeout=false, then the finalizePipeline call will result in a closeContainer event to be fired for every container on the pipeline. These are handled asynchronously.

However, immediately after that, the destroyPipeline(...) call is made. This will remove the pipeline details from the various maps / stores.

Then the closeContainer events get processed, and they attempt to remove the container from the pipeline. However as the pipeline has already been destroyed, this throws an exception and the close container events never get sent to the DNs:

2020-10-01 15:44:18,838 [EventQueue-CloseContainerForCloseContainerEventHandler] INFO container.CloseContainerEventHandler: Close container Event triggered for container : #2
2020-10-01 15:44:18,842 [EventQueue-CloseContainerForCloseContainerEventHandler] ERROR container.CloseContainerEventHandler: Failed to close the container #2.
org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: PipelineID=59e5ae16-f1fe-45ff-9044-dd237b0e91c6 not found
	at org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.removeContainerFromPipeline(PipelineStateMap.java:372)
	at org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.removeContainerFromPipeline(PipelineStateManager.java:111)
	at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removeContainerFromPipeline(SCMPipelineManager.java:413)
	at org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerState(SCMContainerManager.java:352)
	at org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerState(SCMContainerManager.java:331)
	at org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:66)
	at org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.Onmessage(CloseContainerEventHandler.java:45)
	at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor

The simple solution is to catch the exception and ignore it.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4304

How was this patch tested?

Validated manually in a docker environment.

@sodonnel sodonnel changed the title Add try catch block to handle pipeline which does not exist HDDS-4304. Close Container event can fail if pipeline is removed first Oct 2, 2020
@nandakumar131 nandakumar131 merged commit 5719615 into apache:master Oct 5, 2020
errose28 pushed a commit to errose28/ozone that referenced this pull request Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants