Skip to content

Conversation

@nandakumar131
Copy link
Contributor

What changes were proposed in this pull request?

Whenever we close a pipeline, we have an option to give some grace time for the container to get closed, the grace time is configured using ozone.scm.pipeline.destroy.timeout.
We wait for the timeout to happen before we go ahead and delete the pipeline. This will give enough time for the datanodes to close the container gracefully.
It will prevent the containers from moving to the QUASI_CLOSED state.

This functionality is broken and we don't wait for the timeout to happen before we delete the Pipeline. This creates a lot of QUASI_CLOSED containers in the cluster when a node goes stale or when a datanode is getting decommissioned.

This has to be fixed and we should wait for the configured amount of time before we delete the Pipeline, this will give datanodes enough time to CLOSE the containers on that pipeline.

What is the link to the Apache JIRA

HDDS-9479

How was this patch tested?

Unit test modified to test the behaviour.

"ozone.scm.pipeline.scrub.interval";
public static final String OZONE_SCM_PIPELINE_SCRUB_INTERVAL_DEFAULT =
"5m";
"60s";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can have unintended consequence. We do not know how long a pipeline stays in ALLOCATED state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a separate property for handling pipelines in ALLOCATED state.
Since the scrubber thread is handling both ALLOCATED and CLOSED pipelines, it should have a lower interval.

this.creationTimestamp = Instant.now();
this.suggestedLeaderId = suggestedLeaderId;
this.replicaIndexes = new HashMap<>();
this.stateEnterTime = Instant.now();
Copy link
Contributor

@kerneltime kerneltime Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to compare against the configured clock in SCMContainerManager

private void initializeSystemManagers(OzoneConfiguration conf,
      SCMConfigurator configurator) throws IOException {
    // Use SystemClock when data is persisted
    // and used again after system restarts.
    systemClock = Clock.system(ZoneOffset.UTC);

Would it be possible to get the creation timestamp from the same clock here? Currently it should be fine but it would be more resistant to bugs due to refactoring in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, let me create a separate Jira for handling this.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nandakumar131 Thanks for working over this, have minor comment

return creationTimestamp;
}

public Instant getStateEnterTime() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, need set closedState time when close is triggered for pipeline close, else this time is same as creation time as initialized in constructor.
This is to scrub from close time with delay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumitagrawl the PipelineState inside Pipeline class is immutable.
If we want to change the state of a pipeline, we create a new Pipeline object with new state which will also update the stateEnterTime with current time.

Since PipelineState is immutable, stateEnterTime is also designed to be immutable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nandakumar131
We can add comment in Pipeline build() to notify the stateEnterTime will be latest when enter the state and no need update the time while building this object. Just to avoid later on changes if context is missed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is a little strange that creationTime and stateEnterTime are both basically hardcoded to "now". Feels like we should only need one of them, as creation time isn't really "correct". If the pipeline changes state from open to closed, the creation time is going to change too, due to its immutable nature. I can see how this could be confusing to someone in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sodonnel, I completely understand the confusion here. We should refactor the code for better understandability.
Currently we don't have any functional issues as we update the creation time in the builder while we create the Pipeline object from existing pipeline reference.
If you look at Pipeline$Builder(Pipeline pipeline), it assigns the creation time from the given pipeline reference.

With the current code, it's easy to introduce bugs in future. We should re-write the Pipeline class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creationTimestamp in the Pipeline constructor is a fallback value. We update the creation time in the Pipeline$Builder#build. If the creation time is not set in the builder, we use the Instant#now value set in constructor.

Copy link
Contributor

@sodonnel sodonnel Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that the builder sets creation time. I guess this is OK as the constructor is private, so the only way to create the object is via the builder.

I noticed that the setter is public:

  public void setCreationTimestamp(Instant creationTimestamp) {
    this.creationTimestamp = creationTimestamp;
  }

We could make that private as the inner Builder class can call the private methods. This is not really in scope for this PR though.


void openPipeline(PipelineID pipelineId) throws IOException;

@Deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why leave this deprecated on around? This is not an external facing API, so nothing should be using it except tests. It would probably be better to just remove it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, we can remove this method completely. I just didn't want to have large changes in a single PR if possible, it will become hard for the reviewers.

We will have to change another 15+ classes to remove this method, I will create a follow-up Jira to make this change.

closePipeline(p, false);
LOG.info("Closed the stale pipeline: {}", p.getId());
final PipelineID id = p.getId();
LOG.info("Closing the stale pipeline: {}", id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I am not sure we need this log message. We have a message below for "closed" and then exception handler has a message for "failed to close".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nandakumar131 nandakumar131 merged commit 2cbb0e6 into apache:master Nov 29, 2023
@nandakumar131 nandakumar131 deleted the HDDS-9479 branch November 29, 2023 10:46
@nandakumar131 nandakumar131 restored the HDDS-9479 branch November 29, 2023 10:46
@nandakumar131
Copy link
Contributor Author

Thanks @sumitagrawl @sodonnel for the review.

@nandakumar131 nandakumar131 deleted the HDDS-9479 branch November 29, 2023 10:49
@sodonnel
Copy link
Contributor

sodonnel commented Nov 29, 2023

I think this PR might be cause a new test failure:

Error:  Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 61.73 s <<< FAILURE! -- in org.apache.hadoop.hdds.scm.pipeline.TestPipelineClose
Error:  org.apache.hadoop.hdds.scm.pipeline.TestPipelineClose.testPipelineCloseWithPipelineAction -- Time elapsed: 27.06 s <<< ERROR!
java.util.concurrent.TimeoutException: 
Timed out waiting for condition. Thread diagnostics:
Timestamp: 2023-11-29 01:39:38,610

Also failed locally for me locally on the first try:

java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)



	at org.apache.ozone.test.GenericTestUtils.waitFor(GenericTestUtils.java:231)
	at org.apache.hadoop.hdds.scm.pipeline.TestPipelineClose.testPipelineCloseWithPipelineAction(TestPipelineClose.java:193)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
	at org.junit.jupiter.engine.extension.TimeoutInvocation.proceed(TimeoutInvocation.java:46)
	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
	at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at java.util.ArrayList.forEach(ArrayList.java:1259)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at java.util.ArrayList.forEach(ArrayList.java:1259)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:54)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:86)
	at org.junit.platform.launcher.core.DefaultLauncherSession$DelegatingLauncher.execute(DefaultLauncherSession.java:86)
	at org.junit.platform.launcher.core.SessionPerRequestLauncher.execute(SessionPerRequestLauncher.java:53)
	at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:57)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
	at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55)

@nandakumar131
Copy link
Contributor Author

nandakumar131 commented Nov 29, 2023

Thanks for the ping @sodonnel.

The test only waits for 5 seconds before it timeout and fails.
The ozone.scm.pipeline.destroy.timeout is set to 5 seconds in the test. So, SCM will send close pipeline command to Datanode only after 5 seconds and the datanode is not getting enough time to remove the pipeline.

It's good to increase the max wait time in test to at least 10 seconds.

Created HDDS-9801 to fix the same.

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Feb 1, 2024
…be closed. (apache#5604)

(cherry picked from commit 2cbb0e6)
Change-Id: Ie5d95e5cb910b9a98702ae11fcee88ade4fe9cc9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants