Skip to content

Conversation

@sumitagrawl
Copy link
Contributor

@sumitagrawl sumitagrawl commented Apr 17, 2023

What changes were proposed in this pull request?

registerCallback in Lease is not safe if done after lease expiry in parallel, so this callback will never be registered. Removed this method exposer.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8418

How was this patch tested?

UT case and IT for checking any impact

@adoroszlai adoroszlai marked this pull request as draft April 17, 2023 13:08
@adoroszlai
Copy link
Contributor

@sumitagrawl there are multiple test failures, please check

@sumitagrawl sumitagrawl marked this pull request as ready for review April 21, 2023 08:15
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumitagrawl thanks for working on this. It's not clear what the problem was and how the patch is supposed to fix it.

I suspect intermittent failure in TestCloseContainerEventHandler could be the result of the problem being fixed:

[INFO] Running org.apache.hadoop.hdds.scm.container.TestCloseContainerEventHandler
Error:  Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.357 s <<< FAILURE! - in org.apache.hadoop.hdds.scm.container.TestCloseContainerEventHandler
Error:  org.apache.hadoop.hdds.scm.container.TestCloseContainerEventHandler.testCloseContainerWithDelayByLeaseManager  Time elapsed: 4.528 s  <<< FAILURE!
Wanted but not invoked:
eventPublisher.fireEvent(
    TypedEvent{payloadType=CommandForDatanode, name='Datanode_Command'},
    <Capturing argument>
);
-> at org.apache.hadoop.hdds.scm.container.TestCloseContainerEventHandler.testCloseContainerWithDelayByLeaseManager(TestCloseContainerEventHandler.java:151)
Actually, there were zero interactions with this mock.

Synchronization in Lease is very mixed after this patch:

  • write to expired is synchronized (invalidate()), but read is not (hasExpired()), except when called from other synchronized method
  • callbacks is a synchronizedList, item addition is also synchronized on Lease (registerCallback()), but getCallbacks() returns the list as is, and iteration happens without either lock
  • leaseTimeout is atomic, write is also synchronized (renew()), but read is not (getLeaseLifeTime())

Several methods called from LeaseManager are also protected by synchronization in that class.

Instead of adding a few synchronized keywords here and there, it would be great to make it consistent.

@szetszwo
Copy link
Contributor

The bug is the acquire methods returning a Lease and then registerCallBack later. It seems the current change does not prevent that. We should pass the callbacks to acquire at the same time in order to fix the bug.

@adoroszlai adoroszlai marked this pull request as draft May 17, 2023 16:48
@sumitagrawl sumitagrawl marked this pull request as ready for review June 27, 2023 02:20
}

private synchronized void handleTimeout(EventPublisher publisher,
private synchronized Void handleTimeout(EventPublisher publisher,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this part, what is the impact of changing void-> Void and returning null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callback as getting registered has return type is Callback with "Void" Object, so this gives error if return type mismatch,

Syntax for acquire: public synchronized Lease<T> acquire( T resource, Callable<Void> callback)

Below is error if we keep void:
Bad return type in lambda expression: void cannot be converted to Void

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it , looks like autoboxing won't happen for void type. Thanks for explaining

Copy link
Contributor

@sadanand48 sadanand48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sadanand48 sadanand48 merged commit 8276aad into apache:master Sep 22, 2023
@sadanand48
Copy link
Contributor

Thanks @sumitagrawl for the fix, @adoroszlai @szetszwo for reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants