Skip to content

Conversation

@sumitagrawl
Copy link
Contributor

@sumitagrawl sumitagrawl commented Jan 5, 2025

What changes were proposed in this pull request?

Granular locking for OBS for existing flow.

locking is added for external request at entry point. This provides execution of request at leader and existing flow simultaneouly without impacting for cache.

  • refer obs-locking.md for locking added for obs request (HDDS-11898. design doc leader side execution)
  • refer leader-execution.md for Step-by-step integration of existing request (interoperability)

Parent Jira:
https://issues.apache.org/jira/browse/HDDS-11897

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11900

How was this patch tested?

  • UT cases added for lock, and requests
  • Existing test cases for impact

@kerneltime kerneltime added the om-pre-ratis-execution PRs related to https://issues.apache.org/jira/browse/HDDS-11897 label Jan 7, 2025
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prototype @sumitagrawl. Just did a high level review more focused on code organization than correctness. Can we split this PR down further?

  • One PR for the general locking framework + gatekeeper
    • Separating this will make it easier to follow changes when looking at commit history.
  • One PR for key related requests (create, allocate block, commit, delete, etc)
  • One PR for bucket requests (create, delete, update)
  • One PR for volume requests (create, delete, update)
  • One PR for S3 MPU requests

The current patch is already too large and once the required testing is added for each request it will only get larger.

import org.slf4j.LoggerFactory;

/**
* key locking.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a lot more detail to how this class is used

*/
public class KeyLock {
private static final Logger LOG = LoggerFactory.getLogger(KeyLock.class);
private static final long LOCK_TIMEOUT_DEFAULT = 10 * 60 * 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make stripe sizes and timeouts configurable. As suggested above though, this would happen in a layer above this class.

}
}

public List<Lock> readLock(List<String> keyList) throws OMException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need read lock? In the new flow RocksDB should handle consistent views without a cache. Is this just for compatibility with the old flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readlock is added for bucket when key operation is in progress. Some key type need read lock and some need write lock, as as Generic part, the interface is provided.

}

public void start() {
ThreadFactory threadFactory = new ThreadFactoryBuilder().setDaemon(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extend the existing BackgroundService class to keep all background service management consistent

/**
* key locking.
*/
public class KeyLock {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name is confusing. It seems to imply that it for locking "keys" as they are defined in the Ozone namespace. It is actually a generic class that supports locking based on any string, regardless of its significance to the system.

I would move this to one of the common modules outside of OM, have it take any configurations as parameters, and throw general typed exceptions that upper layers can chain on to specific service type exceptions like OMException. This class seems to fit in the same family of utils as SimpleStripe.

Comment on lines +126 to +131
public OmLockOpr.OmLockInfo lock(OzoneManager ozoneManager, OmLockOpr lockOpr) throws IOException {
return null;
}

public void unlock(OmLockOpr lockOpr, OmLockOpr.OmLockInfo lockInfo) {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of defining these stubs in the parent class? I don't see why anyone would call them from outside (public) or using the generic OMClientRequest type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lock key needs to be obtained from Request Object structure, and its different for different request.

So its not possible to get this directly as AOP. Two way of implementation:

  1. Switch case of each request type, and retrieve the lock information based on this
    -- This is tried out but that class itself looks complex and un-manageble
  2. Each request handle define how it can get and call respective lock
  3. Some operation do not need any lock, so provided with default implementation on no lock

executorService.shutdown();
}

public OmLockInfo volumeReadLock(String volumeName) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to greatly reduce the API surface required here. My understanding is that only one lock used at write time for a volume, bucket, or key is required, but the design doc unfortunately does not provide more info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add to the doc,

We have different lock group for volume, bucket and key, as they can conflict with each other.

As API surface,
For volume handling, we have, volume read and write lock
For bucket handling, we have bucket read and write lock
For OBS key handling, its obsLock

So had these set of API, will update design doc further.


}

public OmLockOpr.OmLockInfo lock(OzoneManager ozoneManager, OmLockOpr lockOpr) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we touching FSO/Legacy requests in this OBS patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@errose28 We have, OMDirectoryCreateRequest for OBS and OMDirectoryCreateRequestWithFSO for FSO.
So above class is OBS only (not FSO)
Legacy many places is treated as OBS only, so from this code perspective, it can not distinguish if LEGACY or OBS, so precautionary added the check for OBS.

if (!s3Auth) {
OzoneManagerRatisUtils.checkLeaderStatus(ozoneManager);
}
if (isRatisEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary with the current 2.0 changes to remove non-ratis OM from master.

}
}

private static void performUnlock(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design doc mentioned some sort of "GateKeeper" entity, which I was expecting to see in this PR. Currently it looks like locking/unlocking is strewn about multiple places. Can we consolidate coordination to one new OMRequestGateKeeper class?

long curTime = Time.monotonicNowNanos() - MONITOR_LOCK_THRESHOLD_NS;
for (OmLockInfo lockInfo : lockMonitorMap.values()) {
if ((curTime - lockInfo.getLockTakenTime()) > 0) {
LOG.warn("Lock {} is crossing threshold {}: ", lockInfo, MONITOR_LOCK_THRESHOLD_NS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will completely flood the logs in a deadlock situation, regardless of the interval the check runs. We need some in-memory tracker for which locks have already had warnings issued.

@sumitagrawl
Copy link
Contributor Author

@errose28 First split is created: #7936 and covered all comments related to that,

@sumitagrawl sumitagrawl changed the title HDDS-11900. obs granular lock for existing request HDDS-11900. obs granular lock for existing request [To be splitted] Mar 27, 2025
@adoroszlai
Copy link
Contributor

Being split.

@adoroszlai adoroszlai closed this Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

om-pre-ratis-execution PRs related to https://issues.apache.org/jira/browse/HDDS-11897

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants