Skip to content

Conversation

@sodonnel
Copy link
Contributor

What changes were proposed in this pull request?

This change introduces the ability to re-create / overwrite a key in Ozone using an optimistic locking technique.

Say there is a desire to replace a key with some additional data added somewhere in the key, or perhaps change its replication type from Ratis to EC. To do this, you can read the current key data, write a new key with the same name, and then on commitKey, the new key version will be visible.

However, there is a possibility that some other client deletes the original key, or re-writes it at the same time, resulting in potential lost updates.

To replace a key in this way, the proposal is to use the existing objectID and updateID on the key to ensure the key has not changed since it was read. The flow would be:

  1. Get the keyInfo for the current key.
  2. Call the new bucket.overWriteKey() method, passing the details of the existing key
  3. This call will adjust the keyArgs to pass two new fields - overwriteObjectID and updateObjectID which are taken from the objectID and updateID of the existing key.
  4. When OM receives the open key request, it checks that an existing key is present having the passed keyname, objectID and updateID. If not, an error is returned. Otherwise the key is added to the openKeyTable, storing the overwrite IDs.
  5. The data is written to the key as usual.
  6. On key commit, the values stored in the openKey table for the overwrite IDs are checked against the current key. If the current key is absent, or its IDs have changed, the commit will fail and an error is returned. Otherwise the key is committed as usual.

This technique is similar to optimistic locking used in relational databases, to avoid holding a lock on an object for a long period of time.

Notably there are no additional locks needed on OM and no additional calls or rocksDB reads required to implement this - passing and storing the IDs in the openKey table is all that is required. The overwriteIDs don't need to be stored in the keyTable.

This change only added the feature for Object Store buckets for now.

Additionally, there is a question over what to do about meta-data and ACLs. Should they be copied from the existing key, or passed from the client.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10527

How was this patch tested?

New integration and unit tests added.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sodonnel for the patch.

Comment on lines 1426 to 1428
// TODO - if we are effectively cloning a key, probably the ACLs should
// be copied over server side. I am not too sure how this works.
.setAcls(getAclList())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think we should omit metadata and ACLs, if possible, and let OM copy them.

@adoroszlai adoroszlai changed the title HDDS-10527. KeyOverwrite with optimisitic locking HDDS-10527. KeyOverwrite with optimistic locking Mar 20, 2024
…ID is unchanged, and metadata is copied over to the open key
@kerneltime
Copy link
Contributor

Why not just introduce an object generation field and expose this API (in the future for applications itself).
Reference: https://cloud.google.com/storage/docs/json_api/v1/objects/update

Parameter Name Value Description
Optional query parameters
generation long If present, selects a specific revision of this object (as opposed to the latest version, the default).
ifGenerationMatch long Makes the operation conditional on whether the object's current generation matches the given value. Setting to 0 makes the operation succeed only if there are no live versions of the object.
ifGenerationNotMatch long Makes the operation conditional on whether the object's current generation does not match the given value. If no live object exists, the precondition fails. Setting to 0 makes the operation succeed only if there is a live version of the object.
ifMetagenerationMatch long Makes the operation conditional on whether the object's current metageneration matches the given value.
ifMetagenerationNotMatch long Makes the operation conditional on whether the object's current metageneration does not match the given value.
overrideUnlockedRetention boolean Applicable for object's that have an unlocked retention configuration: Required to be set to true if the operation includes a retention property that changes the mode to Locked, reduces the retainUntilTime, or removes the retention configuration from the object.

@sodonnel
Copy link
Contributor Author

Why not just introduce an object generation field and expose this API (in the future for applications itself). Reference: https://cloud.google.com/storage/docs/json_api/v1/objects/update
Parameter Name Value Description
Optional query parameters
generation long If present, selects a specific revision of this object (as opposed to the latest version, the default).
ifGenerationMatch long Makes the operation conditional on whether the object's current generation matches the given value. Setting to 0 makes the operation succeed only if there are no live versions of the object.
ifGenerationNotMatch long Makes the operation conditional on whether the object's current generation does not match the given value. If no live object exists, the precondition fails. Setting to 0 makes the operation succeed only if there is a live version of the object.
ifMetagenerationMatch long Makes the operation conditional on whether the object's current metageneration matches the given value.
ifMetagenerationNotMatch long Makes the operation conditional on whether the object's current metageneration does not match the given value.
overrideUnlockedRetention boolean Applicable for object's that have an unlocked retention configuration: Required to be set to true if the operation includes a retention property that changes the mode to Locked, reduces the retainUntilTime, or removes the retention configuration from the object.

I think it would be possible to extend to this in the future, but it may need more thought about what all operations it impacts. Implementing this sort of generic approach probably needs to cover key delete and create (overwrite) at least. Perhaps move / rename too. We would need to think through what it would mean for FSO buckets (perhaps nothing different, but I have not given it deep thought).

The workings of the proposed solution here is hidden behind the new overwriteKey API on the bucket object, and that is the only external API I have changed. If we were able to get the generation concept added, the overwrite key could be changed to use it and further enhance this sort of feature.

@sodonnel
Copy link
Contributor Author

Create https://issues.apache.org/jira/browse/HDDS-10558 for the idea suggested by @kerneltime as it is something that is worth exploring, but probably needs a bit more design work than just this PR.

@kerneltime
Copy link
Contributor

Create https://issues.apache.org/jira/browse/HDDS-10558 for the idea suggested by @kerneltime as it is something that is worth exploring, but probably needs a bit more design work than just this PR.

It would not make sense to have generation and also use objectID. It is essentially the same capability, one using just the objectID and another having a more developer friendly generation id. I would recommend to introduce generation ID here instead of using objectID. It would not make sense to expose object ID to developers.

@sodonnel
Copy link
Contributor Author

ObjectID is already there, as is updateID. They have not been introduced here, and they are already persisted and managed in OM. This PR only publicly exposes the new method on the bucket, with no intention of exposing it further.

If we are going to expose GenerationID through the APIs in the way the google cloud docs indicate, then we need to decide if that is something we want to start with. Eg Google cloud supports it, but AWS does not. Its more test surface, and more code to write and expose via all the interfaces and most importantly to be supported going forward. We would also have to worry about forward / backward compatibility if we are adding another ID to be persisted in OM. Old keys will not have a generationID, but perhaps it can be derived from the object / update ID. How does it tie in with the object Version? If it is a new ID to be stored, it will add a small storage / memory overhead on OM. I don't believe we should just implement something like that without giving it due consideration.

What is there in this PR, could easily be changed to use a generationID if it was introduce later and this feature sees some adoption as the use of object / updateID is contained and hidden from users of the API, which is why I would suggest exploring GenerationID in the other Jira I raised. I am trying to make the smallest useful change possible to allow for atomic key overwrite, without closing any paths for future improvements. Therefore I would like to move any design around GenerationID into the other Jira and move ahead with this one in its current direction.

@sodonnel
Copy link
Contributor Author

The generationID in google cloud appears to be like the version in AWS and in Ozone to some extent (I believe versioning is not fully implemented in Ozone).

From the docs:

There is no guarantee that generation numbers increase for successive versions, only that each new version has a unique generation number.
There is no relationship between the generation numbers of unrelated objects, even if the objects are in the same bucket.

From that, it is not clear if each generationID is unique across the bucket, or if two different keys can have the the same generationID.

In Ozone, if you create a key, with objectID=5, and then create a new key of the same name, the objectID, I believe, remains the same, but the update_id is incremented.

If you deleted the key (without versioning enabled) and recreated it, the object_ID will change.

UpdateID comes from the Ratis Transaction ID (if Ratis is enabled), so it is probably unique on its own without the objectID, but I am not sure if that can be trusted without Ratis enabled. Also based on comments in WithObjectID.java.

Therefore I think that to guarantee an object has not changed, we need both object and update ID.

For now, I don't think we should expose any of this via the S3 or Hadoop compatible filesystems. We could also tag the new bucket.overwriteKey() as experimental for now, giving us scope to remove it or change it later if we decide it is not useful. However I still think it could be adapted to a new approach easily behind the public interface. Basically, I think the whole Ozone version story and version ID along with object and update ID needs to be fully worked out before we expose anything more widely that is done in this PR.

@kerneltime
Copy link
Contributor

kerneltime commented Mar 21, 2024

The generationID in google cloud appears to be like the version in AWS and in Ozone to some extent (I believe versioning is not fully implemented in Ozone).

Versioning != generation ID. Generation is just a monotonically increasing number without preserving the older generations.

From the docs:

There is no guarantee that generation numbers increase for successive versions, only that each new version has a unique generation number.
There is no relationship between the generation numbers of unrelated objects, even if the objects are in the same bucket.

From that, it is not clear if each generationID is unique across the bucket, or if two different keys can have the the same generationID.

Each object has its own generation field, it is not based on the bucket.

In Ozone, if you create a key, with objectID=5, and then create a new key of the same name, the objectID, I believe, remains the same, but the update_id is incremented.

If you deleted the key (without versioning enabled) and recreated it, the object_ID will change.

UpdateID comes from the Ratis Transaction ID (if Ratis is enabled), so it is probably unique on its own without the objectID, but I am not sure if that can be trusted without Ratis enabled. Also based on comments in WithObjectID.java.

Therefore I think that to guarantee an object has not changed, we need both object and update ID.

For now, I don't think we should expose any of this via the S3 or Hadoop compatible filesystems. We could also tag the new bucket.overwriteKey() as experimental for now, giving us scope to remove it or change it later if we decide it is not useful. However I still think it could be adapted to a new approach easily behind the public interface. Basically, I think the whole Ozone version story and version ID along with object and update ID needs to be fully worked out before we expose anything more widely that is done in this PR.

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sodonnel for working on this quick patch. Overall LGTM +1. However I think, since overwrite key write flow is same as normal create key first time, it would be good if we introduce some extra information in Audit logs for overwrite flow to differentiate.

@sodonnel
Copy link
Contributor Author

@devmadhuu Thanks for the review. I added one more commit that adds details to the audits.

@kerneltime In google cloud, they use the generationID to access versions of an object like:

gs://bucket/object#generation_ID

So there is surely some overlap with the Ozone version and how AWS does things. Objects that are not versioned also have a generationID, so its not tied to versioning, but it is used in versioning.

@errose28
Copy link
Contributor

In Ozone, if you create a key, with objectID=5, and then create a new key of the same name, the objectID, I believe, remains the same, but the update_id is incremented.

Object ID is derived from the Ratis transaction index of the create request. See the method doing the calculation that is called from all create requests. Note the extra space created around each object ID to account for directories that might need to be created with a file create request. Creating a new key with the same name will have a different object ID.

Update ID is also set on key create but additionally changed when metadata like Native ACLs are changed. This is just the Ratis transaction index with no additional modifications. See this method and everywhere it is used for what operations modify the update ID. I'm not sure what you mean "update ID is incremented". The update ID of the new key will be arbitrarily larger than the old key since it is a Ratis transaction index, but its value is not related to the update ID of the previous version of the key.

@sodonnel
Copy link
Contributor Author

@errose28 I think that for a HSync (HBase related changes), when a key is appended or synced, the update ID should be increased while the objectID stays the same. As I first read your comment, I thought maybe I can cut this down to just checking the objectID, but I think the append / hsync thing changes the picture slightly, needing the update ID to be checked too if we want to ensure there are no changes.

By "update ID is incremented" I just meant it was changed. Not that it is increased by 1. I was aware that both IDs are derived from the Ratis Transaction ID.

@errose28
Copy link
Contributor

As I first read your comment, I thought maybe I can cut this down to just checking the objectID

Actually I'm thinking we can just check the update ID to determine if the key changed. The case where object ID alone would be useful is if only a key's native ACLs were updated and we wanted to disregard this change. However, I think we want this API to preserve native ACLs in the original key. This makes it consistent with Ranger ACLs which will remain the same as long as the key is re-rewritten to the same location.

but I think the append / hsync thing changes the picture slightly, needing the update ID to be checked too if we want to ensure there are no changes.

I don't think a key with an open lease being hsync'ed should be eligible for overwrite. I think this overwrite case is already blocked by existing hsync changes on the feature branch but we should double check to make sure we don't have issues later. I believe append will require an OM side update every time it is called, so yes that should increment the update ID.

@sodonnel
Copy link
Contributor Author

sodonnel commented Mar 21, 2024

I don't think a key with an open lease being hsync'ed should be eligible for overwrite

A key being hsync'ed is "open but visible" in Ozone and hence should have a lease which blocks other writers.

I think the "hsync / hbase" work also allows for a key to be appended - ie the key is closed and committed. Then a writer reopens it and appends some new data and commits / closes it again.

While the first scenario should be blocked, I need to ensure we do the right thing if an append happens to a closed key that is currently being overwrriten, and whether that is even possible! I will ask around about that.

I think you could be right that update_id is all we need as it will change on key append, hsync and object delete and recreate. It would be much nicer to only need to use 1 field.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @sodonnel. I left some minor comments and have a few ideas on the design I think we can look into:

  • Can we just use updateID to check for changes as discussed above?
  • Can we avoid persisting the overwrite ID to the open key proto on key create and have the client hold it in memory, supply it on commit, and check it then?
  • I think we want native ACLs of the original key preserved when using this API. We should add tests for this.
  • Since the PR description states FSO is out of scope for now, let's have the code explicitly enforce this.
    • Currently overwrite will fail if the key is renamed or its immediate parent is changed. It will not fail if a directory farther up the file's parent tree is moved or renamed.

Comment on lines 245 to 248
// Optimistic locking validation has passed. Now set the overwrite fields to null so they are
// not persisted in the key table.
omKeyInfo.setOverwriteUpdateID(null);
omKeyInfo.setOverwriteObjectID(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need these persisted to the open key table? The client could just hold the update ID in memory from the original read, and pass it as the overwrite update ID on commit for the OM to use only during the in-memory portion of the commit. If the client dies before finishing the overwrite it will lose its client ID and not be able to access the open key to resume writing anyways.

It seems like we can move all new logic to the commit phase without modifying key create requests or the protos that are written to the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the overwrite IDs to the commit phase is much more difficult as the commit is performed by various sub-classes on the client. Eg Ratis vs EC vs any new write class. I did look into added them only to the commit, but concluded it was much easier to persist them to the open key table.

Therefore I prefer to keep it the way it is now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also argue, why persist any of the key meta data on key open (created date, ACLs, etc). I think persisting the overwriteID in the open key table keeps with the existing pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also argue, why persist any of the key meta data on key open (created date, ACLs, etc).

After looking this over a bit I think these are actually bugs that we need to fix:

  • Create time:
    This one is more of a debugging/auditing inconvenience. We set create time in the create phase before the object is visible, and then set modification time in the commit phase when the key is visible. Ideally for a key that has just been created, I would think ctime and mtime would be expected to be the same. However since both create and commit show up in the audit log I guess you could argue that the current implementation is correct as well.

  • ACLs
    This one looks more concerning. I haven't tested this yet, but it looks like the ACLs at the time of create are what are also committed to the final key, without checking if the key being replaced had ACL updates in the mean time. For example:

  • key1 exists with acl1

  • key1' is created at the same path as key1

  • ACLs for key1 are updated to acl2 by another user/admin.

  • key1' is committed with acl1 that was read at create time.

Now the ACLs have gone back in time without the admin or user intending to make this change.

Looking at it from this angle, the existing approach looks like it should be fixed to write metadata like ACLs and create time at the time of commit. Once these bugs are fixed, persisting the overwrite update ID at the time of create does not make much sense in context either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any change to the key, including ACLs should change the keys updateID, as that is what we are relying on to test if the key has changed. That would cause the optimistic commit to fail.

From the client code perspective, I think its easier to set the ACLs etc at the time of key open, as otherwise they need to be potentially passed down to various sub-classes. This also adds different testing paths for each type of key (Ratis, EC). That is why I am persisting the overwriteUpdateID in the openKey part. It should be passed as part of open key so we can check they key has no change since it was read and the key write starts. It costs basically nothing to persist it in the open key table and saves some complexity on commit.

If you think there are bugs around existing ACLs / Created Time / Modification time handling then please raise Jiras for them. We cannot change that as part of this PR, as it would not make sense in this context. A different PR should take care of that.

On the created time / modification time front - I think could be argued either way. What is the creation time of a file in Linux? You open the file, you write a series of bytes of several minutes and close the file. The ctime is probably the time the file was opened. The mtime is the time the file was last changed - they can easily be different. With Ozone its not as clear cut as the key traditionally has not been visible until it is committed. They the difference between ctime and mtime is really the time it took to write the bytes. After HBase allows uncommitted keys to be visible, the behavior is more like Linux and hence is quite possibly correct as it stands.

@adoroszlai adoroszlai marked this pull request as draft April 22, 2024 11:38
@adoroszlai
Copy link
Contributor

Converted to draft until #6482 is outstanding.

S O'Donnell added 5 commits April 30, 2024 14:00
 Conflicts:
	hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/OzoneBucket.java
	hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/OzoneKey.java
	hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/OzoneKeyDetails.java
	hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java
	hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OmKeyArgs.java
	hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OmKeyInfo.java
	hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/RequestAuditor.java
	hadoop-ozone/s3gateway/src/test/java/org/apache/hadoop/ozone/client/OzoneBucketStub.java
@sodonnel sodonnel marked this pull request as ready for review April 30, 2024 19:02
@sodonnel
Copy link
Contributor Author

@errose28 @kerneltime I have updated this PR to match what we discussed in the design doc PR.

The only things in the design doc that are not included here are:

  1. Using the version framework in the client to detect an old server. As this is going to go on a branch, we will do that in a followup PR.
  2. Have not considered failing an atomic write at a block boundary. That will be investigate later, and is somewhat nice to have at this stage.

I also plan to send this PR to a branch - it will not be committed to master. I need to create the branch and then see if we can re-point the PR at the branch.

@adoroszlai adoroszlai changed the base branch from master to HDDS-10656-atomic-key-overwrite May 3, 2024 07:34
@sodonnel
Copy link
Contributor Author

sodonnel commented May 3, 2024

@errose28 @kerneltime, @adoroszlai has created the branch HDDS-10656-atomic-key-overwrite and pointed this PR at that branch.

Please let us know if you are happy with the changes here, or if you have further comments so we can progress toward committing this onto the branch.

@sodonnel
Copy link
Contributor Author

sodonnel commented May 8, 2024

@errose28 @kerneltime @adoroszlai have you got any further comments on this PR?

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sodonnel for updating the patch.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this @sodonnel.

For the unit and integration tests, I think we need one static checker in a test util class that takes the old and new KeyInfo objects from a rewrite operation and checks that metadata is either altered or unaltered accordingly. This lets the code document and enforce exactly which fields are and are not supposed to be modified by rewrite. Currently metadata checks are done ad-hoc in unit and integration checks and I think some fields like mtime and key owner are not tested.

Also there are still some lingering references to "overwrite" in the change, but I think "rewrite" is the terminology we are using now. For clarity it would be good to just do a find/replace to make those match.

public static final String BUCKET_LAYOUT = "bucketLayout";
public static final String TENANT = "tenant";
public static final String USER_PREFIX = "userPrefix";
public static final String OVERWRITE_GENERATION = "overwriteGeneration";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just call this generation? We may use it for something other than overwrites in the future.

Copy link
Contributor

@ivandika3 ivandika3 May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this, similarly for OmKeyArgs.overwriteGeneration (edit: OmkeyArgs.overwriteGeneration change can be ignored). Although I haven't taken a look deeply, I think the "generation" concept can be reused in the future. For example, we can ensure follower read adheres to "read-your-own-writes" consistency by passing the generation that was just written to the follower to ensure that the value has been updated in the follower (instead of linearizable read which has higher latency since it requires the follower to contact the leader as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant is only used in the audit log. I think it makes sense to call it overwrite (or probably rewrite) generation to make it distinct from the current key generation. Calling the rewriteGeneration just generation could be confusing if we have a current generation called just "generation" and then this atomic rewrite parameter called "generation".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was meant as a more general comment on the PR which probably wasn't clear. I think there are some places where rewriteGeneration makes sense, like the audit log here and the KeyArgs that are passed in by the client. Basically the places where the client is instructing the server to use the generation for rewrite purposes. However, when the value is stored or returned like in OmKeyInfo and the KeyInfo proto I think we should just call it generation. At that point the value could technically be used for anything, not just a rewrite.

}

@Test
public void testAtomicRewrite() throws Exception {
Copy link
Contributor

@errose28 errose28 May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional suggestion:

The large multi-scenario test methods used in the three request testing classes here would be clearer if each scenario was split into its own test case, like testRewriteWhenKeyDeleted, testRewriteWhenKeyUpdated, testRewritePreservesMetadata etc. These smaller units make it easier to see what is actually being tested, and what scenarios may have been missed.

That said there's technically nothing wrong with the way it is implemented here so you can disregard this if you think there would be code duplication in test setup or other factors that make this suboptimal.

@sodonnel
Copy link
Contributor Author

sodonnel commented May 9, 2024

@errose28 I believe I have addressed you further comments. Please have a look and let me know what you think.

@errose28
Copy link
Contributor

errose28 commented May 9, 2024

Looking good overall, I think we just need to determine when to call it rewriteGeneration and when to just use generation from this thread. I still think we need one method used by all tests that defines metadata expectations before and after an overwrite as described in this comment but if you want to do a follow up PR for that I'm ok.

@kerneltime
Copy link
Contributor

Looking good overall, I think we just need to determine when to call it rewriteGeneration and when to just use generation from this thread. I still think we need one method used by all tests that defines metadata expectations before and after an overwrite as described in this comment but if you want to do a follow up PR for that I'm ok.

An object should have a generation when either metadata or data mutates it should change (ideally increment). We can add sub cases dataGeneration and metadataGeneration. Qualifying it as rewriteGeneration seems superfluous.

@sodonnel
Copy link
Contributor Author

@kerneltime @errose28 Can you please clarify exactly what you want it to be called in this PR, keeping in mind we don't have metaGeneration at the moment? Is it to be "generation", "dataGeneration", or perhaps "expectedGeneration", which makes sense when making a call that should only exceed when the given expected generation is present.

It seems strange to have simply "generation" to me, as ideally that would be the current generation of a key, which is currently "updateID", as we are reusing it. If we ever added metaGeneration, I doubt we would call it metaUpdateID, but perhaps we would to keep consistency.

@sodonnel
Copy link
Contributor Author

Looking good overall, I think we just need to determine when to call it rewriteGeneration and when to just use generation from this thread. I still think we need one method used by all tests that defines metadata expectations before and after an overwrite as described in this comment but if you want to do a follow up PR for that I'm ok.

I have created https://issues.apache.org/jira/browse/HDDS-10843 for the test improvements.

@sodonnel
Copy link
Contributor Author

@kerneltime @errose28 Following up on the naming of the "expectedGeneration" field in the protobuf. Can you review my thoughts above so we can agree on what this should be called? This is the only outstanding item in this PR.

@sodonnel
Copy link
Contributor Author

I have raised https://issues.apache.org/jira/browse/HDDS-10857 to decide on the naming of the passed generation in the API. Please comment on the Jira so we can decide on the naming.

As @adoroszlai has given a +1 and @errose28 voiced approval pending this last decision I will commit this PR onto the branch tomorrow. All the followup items are under the epic and will be taken care of before merging the branch to master.

@sodonnel sodonnel changed the title HDDS-10527. KeyOverwrite with optimistic locking HDDS-10527. Rewrite key atomically May 15, 2024
@sodonnel sodonnel merged commit 51466d7 into apache:HDDS-10656-atomic-key-overwrite May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants