Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions lib/storage/metadata/mongoclient/MongoClientInterface.ts
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ export type ObjectMDOperationParams = {
needOplogUpdate: boolean,
originOp: string,
doesNotNeedOpogUpdate?: boolean,
preventConcurrentCompletion?: boolean,
conditions: any,
};

Expand Down Expand Up @@ -1200,6 +1201,9 @@ class MongoClientInterface {
* @param {string} params.vFormat - object key format.
* @param {boolean} params.needOplogUpdate - If true, the object is directly put into the collection
* with updating the operation log.
* @param {boolean} params.preventConcurrentCompletion - If true, the object is only created if
* completeInProgress is not already set to true. Fix this concurrent completeMultipartUpload issue :
* https://scality.atlassian.net/jira/software/c/projects/OS/boards/268?selectedIssue=CLDSRV-687
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just ticket number of enough

Suggested change
* https://scality.atlassian.net/jira/software/c/projects/OS/boards/268?selectedIssue=CLDSRV-687
* CLDSRV-687

* @param {Object} log - The logger to use.
* @param {Function} cb - The callback function to call when the operation is complete. It is called with an error
* if there is an issue with the operation.
Expand All @@ -1220,6 +1224,9 @@ class MongoClientInterface {
}
const key = formatMasterKey(objName, params.vFormat);
const putFilter = { _id: key };
if (params?.preventConcurrentCompletion) {
putFilter['value.completeInProgress'] = { $ne: true };
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach does not work for Metadata (since metadata does not go through MongoClientInterface), and introduce some business-specific logic deep down in the "metadata layer" (completeInProgress is not related to metadata management, but really only to MPU objects) : thus creating more coupling and breaking the abstraction...
This may also relate to a larger issue that we have : we are currently not able (in the metadata interface/api) to distinguish between updates to an existing objet [like what is done in this specific case, we just want to try and set the completeInProgress flag] and an insertion. Here both are mostly undistinguishable, and thus we always upsert.
→ I wonder if having the distrinction would allow to handle this differently, i.e. on update we must not upsert and keep/increase the value of some other field.... (it would be a much larger endeavor, but useful debt cleanup needed in the long run anyway)

Alternatively, if we don't think this approach solves the issue or if we need that fix more urgently, this needs more work:

  • We may use putObjectWithCond function instead to set the completeInProgress flag. It allows performing such conditional operations, while keeping the condition in upper layer (cloudserver). It works only on non-versioned buckets -which is fine for shadow bucket-, and should be supported by metadata as well ; not sure however if we actually use it (i.e. is it stable enough, and/or if the upsert behavior is appropriate), so some testing may be needed.
  • If that does not work and we need to introduce an extra flag in putObject, we will need to design the new flag so that it has a generic semantic which allows to fix the issue, but makes sense generically and can be implemented in metadata and other cases of putObject (versionnned...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also a (kind of generic) replayId flag, which is set on cloudserver side based on the uploadId, and implemented in Metadata but not in MongoClientInterface : may be worth reviewing this, to understand where it is set (in mpu shadow bucket, or in the actual bucket) and if it would solve our problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Metadata uses the replayId to deduplicate the incoming requests, it's used in this specific scenario to avoid duplicating versions and ending up with duplicate sproxyd keys in particular. It's also used for complete/abort conflicts, where we consider that the abort always wins (and the user has to re-create a new MPU in such case for the next PUT of the same object).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and Cloudserver passes the upload ID of the MPU as the unique property of the MPU

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jonathan-gramain ! Can you explain what is the expected semantics of the "backend", i.e. metadata/mongoClientInterface, regarding this field? And point us to the implementation in metadata?

cc @williamlardier maybe this also relates to the other issues we found/fixed on MPU ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@francoisferrand I don't think it is related to my recent PRs in the sense that we really had a wrong logic in some cases, but for this one (the current PR), it is definitely related yes. I saw we had logic in the metadata backend that we don't with MongoDB.

As we definitely reproduced the issue with a mongodb backend, that is, end up with two distinct versions and the same data location, I was just wondering if @jonathan-gramain knows if the metadata backend would be also affected. As you mention to avoid duplicating versions and ending up with duplicate sproxyd keys in particular I would say this is handled.

If that is the case, it means two things:

  • We need to fix it for mongodb backend as well
  • We should list all these specific logic we have in metadata but not MongoDB, like this one, that allows to better support edge cases? I feel like this added logic in metadata is hiding the API behavior in the other backends case...

return collection.updateOne(putFilter, {
$set: {
_id: key,
Expand Down
Loading