-
Notifications
You must be signed in to change notification settings - Fork 587
HDDS-8859. [Snapshot] Return failure message to client for a failed snapshot diff jobs #4993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/SnapshotDiffJob.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/snapshot/SnapshotDiffResponse.java
Outdated
Show resolved
Hide resolved
prashantpogde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @hemantk-12 . Other than minor comment. the changes look good to me.
prashantpogde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you for making these changes @hemantk-12
| bucketName, fromSnapshotName, toSnapshotName, new ArrayList<>(), | ||
| null), | ||
| CANCELLED, 0L, JobCancelResult.CANCELLATION_SUCCESS); | ||
| CANCELLED, 0L, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this switch case is for "job status == CANCELLED", should the 'null' be 'CANCEL_ALREADY_CANCELLED_JOB'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is be nothing similar to In_progress, Done and others. I'll update it.
Thanks for the catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
| .equals(CANCELLED)) { | ||
| String jobKey = generateSnapDiffJobKey.apply(fromSnapInfo, toSnapInfo); | ||
| SnapshotDiffJob diffJob = snapDiffJobTable.get(jobKey); | ||
| if (diffJob == null || diffJob.getStatus() == CANCELLED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should job status DONE, REJECTED, FAILED also return false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE, REJECTED and FAILED should not come to this path. Because one job is single thread and once it is marked DONE, REJECTED or FAILED, it won't come to this flow. You can create new job after that cleanup of REJECTED or FAILED but that would be considered a different job altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks Hemant!
| private synchronized void updateJobStatusToFailed(String jobKey, | ||
| String reason) { | ||
| SnapshotDiffJob snapshotDiffJob = snapDiffJobTable.get(jobKey); | ||
| if (snapshotDiffJob.getStatus() != IN_PROGRESS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we cancel QUEUED job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that should not happen. If happens, job is in invalid state.
Flow is Queue -> In_Progress/Rejected -> Done/Failed/Cancelled.
- Once job is queue, we check if executor can take more job if yes, then change the state to In_Progress otherwise Rejected.
- In_Progress job is marked Done if report is generated successfully. In case of failure, job is marked Failed. Cancel is marked if someone execute the cancel request only if job is f In_Progress otherwise cancel request fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks Hemant!
* master: HDDS-8555. [Snapshot] When snapshot feature is disabled, block OM startup if there are still snapshots in the system (apache#4994) HDDS-8782. Improve Volume Scanner Health checks. (apache#4867) HDDS-8447. Datanodes should not process container deletes for failed volumes. (apache#4901) HDDS-5869. Added support for stream on S3Gateway write path (apache#4970) HDDS-8859. [Snapshot] Return failure message to client for a failed snapshot diff jobs (apache#4993) HDDS-8939. [Snapshot] isBlockLocationSame check should be skipped if object is not OmKeyInfo. (apache#4991) HDDS-8923. Expose XceiverClient cache stats as metrics (apache#4979) HDDS-8913. ContainerManagerImpl: reduce processing while locked (apache#4967) HDDS-8935. [Snapshot] Fallback to full diff if getDetlaFiles from compaction DAG fails (apache#4986) HDDS-8911. Update Hadoop to 3.3.6 (apache#4985) HDDS-8931. Allow EC PipelineChoosingPolicy to be defined separately from Ratis (apache#4983) HDDS-8895. Support dynamic change of ozone.readonly.administrators in SCM (apache#4977) HDDS-6814. Make OM service ID optional for `ozone s3` commands if only one is defined in config (apache#4953) HDDS-8925. BaseFreonGenerator may not complete if last attempts fail (apache#4975) HDDS-7100. Container scanner incorrectly marks containers unhealthy when DN is shutdown (apache#4951) HDDS-8919. Allow EC pipelines to be created and then added to PipelineManager in two steps (apache#4968) HDDS-8901. Enable mTLS for InterSCMGrpcProtocol. (apache#4964) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/Container.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/ContainerTestUtils.java
…napshot diff jobs (apache#4993)
…napshot diff jobs (apache#4993)
What changes were proposed in this pull request?
This patch contains following changes.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8859
How was this patch tested?
Updated existing unit and integration tests as of now.