-
Notifications
You must be signed in to change notification settings - Fork 590
HDDS-7726. EC: Enhance datanode reconstruction log message #4155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| LOG.warn( | ||
| "Failed to complete the reconstruction task for the container: " | ||
| + reconstructionCommandInfo.getContainerID(), e); | ||
| LOG.warn("Failed {}", reconstructionCommandInfo, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding the elapsed time to the fail message too, incase it is failing after some long delay or timeout?
|
This change looks basically good. I just had one suggestion about adding the elapsed time to the fail message too. I also wonder about logging the DN UUID. Logging DatanodeDetails.toString() is really too verbose and make the logs quite long, so its a good idea to avoid it. In #4153 I went with logging "DN host / IP" to make it less verbose. If someone is debugging an issue, it might be easier to work with that rather than a UUID, as it saves another lookup to find the host you may want to look at, but it could be argued either way I think. I am fine to leave it as it is I think. |
In normal usage hostname is better. UUID might be better for integration tests with mini cluster, where all nodes are on the same host. I guess I'll go with your solution, and maybe we can add a test-specific check later. |
sodonnel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks @sodonnel for reviewing and merging this. |
* master: (176 commits) HDDS-7726. EC: Enhance datanode reconstruction log message (apache#4155) HDDS-7739. EC: Increase the information in the RM sending command log message (apache#4153) HDDS-7652. Volume Quota not enforced during write when bucket quota is not set (apache#4124) HDDS-7628. Intermittent failure in TestOzoneContainerWithTLS (apache#4142) HDDS-7695. EC metrics related to replication commands don't add up (apache#4152) HDDS-7729. EC: ECContainerReplicaCount should handle pending delete of unhealthy replicas (apache#4146) HDDS-7738. SCM terminates when adding container to a closed pipeline (apache#4154) HDDS-7243. Remove RequestFeatureValidator from echoRPC method which supports only ValidationCondition.OLDER_CLIENT_REQUESTS (apache#4051) HDDS-7708. No check for certificate duration config scenarios. (apache#4149) HDDS-7727. EC: SCM unregistered event handler for DatanodeCommandCountUpdated (apache#4147) HDDS-7606. Add SCM HA support in intellij run (apache#4058) HDDS-7666. EC: Unrecoverable EC containers with some remaining replicas may block decommissioning (apache#4118) HDDS-7339. Implement Certificate renewal task for services (apache#3982) HDDS-7696. MisReplicationHandler does not consider QUASI_CLOSED replicas as sources (apache#4144) HDDS-7714. Docker cluster ozone-om-ha fails during docker-compose up (apache#4137) HDDS-7716. Log read requests rejected with permission denied in OM audit (apache#4136) HDDS-7588. Intermittent failure in TestObjectStoreWithLegacyFS#testFlatKeyStructureWithOBS (apache#4040) HDDS-7633. Compile error with Java 11: package com.sun.jmx.mbeanserver is not visible (apache#4077) HDDS-7648. Add a servername tag in UGI metrics. (apache#4094) HDDS-7564. Update Ozone version after 1.3.0 release (apache#4115) ...
(cherry picked from commit a80c6b1) Change-Id: I7aeca84a2ebb6c11b3856a7baecd8fe84e520895
What changes were proposed in this pull request?
Include more information in messages from
ECReconstructionCoordinatorTask. Source/target datanodes are identified by UUID.To facilitate the change, move some logic from
ECReconstructionCoordinatorTasktoECReconstructionCommandInfo.https://issues.apache.org/jira/browse/HDDS-7726
How was this patch tested?
Regular CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/3861802806