Skip to content

Conversation

@xichen01
Copy link
Contributor

What changes were proposed in this pull request?

Add a performance audit log for slow requests on the Datanode, which will be output if the request processing time exceeds the threshold(500ms by default).

Example:

2023-1-1 17:01:59,134 | INFO  | DNAudit | [bTxId:e82fc88abef1427faec04e7473a33f8e, bSpanId:efa32ade0866e39d] user=null | ip=null | op=GET_BLOCK {blockData=conID: 674595 locID: 109611023645035393 bcsId: 1506141102} performance={opLatencyMs=1724, preOpLatencyMS=0} | ret=null |
2023-1-1 17:01:59,134 | INFO  | DNAudit | [bTxId:, bSpanId:] user=null | ip=null | op=GET_BLOCK {blockData=conID: 674595 locID: 109611023645332310 bcsId: 1506237580} performance={opLatencyMs=1953, preOpLatencyMS=0} | ret=null |
2023-1-1 17:02:56,288 | INFO  | DNAudit | [bTxId:099d6fc1228e468aac59ed0cce13ce66, bSpanId:efa3ea780866e83b] user=null | ip=null | op=GET_BLOCK {blockData=conID: 674595 locID: 109611023645245975 bcsId: 1506209514} performance={opLatencyMs=882, preOpLatencyMS=0} | ret=null |
2023-1-1 17:12:15,903 | INFO  | DNAudit | [bTxId:, bSpanId:] user=null | ip=null | op=READ_CHUNK {blockData=conID: 661896 locID: 109611023348686704 bcsId: 1400692708, blockDataSize=749} performance={opLatencyMs=502, preOpLatencyMS=0} | ret=null |

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9719

How was this patch tested?

Manual test

@adoroszlai
Copy link
Contributor

@xichen01 thanks for working on this. This PR contains portions of the new code from #5644, so one of them will need to be updated when the other is merged. Hence one should be a draft (probably this one, as the other one is older).

@adoroszlai adoroszlai marked this pull request as draft November 23, 2023 16:03
# Conflicts:
#	hadoop-hdds/framework/src/main/java/org/apache/hadoop/ozone/audit/AuditMessage.java
@xichen01 xichen01 marked this pull request as ready for review December 5, 2023 12:27
@jojochuang jojochuang requested a review from duongkame December 11, 2023 16:45
new Exception(responseProto.getMessage()));
}
perf.appendOpLatencyMs(oPLatencyMS);
performanceAudit(action, params, perf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @xichen01. One comment here, I think the performance should be audited if the opLatency isExceedThreshold(). Let me know if that makes sense. Otherwise, LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, have added.


}

private boolean isExceedThreshold(long opLatencyMs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we change the function name to isOperationSlow() to make it more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xichen01 for the patch.

@adoroszlai
Copy link
Contributor

Thanks @xichen01 for updating the patch.

So if I understand it correctly, the idea is to:

  • log an extra message for slow operations with PERFORMANCE marker
  • allow admin to configure audit log to keep PERFORMANCE messages while discarding regular READ messages

Is that correct?

@adoroszlai adoroszlai dismissed their stale review January 24, 2024 06:20

patch updated

@xichen01
Copy link
Contributor Author

Thanks @xichen01 for updating the patch.

So if I understand it correctly, the idea is to:

  • log an extra message for slow operations with PERFORMANCE marker
  • allow admin to configure audit log to keep PERFORMANCE messages while discarding regular READ messages

Is that correct?

Yes, right.

@adoroszlai adoroszlai requested a review from kerneltime January 29, 2024 21:31
@adoroszlai adoroszlai merged commit 361ad06 into apache:master Feb 1, 2024
@adoroszlai
Copy link
Contributor

Thanks @xichen01 for the patch, @SaketaChalamchala for the review.

@adoroszlai
Copy link
Contributor

I created HDDS-10270 as a follow-up, for a possible improvement in creating AuditMessage instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants