Skip to content

Conversation

@jojochuang
Copy link
Contributor

@jojochuang jojochuang commented Jun 7, 2024

What changes were proposed in this pull request?

HDDS-10941. CSMMetrics to record number of started transactions

(1)
Add a label RaftGroupId so that Prometheus can identify different CSM instances by pipeline id. Previously, only one Container State Machine metrics instance is exported, which is confusing.

Add a few metrics for ContainerStateMachine operations.

  • opsQueueingDelay -- the queueing latency of ContainerOp thread pool.
  • pendingApplyTransactions -- count the number of transactions between startTransaction and applyTransaction
  • untilApplyTransactionNs -- measures the latency between startTransaction and applyTransaction
  • startTransactionCompleteNs -- measures the time spent in startTransaction callback method.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10941

How was this patch tested?

Existing unit tests passed. https://github.com/jojochuang/ozone/actions/runs/9409277999/job/25918878051
Deployed a version of this PR on a real cluster and observed the metrics were exported properly.

@jojochuang jojochuang requested a review from duongkame June 7, 2024 04:32
@jojochuang jojochuang changed the title HDDS-10941. CSMMetrics to record number of started transactions HDDS-10941. Add a few interesting ContainerStateMachine metrics in CSMMetrics Jun 7, 2024
@jojochuang
Copy link
Contributor Author

Uploading Screenshot 2024-06-06 at 9.38.09 PM.png…

@adoroszlai adoroszlai requested a review from xichen01 June 7, 2024 05:29
Copy link
Contributor

@xichen01 xichen01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jojochuang Thanks for the patch, you can refer to some comments

Copy link
Contributor

@duongkame duongkame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jojochuang for the patch, LGTM.

Comment on lines 897 to 898
ContainerCommandResponseProto proto = dispatchCommand(request, context);
return proto;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ContainerCommandResponseProto proto = dispatchCommand(request, context);
return proto;
return dispatchCommand(request, context);
;

@duongkame
Copy link
Contributor

duongkame commented Jun 12, 2024

As discussed offline, we'd also be interested in queueing delay and execution latency of ContainerStateMachine.write (the submitted task in writeStateMachineData). As Ratis has to wait for the future returned by ContainerStateMachine.write to complete to mark transaction committed, those latencies will determine how many transactions per second we can put to datanodes for a given block/container.

…ansaction.

Change-Id: I013ec37c2b18e7e164585a27e403bf52341fe6a7
Change-Id: Iad002201c447e3ca653b5fb86421068313206eb4
Change-Id: I6ac744943bd9c4dd1e2bae9810605eefd5486a3b
(cherry picked from commit 582dbe590aa2c77bfd08dfeff08fd33f90e5193e)
Change-Id: I1881a54696f46ea7dec2db33e1eb2a0c62d03ad9
Change-Id: I4d81fb4ae063f5af830b5c327f6a1a391275f405
Change-Id: I7f9add68a4c26d521c94f98c1dd57090984b7750
@jojochuang
Copy link
Contributor Author

Rebased the PR.
TODO: incorporate Duong's suggestion.

Change-Id: Ic0d440bd69cd4b754f18b10c7b48b0b1c70c100f
@jojochuang jojochuang requested review from duongkame and xichen01 June 21, 2024 17:11
@jojochuang
Copy link
Contributor Author

Is this good to go? @xichen01 @duongkame

Copy link
Contributor

@xichen01 xichen01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM+1

@jojochuang jojochuang merged commit a9a42eb into apache:master Jul 8, 2024
@jojochuang
Copy link
Contributor Author

Merged. Thanks @xichen01 @duongkame

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants