You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- remove SLA gauge metrics, they can be inferred from the 'inifinite' bucket in the latency histograms
- methods to handle metric events (onAck, onRoundtrip, onOffsetCommit); will also soon be used to extract e2e into its own package
- add _total and _seconds suffixes to metrics for best practices
// Users can construct stuff like "message commits failed" themselves from those
123
-
service.endToEndMessagesProduced=makeCounter("messages_produced", "Number of messages that kminion's end-to-end test has tried to send to kafka")
124
-
service.endToEndMessagesAcked=makeCounter("messages_acked", "Number of messages kafka acknowledged as produced")
125
-
service.endToEndMessagesReceived=makeCounter("messages_received", "Number of *matching* messages kminion received. Every roundtrip message has a minionID (randomly generated on startup) and a timestamp. Kminion only considers a message a match if it it arrives within the configured roundtrip SLA (and it matches the minionID)")
126
-
service.endToEndMessagesCommitted=makeCounter("messages_committed", "Number of *matching* messages kminion successfully commited as read/processed. See 'messages_received' for what 'matching' means. Kminion will commit late/mismatching messages to kafka as well, but those won't be counted in this metric.")
127
-
128
-
// High-level SLA reporting
129
-
// Simple gauges that report if stuff is within the configured SLAs
130
-
// Naturally those will potentially not trigger if, for example, only a single message is lost in-between scrap intervals.
131
-
gaugeHelp:="Will be either 0 (false) or 1 (true), depending on the durations (SLAs) configured in kminion's config"
132
-
service.endToEndWithinAckSla=makeGauge("is_within_ack_sla", "Reports whether messages can be produced. A message is only considered 'produced' when the broker has sent an ack within the configured timeout. "+gaugeHelp)
133
-
service.endToEndWithinRoundtripSla=makeGauge("is_within_roundtrip_sla", "Reports whether or not kminion receives the test messages it produces within the configured timeout. "+gaugeHelp)
134
-
service.endToEndWithinCommitSla=makeGauge("is_within_commit_sla", "Reports whether or not kminion can successfully commit offsets for the messages it receives/processes within the configured timeout. "+gaugeHelp)
107
+
// Users can construct alerts like "can't produce messages" themselves from those
108
+
service.endToEndMessagesProduced=makeCounter("messages_produced_total", "Number of messages that kminion's end-to-end test has tried to send to kafka")
109
+
service.endToEndMessagesAcked=makeCounter("messages_acked_total", "Number of messages kafka acknowledged as produced")
110
+
service.endToEndMessagesReceived=makeCounter("messages_received_total", "Number of *matching* messages kminion received. Every roundtrip message has a minionID (randomly generated on startup) and a timestamp. Kminion only considers a message a match if it it arrives within the configured roundtrip SLA (and it matches the minionID)")
111
+
service.endToEndMessagesCommitted=makeCounter("messages_committed_total", "Number of *matching* messages kminion successfully commited as read/processed. See 'messages_received' for what 'matching' means. Kminion will commit late/mismatching messages to kafka as well, but those won't be counted in this metric.")
135
112
136
113
// Latency Histograms
137
114
// More detailed info about how long stuff took
138
-
// Since histograms also have an 'infinite' bucket, they can be used to detect small hickups that won't trigger the SLA gauges
139
-
service.endToEndProduceLatency=makeHistogram("produce_latency", cfg.EndToEnd.Producer.AckSla, "Time until we received an ack for a produced message")
140
-
service.endToEndRoundtripLatency=makeHistogram("roundtrip_latency", cfg.EndToEnd.Consumer.RoundtripSla, "Time it took between sending (producing) and receiving (consuming) a message")
141
-
service.endToEndCommitLatency=makeHistogram("commit_latency", cfg.EndToEnd.Consumer.CommitSla, "Time kafka took to respond to kminion's offset commit")
115
+
// Since histograms also have an 'infinite' bucket, they can be used to detect small hickups "lost" messages
116
+
service.endToEndAckLatency=makeHistogram("produce_latency_seconds", cfg.EndToEnd.Producer.AckSla, "Time until we received an ack for a produced message")
117
+
service.endToEndRoundtripLatency=makeHistogram("roundtrip_latency_seconds", cfg.EndToEnd.Consumer.RoundtripSla, "Time it took between sending (producing) and receiving (consuming) a message")
118
+
service.endToEndCommitLatency=makeHistogram("commit_latency_seconds", cfg.EndToEnd.Consumer.CommitSla, "Time kafka took to respond to kminion's offset commit")
0 commit comments