-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[agent] Add metrics to show connections status between agent and collectors #2657
Conversation
Below is my local test:
|
@WalkerWang731 lint failed:
|
Codecov Report
@@ Coverage Diff @@
## master #2657 +/- ##
==========================================
+ Coverage 95.73% 95.76% +0.02%
==========================================
Files 216 217 +1
Lines 9599 9623 +24
==========================================
+ Hits 9190 9215 +25
Misses 336 336
+ Partials 73 72 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall approach seems reasonable.
logger.Info("Checking connection to collector") | ||
for { | ||
s := cc.GetState() | ||
if s != connectivity.Ready { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put s == Ready
success case first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For compatibility and unnecessary changes in the future. Actually, I concern the gRPC official changes connectivity status constant in the future version, although seems less possible.
ExpireTTL time.Duration | ||
} | ||
|
||
// ConnectMetricsReporter is a decorator also it not actual use currently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't accurate, it is not used as a decorator, and does not need to be called "reporter" since it's confusing in the current package where "reporter" refers to span reporter (or "exporter" in OTEL terminology).
metric := r.params.MetricsFactory.Gauge(metrics.Options{ | ||
Name: "connected_collector_status", | ||
Help: "Connection status that jaeger-agent to jaeger-collector, 1 is connected, 0 is disconnected", | ||
Tags: map[string]string{"target": target}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The target
here is the address of the collector, correct? Is it a good idea to use that as label, given that it could contain an ephemeral port number and prone to high cardinality? I'd say the target could be reported as an expvar variable, but the metric can be just the status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ephemeral port number
Care to expand on that? I don't remember seeing servers with ephemeral port numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? For example, at Uber most services run on ephemeral port numbers, to make it easy to stack and relocate them across available compute capacity. The discovery service reports host:port pairs for all instances of a given service. The custom build of the agent that Uber runs would use that discovery mechanism to locate Jaeger collectors. So the target
string could be quite dynamic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, for this early version, I use such as oldTarget
variable, the function incoming variable like newTarget
, function will change oldTarget
metric to 0, change newTarget
metric to 1. In this case, we can watch which one agent connects to which collector. But eventually, I thought these are too complicated. We only need to observe the connection status is ok, after all. Which one agent connects to which collector should be added in collector metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For connected is 0 disconnected is 1
, because when have many connection metrics, we can filter sum(jaeger_agent_connection_status_connected_collector_status{}) by (instance) > bool 0
for quick view status.
I fixed the target, no need to pass the target parameter, in the next commit. target is static when the connectMetrics{}
initialize
For the current PR, I thought that simplify the design first, and add new features if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, at Uber most services run on ephemeral port numbers, to make it easy to stack and relocate them across available compute capacity
Didn't know that, but it does make sense for high density VMs/bare metal servers.
} | ||
|
||
// CollectorConnected used for change metric as agent connected. | ||
func (r *ConnectMetricsReporter) CollectorConnected(target string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (r *ConnectMetricsReporter) CollectorConnected(target string) { | |
func (r *ConnectMetricsReporter) OnConnected(target string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps have just a single func OnConnectionStatusChange and pass a parameter.
} | ||
|
||
// CollectorAborted used for change metric as agent disconnected. | ||
func (r *ConnectMetricsReporter) CollectorAborted(target string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (r *ConnectMetricsReporter) CollectorAborted(target string) { | |
func (r *ConnectMetricsReporter) OnDisconnected(target string) { |
Logger *zap.Logger // required | ||
MetricsFactory metrics.Factory // required | ||
ExpireFrequency time.Duration | ||
ExpireTTL time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like any of these fields are needed, let's simplify to only what's necessary. The gauge can be initialized in the constructor once (especially if target
is not used as a label per my comment below).
} | ||
|
||
// WrapWithConnectMetrics creates ConnectMetricsReporter. | ||
func WrapWithConnectMetrics(params ConnectMetricsReporterParams) *ConnectMetricsReporter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to comment above, this is not "wrapping" anything, just a standalone util class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I see, I also feel that there is no meaning, just maintain the same structure, let me adjust it to simplify.
Hi all, I had committed code, about this commit I modified the following:
|
@@ -149,7 +155,17 @@ func TestBuilderWithCollectors(t *testing.T) { | |||
cfg.Notifier = test.notifier | |||
cfg.Discoverer = test.discoverer | |||
|
|||
conn, err := cfg.CreateConnection(zap.NewNop()) | |||
mb := metricstest.NewFactory(time.Hour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mf
is a more intuitive var name here
MetricsFactory: mb, | ||
} | ||
cm := reporter.WrapWithConnectMetrics(params, test.target) | ||
tr := &connectMetricsTest{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this struct serves no purpose, you can access the factory directly
} | ||
|
||
// WrapWithConnectMetrics creates ConnectMetrics. | ||
func WrapWithConnectMetrics(params ConnectMetricsParams, target string) *ConnectMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func WrapWithConnectMetrics(params ConnectMetricsParams, target string) *ConnectMetrics { | |
func NewConnectMetrics(params ConnectMetricsParams, target string) *ConnectMetrics { |
nothing is being wrapped here
} | ||
if params.ExpireTTL == 0 { | ||
params.ExpireTTL = defaultExpireTTL | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is the point of ExpireFrequency/ExpireTTL?
// OnConnectionStatusChange used for pass the status parameter when connection is changed | ||
// 0 is disconnected, 1 is connected | ||
// For quick view status via use `sum(jaeger_agent_connection_status_connected_collector_status{}) by (instance) > bool 0` | ||
func (r *ConnectMetrics) OnConnectionStatusChange(status int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes the API more explicit:
func (r *ConnectMetrics) OnConnectionStatusChange(status int64) { | |
func (r *ConnectMetrics) OnConnectionStatusChange(connected bool) { |
if params.ExpireTTL == 0 { | ||
params.ExpireTTL = defaultExpireTTL | ||
} | ||
params.Target = target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my objection was not about how target was passed, but to using it as a label in the metrics - it can cause high cardinality due to ephemeral ports.
// 0 is disconnected, 1 is connected | ||
// For quick view status via use `sum(jaeger_agent_connection_status_connected_collector_status{}) by (instance) > bool 0` | ||
func (r *ConnectMetrics) OnConnectionStatusChange(status int64) { | ||
metric := r.params.MetricsFactory.Gauge(metrics.Options{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole file uses just two metrics, but one initialized via reflection another explicitly here. Please pick just one way to simplify the code.
"go.uber.org/zap" | ||
) | ||
|
||
// connectMetrics is real metric, but not support to directly change, need via ConnectMetrics for changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this comment, it doesn't add much to the code
} | ||
|
||
cm := new(connectMetrics) | ||
cmp.MetricsFactory = cmp.MetricsFactory.Namespace(metrics.NSOptions{Name: "connection_status"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the extra namespacing is unnecessary here, metric names are already descriptive.
} | ||
|
||
// NewConnectMetrics creates ConnectMetricsParams. | ||
func NewConnectMetrics(cmp ConnectMetricsParams) *ConnectMetricsParams { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cmp
struct is not needed, the only argument this function needs it the metrics factory, which can be passed directly.
the returned struct should not be called ...Params
, it's very confusing. Call it ConnectMetrics
. It can be the same struct as connectMetrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok let me adjust it
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: luhualin <[email protected]> Co-authored-by: luhualin <[email protected]> Signed-off-by: WalkerWang731 <[email protected]>
* Fix flaky tbuffered server test Signed-off-by: Pavel Kositsyn <[email protected]> * Apply suggestions from code review - more readable comments Co-authored-by: Yuri Shkuro <[email protected]> Signed-off-by: Pavel Kositsyn <[email protected]> Co-authored-by: Yuri Shkuro <[email protected]> Signed-off-by: WalkerWang731 <[email protected]>
* Add github action for jaeger integration tests Signed-off-by: Ashmita Bohara <[email protected]> * Create separate workflow for each integration test Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]>
* Add github action for jaeger all-in-one image Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> * Make steps self-explantory Signed-off-by: Ashmita Bohara <[email protected]> * Fix git tags issue Signed-off-by: Ashmita Bohara <[email protected]> * Fix ES integration test Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]> Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]> Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]>
* Fix docker login issue with all-in-one build Signed-off-by: Ashmita Bohara <[email protected]> * Fix docker login issue with all-in-one build Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]>
Hi Sorry for disturbing
btw: about the Thanks! |
@yurishkuro @jpkrohling @Ashmita152 |
Hi @WalkerWang731 Sorry for the trouble. You're right, we changed that build script recently and that break your build. We missed an edge case where the PR is created from fork's master branch like yours this one. I have raised a fix: #2675 which should solve your build problem. |
Hi @Ashmita152 If you are not responsible, but you know who can help me, can help me thanks~~ |
The |
Hi @Ashmita152 really thanks for your response. But now, I encounter a new issue that doesn't pass Above for your reference. Same previous, if you are not responsible, but you know who can help me, can help me Thank you~ |
Must be another flaky test - #2679. I restarted the job. |
ok, thanks, currently, looks all normal. Thank you! |
@jpkrohling Hi, Happy new year, may I know, how's this PR progress so far? |
I see that there are a few comments from @yurishkuro that weren't addressed yet. |
Ah sorry, I thought it was pending for some reason. |
Signed-off-by: WalkerWang731 <[email protected]>
} | ||
|
||
// NewConnectMetrics will be initialize ConnectMetrics | ||
func (r *ConnectMetrics) NewConnectMetrics() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unusual construct. I would make it simply a constructor function
func NewConnectMetrics(logger *zap.Logger, metrics metrics.Factory) *ConnectMetrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry for responding too late..
I saw that your PR #2728 and thanks for your help clean-up.
Looked forward to learning more from you
r := expvar.Get("gRPCTarget") | ||
if r == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r := expvar.Get("gRPCTarget") | |
if r == nil { | |
if r := expvar.Get("gRPCTarget"); r == nil { |
logger.Info("Checking connection to collector") | ||
var egt *expvar.String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not move this into ConnectMetrics struct as well?
Signed-off-by: Yuri Shkuro <[email protected]>
minor clean-up in #2728 |
* [refactor] Minor clean-up of #2657 Signed-off-by: Yuri Shkuro <[email protected]> * add test Signed-off-by: Yuri Shkuro <[email protected]>
…ectors (jaegertracing#2657) * add metrics that show agent connection collector status Signed-off-by: WalkerWang731 <[email protected]> * update comment Signed-off-by: WalkerWang731 <[email protected]> * exec make fmt Signed-off-by: WalkerWang731 <[email protected]> * simplify function and add testing relevant code in the builder_test.go Signed-off-by: WalkerWang731 <[email protected]> * add comment in connect_metrics.go Signed-off-by: WalkerWang731 <[email protected]> * simplify code and changed use expvar to show target Signed-off-by: WalkerWang731 <[email protected]> * simplify code and changed use expvar to show target Signed-off-by: WalkerWang731 <[email protected]> * exec make fmt Signed-off-by: WalkerWang731 <[email protected]> * Fix collector panic due to sarama sdk returning nil error (jaegertracing#2654) Signed-off-by: luhualin <[email protected]> Co-authored-by: luhualin <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Fix flaky tbuffered server test (jaegertracing#2635) * Fix flaky tbuffered server test Signed-off-by: Pavel Kositsyn <[email protected]> * Apply suggestions from code review - more readable comments Co-authored-by: Yuri Shkuro <[email protected]> Signed-off-by: Pavel Kositsyn <[email protected]> Co-authored-by: Yuri Shkuro <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Add github actions for integration tests (jaegertracing#2649) * Add github action for jaeger integration tests Signed-off-by: Ashmita Bohara <[email protected]> * Create separate workflow for each integration test Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Clean-up GH action names (jaegertracing#2661) Signed-off-by: WalkerWang731 <[email protected]> * Fix for failures in badger integration tests (jaegertracing#2660) Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Add protogen validation test (jaegertracing#2662) Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Add github action for jaeger all-in-one image (jaegertracing#2663) * Add github action for jaeger all-in-one image Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> * Feedbacks changes Signed-off-by: Ashmita Bohara <[email protected]> * Make steps self-explantory Signed-off-by: Ashmita Bohara <[email protected]> * Fix git tags issue Signed-off-by: Ashmita Bohara <[email protected]> * Fix ES integration test Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Update comment that looks confusing during builds Signed-off-by: Yuri Shkuro <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Use GitHub actions based build badges Signed-off-by: Yuri Shkuro <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Fix and minor improvements to all-in-one github action (jaegertracing#2667) Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Fix docker login issue with all-in-one build (jaegertracing#2668) * Fix docker login issue with all-in-one build Signed-off-by: Ashmita Bohara <[email protected]> * Fix docker login issue with all-in-one build Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Fix issue with all-in-one build (jaegertracing#2669) Signed-off-by: Ashmita Bohara <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Update cmd/agent/app/reporter/connect_metrics.go accept suggestions Co-authored-by: Yuri Shkuro <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * Update cmd/agent/app/reporter/connect_metrics.go accept suggestions Co-authored-by: Yuri Shkuro <[email protected]> Signed-off-by: WalkerWang731 <[email protected]> * simplify the code that remove ConnectMetricsParams{} and integrate ConnectMetrics{} Signed-off-by: WalkerWang731 <[email protected]> * simplify the code that remove ConnectMetricsParams{} and integrate ConnectMetrics{} Signed-off-by: WalkerWang731 <[email protected]> * merage from the lastest master branch and exec make fmt Signed-off-by: walker.wangxy <[email protected]> * add comment on ConnectMetrics Signed-off-by: WalkerWang731 <[email protected]> * clear up redundant codes Signed-off-by: WalkerWang731 <[email protected]> Co-authored-by: WalkerWang731 <[email protected]> Co-authored-by: Betula-L <[email protected]> Co-authored-by: luhualin <[email protected]> Co-authored-by: Pavel Kositsyn <[email protected]> Co-authored-by: Yuri Shkuro <[email protected]> Co-authored-by: Ashmita <[email protected]> Co-authored-by: Yuri Shkuro <[email protected]> Co-authored-by: walker.wangxy <[email protected]>
* [refactor] Minor clean-up of jaegertracing#2657 Signed-off-by: Yuri Shkuro <[email protected]> * add test Signed-off-by: Yuri Shkuro <[email protected]>
Which problem is this PR solving?
Short description of the changes
cmd/agent/app/reporter/connect_metrics.go
, used for add new metrics that connections status between agent with collectioncmd/agent/app/reporter/connect_metrics_test.go
, used for unit test to check func incmd/agent/app/reporter/connect_metrics.go
cmd/agent/app/reporter/grpc/builder.go
funcCreateConnection()
, used for support change metrics as gRPC is changedcmd/agent/app/reporter/grpc/collector_proxy.go
funcNewCollectorProxy()
, used forCreateConnection()
newly added parameter requirementscmd/agent/app/reporter/grpc/builder_test.go
, used for unit test to check func incmd/agent/app/reporter/grpc/builder.go
and newly added parameter requirementsChangelog
Add metrics on the agent that can show connections status between the agent with collections