Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[agent] Add metrics to show connections status between agent and collectors #2657

Merged
merged 34 commits into from
Jan 13, 2021
Merged

Conversation

WalkerWang731
Copy link
Contributor

Which problem is this PR solving?

Short description of the changes

  • Add cmd/agent/app/reporter/connect_metrics.go, used for add new metrics that connections status between agent with collection
  • Add cmd/agent/app/reporter/connect_metrics_test.go, used for unit test to check func in cmd/agent/app/reporter/connect_metrics.go
  • Adjust cmd/agent/app/reporter/grpc/builder.go func CreateConnection(), used for support change metrics as gRPC is changed
  • Adjust cmd/agent/app/reporter/grpc/collector_proxy.go func NewCollectorProxy(), used for CreateConnection() newly added parameter requirements
  • Adjust cmd/agent/app/reporter/grpc/builder_test.go, used for unit test to check func in cmd/agent/app/reporter/grpc/builder.go and newly added parameter requirements

Changelog

Add metrics on the agent that can show connections status between the agent with collections

@WalkerWang731
Copy link
Contributor Author

Below is my local test:

(cmd/agent/app)$ go test -v
=== RUN   TestAgentStartError
--- PASS: TestAgentStartError (0.00s)
=== RUN   TestAgentSamplingEndpoint
--- PASS: TestAgentSamplingEndpoint (0.51s)
=== RUN   TestAgentMetricsEndpoint
--- PASS: TestAgentMetricsEndpoint (0.51s)
=== RUN   TestStartStopRace
    agent_test.go:175: stopping agent
    logger.go:130: 2020-11-26T22:34:19.699+0800	INFO	shutting down agent's HTTP server	{"addr": "[::]:5778"}
    logger.go:130: 2020-11-26T22:34:19.699+0800	INFO	Starting jaeger-agent HTTP server	{"http-port": 5778}
    logger.go:130: 2020-11-26T22:34:19.700+0800	INFO	agent's http server exiting
--- PASS: TestStartStopRace (0.00s)
=== RUN   TestBuilderFromConfig
--- PASS: TestBuilderFromConfig (0.00s)
=== RUN   TestBuilderWithExtraReporter
--- PASS: TestBuilderWithExtraReporter (0.00s)
=== RUN   TestBuilderWithProcessorErrors
--- PASS: TestBuilderWithProcessorErrors (0.00s)
=== RUN   TestMultipleCollectorProxies
[{} {}]
--- PASS: TestMultipleCollectorProxies (0.00s)
=== RUN   TestCreateCollectorProxy
--- PASS: TestCreateCollectorProxy (0.21s)
=== RUN   TestCreateCollectorProxy_UnknownReporter
--- PASS: TestCreateCollectorProxy_UnknownReporter (0.00s)
=== RUN   TestPublishOpts
--- PASS: TestPublishOpts (0.00s)
=== RUN   TestBindFlags
--- PASS: TestBindFlags (0.00s)
PASS
ok  	github.com/jaegertracing/jaeger/cmd/agent/app	1.663s
(cmd/agent/app/reporter)$ go test -v
=== RUN   TestClientMetricsReporter_Zipkin
--- PASS: TestClientMetricsReporter_Zipkin (0.00s)
=== RUN   TestClientMetricsReporter_Jaeger
=== RUN   TestClientMetricsReporter_Jaeger/iter0
=== RUN   TestClientMetricsReporter_Jaeger/iter1
=== RUN   TestClientMetricsReporter_Jaeger/iter2
=== RUN   TestClientMetricsReporter_Jaeger/iter3
=== RUN   TestClientMetricsReporter_Jaeger/iter4
=== RUN   TestClientMetricsReporter_Jaeger/iter5
=== RUN   TestClientMetricsReporter_Jaeger/iter6
--- PASS: TestClientMetricsReporter_Jaeger (0.00s)
    --- PASS: TestClientMetricsReporter_Jaeger/iter0 (0.00s)
    --- PASS: TestClientMetricsReporter_Jaeger/iter1 (0.00s)
    --- PASS: TestClientMetricsReporter_Jaeger/iter2 (0.00s)
    --- PASS: TestClientMetricsReporter_Jaeger/iter3 (0.00s)
    --- PASS: TestClientMetricsReporter_Jaeger/iter4 (0.00s)
    --- PASS: TestClientMetricsReporter_Jaeger/iter5 (0.00s)
    --- PASS: TestClientMetricsReporter_Jaeger/iter6 (0.00s)
=== RUN   TestClientMetricsReporter_ClientUUID
=== RUN   TestClientMetricsReporter_ClientUUID/iter0
=== RUN   TestClientMetricsReporter_ClientUUID/iter1
=== RUN   TestClientMetricsReporter_ClientUUID/iter2
=== RUN   TestClientMetricsReporter_ClientUUID/iter3
=== RUN   TestClientMetricsReporter_ClientUUID/iter4
=== RUN   TestClientMetricsReporter_ClientUUID/iter5
--- PASS: TestClientMetricsReporter_ClientUUID (0.00s)
    --- PASS: TestClientMetricsReporter_ClientUUID/iter0 (0.00s)
    --- PASS: TestClientMetricsReporter_ClientUUID/iter1 (0.00s)
    --- PASS: TestClientMetricsReporter_ClientUUID/iter2 (0.00s)
    --- PASS: TestClientMetricsReporter_ClientUUID/iter3 (0.00s)
    --- PASS: TestClientMetricsReporter_ClientUUID/iter4 (0.00s)
    --- PASS: TestClientMetricsReporter_ClientUUID/iter5 (0.00s)
=== RUN   TestClientMetricsReporter_Expire
=== RUN   TestClientMetricsReporter_Expire/detect_new_client
    client_metrics_test.go:261: gauge=1 detected on iteration 1
=== RUN   TestClientMetricsReporter_Expire/detect_stale_client
    client_metrics_test.go:278: gauge=0 detected on iteration 0
--- PASS: TestClientMetricsReporter_Expire (0.05s)
    --- PASS: TestClientMetricsReporter_Expire/detect_new_client (0.00s)
    --- PASS: TestClientMetricsReporter_Expire/detect_stale_client (0.05s)
=== RUN   TestConnectMetrics
--- PASS: TestConnectMetrics (0.00s)
=== RUN   TestBindFlags_NoJaegerTags
--- PASS: TestBindFlags_NoJaegerTags (0.00s)
=== RUN   TestBindFlags
--- PASS: TestBindFlags (0.00s)
=== RUN   TestBindFlagsAllInOne
--- PASS: TestBindFlagsAllInOne (0.00s)
=== RUN   TestMetricsReporter
--- PASS: TestMetricsReporter (0.00s)
=== RUN   TestMultiReporter
--- PASS: TestMultiReporter (0.00s)
=== RUN   TestMultiReporterErrors
--- PASS: TestMultiReporterErrors (0.00s)
PASS
ok  	github.com/jaegertracing/jaeger/cmd/agent/app/reporter	0.461s
(cmd/agent/app/reporter/grpc)$ go test -v
=== RUN   TestBuilderFromConfig
--- PASS: TestBuilderFromConfig (0.00s)
=== RUN   TestBuilderWithCollectors
=== RUN   TestBuilderWithCollectors/with_roundrobin_schema
=== RUN   TestBuilderWithCollectors/with_single_host
=== RUN   TestBuilderWithCollectors/with_custom_resolver_and_fixed_discoverer
=== RUN   TestBuilderWithCollectors/without_collectorPorts_and_resolver
=== RUN   TestBuilderWithCollectors/with_collector_connection_status_ready
=== RUN   TestBuilderWithCollectors/with_collector_connection_status_failure
--- PASS: TestBuilderWithCollectors (0.00s)
    --- PASS: TestBuilderWithCollectors/with_roundrobin_schema (0.00s)
    --- PASS: TestBuilderWithCollectors/with_single_host (0.00s)
    --- PASS: TestBuilderWithCollectors/with_custom_resolver_and_fixed_discoverer (0.00s)
    --- PASS: TestBuilderWithCollectors/without_collectorPorts_and_resolver (0.00s)
    --- PASS: TestBuilderWithCollectors/with_collector_connection_status_ready (0.00s)
    --- PASS: TestBuilderWithCollectors/with_collector_connection_status_failure (0.00s)
=== RUN   TestProxyBuilder
=== RUN   TestProxyBuilder/should_pass_with_insecure_grpc_connection
=== RUN   TestProxyBuilder/should_fail_with_secure_grpc_connection_and_a_CA_file_which_does_not_exist
=== RUN   TestProxyBuilder/should_pass_with_secure_grpc_connection_and_valid_TLS_Client_settings
--- PASS: TestProxyBuilder (0.00s)
    --- PASS: TestProxyBuilder/should_pass_with_insecure_grpc_connection (0.00s)
    --- PASS: TestProxyBuilder/should_fail_with_secure_grpc_connection_and_a_CA_file_which_does_not_exist (0.00s)
    --- PASS: TestProxyBuilder/should_pass_with_secure_grpc_connection_and_valid_TLS_Client_settings (0.00s)
=== RUN   TestProxyClientTLS
=== RUN   TestProxyClientTLS/should_pass_with_insecure_grpc_connection
=== RUN   TestProxyClientTLS/should_fail_with_TLS_client_to_non-TLS_server
=== RUN   TestProxyClientTLS/should_fail_with_TLS_client_to_untrusted_TLS_server
=== RUN   TestProxyClientTLS/should_fail_with_TLS_client_to_trusted_TLS_server_with_incorrect_hostname
=== RUN   TestProxyClientTLS/should_pass_with_TLS_client_to_trusted_TLS_server_with_correct_hostname
=== RUN   TestProxyClientTLS/should_fail_with_TLS_client_without_cert_to_trusted_TLS_server_requiring_cert
=== RUN   TestProxyClientTLS/should_fail_with_TLS_client_without_cert_to_trusted_TLS_server_requiring_cert_from_a_different_CA
=== RUN   TestProxyClientTLS/should_pass_with_TLS_client_with_cert_to_trusted_TLS_server_requiring_cert
--- PASS: TestProxyClientTLS (0.17s)
    --- PASS: TestProxyClientTLS/should_pass_with_insecure_grpc_connection (0.00s)
    --- PASS: TestProxyClientTLS/should_fail_with_TLS_client_to_non-TLS_server (0.14s)
    --- PASS: TestProxyClientTLS/should_fail_with_TLS_client_to_untrusted_TLS_server (0.00s)
    --- PASS: TestProxyClientTLS/should_fail_with_TLS_client_to_trusted_TLS_server_with_incorrect_hostname (0.00s)
    --- PASS: TestProxyClientTLS/should_pass_with_TLS_client_to_trusted_TLS_server_with_correct_hostname (0.00s)
    --- PASS: TestProxyClientTLS/should_fail_with_TLS_client_without_cert_to_trusted_TLS_server_requiring_cert (0.00s)
    --- PASS: TestProxyClientTLS/should_fail_with_TLS_client_without_cert_to_trusted_TLS_server_requiring_cert_from_a_different_CA (0.01s)
    --- PASS: TestProxyClientTLS/should_pass_with_TLS_client_with_cert_to_trusted_TLS_server_requiring_cert (0.01s)
=== RUN   TestMultipleCollectors
--- PASS: TestMultipleCollectors (0.00s)
=== RUN   TestBindFlags
--- PASS: TestBindFlags (0.00s)
=== RUN   TestReporter_EmitZipkinBatch
--- PASS: TestReporter_EmitZipkinBatch (0.00s)
=== RUN   TestReporter_EmitBatch
--- PASS: TestReporter_EmitBatch (0.00s)
=== RUN   TestReporter_SendFailure
--- PASS: TestReporter_SendFailure (0.00s)
=== RUN   TestReporter_AddProcessTags_EmptyTags
--- PASS: TestReporter_AddProcessTags_EmptyTags (0.00s)
=== RUN   TestReporter_AddProcessTags_ZipkinBatch
--- PASS: TestReporter_AddProcessTags_ZipkinBatch (0.00s)
=== RUN   TestReporter_AddProcessTags_JaegerBatch
--- PASS: TestReporter_AddProcessTags_JaegerBatch (0.00s)
=== RUN   TestReporter_MakeModelKeyValue
--- PASS: TestReporter_MakeModelKeyValue (0.00s)
PASS
ok  	github.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc	0.571s

@objectiser
Copy link
Contributor

@WalkerWang731 lint failed:

Go fmt, license check, or import ordering failures, run 'make fmt'

@codecov
Copy link

codecov bot commented Nov 26, 2020

Codecov Report

Merging #2657 (d00801e) into master (2f8ffa9) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2657      +/-   ##
==========================================
+ Coverage   95.73%   95.76%   +0.02%     
==========================================
  Files         216      217       +1     
  Lines        9599     9623      +24     
==========================================
+ Hits         9190     9215      +25     
  Misses        336      336              
+ Partials       73       72       -1     
Impacted Files Coverage Δ
cmd/agent/app/reporter/connect_metrics.go 100.00% <100.00%> (ø)
cmd/agent/app/reporter/grpc/builder.go 96.42% <100.00%> (+1.42%) ⬆️
cmd/agent/app/reporter/grpc/collector_proxy.go 100.00% <100.00%> (ø)
plugin/storage/integration/integration.go 77.34% <0.00%> (-0.56%) ⬇️
cmd/query/app/static_handler.go 96.52% <0.00%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc091c5...d00801e. Read the comment docs.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall approach seems reasonable.

logger.Info("Checking connection to collector")
for {
s := cc.GetState()
if s != connectivity.Ready {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put s == Ready success case first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compatibility and unnecessary changes in the future. Actually, I concern the gRPC official changes connectivity status constant in the future version, although seems less possible.

cmd/agent/app/reporter/grpc/builder_test.go Show resolved Hide resolved
ExpireTTL time.Duration
}

// ConnectMetricsReporter is a decorator also it not actual use currently.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't accurate, it is not used as a decorator, and does not need to be called "reporter" since it's confusing in the current package where "reporter" refers to span reporter (or "exporter" in OTEL terminology).

metric := r.params.MetricsFactory.Gauge(metrics.Options{
Name: "connected_collector_status",
Help: "Connection status that jaeger-agent to jaeger-collector, 1 is connected, 0 is disconnected",
Tags: map[string]string{"target": target},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target here is the address of the collector, correct? Is it a good idea to use that as label, given that it could contain an ephemeral port number and prone to high cardinality? I'd say the target could be reported as an expvar variable, but the metric can be just the status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ephemeral port number

Care to expand on that? I don't remember seeing servers with ephemeral port numbers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? For example, at Uber most services run on ephemeral port numbers, to make it easy to stack and relocate them across available compute capacity. The discovery service reports host:port pairs for all instances of a given service. The custom build of the agent that Uber runs would use that discovery mechanism to locate Jaeger collectors. So the target string could be quite dynamic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, for this early version, I use such as oldTarget variable, the function incoming variable like newTarget, function will change oldTarget metric to 0, change newTarget metric to 1. In this case, we can watch which one agent connects to which collector. But eventually, I thought these are too complicated. We only need to observe the connection status is ok, after all. Which one agent connects to which collector should be added in collector metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For connected is 0 disconnected is 1 , because when have many connection metrics, we can filter sum(jaeger_agent_connection_status_connected_collector_status{}) by (instance) > bool 0 for quick view status.

I fixed the target, no need to pass the target parameter, in the next commit. target is static when the connectMetrics{} initialize

For the current PR, I thought that simplify the design first, and add new features if required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, at Uber most services run on ephemeral port numbers, to make it easy to stack and relocate them across available compute capacity

Didn't know that, but it does make sense for high density VMs/bare metal servers.

}

// CollectorConnected used for change metric as agent connected.
func (r *ConnectMetricsReporter) CollectorConnected(target string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (r *ConnectMetricsReporter) CollectorConnected(target string) {
func (r *ConnectMetricsReporter) OnConnected(target string) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps have just a single func OnConnectionStatusChange and pass a parameter.

}

// CollectorAborted used for change metric as agent disconnected.
func (r *ConnectMetricsReporter) CollectorAborted(target string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (r *ConnectMetricsReporter) CollectorAborted(target string) {
func (r *ConnectMetricsReporter) OnDisconnected(target string) {

Logger *zap.Logger // required
MetricsFactory metrics.Factory // required
ExpireFrequency time.Duration
ExpireTTL time.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like any of these fields are needed, let's simplify to only what's necessary. The gauge can be initialized in the constructor once (especially if target is not used as a label per my comment below).

}

// WrapWithConnectMetrics creates ConnectMetricsReporter.
func WrapWithConnectMetrics(params ConnectMetricsReporterParams) *ConnectMetricsReporter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to comment above, this is not "wrapping" anything, just a standalone util class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see, I also feel that there is no meaning, just maintain the same structure, let me adjust it to simplify.

@WalkerWang731
Copy link
Contributor Author

@yurishkuro @jpkrohling

Hi all,

I had committed code, about this commit I modified the following:

  • changed cmd/agent/app/reporter/connect_metrics.go relevant func that simplify useless parameters, and target is changed static when the connectMetrics{} initialize.
  • changed cmd/agent/app/reporter/connect_metrics.go that add a new metric in connectMetrics.ConnectedCollectorReconnect used for can reflect current connection stability and demo the overall structure
  • changed cmd/agent/app/reporter/grpc/builder.go that s == connectivity.Ready
  • changed cmd/agent/app/reporter/grpc/builder.go that add new variable b.ConnectMetrics used for unit test outside call
  • changed cmd/agent/app/reporter/grpc/builder_test.go that add unit test in TestBuilderWithCollectors() as builder unit test.
  • changed cmd/agent/app/reporter/connect_metrics_test.go that add test new metric

@@ -149,7 +155,17 @@ func TestBuilderWithCollectors(t *testing.T) {
cfg.Notifier = test.notifier
cfg.Discoverer = test.discoverer

conn, err := cfg.CreateConnection(zap.NewNop())
mb := metricstest.NewFactory(time.Hour)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mf is a more intuitive var name here

MetricsFactory: mb,
}
cm := reporter.WrapWithConnectMetrics(params, test.target)
tr := &connectMetricsTest{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this struct serves no purpose, you can access the factory directly

}

// WrapWithConnectMetrics creates ConnectMetrics.
func WrapWithConnectMetrics(params ConnectMetricsParams, target string) *ConnectMetrics {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func WrapWithConnectMetrics(params ConnectMetricsParams, target string) *ConnectMetrics {
func NewConnectMetrics(params ConnectMetricsParams, target string) *ConnectMetrics {

nothing is being wrapped here

}
if params.ExpireTTL == 0 {
params.ExpireTTL = defaultExpireTTL
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is the point of ExpireFrequency/ExpireTTL?

// OnConnectionStatusChange used for pass the status parameter when connection is changed
// 0 is disconnected, 1 is connected
// For quick view status via use `sum(jaeger_agent_connection_status_connected_collector_status{}) by (instance) > bool 0`
func (r *ConnectMetrics) OnConnectionStatusChange(status int64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes the API more explicit:

Suggested change
func (r *ConnectMetrics) OnConnectionStatusChange(status int64) {
func (r *ConnectMetrics) OnConnectionStatusChange(connected bool) {

if params.ExpireTTL == 0 {
params.ExpireTTL = defaultExpireTTL
}
params.Target = target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my objection was not about how target was passed, but to using it as a label in the metrics - it can cause high cardinality due to ephemeral ports.

// 0 is disconnected, 1 is connected
// For quick view status via use `sum(jaeger_agent_connection_status_connected_collector_status{}) by (instance) > bool 0`
func (r *ConnectMetrics) OnConnectionStatusChange(status int64) {
metric := r.params.MetricsFactory.Gauge(metrics.Options{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file uses just two metrics, but one initialized via reflection another explicitly here. Please pick just one way to simplify the code.

cmd/agent/app/reporter/connect_metrics.go Outdated Show resolved Hide resolved
"go.uber.org/zap"
)

// connectMetrics is real metric, but not support to directly change, need via ConnectMetrics for changed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this comment, it doesn't add much to the code

cmd/agent/app/reporter/connect_metrics.go Outdated Show resolved Hide resolved
}

cm := new(connectMetrics)
cmp.MetricsFactory = cmp.MetricsFactory.Namespace(metrics.NSOptions{Name: "connection_status"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the extra namespacing is unnecessary here, metric names are already descriptive.

}

// NewConnectMetrics creates ConnectMetricsParams.
func NewConnectMetrics(cmp ConnectMetricsParams) *ConnectMetricsParams {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cmp struct is not needed, the only argument this function needs it the metrics factory, which can be passed directly.

the returned struct should not be called ...Params, it's very confusing. Call it ConnectMetrics. It can be the same struct as connectMetrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let me adjust it

WalkerWang731 and others added 19 commits November 30, 2020 17:10
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: luhualin <[email protected]>

Co-authored-by: luhualin <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
* Fix flaky tbuffered server test

Signed-off-by: Pavel Kositsyn <[email protected]>

* Apply suggestions from code review - more readable comments

Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: Pavel Kositsyn <[email protected]>

Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
* Add github action for jaeger integration tests

Signed-off-by: Ashmita Bohara <[email protected]>

* Create separate workflow for each integration test

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
* Add github action for jaeger all-in-one image

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Make steps self-explantory

Signed-off-by: Ashmita Bohara <[email protected]>

* Fix git tags issue

Signed-off-by: Ashmita Bohara <[email protected]>

* Fix ES integration test

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
* Fix docker login issue with all-in-one build

Signed-off-by: Ashmita Bohara <[email protected]>

* Fix docker login issue with all-in-one build

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>
@WalkerWang731
Copy link
Contributor Author

@jpkrohling @yurishkuro

Hi Sorry for disturbing
I had committed code.
According to @yurishkuro suggestion, about this commit I modified the following:

  • changed cmd/agent/app/reporter/connect_metrics.go that integrated ConnectMetrics() and removed ConnectMetricsParams()
  • changed cmd/agent/app/reporter/grpc/builder.go that achieve target by expvar and adjusted code of dependency relevant
  • changed cmd/agent/app/reporter/grpc/builder_test.go and cmd/agent/app/reporter/connect_metrics_test.go ensure unit test is passed

btw: about the Unit Tests / unit-tests (pull_request) and Build all-in-one / all-in-one (pull_request), I saw doesn't pass. I don't know the reason. if need me to modify, let me know.

Thanks!

@WalkerWang731
Copy link
Contributor Author

@yurishkuro @jpkrohling @Ashmita152
Hi all, who can take a look that why does build docker-images always failed recently?
I saw scripts/travis/build-all-in-one-image.sh was modified recently. I don't know whether related.

@Ashmita152
Copy link
Contributor

Hi @WalkerWang731 Sorry for the trouble. You're right, we changed that build script recently and that break your build.

We missed an edge case where the PR is created from fork's master branch like yours this one. I have raised a fix: #2675 which should solve your build problem.

@WalkerWang731
Copy link
Contributor Author

Hi @Ashmita152
Thanks for the response. But now I encounter a new issue that doesn't pass CIT Elasticsearch / elasticsearch 5.x (pull_request), I don't know who can help me take a look, and I also don't know whether related to recently modified.

If you are not responsible, but you know who can help me, can help me @

thanks~~

@Ashmita152
Copy link
Contributor

Hi @WalkerWang731

The CIT Elasticsearch / elasticsearch 5.x job issue has been resolved in master. Thank you.

@WalkerWang731
Copy link
Contributor Author

Hi @Ashmita152 really thanks for your response.

But now, I encounter a new issue that doesn't pass Unit Tests / unit-tests (pull_request) Failing after 7m — unit-tests. haha.
Different from before, this case will sometimes fail sometimes succeed.
I encountered it in a previous test, but then passed as Merge branch 'master' into master.

Above for your reference.

Same previous, if you are not responsible, but you know who can help me, can help me @

Thank you~

@yurishkuro
Copy link
Member

Must be another flaky test - #2679. I restarted the job.

@WalkerWang731
Copy link
Contributor Author

Must be another flaky test - #2679. I restarted the job.

ok, thanks, currently, looks all normal.
For coding, I had modified as required, if any questions let me know.

Thank you!

@WalkerWang731
Copy link
Contributor Author

@jpkrohling Hi, Happy new year, may I know, how's this PR progress so far?

@jpkrohling
Copy link
Contributor

I see that there are a few comments from @yurishkuro that weren't addressed yet.

@WalkerWang731
Copy link
Contributor Author

I see that there are a few comments from @yurishkuro that weren't addressed yet.

Ah sorry, I thought it was pending for some reason.
Thanks for your reminder, I will fix it as soon as possible in few days.

}

// NewConnectMetrics will be initialize ConnectMetrics
func (r *ConnectMetrics) NewConnectMetrics() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unusual construct. I would make it simply a constructor function

func NewConnectMetrics(logger *zap.Logger, metrics metrics.Factory) *ConnectMetrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry for responding too late..

I saw that your PR #2728 and thanks for your help clean-up.

Looked forward to learning more from you

Comment on lines +119 to +120
r := expvar.Get("gRPCTarget")
if r == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r := expvar.Get("gRPCTarget")
if r == nil {
if r := expvar.Get("gRPCTarget"); r == nil {

logger.Info("Checking connection to collector")
var egt *expvar.String
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not move this into ConnectMetrics struct as well?

@yurishkuro yurishkuro merged commit b1ecb5b into jaegertracing:master Jan 13, 2021
@yurishkuro yurishkuro changed the title Add metrics on the agent that can show connections status between agent with collections [agent] Add metrics to show connections status between agent and collectors Jan 13, 2021
yurishkuro added a commit to yurishkuro/jaeger that referenced this pull request Jan 13, 2021
@yurishkuro
Copy link
Member

minor clean-up in #2728

yurishkuro added a commit that referenced this pull request Jan 13, 2021
* [refactor] Minor clean-up of #2657

Signed-off-by: Yuri Shkuro <[email protected]>

* add test

Signed-off-by: Yuri Shkuro <[email protected]>
bhiravabhatla pushed a commit to bhiravabhatla/jaeger that referenced this pull request Jan 25, 2021
…ectors (jaegertracing#2657)

* add metrics that show agent connection collector status

Signed-off-by: WalkerWang731 <[email protected]>

* update comment

Signed-off-by: WalkerWang731 <[email protected]>

* exec make fmt

Signed-off-by: WalkerWang731 <[email protected]>

* simplify function and add testing relevant code in the builder_test.go

Signed-off-by: WalkerWang731 <[email protected]>

* add comment in connect_metrics.go

Signed-off-by: WalkerWang731 <[email protected]>

* simplify code and changed use expvar to show target

Signed-off-by: WalkerWang731 <[email protected]>

* simplify code and changed use expvar to show target

Signed-off-by: WalkerWang731 <[email protected]>

* exec make fmt

Signed-off-by: WalkerWang731 <[email protected]>

* Fix collector panic due to sarama sdk returning nil error (jaegertracing#2654)

Signed-off-by: luhualin <[email protected]>

Co-authored-by: luhualin <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Fix flaky tbuffered server test (jaegertracing#2635)

* Fix flaky tbuffered server test

Signed-off-by: Pavel Kositsyn <[email protected]>

* Apply suggestions from code review - more readable comments

Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: Pavel Kositsyn <[email protected]>

Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Add github actions for integration tests (jaegertracing#2649)

* Add github action for jaeger integration tests

Signed-off-by: Ashmita Bohara <[email protected]>

* Create separate workflow for each integration test

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Clean-up GH action names (jaegertracing#2661)

Signed-off-by: WalkerWang731 <[email protected]>

* Fix for failures in badger integration tests (jaegertracing#2660)

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Add protogen validation test (jaegertracing#2662)

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Add github action for jaeger all-in-one image (jaegertracing#2663)

* Add github action for jaeger all-in-one image

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Feedbacks changes

Signed-off-by: Ashmita Bohara <[email protected]>

* Make steps self-explantory

Signed-off-by: Ashmita Bohara <[email protected]>

* Fix git tags issue

Signed-off-by: Ashmita Bohara <[email protected]>

* Fix ES integration test

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Update comment that looks confusing during builds

Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Use GitHub actions based build badges

Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Fix and minor improvements to all-in-one github action (jaegertracing#2667)

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Fix docker login issue with all-in-one build (jaegertracing#2668)

* Fix docker login issue with all-in-one build

Signed-off-by: Ashmita Bohara <[email protected]>

* Fix docker login issue with all-in-one build

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Fix issue with all-in-one build (jaegertracing#2669)

Signed-off-by: Ashmita Bohara <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Update cmd/agent/app/reporter/connect_metrics.go

accept suggestions

Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* Update cmd/agent/app/reporter/connect_metrics.go

accept suggestions

Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: WalkerWang731 <[email protected]>

* simplify the code that remove ConnectMetricsParams{} and integrate ConnectMetrics{}

Signed-off-by: WalkerWang731 <[email protected]>

* simplify the code that remove ConnectMetricsParams{} and integrate ConnectMetrics{}

Signed-off-by: WalkerWang731 <[email protected]>

* merage from the lastest master branch and exec make fmt

Signed-off-by: walker.wangxy <[email protected]>

* add comment on ConnectMetrics

Signed-off-by: WalkerWang731 <[email protected]>

* clear up redundant codes

Signed-off-by: WalkerWang731 <[email protected]>

Co-authored-by: WalkerWang731 <[email protected]>
Co-authored-by: Betula-L <[email protected]>
Co-authored-by: luhualin <[email protected]>
Co-authored-by: Pavel Kositsyn <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
Co-authored-by: Ashmita <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
Co-authored-by: walker.wangxy <[email protected]>
bhiravabhatla pushed a commit to bhiravabhatla/jaeger that referenced this pull request Jan 25, 2021
* [refactor] Minor clean-up of jaegertracing#2657

Signed-off-by: Yuri Shkuro <[email protected]>

* add test

Signed-off-by: Yuri Shkuro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metrics on the agent that can show connections status between agent with collection
7 participants