Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metric for debug traces #796

Merged
merged 7 commits into from
May 25, 2018
Merged

Add metric for debug traces #796

merged 7 commits into from
May 25, 2018

Conversation

black-adder
Copy link
Contributor

Currently, we support metrics for the total number of debug spans but another useful metric to have is the debug traces (services that initiate the debug trace). Since debug traces aren't downsampled (this isn't technically true here in OSS...) having metrics to observe which service is creating more debug traces than necessary is useful.

Signed-off-by: Won Jun Jang [email protected]

@ghost ghost assigned black-adder May 1, 2018
@ghost ghost added the review label May 1, 2018
counts: make(map[string]metrics.Counter),
factory: factory.Namespace("debug-spans."+category, nil),
lock: &sync.Mutex{},
counts: make(map[string]metrics.Counter),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this slightly. Rather than having debugSpans and debugTraces as separate metrics, they should be part of the spans and traces metrics but with a debug tag. This way it's easier to aggregate. I guess technically this is a breaking change?

While not in scope of this PR, we could break up the taggify the metrics further by splitting out service as a tag.

@coveralls
Copy link

coveralls commented May 1, 2018

Coverage Status

Coverage remained the same at 100.0% when pulling fef43b8 on metric_for_root_traces into 70dd38f on master.

@yurishkuro
Copy link
Member

change looks good, but I am concerned how this would overlap with #776

@black-adder
Copy link
Contributor Author

black-adder commented May 4, 2018

he who lands first, wins

} else {
counter = getOrCreateCounter(m.counts, func() metrics.Counter {
return m.factory.Counter(serviceName, map[string]string{"debug": "false"})
})
}
m.lock.Unlock()
if counter != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have a fallback "all other services" name when over limit.

@black-adder
Copy link
Contributor Author

I'll address changes and keep this PR open, I'll coordinate with @jpkrohling so that these two changes can land at the same time.

@black-adder
Copy link
Contributor Author

or might be easier to just do the changes in that PR

@yurishkuro
Copy link
Member

do yo mean combine them or merge this and re-apply there?

@black-adder
Copy link
Contributor Author

Ill redo this change on top of his branch and we can land it there

Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>
Signed-off-by: Won Jun Jang <[email protected]>
@ghost ghost removed the review label May 25, 2018
@black-adder black-adder reopened this May 25, 2018
@ghost ghost added the review label May 25, 2018
} else {
counter = getOrCreateCounter(m.counts, func() metrics.Counter {
return m.factory.Counter("", map[string]string{"service": serviceName, "debug": "false"})
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is highly in efficient. The overall function is "count", one would expect it to be quick and zero-allocations, but you are creating all these lambdas that will require memory allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went from 1067 ns/op 231 B/op 11 allocs/op to 947 ns/op 165 B/op 9 allocs/op

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe if you benchmark the unhappy path, but try with the same service name in a hot loop

Signed-off-by: Won Jun Jang <[email protected]>
@ghost ghost removed the review label May 25, 2018
@black-adder black-adder reopened this May 25, 2018
@ghost ghost added the review label May 25, 2018
}
}

func (m *countsBySvc) x(serviceName, debugStr string, counts map[string]metrics.Counter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need an extra function with such a great name? just set two local vars above and continue in the same function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duuude

Signed-off-by: Won Jun Jang <[email protected]>
@black-adder
Copy link
Contributor Author

https://coveralls.io/jobs/36863391 <- 100% coverage. I'll investigate why overalls is being a pain but landing first.

@black-adder black-adder merged commit 037875b into master May 25, 2018
@ghost ghost removed the review label May 25, 2018
@black-adder black-adder deleted the metric_for_root_traces branch May 25, 2018 16:21
mabn pushed a commit to mabn/jaeger that referenced this pull request May 28, 2018
* master: (38 commits)
  Preparing release 1.5.0 (jaegertracing#847)
  Add bounds to memory storage (jaegertracing#845)
  Add metric for debug traces (jaegertracing#796)
  Change metrics naming scheme (jaegertracing#776)
  Bump gocql version (jaegertracing#829)
  Remove ParentSpanID from domain model (jaegertracing#831)
  Make gas run quiet (jaegertracing#838)
  Revert "Make gas run quite"
  Revert "Install gas from install-ci"
  Install gas from install-ci
  Make gas run quite
  Add 'gas' for security problems scanning (jaegertracing#830)
  Add ability to adjust static sampling probabilities per operation (jaegertracing#827)
  Support log-level flag on agent (jaegertracing#828)
  Remove unused function (jaegertracing#822)
  Add healthcheck to standalone (jaegertracing#784)
  Do not use KeyValue fields directly and use KeyValues as decorator only (jaegertracing#810)
  Add ContaAzul to the adopters list (jaegertracing#806)
  Add ISSUE_TEMPLATE and PULL_REQUEST_TEMPLATE (jaegertracing#805)
  Upgrade to  go 1.10 (jaegertracing#792)
  ...

# Conflicts:
#	cmd/agent/app/builder.go
#	cmd/collector/main.go
#	cmd/query/main.go
#	cmd/standalone/main.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants