promhttp: count hard request failures in roundtripper metrics #943

EdSchouten · 2021-12-02T10:31:52Z

Right now InstrumentRoundTripper{Counter,Duration} don't increase any
metrics in case of hard request failures (e.g., TCP, TLS failures). This
makes these functions unattractive to use, as those failures are
typically among the most interesting ones to alert on.

This change extends these functions to count such requests properly. To
distinguish them from requests that did yield an actual response, we
leave the "code" label empty. This seems the correct thing to do,
because hard request failures don't yield a code from the remote side.

Right now InstrumentRoundTripper{Counter,Duration} don't increase any metrics in case of hard request failures (e.g., TCP, TLS failures). This makes these functions unattractive to use, as those failures are typically among the most interesting ones to alert on. This change extends these functions to count such requests properly. To distinguish them from requests that did yield an actual response, we leave the "code" label empty. This seems the correct thing to do, because hard request failures don't yield a code from the remote side. Signed-off-by: Ed Schouten <[email protected]>

bwplotka

Thanks for this contribution!

Hm, we somehow never needed that as such error would usually surface in other means.
Think about the abstraction of round tripper. We have res, err := r.RoundTrip(...). If we have no error we have kind of successful roundtrip from point of view of this method, thus we increment that request was made with such code.

If error was returned, technically no request was made. So if our function observes results, there was no result too:

InstrumentRoundTripperCounter is a middleware that wraps the provided
// http.RoundTripper to observe the request result with the provided CounterVec.

If no result was returned error will be returned. Then users handle errors, usually by incrementing some metric due. If we increment our metric here we risk the anti-pattern of handling error multiple times, especially for existing users of our library.

It really depends on the semantics the user have on this. I am afraid this change might surprise many. Is there away to provide optionality for this feature? 🤔

cc @kakkoyun @beorn7

bwplotka · 2022-01-20T12:11:19Z

prometheus/promhttp/instrument_client.go

@@ -64,6 +84,8 @@ func InstrumentRoundTripperCounter(counter *prometheus.CounterVec, next http.Rou
 		resp, err := next.RoundTrip(r)
 		if err == nil {
 			counter.With(labels(code, method, r.Method, resp.StatusCode)).Inc()
+		} else {


We could just return resp, err here and avoid using else (opinionated nit).

bwplotka · 2022-01-20T12:11:29Z

prometheus/promhttp/instrument_client.go

@@ -93,6 +115,8 @@ func InstrumentRoundTripperDuration(obs prometheus.ObserverVec, next http.RoundT
 		resp, err := next.RoundTrip(r)
 		if err == nil {
 			obs.With(labels(code, method, r.Method, resp.StatusCode)).Observe(time.Since(start).Seconds())
+		} else {


We could just return resp, err here and avoid using else (opinionated nit).

bwplotka · 2022-01-20T12:12:34Z

prometheus/promhttp/instrument_client.go

+// for the case where a HTTP request failed without getting a server
+// response. The "code" label is left empty to distinguish from cases
+// where a server response was obtained.
+func labelsForError(code, method bool, reqMethod string) prometheus.Labels {


Why not reusing label and injecting custom code (e.g -1) that would be a sentinel for us to put empty label ""? Less deduplicate functionality.

"code" suppose to be a valid HTTP code after the latest #962 change. I totally agree that we need to make it more sentinel.

kakkoyun

I understand the need behind this. And I think it would be nice to have a facility to track underlying transport layer errors. However, there are several things we need to account here:

First of all, as of know RoundTripper description and API suggests it only meant to used for HTTP, so having errors from underlying layers could be confusing. As we already discussed we assume metrics would have HTTP status code and method labels, not having them or overloading their usage would be confusing.
As Bartek suggested this is changing existing behaviour and this could confuse people. Even changing and implicit behaviour could be problematic for the dependants of the library.

Those being said, I think if we steer the direction of the design we can achieve the same goal.
My suggestion would be to provide wrapper instrumented transport layers in client_golang. Such as instrumentedTCPTransport. So that we can have transport specific metrics and more over users can opt-in to usages of those.

What do you think?

kakkoyun · 2022-01-21T05:52:04Z

prometheus/promhttp/instrument_client.go

+// for the case where a HTTP request failed without getting a server
+// response. The "code" label is left empty to distinguish from cases
+// where a server response was obtained.
+func labelsForError(code, method bool, reqMethod string) prometheus.Labels {


"code" suppose to be a valid HTTP code after the latest #962 change. I totally agree that we need to make it more sentinel.

kakkoyun · 2022-03-16T08:57:32Z

@EdSchouten Do you want to continue working on this?

stale · 2022-06-04T02:46:16Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

EdSchouten force-pushed the eschouten/20211130-client-errors branch from 5ba3bfd to ca86423 Compare December 2, 2021 10:32

bwplotka reviewed Jan 20, 2022

View reviewed changes

kakkoyun requested changes Jan 21, 2022

View reviewed changes

stale bot added the stale label Jun 4, 2022

stale bot closed this Jun 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

promhttp: count hard request failures in roundtripper metrics #943

promhttp: count hard request failures in roundtripper metrics #943

EdSchouten commented Dec 2, 2021

bwplotka left a comment

bwplotka Jan 20, 2022

bwplotka Jan 20, 2022

bwplotka Jan 20, 2022

kakkoyun Jan 21, 2022

kakkoyun left a comment

kakkoyun Jan 21, 2022

kakkoyun commented Mar 16, 2022

stale bot commented Jun 4, 2022

promhttp: count hard request failures in roundtripper metrics #943

promhttp: count hard request failures in roundtripper metrics #943

Conversation

EdSchouten commented Dec 2, 2021

bwplotka left a comment

Choose a reason for hiding this comment

bwplotka Jan 20, 2022

Choose a reason for hiding this comment

bwplotka Jan 20, 2022

Choose a reason for hiding this comment

bwplotka Jan 20, 2022

Choose a reason for hiding this comment

kakkoyun Jan 21, 2022

Choose a reason for hiding this comment

kakkoyun left a comment

Choose a reason for hiding this comment

kakkoyun Jan 21, 2022

Choose a reason for hiding this comment

kakkoyun commented Mar 16, 2022

stale bot commented Jun 4, 2022