You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey guys, I'm curious about your opinion about this topic so decided to open a ticket - but for now just for discussing.
Sorry this will be a bit longer...
I'm a Software Architect - mainly Java background. Using typically Prometheus + Grafana combo when it comes to app observability for many years now.
There are mature Metrics libraries in Java and other languages (like Go, Python) which support Timer and Meter implementations can be used by developers very easily (interface is intuitive)
Just for reference: Timer is measuring "how long something takes" while Meter is measuring "event occurence / time unit" like requests/sec. Signature of these is very intuitive - easy to be built in and used by devs.
You can find better / more detailed description in this popular Java libs docs:
Based on my research it looks these "client side calculated" metric types are well supported in all languages (Java, Go, Python, Nodejs, etc) but in C++...
Unfortunately they are not supported by this library and I very very much miss them now as when you have hundreds of "coming and going" developers in a Company it does matter a lot how complicated it is for them to create and maintain a unified observability (=can be scraped and dashboarded with Prom + Grafana) in all services they maintain. So the problem is big!
My main question here is: shouldn't they be supported? Was this ever considered? Was this excluded intentionally? or it was just not really in the focus ever and nobody implemented this...
But what is this problem exactly?
The above mentioned metric types are not part of the core Prometheus metrics concept because they can be solved with the existing minimalistic Metric types + functions (I assume at least this is why) in Prometheus, e.g.
Meter:
You can just create a counter, which counts the event occurences then apply rate() function on it
And you are done. Right?
Well, not quite...
Both approaches has serious cons. Just to mention a few:
For Meter situation (so rate(<some counter>). If Prometheus scrape interval is once / minute or even less then it is clear Prometheus has no clue what has happened and how during that minute. Increasing the scraping frequency helps but comes with other cons: more disk needed, more network is used etc
For Timers the problem is even bigger... It's very difficult to pre-define a good "how long" intervals (so basically: bucket boundaries) fits to any processing time you want to monitor. Therefore it would become "developer responsibility" to choose the buckets appropriately. While we should keep the number of buckets minimal as each bucket is a counter so if you do it too granular (and now imagine hundreds of developers doing it differently) its a huuuge storage + performance impact on Prometheus side.
Due to these reasons it is not a coincidence that the above mentioned Timer and Meter types are supported by metric libraries and
they do statistics (distribution, etc) calculations on client side (often with using so called "Resevoir"s - like TimeSlidingWindowReservoir, 1m, 5m, etc - in the background)
for Timer based situations this way devs do not need to figure out upfront distribution buckets
Prometheus just receives the numerical values - calculated on client side - which very very drastically decreasing storage and performance difficulties
So there are very obvious benefits of implementing such things.
And I also have lots of painful practical experience with Observability if these are not supported but we need to go with Histograms all the time.
but it is abandoned project as far as I see - maybe should be checked
Conclusion
This is clearly missing. Reason I'm interested in for now.
I would be happy to contribute here with resources (also devs from companies I'm working for) to get this done... unless this topic was intentionally excluded from this library!
The text was updated successfully, but these errors were encountered:
Hey guys, I'm curious about your opinion about this topic so decided to open a ticket - but for now just for discussing.
Sorry this will be a bit longer...
I'm a Software Architect - mainly Java background. Using typically Prometheus + Grafana combo when it comes to app observability for many years now.
There are mature Metrics libraries in Java and other languages (like Go, Python) which support
Timer
andMeter
implementations can be used by developers very easily (interface is intuitive)Just for reference: Timer is measuring "how long something takes" while Meter is measuring "event occurence / time unit" like requests/sec. Signature of these is very intuitive - easy to be built in and used by devs.
You can find better / more detailed description in this popular Java libs docs:
Based on my research it looks these "client side calculated" metric types are well supported in all languages (Java, Go, Python, Nodejs, etc) but in C++...
Unfortunately they are not supported by this library and I very very much miss them now as when you have hundreds of "coming and going" developers in a Company it does matter a lot how complicated it is for them to create and maintain a unified observability (=can be scraped and dashboarded with Prom + Grafana) in all services they maintain. So the problem is big!
My main question here is: shouldn't they be supported? Was this ever considered? Was this excluded intentionally? or it was just not really in the focus ever and nobody implemented this...
But what is this problem exactly?
The above mentioned metric types are not part of the core Prometheus metrics concept because they can be solved with the existing minimalistic Metric types + functions (I assume at least this is why) in Prometheus, e.g.
Timer:
You can use a Histogram where Buckets are representing the "how long" part. (This guy summarized it nicely: https://povilasv.me/prometheus-tracking-request-duration/)
Meter:
You can just create a counter, which counts the event occurences then apply
rate()
function on itAnd you are done. Right?
Well, not quite...
Both approaches has serious cons. Just to mention a few:
rate(<some counter>)
. If Prometheus scrape interval is once / minute or even less then it is clear Prometheus has no clue what has happened and how during that minute. Increasing the scraping frequency helps but comes with other cons: more disk needed, more network is used etcDue to these reasons it is not a coincidence that the above mentioned Timer and Meter types are supported by metric libraries and
So there are very obvious benefits of implementing such things.
And I also have lots of painful practical experience with Observability if these are not supported but we need to go with Histograms all the time.
I found a cpp port of the Java metric lib I mentioned above
https://github.com/ultradns/cppmetrics
but it is abandoned project as far as I see - maybe should be checked
Conclusion
This is clearly missing. Reason I'm interested in for now.
I would be happy to contribute here with resources (also devs from companies I'm working for) to get this done... unless this topic was intentionally excluded from this library!
The text was updated successfully, but these errors were encountered: