-
Notifications
You must be signed in to change notification settings - Fork 107
cleanup carbon metrics for out-of-order vs duplicate data points, cleaner names in sync with prom metrics #1288
Conversation
I'm not very happy with the new mdata/errors package. Going a little further we could move the aggmetric errors into it and remove the conflicting 'errors'/'mdadaerrors' package name. The other solution I had was having different error types/values for chunk and reorder_buffer which would imply duplicating the switch case in discardedMetricsInc. Avoiding duplication could be done using a common interface. |
Test with:
|
yeah, the dependencies in mdata are a bit intricate. i suspect the right solution is moving Let's think about a refactor later though. For now this will work |
discardedNewValueForTimestamp.Inc() | ||
default: | ||
reason = "unknown" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from #1201 :
Every point we receive should either be persisted or we should increment a counter to say we discarded
thus in the unknown case, we should also increment a carbon counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we create a new catch-all sort of counter? tank.discarded.unknown
perhaps?
(note that this case does not happen atm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I want a clear paraphrasing of all the conversations in all those previous tickets. Seems there were a couple of loose ends (e.g. new_value_for_timestamp or duplicate_ts .. Florian suggested the former and that sounds good to me,) So here's my TLDR objective:
Florian, per 4), can you rename |
it should also be noted that metrics can be incorrectly classified as "too old" when "new value for ts" would have been more incorrect, e.g. for a ts older than the ROB's retention, or when not using ROB, for any ts older than the last one. |
tank.add_to_closed_chunk should also be renamed. also some of the metrics in NewDefaultHandler (invalid, unknown, ..) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, but needs a bit more work, see comments.
yes. |
…ple-out-of-order'
…tank.discarded.new-value-for-timestamp'
….received-too-late'
…carded.invalid/unknown'
}, | ||
{ | ||
"refId": "F", | ||
"target": "alias(sumSeries(perSecond(metrictank.stats.$environment.$instance.tank.add_to_closed_chunk.counter32)), 'add-to-saved')" | ||
"target": "alias(sumSeries(perSecond(metrictank.stats.$environment.$instance.tank.discarded.received-too-late.counter32)), 'add-to-saved')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh.. i love a clean dashboard json diff like this done.
was it a lot of work? I suspect you either did a manual json edit or had to do lots of git add -p
or git checkout -p
if you exported out of grafana.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This time I did manual json edit. Last time I did an export of grafana and managed to get a decent diff.
@fkaleo : this looks solid! nice work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
New Carbon metric 'tank.discarded.new-value-for-timestamp'.
Prometheus metric 'discarded_samples_total' has a new reason 'new-value-for-timestamp'.
Test dashboard 'Fakemetrics - discarded samples' plots both.
In order to be consistent with the Prometheus metrics, renamed carbon metrics:
Fixes #1201, fixes #1202, fixes #1203