Track number of discarded samples while committing in TSDB#7213
Track number of discarded samples while committing in TSDB#7213pracucci wants to merge 2 commits intoprometheus:masterfrom
Conversation
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
|
Hmm, now that we have isolation should we be not writing these to the WAL at all? |
codesome
left a comment
There was a problem hiding this comment.
I can see this being useful. This is not to be confused with the metric being added for out of order in #6679 - which counts out of order samples in both AddFast and Commit(), while the out of order in AddFast does not mean discarded silently.
|
|
||
| if !ok { | ||
| total-- | ||
| discarded++ |
There was a problem hiding this comment.
We can get rid of total variable and use len(a.samples)-discarded when adding the metric, looks cleaner.
I had the same thought. But do we want things to be logged in the WAL even before we start to append in the memory? |
|
If users can't seem them due to isolation, does it matter? |
|
If we log after appending to memory, and if the log fails, then users can still see unless we remove it from the memory. In that case, a restart will make those samples disappear. Another way to do this would be to identify the potential out of order in the |
That'd only reduce the chances of it, it could still in theory happen. |
|
Any suggestions on skipping WAL for those samples? (assuming we want to log before putting it in the memory) |
|
I don't see a way to make it work. |
|
Sounds like a metric then |
Today, while supporting a customer, we realised there's a condition under which out of order samples may be written to the WAL (and thus sent to the remote write) but then silently discarded by TSDB.
We observed this issue with clashing series (container name label was missing) scraped from cAdvisor. We've noticed cAdvisor exposes metrics with the timestamp but, due to how it works, the timestamp may be different between different series exposed during a single scrape.
In the case of clashing series, samples for the same series but different timestamp may be appended during a single scrape. When this happens, such out of order samples are silently ignored during the
headAppender.Commit()but are written to the WAL anyway and thus sent to the remote write:prometheus/tsdb/head.go
Lines 1059 to 1061 in 532f7bb
In this PR, I'm suggesting to add a metric to keep track of such silently discarded samples so that when we see out of order samples received by the remote write (ie. Cortex) we would have a way to check if we hit this case.
Internal details
memSeries.append()is called within theheadAppender.Commit(). If thememSeries.append()fails because of "Out of order sample" it returnsfalseand the sample discarded, but there's no metrics tracking it and Prometheus just silently moves on.This issue is triggered by the fact that
c.maxTimeis updated inmemSeries.append()(again, called duringheadAppender.Commit()) and so thestorage.ErrOutOfOrderSampleis not returned bymemSeries.appendable()if the out of order occurs within a single scrape/transaction.However, records are logged in the WAL within the
headAppender.Commit()but beforememSeries.append()and this may lead to out of order samples being written to the WAL (and thus sent to the remote write) but then silently discarded while committing to TSDB.