-
Notifications
You must be signed in to change notification settings - Fork 107
Conversation
this is an old optimization (?) that has been with us since a long time ago: #74 2029113 here's how it caused data loss at read time: - when only 1 chunk of data had been filled: the "update" of the field is a no-op because len(chunks) == 1, so oldPos goes back to 0 (not sure if intentional or a bug) so reading the first chunk worked. - once you have more than 1 chunk: update of oldPos works. we start hitting cassandra. depending on how long the chunk takes to get saved to cassandra, we will miss data at read time. also, our chunk cache does not cache absence of data, hitting cassandra harder during this period. - once the chunk is saved to cassandra the problem disappears - once the circular buffer recycles the first time (effectively removing the first chunk) this optimization no longer applies, but at that point we still hit cassandra just as before. This problem is now solved. However, removing that code enables another avenue for data loss at read time: - when a read node starts (without data backfill) - or a read node starts with data backfill, but the backfill doesn't have old data for the particular metric, IOW when the data only covers 1 chunk's worth - a read node starts with data backfill, but since backfilling starts at arbitrary positions, the first chunk will miss some data in the beginning. In all these cases, the first chunk is a partial chunk, whereas a full version of the chunk is most likely already in cassandra. To make sure this is not a problem, if the first chunk we used was partial, we set oldest to the first timestamp, so that the rest can be retrieved from cassandra. Typically, this will cause the "same" chunk (but a full version) to be retrieved from cassandra, which is then cached and seamlessly merged via Fix() fix #78 fix #988
2d12b85
to
0a59537
Compare
@@ -8,5 +8,5 @@ import ( | |||
type Result struct { | |||
Points []schema.Point | |||
Iters []chunk.Iter | |||
Oldest uint32 | |||
Oldest uint32 // timestamp of oldest point we have, to know when and when not we may need to query slower storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to know when and when not we may need
is a bit hard to parse. Maybe
to know whether we need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -342,8 +331,13 @@ func (a *AggMetric) Get(from, to uint32) (Result, error) { | |||
} | |||
} | |||
|
|||
if oldestChunk.First { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems unnecessary that each chunk needs to have this First
bool. Can you use the t0
and LastTs
to be able to tell whether firstTs
falls in this range? Something like a ContainsTimestamp
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good idea. I had been thinking for a while what was the simplest/lowest-overhead way to do this (ideally introducing only 1 new attribute instead of 2)
But note that the First
field should have no overhead. it's a bool so fits in the space that was previously padded space (as we already had a bool on chunk.Chunk) so i still like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's less about the space and more about the branching of chunk types, when from a Chunk
s perspective it is just a Chunk
and has no concept of First
ness. Like, if Chunk
was just a slice, it would be weird for the First
member to live with the slice, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
funny, i also thought about this. but what I realized is that chunk.Chunk is already not a pure chunk, rather it is a "chunk as embedded in an AggMetric in metrictank", what i mean is it already has a bunch of fields for other uses within metrictank besides mere chunkness. so one more doesn't change anything IMHO.
that said at some point i want to refactor this to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't bother you that's ok. But I'll point out that the other elements of the Chunk
are actually properties relevant to a Chunk
s internals. First
will be the first/only property that is actually about a state outside of Chunk
, and that's what makes it stand out as awkward to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. you're looking at it from a "field describes which state" perspective. I was looking at it "field is used for what/by what" perspective.
using the former perspective generally leads to cleaner code I think. And you're right, it looks a bit more awkward to me too now. But it's the most efficient solution so until we refactor I think it's OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, but nothing objectionable enough to block.
thx for reviewing Sean! |
this is an old optimization (?) that has been with us
since a long time ago: #74
2029113
here's how it caused data loss at read time:
the "update" of the field is a no-op
because len(chunks) == 1, so oldPos goes back to 0
(not sure if intentional or a bug) so reading the
first chunk worked.
we start hitting cassandra.
depending on how long the chunk takes to get saved
to cassandra, we will miss data at read time.
also, our chunk cache does not cache absence of data,
hitting cassandra harder during this period.
removing the first chunk) this optimization no longer applies,
but at that point we still hit cassandra just as before.
This problem is now solved. However, removing that code
enables another avenue for data loss at read time:
doesn't have old data for the particular metric, IOW
when the data only covers 1 chunk's worth
at arbitrary positions, the first chunk will miss some data in the
beginning.
In all these cases, the first chunk is a partial chunk, whereas
a full version of the chunk is most likely already in cassandra.
To make sure this is not a problem, if the first chunk we used was
partial, we set oldest to the first timestamp, so that the rest
can be retrieved from cassandra.
Typically, this will cause the "same" chunk (but a full version)
to be retrieved from cassandra, which is then cached and seamlessly
merged via Fix()
fix #78
fix #988