Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

read from first chunks #994

Merged
merged 2 commits into from
Aug 14, 2018
Merged

read from first chunks #994

merged 2 commits into from
Aug 14, 2018

Conversation

Dieterbe
Copy link
Contributor

@Dieterbe Dieterbe commented Aug 14, 2018

this is an old optimization (?) that has been with us
since a long time ago: #74
2029113

here's how it caused data loss at read time:

  • when only 1 chunk of data had been filled:
    the "update" of the field is a no-op
    because len(chunks) == 1, so oldPos goes back to 0
    (not sure if intentional or a bug) so reading the
    first chunk worked.
  • once you have more than 1 chunk: update of oldPos works.
    we start hitting cassandra.
    depending on how long the chunk takes to get saved
    to cassandra, we will miss data at read time.
    also, our chunk cache does not cache absence of data,
    hitting cassandra harder during this period.
  • once the chunk is saved to cassandra the problem disappears
  • once the circular buffer recycles the first time (effectively
    removing the first chunk) this optimization no longer applies,
    but at that point we still hit cassandra just as before.

This problem is now solved. However, removing that code
enables another avenue for data loss at read time:

  • when a read node starts (without data backfill)
  • or a read node starts with data backfill, but the backfill
    doesn't have old data for the particular metric, IOW
    when the data only covers 1 chunk's worth
  • a read node starts with data backfill, but since backfilling starts
    at arbitrary positions, the first chunk will miss some data in the
    beginning.

In all these cases, the first chunk is a partial chunk, whereas
a full version of the chunk is most likely already in cassandra.

To make sure this is not a problem, if the first chunk we used was
partial, we set oldest to the first timestamp, so that the rest
can be retrieved from cassandra.
Typically, this will cause the "same" chunk (but a full version)
to be retrieved from cassandra, which is then cached and seamlessly
merged via Fix()

fix #78
fix #988

@Dieterbe Dieterbe requested review from shanson7 and woodsaj August 14, 2018 07:56
this is an old optimization (?) that has been with us
since a long time ago: #74
2029113

here's how it caused data loss at read time:
- when only 1 chunk of data had been filled:
  the "update" of the field is a no-op
  because len(chunks) == 1, so oldPos goes back to 0
  (not sure if intentional or a bug) so reading the
  first chunk worked.
- once you have more than 1 chunk: update of oldPos works.
  we start hitting cassandra.
  depending on how long the chunk takes to get saved
  to cassandra, we will miss data at read time.
  also, our chunk cache does not cache absence of data,
  hitting cassandra harder during this period.
- once the chunk is saved to cassandra the problem disappears
- once the circular buffer recycles the first time (effectively
  removing the first chunk) this optimization no longer applies,
  but at that point we still hit cassandra just as before.

This problem is now solved. However, removing that code
enables another avenue for data loss at read time:
- when a read node starts (without data backfill)
- or a read node starts with data backfill, but the backfill
  doesn't have old data for the particular metric, IOW
  when the data only covers 1 chunk's worth
- a read node starts with data backfill, but since backfilling starts
  at arbitrary positions, the first chunk will miss some data in the
  beginning.

In all these cases, the first chunk is a partial chunk, whereas
a full version of the chunk is most likely already in cassandra.

To make sure this is not a problem, if the first chunk we used was
partial, we set oldest to the first timestamp, so that the rest
can be retrieved from cassandra.
Typically, this will cause the "same" chunk (but a full version)
to be retrieved from cassandra, which is then cached and seamlessly
merged via Fix()

fix #78
fix #988
@Dieterbe Dieterbe force-pushed the read-from-first-chunk branch from 2d12b85 to 0a59537 Compare August 14, 2018 08:17
@@ -8,5 +8,5 @@ import (
type Result struct {
Points []schema.Point
Iters []chunk.Iter
Oldest uint32
Oldest uint32 // timestamp of oldest point we have, to know when and when not we may need to query slower storage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to know when and when not we may need

is a bit hard to parse. Maybe

to know whether we need

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -342,8 +331,13 @@ func (a *AggMetric) Get(from, to uint32) (Result, error) {
}
}

if oldestChunk.First {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems unnecessary that each chunk needs to have this First bool. Can you use the t0 and LastTs to be able to tell whether firstTs falls in this range? Something like a ContainsTimestamp function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good idea. I had been thinking for a while what was the simplest/lowest-overhead way to do this (ideally introducing only 1 new attribute instead of 2)

But note that the First field should have no overhead. it's a bool so fits in the space that was previously padded space (as we already had a bool on chunk.Chunk) so i still like it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's less about the space and more about the branching of chunk types, when from a Chunks perspective it is just a Chunk and has no concept of Firstness. Like, if Chunk was just a slice, it would be weird for the First member to live with the slice, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funny, i also thought about this. but what I realized is that chunk.Chunk is already not a pure chunk, rather it is a "chunk as embedded in an AggMetric in metrictank", what i mean is it already has a bunch of fields for other uses within metrictank besides mere chunkness. so one more doesn't change anything IMHO.

that said at some point i want to refactor this to clarify this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't bother you that's ok. But I'll point out that the other elements of the Chunk are actually properties relevant to a Chunks internals. First will be the first/only property that is actually about a state outside of Chunk, and that's what makes it stand out as awkward to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. you're looking at it from a "field describes which state" perspective. I was looking at it "field is used for what/by what" perspective.
using the former perspective generally leads to cleaner code I think. And you're right, it looks a bit more awkward to me too now. But it's the most efficient solution so until we refactor I think it's OK.

Copy link
Collaborator

@shanson7 shanson7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, but nothing objectionable enough to block.

@Dieterbe
Copy link
Contributor Author

thx for reviewing Sean!

@Dieterbe Dieterbe merged commit b6711ab into master Aug 14, 2018
@Dieterbe Dieterbe deleted the read-from-first-chunk branch September 18, 2018 09:09
@Dieterbe Dieterbe added this to the 0.10.0 milestone Dec 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants