Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

calculate TTL relative to now when inserting into cassandra #1448

Merged
merged 5 commits into from
Aug 30, 2019

Conversation

replay
Copy link
Contributor

@replay replay commented Aug 29, 2019

Calculate the TTL relative to now before inserting into cassandra. If it is <=0 then we just skip the insert.

@replay
Copy link
Contributor Author

replay commented Aug 29, 2019

I have tested the latest version like this:

  • Spun up a Metrictank instance with an empty Cassandra cluster and this schema:
    1s:6h:2min:2,1min:2d:6h:1
  • Fed it with fakemetrics like this:
    fakemetrics backfill --offset 72h --speedup 10000 --kafka-mdm-addr kafka:9092 --mpo 1
    note that this offset goes back further into the past than the largest aggregate of the storage schema
  • Selected the data from Cassandra:
cqlsh:metrictank> select key, ts, TTL(data) from metric_32;

 key                                           | ts         | ttl(data)
-----------------------------------------------+------------+-----------
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1567080000 |    157484
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1567058400 |    135884
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1567036800 |    114284
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1567015200 |     92684
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1566993600 |     71084
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1566972000 |     49484
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1566950400 |     27884
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1566928800 |      6284
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1567080000 |    157484
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1567058400 |    135884
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1567036800 |    114284
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1567015200 |     92684
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1566993600 |     71084
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1566972000 |     49484
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1566950400 |     27884
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1566928800 |      6284
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1567080000 |    157484
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1567058400 |    135884
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1567036800 |    114284
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1567015200 |     92684
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1566993600 |     71084
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1566972000 |     49484
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1566950400 |     27884
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1566928800 |      6284
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1567080000 |    157484
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1567058400 |    135884
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1567036800 |    114284
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1567015200 |     92684
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1566993600 |     71084
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1566972000 |     49484
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1566950400 |     27884
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1566928800 |      6284

(32 rows)

There are 2 things to note in this cassandra output:

  1. The total number of chunks is 32. If all the data that has been fed by fakemetrics would have been stored then there should be 72h (offset) / 6h (chunkspan) * 4 (nr aggregates) = 48. But because for 16 of them the generated TTL was <=0 their insert got omitted.
  2. The TTLs shown in the last row are making sense considering that they should:
  • All be in the range 0 - 48 * 3600 (172800)
  • Each aggregation (min/max/cnt/sum) should have one of each unique TTL value.
    F.e:
    1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 has one with 157484
    1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 also has one with 157484
  • The distances between the TTLs of each aggregate should be 6 * 3600 = 21600.
    157484 - 135884 = 21600
    135884 - 114284 = 21600
    114284 - 92684 = 21600
    ...

@replay replay requested a review from woodsaj August 29, 2019 22:28
@replay replay changed the title [WIP] calculate timestamp when inserting into cassandra calculate timestamp when inserting into cassandra Aug 29, 2019
@replay replay changed the title calculate timestamp when inserting into cassandra calculate TTL relative to now when inserting into cassandra Aug 29, 2019
// - the timestamp of the last datapoint + ttl is the timestamp until when we want to keep this chunk
// - then we subtract the current time stamp to get the difference relative to now
// - the result is the ttl in seconds relative to now
relativeTtl := int64(t0+mdata.MaxChunkSpan()+ttl) - time.Now().Unix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with using MaxChunkSpan is that it makes the minimum ttl of data at MaxChunkSpan. eg, if you want raw data stored for 1h, but also have rollups being stored for longer and those rollups use a chunkSpan of 6h, then the raw data will be stored for a lot longer then intended.

This is not a concern for any of our use cases, but might cause problems for other users and perhaps end2end tests.

Copy link
Contributor Author

@replay replay Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid concern, but I'm not sure how to improve that. There are basically two possibilities:

  1. we read the data byte slice to determine the chunk span. this should only require reading the first 2 bytes, it can be done like here: https://github.com/grafana/metrictank/blob/master/mdata/chunk/itergen.go#L76
  2. we add a span property to the chunk write request's payload. but this will require updating all locations that generate chunk write requests.

I'm leaning towards 1)

Copy link
Contributor Author

@replay replay Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given 1) a try

Copy link
Contributor Author

@replay replay Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran another test with the latest version. I've started MT with an empty cassandra instance and this schema:

1s:1h:10min:1,1min:48h:6h:1

Then I fed it with 72h of data for one metric. It created two tables for the two different TTLs metric_1/metric_32. All the TTLs in these two tables look like what we want. In each table the TTLs are nicely adjusted to the span:

cqlsh:metrictank> select key, ts, TTL(data) from metric_1;

 key                                    | ts         | ttl(data)
----------------------------------------+------------+-----------
 1.d0a8110e69b0d874610aa08ab6740dfa_647 | 1567179600 |      3815
 1.d0a8110e69b0d874610aa08ab6740dfa_647 | 1567179000 |      3215
 1.d0a8110e69b0d874610aa08ab6740dfa_647 | 1567178400 |      2615
 1.d0a8110e69b0d874610aa08ab6740dfa_647 | 1567177800 |      2015
 1.d0a8110e69b0d874610aa08ab6740dfa_647 | 1567177200 |      1415
 1.d0a8110e69b0d874610aa08ab6740dfa_647 | 1567176600 |       815
 1.d0a8110e69b0d874610aa08ab6740dfa_647 | 1567176000 |       215

(7 rows)
cqlsh:metrictank> select key, ts, TTL(data) from metric_32;

 key                                           | ts         | ttl(data)
-----------------------------------------------+------------+-----------
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1567144800 |    159128
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1567123200 |    137528
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1567101600 |    115928
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1567080000 |     94328
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1567058400 |     72728
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1567036800 |     51128
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1567015200 |     29528
 1.d0a8110e69b0d874610aa08ab6740dfa_sum_60_647 | 1566993600 |      7928
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1567144800 |    159128
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1567123200 |    137528
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1567101600 |    115928
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1567080000 |     94328
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1567058400 |     72728
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1567036800 |     51128
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1567015200 |     29528
 1.d0a8110e69b0d874610aa08ab6740dfa_min_60_647 | 1566993600 |      7928
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1567144800 |    159128
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1567123200 |    137528
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1567101600 |    115928
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1567080000 |     94328
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1567058400 |     72728
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1567036800 |     51128
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1567015200 |     29528
 1.d0a8110e69b0d874610aa08ab6740dfa_cnt_60_647 | 1566993600 |      7928
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1567144800 |    159128
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1567123200 |    137528
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1567101600 |    115928
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1567080000 |     94328
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1567058400 |     72728
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1567036800 |     51128
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1567015200 |     29528
 1.d0a8110e69b0d874610aa08ab6740dfa_max_60_647 | 1566993600 |      7928

(32 rows)

Copy link
Member

@woodsaj woodsaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Other then maybe renaming SpanOfChunk to ExtractChunkSpan

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants