-
Notifications
You must be signed in to change notification settings - Fork 107
When ingesting data from the future we use a TTL that's higher than the defined TTL #1560
Comments
I am not sure that we should make this change. eg. lets say we have a retention policy of 10s:7day,1m:30d and a user requests data for Even if the data for this series was ingested with a timestamp that was 2days into the future, the query engine still expects the 10s resolution data to be available up to now-7d. |
The issue is with Cassandra and TWCS (which is the recommended strategy for this type of data). Any datapoints written at the same time to the same table get compacted together. The sstables can only get cleaned up when all data is tombstones (either by delete or TTL). Writing future data with longer TTLs will prevent deletion from disk (possibly for far longer than the intended TTL) As a concrete example, if writing 10TB / day of data with an intended TTL of If even a tiny fraction of data is written with, for example, a time 1 year in the future then the TTL would mean these SStables will longer for a year and use ~52 times the storage. Because metrictank sets the window length for TWCS and TTL range per table, I think it needs to play by those rules |
Changing the TTL to be anything other then relative to the datapoint timestamps is not an option due to the requirement of the query engine to know which rollups have data for requested query time ranges. To protect against the issue of datapoints with future timestamps breaking TWCS, metrictank should have a config option to simply reject datapoints that have a timestamp greater than TWCS only works when datapoints are written with timestamps that are close to NOW(). If users need to support workloads that do something different, then a different compaction strategy needs to be used. |
Except this is how it worked for a very long time. This change actively endangered a large production setup, and possibly others.
Is limit based on the retention of the destination or some statically configured number? If a table is a 10 year table, it would be less sensitive to data 30 days in the future than a 1 day retention table.
Metrictank creates the table based on a schema skeleton. Metrictank is what passes the window size with the assumption that the compaction strategy is TWCS. Basically, MT doesn't really support other compaction strategies for cassandra. At the end of the day, all I want is protection from this issue. I'm ok with an option of rejecting data that would be too far in the future for the given retention policy. |
One problem with a future-tolerance-limit which is defined per storage schema is that it will effectively be applied per table. If two schemas have an aggregation with the same TTL, then this means that the deletion of the SSTables may be delayed by the larger future-tolerance-limit of the two schemas. This is confusing to users, because the ability to define the future tolerance per schema looks is if it were freely configurable, while actually it isn't. |
Rejecting future data makes me sad. I would rather not suddenly cut out a very valid use case (forecasting) to address a concern about suboptimal compaction. What if series that send future data go into separate tables? This raises two questions: how does the query engine know, for any given serie, which table to query?There's 2 solutions:
how well does Cassandra handle the increase in number of tables?I imagine this is probably not a big deal. |
The idea with using a distinct table for future data, assuming that on this one compaction will be inefficient, is interesting. It appears to get very complicated though, especially since we'll probably have to be able to deal with both cases The 2nd suggestion sounds good to me, to just allow the user to assign specific patterns to separate tables via |
I agree, but really this was just recently supported in the first place (#1448) and it caused non-deterministic TTL behavior. A solution that still partially supports and still allows controllable retention is at least a good mid-step. As an admin of a large mixed-use cluster, it's very difficult to control who/when forecasted data is going to be inserted into the system. |
I do want to address the compaction problems, of course! All I am saying is, I don't want it to be at the cost of forecasting. Perhaps we can merge something like @replay 's proposed solution now as a stop-gap. But:
For 2 it may be enough to just have a mode that doesn't enforce the rule yet, but increments a metric when we would have dropped the data if the mode would have been active. Based on that we can see what the impact would be before we enforce it. (@replay 's idea). I think i like it. |
I like the idea of users being able to specify a table name for specific retention policies. However, there is no direct relationship between storage-schemas and the backend store where data is persisted. ie the "table name" data is stored in as only specific to the Cassandra store plugin. The Bigtable store plugin does have the concept of a table_name but it means something very different. so we would need to handle this in a more generic way, eg with a 'storeConfig' field in the storage-schemas config that can be passed to the backend store plugins directly. e.g. something like:
There is a lot of complexity involved in implementing this though. IMHO forecasted data is such a niche use case that we really dont need to invest any additional effort. With the proposed eg, if you want to allow datapoints that have timestamps 14days into the future, and the future-tolerance-ratio is set to 10, then the retention policy for the forecasted needs to be:
|
the numbers on our cloud platform confirm this. I have looked at the timestamp deltas for all of GrafanaCloud Graphite, and most instances have deltas of up to 30/60s, a couple around 8/10 minutes, and some large customers in-us east send 1h~3h ahead, to varying degrees of consistency. so possiby bugs or poorly configured clocks on their side. |
When ingesting data from the future, the effective TTL with which the data gets inserted into Cassandra is
ttl + (timestamp - now)
. This is due to the changes in #1448This means that for data from the future expiration based on the defined
storage-schemas.conf
is broken.F.e. if a datapoint has a timestamp 5 days in the future and gets inserted into a table with 30d TTL then it effectively gets a TTL of 35d. This will delay the removal of the according SSTable by an additional 5 days.
We should limit the calculated
relativeTTL
(https://github.com/grafana/metrictank/blob/master/store/cassandra/cassandra.go#L368) which has been introduced in #1448 to the defined table TTL. (if relativeTTL > ttl; relativeTTL = ttl;
).Thx for reporting this issue @shanson7
The text was updated successfully, but these errors were encountered: