Skip to content

Make TSDB idle-flush more dynamic by compacting only time range that cannot receive any samples. #3832

@pstibrany

Description

@pstibrany

Under some conditions, tenant is pushing data to the Cortex such that all samples are rejected at ingester level due to being out of bounds. Normally such push requests get 4xx error, but once in a while when force-compaction of TSDB is ongoing (due to idle TSDB), those push requests get 503 error instead (as implemented by #3422).

To avoid getting 503 during force-compaction, one possibility would be to use reject_old_samples_max_age to compute when TSDB is idle. This is a limit in distributor, which rejects old samples, without even sending them to ingester. Ingester could use reject_old_samples_max_age value to determine the time range when flushing idle TSDB.

For example, if reject_old_samples_max_age is set to 1h, and block range is 2h, then idle-flush could compact head up to "floor(now-1h / 2h) * 2h" time. With time ranges [4-6) [6-8) [8-10), at time, 6:30 we cannot yet idle-flush block [4-6), since there can still be samples sent to it. However at time 7:10, no samples can be appended to block [4-6) anymore (due to 1h reject_old_samples_max_age), so it can be flushed.

All this applies only when TSDB is flushed due to being idle. Under normal conditions when samples are appended all the time, regular compaction handles this automatically. Also flushing requested by user should flush ALL samples, regardless of reject_old_samples_max_age.

Default value for reject_old_samples_max_age is 14 days. To activate described behavior, reject_old_samples_max_age needs to be much smaller (eg. 1h, which works better with blocks engine), and it may be good idea to put it behind a feature flag, or use value separate from reject_old_samples_max_age in the first place, without option to have per-tenant override.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions