Streaming collection of hypertable invalidations for continuous aggregates

## Background

When using WAL-based invalidation collection, we are collecting invalidations by reading the hypertable changes from the WAL and computing buckets for all continuous aggregates associated with a hypertable. These are currently collected in an in-memory RANGES structure, and when hitting the limit `cagg_processing_high_work_mem` we will start flushing the existing ranges until we reach `cagg_processing_low_work_mem`. This ensures that we cannot exceed the allocated working memory for the processor. 

The hypertable changes are read on refresh of a continuous aggregate, on an explicit call of `process_hypertable_invalidations`, or on execution of the policy added using `add_process_hypertable_invalidations_policy`.

## Problem

Since `max_slot_wal_keep_size` is set to 5 GiB in TimescaleDB cloud, it means that we can run out of WAL and have it truncated under our feet. Since we cannot read these records, it is impossible to decide what is invalid and we have to assume the worst and invalidate the entire range, which would trigger a "full refresh" (the entire refresh window of all continuous aggregate policies will be refreshed next time).

## Solution outline

To deal with this, we can continuously read the WAL and collect all new changes to in-memory RANGES structure and flush and advance the WAL at regular intervals. Because we are reading the entries as they are written, we cannot miss any, but we need to make sure that we only advance the slot once we have persisted the state to the materialization table.

Add new GUC `cagg_processing_high_wal_keep_size` with a default of (say) `0.8 * max_slot_wal_keep_size`. The default value is arbitrarily picked but it should be selected so that we cannot exceed `max_slot_wal_keep_size` while flushing the RANGES structure.

1. Read a row from the WAL
2. Extract partition hypertable and partition column value from row
3. For each continuous aggregate associated with hypertable
    a. Compute bucket range for partition column value
    b. Add range to RANGES
    c. If `cagg_processing_high_work_mem` or `cagg_processing_high_wal_keep_size` is exceeded, flush RANGES to materialization range and advance slot to last read position.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming collection of hypertable invalidations for continuous aggregates #8570

Background

Problem

Solution outline

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Streaming collection of hypertable invalidations for continuous aggregates #8570

Description

Background

Problem

Solution outline

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions