@@ -73,6 +73,43 @@ Set up CrateDB as a long-term observability backend for OpenTelemetry.
7373
7474::::
7575
76+ ## Tools
77+
78+ ### Automatic retention and expiration
79+
80+ When operating a system storing and processing large amounts of data,
81+ it is crucial to manage data flows and life-cycles well, which includes
82+ handling concerns of data expiry, size reduction, and archival.
83+
84+ Optimally, corresponding tasks are automated rather than manually
85+ performed. CrateDB provides relevant integrations and standalone
86+ applications for automatic data retention purposes.
87+
88+ :::{rubric} Apache Airflow
89+ :::
90+
91+ {ref}` Build a hot/cold storage data retention policy <airflow-data-retention-hot-cold> `
92+ describes how to manage aging data by leveraging CrateDB cluster
93+ features to mix nodes with different hardware setups, i.e. hot
94+ nodes using the latest generation of NVMe drives for responding
95+ to analytics queries quickly, and cold nodes that have access to
96+ cheap mass storage for retaining historic data.
97+
98+ :::{rubric} CrateDB Toolkit
99+ :::
100+
101+ [ CrateDB Toolkit Retention and Expiration] is a data retention and
102+ expiration policy management system for CrateDB, providing multiple
103+ retention strategies.
104+
105+ :::{note}
106+ The system derives its concepts from [ InfluxDB data retention] ideas and
107+ from the {ref}` Airflow-based data retention tasks for CrateDB <airflow-data-retention-policy> ` ,
108+ but aims to be usable as a standalone system in different software environments.
109+ Effectively, it is a Python library and CLI around a policy management
110+ table defined per [ retention-policy-ddl.sql] .
111+ :::
112+
76113## Related sections
77114
78115{ref}` metrics-store ` includes information about how to
@@ -88,16 +125,12 @@ illustrates how to reduce table storage size by 80%,
88125by using arrays for time-based bucketing, a historical table having
89126a dedicated layout, and querying using the UNNEST table function.
90127
91- {ref}` Build a hot/cold storage data retention policy <airflow-data-retention-hot-cold> `
92- describes how to manage aging data by leveraging CrateDB cluster
93- features to mix nodes with different hardware setups, i.e. hot
94- nodes using the latest generation of NVMe drives for responding
95- to analytics queries quickly, and cold nodes that have access to
96- cheap mass storage for retaining historic data.
97-
98128{ref}` weather-data-storage ` provides information about how to
99129use CrateDB for mass storage of synoptic weather observations,
100130allowing you to query them efficiently.
101131
102132
133+ [ CrateDB Toolkit Retention and Expiration ] : https://cratedb-toolkit.readthedocs.io/retention.html
134+ [ InfluxDB data retention ] : https://docs.influxdata.com/influxdb/v1/guides/downsample_and_retain/
103135[ Optimizing storage efficiency for historic time series data ] : https://community.cratedb.com/t/optimizing-storage-for-historic-time-series-data/762
136+ [ retention-policy-ddl.sql ] : https://github.com/crate/cratedb-toolkit/blob/main/cratedb_toolkit/retention/setup/schema.sql
0 commit comments