Skip to content

Commit e10ec2e

Browse files
committed
Long-term store: Add "Tools" section, bundling data retention utilities
- Airflow-based data retention - CTK-based data retention
1 parent df149bb commit e10ec2e

File tree

1 file changed

+40
-7
lines changed

1 file changed

+40
-7
lines changed

docs/solution/longterm/index.md

Lines changed: 40 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,43 @@ Set up CrateDB as a long-term observability backend for OpenTelemetry.
7373

7474
::::
7575

76+
## Tools
77+
78+
### Automatic retention and expiration
79+
80+
When operating a system storing and processing large amounts of data,
81+
it is crucial to manage data flows and life-cycles well, which includes
82+
handling concerns of data expiry, size reduction, and archival.
83+
84+
Optimally, corresponding tasks are automated rather than manually
85+
performed. CrateDB provides relevant integrations and standalone
86+
applications for automatic data retention purposes.
87+
88+
:::{rubric} Apache Airflow
89+
:::
90+
91+
{ref}`Build a hot/cold storage data retention policy <airflow-data-retention-hot-cold>`
92+
describes how to manage aging data by leveraging CrateDB cluster
93+
features to mix nodes with different hardware setups, i.e. hot
94+
nodes using the latest generation of NVMe drives for responding
95+
to analytics queries quickly, and cold nodes that have access to
96+
cheap mass storage for retaining historic data.
97+
98+
:::{rubric} CrateDB Toolkit
99+
:::
100+
101+
[CrateDB Toolkit Retention and Expiration] is a data retention and
102+
expiration policy management system for CrateDB, providing multiple
103+
retention strategies.
104+
105+
:::{note}
106+
The system derives its concepts from [InfluxDB data retention] ideas and
107+
from the {ref}`Airflow-based data retention tasks for CrateDB <airflow-data-retention-policy>`,
108+
but aims to be usable as a standalone system in different software environments.
109+
Effectively, it is a Python library and CLI around a policy management
110+
table defined per [retention-policy-ddl.sql].
111+
:::
112+
76113
## Related sections
77114

78115
{ref}`metrics-store` includes information about how to
@@ -88,16 +125,12 @@ illustrates how to reduce table storage size by 80%,
88125
by using arrays for time-based bucketing, a historical table having
89126
a dedicated layout, and querying using the UNNEST table function.
90127

91-
{ref}`Build a hot/cold storage data retention policy <airflow-data-retention-hot-cold>`
92-
describes how to manage aging data by leveraging CrateDB cluster
93-
features to mix nodes with different hardware setups, i.e. hot
94-
nodes using the latest generation of NVMe drives for responding
95-
to analytics queries quickly, and cold nodes that have access to
96-
cheap mass storage for retaining historic data.
97-
98128
{ref}`weather-data-storage` provides information about how to
99129
use CrateDB for mass storage of synoptic weather observations,
100130
allowing you to query them efficiently.
101131

102132

133+
[CrateDB Toolkit Retention and Expiration]: https://cratedb-toolkit.readthedocs.io/retention.html
134+
[InfluxDB data retention]: https://docs.influxdata.com/influxdb/v1/guides/downsample_and_retain/
103135
[Optimizing storage efficiency for historic time series data]: https://community.cratedb.com/t/optimizing-storage-for-historic-time-series-data/762
136+
[retention-policy-ddl.sql]: https://github.com/crate/cratedb-toolkit/blob/main/cratedb_toolkit/retention/setup/schema.sql

0 commit comments

Comments
 (0)