diff --git a/api/_template.md b/api/_template.md index efc55431ffbc..58f71a94f3b1 100644 --- a/api/_template.md +++ b/api/_template.md @@ -11,7 +11,7 @@ Use a highlight for any important information. Choose `note`, `important`, or `w --> -## Required Arguments +## Required arguments |Name|Type|Description| |-|-|-| @@ -19,7 +19,7 @@ Use a highlight for any important information. Choose `note`, `important`, or `w -### Optional Arguments +### Optional arguments |Name|Type|Description| |-|-|-| @@ -35,7 +35,7 @@ Use a highlight for any important information. Choose `note`, `important`, or `w -## Sample Usage +## Sample usage ``` sql diff --git a/api/approx_count_distincts.md b/api/approx_count_distincts.md new file mode 100644 index 000000000000..d924b25bee06 --- /dev/null +++ b/api/approx_count_distincts.md @@ -0,0 +1,20 @@ +# Approximate count distincts +This section includes functions related to approximating distinct counts. +Approximate count distincts are used to find the number of unique values, or +cardinality, in a large dataset. For more information about approximate count +distinct functions, see the +[hyperfunctions documentation][hyperfunctions-approx-count-distincts]. + +Some hyperfunctions are included in the default TimescaleDB product. For +additional hyperfunctions, you need to install the +[Timescale Toolkit][install-toolkit] PostgreSQL extension. + +|Hyperfunction family|Types|API Calls|Included by default|Toolkit required| +|-|-|-|-|-| +|Approximate count distincts|Hyperloglog|[`hyperloglog`](hyperfunctions/approx_count_distincts/hyperloglog/)|❌|✅| +|||[`rollup`](hyperfunctions/approx_count_distincts/rollup-hyperloglog/)|❌|✅| +|||[`distinct_count`](hyperfunctions/approx_count_distincts/distinct_count/)|❌|✅| +|||[`stderror`](hyperfunctions/approx_count_distincts/stderror/)|❌|✅| + +[hyperfunctions-approx-count-distincts]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/approx-count-distincts/ +[install-toolkit]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/install-toolkit diff --git a/api/approx_percentile.md b/api/approx_percentile.md index 7000bec0100a..f1ce5265a103 100644 --- a/api/approx_percentile.md +++ b/api/approx_percentile.md @@ -9,6 +9,9 @@ approx_percentile( ) RETURNS DOUBLE PRECISION ``` +For more information about percentile approximation functions, see the +[hyperfunctions documentation][hyperfunctions-percentile-approx]. + ## Required arguments |Name|Type|Description| @@ -34,3 +37,6 @@ approx_percentile ------------------- 0.999 ``` + + +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ diff --git a/api/approx_percentile_rank.md b/api/approx_percentile_rank.md index 255ed7aaa1ca..ab891c0d5ff8 100644 --- a/api/approx_percentile_rank.md +++ b/api/approx_percentile_rank.md @@ -1,6 +1,5 @@ -## approx_percentile_rank() Toolkit - -Estimate what percentile a given value would be located at in a UddSketch. +# approx_percentile_rank() Toolkit +Estimate what percentile a given value would be located at in a `UddSketch`. ```SQL approx_percentile_rank( @@ -9,20 +8,25 @@ approx_percentile_rank( ) RETURNS UddSketch ``` -### Required arguments +* For more information about percentile approximation algorithms, see + [advanced aggregation methods][advanced-agg]. +* For more information about percentile approximation functions, see the + [hyperfunctions documentation][hyperfunctions-percentile-approx]. + +## Required arguments |Name|Type|Description| |---|---|---| |`value`|`DOUBLE PRECISION`|The value to estimate the percentile of| -|`sketch`|`UddSketch`|The sketch to compute the percentile on. +|`sketch`|`UddSketch`|The sketch to compute the percentile on| -### Returns +## Returns |Column|Type|Description| |---|---|---| |`approx_percentile_rank`|`DOUBLE PRECISION`|The estimated percentile associated with the provided value| -### Sample usage +## Sample usage ```SQL SELECT @@ -34,3 +38,7 @@ FROM generate_series(0, 100) data; ---------------------------- 0.9851485148514851 ``` + + +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ +[advanced-agg]: /timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/advanced-agg/ diff --git a/api/average.md b/api/average.md index 54e22b09bc45..e31e657fa5b3 100644 --- a/api/average.md +++ b/api/average.md @@ -1,4 +1,4 @@ -## average() Toolkit +# average() Toolkit ```SQL average( @@ -8,16 +8,21 @@ average( A function to compute a time weighted average from a `TimeWeightSummary`. +* For more information about time-weighted average functions, see the + [hyperfunctions documentation][hyperfunctions-time-weight-average]. +* For more information about statistical aggregate functions, see the + [hyperfunctions documentation][hyperfunctions-stats-agg]. + ### Required arguments |Name|Type|Description| -|---|---|---| +|-|-|-| |`tws`|`TimeWeightSummary`|The input TimeWeightSummary from a [`time_weight`](/hyperfunctions/time-weighted-averages/time_weight/) call| ### Returns |Column|Type|Description| -|---|---|---| +|-|-|-| |`average`|`DOUBLE PRECISION`|The time weighted average computed from the `TimeWeightSummary`| ### Sample usage @@ -35,3 +40,6 @@ FROM ( ) t ``` + +[hyperfunctions-time-weight-average]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/time-weighted-averages/ +[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/ diff --git a/api/corr.md b/api/corr.md new file mode 100644 index 000000000000..c7466bde4042 --- /dev/null +++ b/api/corr.md @@ -0,0 +1,48 @@ +# corr() +The correlation coefficient of the least squares fit line of the adjusted +counter value. Given that the slope of a line for any counter value must be +non-negative, this must also always be non-negative and in the range from 0.0 to +1.0. It measures how well the least squares fit the available data, where a +value of 1.0 represents the strongest correlation between time and the counter +increasing. + +```sql +corr( + summary CounterSummary +) RETURNS DOUBLE PRECISION +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +## Returns + +|Name|Type|Description| +|-|-|-| +|corr|DOUBLE PRECISION|The correlation coefficient computed from the least squares fit of the adjusted counter values input to the CounterSummary| + +## Sample usage + +```sql +SELECT + id, + bucket, + corr(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/counter_agg_point.md b/api/counter_agg_point.md new file mode 100644 index 000000000000..2dd7008ee2e4 --- /dev/null +++ b/api/counter_agg_point.md @@ -0,0 +1,65 @@ +# counter_agg() +An aggregate that produces a CounterSummary from timestamps and associated +values. + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|ts|TIMESTAMPZ|The time at each point| +|value|DOUBLE PRECISION|The value at each point to use for the counter aggregate| + +The `value` argument is currently only accepted as a DOUBLE PRECISION number, +because it is the most common type for counters, even though other numeric +types, such as BIGINT, might sometimes be more intuitive. If you store a value +as a different numeric type you can cast to DOUBLE PRECISION on input to the +function. + + +Note that both `ts` and `value` can be NULL, but the aggregate is not evaluated +on NULL values. This means that if the aggregate receives a NULL value, it will +return NULL, it will not return an error. + + +### Optional arguments + +|Name|Type|Description| +|-|-|-| +|bounds|TSTZRANGE|A range of timestamptz| + +The `bounds` argument represents the largest and smallest possible times that +could be input to this aggregate. Calling with NULL, or leaving out the +argument, results in an unbounded `CounterSummary`. Bounds are required for +extrapolation, but not for other accessor functions. + +## Returns + +|Column|Type|Description| +|-|-|-| +|counter_agg|CounterSummary|A CounterSummary object that can be passed to accessor functions or other objects in the counter aggregate API| + + + +## Sample usage +This example produces a CounterSummary from timestamps and associated values. + +``` sql +WITH t as ( + SELECT + time_bucket('1 day'::interval, ts) as dt, + counter_agg(ts, val) AS cs -- get a CounterSummary + FROM foo + WHERE id = 'bar' + GROUP BY time_bucket('1 day'::interval, ts) +) +SELECT + dt, + irate_right(cs) -- extract instantaneous rate from the CounterSummary +FROM t; +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/counter_aggs.md b/api/counter_aggs.md new file mode 100644 index 000000000000..7c2bc496449b --- /dev/null +++ b/api/counter_aggs.md @@ -0,0 +1,33 @@ +# Counter aggregation +This section contains functions related to counter aggregation. Counter +aggregation functions are used to continue accumulating data while ignoring any +interruptions or resets. For more information about counter aggregation +functions, see the [hyperfunctions documentation][hyperfunctions-counter-agg]. + +Some hyperfunctions are included in the default TimescaleDB product. For +additional hyperfunctions, you need to install the +[Timescale Toolkit][install-toolkit] PostgreSQL extension. + +|Hyperfunction family|Types|API Calls|Included by default|Toolkit required| +|-|-|-|-|-| +|Counter aggregation|Counter aggregates|[`counter_agg`](/hyperfunctions/counter_aggs/counter_agg_point/)|❌|✅| +|||[`rollup`](/hyperfunctions/counter_aggs/rollup-counter/)|❌|✅| +|||[`corr`](/hyperfunctions/counter_aggs/corr/)|✅|❌| +|||[`counter_zero_time`](/hyperfunctions/counter_aggs/counter_zero_time/)|✅|❌| +|||[`delta`](/hyperfunctions/counter_aggs/delta/)|✅|❌| +|||[`extrapolated_delta`](/hyperfunctions/counter_aggs/extrapolated_delta/)|✅|❌| +|||[`extrapolated_rate`](/hyperfunctions/counter_aggs/extrapolated_rate/)|✅|❌| +|||[`idelta`](/hyperfunctions/counter_aggs/idelta/)|✅|❌| +|||[`intercept`](/hyperfunctions/counter_aggs/intercept/)|✅|❌| +|||[`irate`](/hyperfunctions/counter_aggs/irate/)|✅|❌| +|||[`num_changes`](/hyperfunctions/counter_aggs/num_changes/)|✅|❌| +|||[`num_elements`](/hyperfunctions/counter_aggs/num_elements/)|✅|❌| +|||[`num_resets`](/hyperfunctions/counter_aggs/num_resets/)|✅|❌| +|||[`rate`](/hyperfunctions/counter_aggs/rate/)|✅|❌| +|||[`slope`](/hyperfunctions/counter_aggs/slope/)|✅|❌| +|||[`time_delta`](/hyperfunctions/counter_aggs/time_delta/)|✅|❌| +|||[`with_bounds`](/hyperfunctions/counter_aggs/with_bounds/)|❌|✅| + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ +[install-toolkit]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/install-toolkit diff --git a/api/counter_zero_time.md b/api/counter_zero_time.md new file mode 100644 index 000000000000..70297bfb1528 --- /dev/null +++ b/api/counter_zero_time.md @@ -0,0 +1,43 @@ +# counter_zero_time() +The time at which the counter value is predicted to have been zero based on the +least squares fit line computed from the points in the CounterSummary. + +```sql +counter_zero_time( + summary CounterSummary +) RETURNS TIMESTAMPTZ +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +## Returns + +|Name|Type|Description| +|-|-|-| +|counter_zero_time|TIMESTAMPTZ|The time at which the counter value is predicted to have been zero based onthe least squares fit of the points input to the CounterSummary| + +## Sample usage + +```sql +SELECT + id, + bucket, + counter_zero_time(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/delta.md b/api/delta.md new file mode 100644 index 000000000000..f426d50ac13c --- /dev/null +++ b/api/delta.md @@ -0,0 +1,42 @@ +# delta() +The change in the counter over the time period. This is the raw or simple delta +computed by accounting for resets and subtracting the last seen value from the +first. + +```sql +delta( + summary CounterSummary +) RETURNS DOUBLE PRECISION +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +## Returns + +|Name|Type|Description| +|-|-|-| +|delta|DOUBLE PRECISION|The delta computed from the CounterSummary| + +## Sample usage + +```sql +SELECT + id, + delta(summary) +FROM ( + SELECT + id, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id +) t +``` + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/distinct_count.md b/api/distinct_count.md new file mode 100644 index 000000000000..1b8c914c5337 --- /dev/null +++ b/api/distinct_count.md @@ -0,0 +1,34 @@ +# distinct_count() ToolkitExperimental +The `distinct_count` function gets the number of distinct values from a +hyperloglog. + +For more information about approximate count distinct functions, see the +[hyperfunctions documentation][hyperfunctions-approx-count-distincts]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|hyperloglog|Hyperloglog|The hyperloglog to extract the count from.| + +## Returns + +|Column|Type|Description| +|-|-|-| +|distinct_count|BIGINT|The number of distinct elements counted by the hyperloglog.| + +## Sample usage +This example retrieves the distinct values from a hyperloglog +called `hyperloglog`: + +``` sql +SELECT toolkit.distinct_count(toolkit.hyperloglog(64, data)) +FROM generate_series(1, 100) data + + distinct_count +---------------- + 114 +``` + + +[hyperfunctions-approx-count-distincts]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/approx-count-distincts/ diff --git a/api/error.md b/api/error.md index efb9b7989945..f3e7bea213be 100644 --- a/api/error.md +++ b/api/error.md @@ -1,31 +1,34 @@ -## error() Toolkit +# error() Toolkit ```SQL error(sketch UddSketch) RETURNS DOUBLE PRECISION ``` This returns the maximum relative error that a percentile estimate will have -(relative to the correct value). This means the actual value will fall in the -range defined by `approx_percentile(sketch) +/- approx_percentile(sketch)*error(sketch)`. +relative to the correct value. This means the actual value falls in the range +defined by `approx_percentile(sketch) +/- approx_percentile(sketch)*error(sketch)`. This function can only be used on estimators produced by -[`percentile_agg()`](/hyperfunctions/percentile-approximation/percentile_agg/) or -[`uddsketch()`](/hyperfunctions/percentile-approximation/percentile-aggregation-methods/uddsketch/) -calls. +[`percentile_agg()`][percentile-agg] or [`uddsketch()`][uddsketch] calls. -### Required arguments +* For more information about percentile approximation algorithms, see + [advanced aggregation methods][advanced-agg]. +* For more information about percentile approximation functions, see the + [hyperfunctions documentation][hyperfunctions-percentile-approx]. + +## Required arguments |Name|Type|Description| -|---|---|---| +|-|-|-| |`sketch`|`UddSketch`|The sketch to determine the error of, usually from a [`percentile_agg()`](/hyperfunctions/percentile-approximation/aggregation-methods/percentile_agg/) call| -### Returns +## Returns |Column|Type|Description| -|---|---|---| +|-|-|-| |`error`|`DOUBLE PRECISION`|The maximum relative error of any percentile estimate| -### Sample usage +## Sample usage ```SQL SELECT error(percentile_agg(data)) @@ -36,3 +39,9 @@ FROM generate_series(0, 100) data; ------- 0.001 ``` + + +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ +[uddsketch]: /hyperfunctions/percentile-approximation/percentile-aggregation-methods/uddsketch/ +[percentile-agg]: /hyperfunctions/percentile-approximation/percentile_agg/ +[advanced-agg]: /timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/advanced-agg/ diff --git a/api/extrapolated_delta.md b/api/extrapolated_delta.md new file mode 100644 index 000000000000..e775e91f26b0 --- /dev/null +++ b/api/extrapolated_delta.md @@ -0,0 +1,59 @@ +# extrapolated_delta() +The change in the counter during the time period specified by the bounds in the +CounterSummary. To calculate the extrapolated delta, any counter resets are +accounted for and the observed values are extrapolated to the bounds using the +method specified, then the values are subtracted to compute the delta. + +The bounds must be specified for the `extrapolated_delta` function to work, the +bounds can be provided in the `counter_agg` call, or by using the `with_bounds` +utility function to set the bounds. + +```sql +extrapolated_delta( + summary CounterSummary, + method TEXT¹ +) RETURNS DOUBLE PRECISION +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| +|method|TEXT|The extrapolation method to use. Not case-sensitive.| + +Currently, the only allowed value of method is `prometheus`, as we have only implemented extrapolation following the Prometheus extrapolation protocol. + +## Returns + +|Name|Type|Description| +|-|-|-| +|extrapolated_delta|DOUBLE PRECISION|The delta computed from the CounterSummary| + +## Sample usage + +```sql +SELECT + id, + bucket, + extrapolated_delta( + with_bounds( + summary, + time_bucket_range('15 min'::interval, bucket) + ) + ) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/extrapolated_rate.md b/api/extrapolated_rate.md new file mode 100644 index 000000000000..485650a3c255 --- /dev/null +++ b/api/extrapolated_rate.md @@ -0,0 +1,59 @@ +# extrapolated_rate() +The rate of change in the counter computed over the time period specified by the +bounds in the CounterSummary, extrapolating to the edges. It is an +`extrapolated_delta` divided by the duration in seconds. + +The bounds must be specified for the `extrapolated_rate` function to work. The +bounds can be provided in the `counter_agg` call, or by using the `with_bounds` +utility function. + +```sql +extrapolated_rate( + summary CounterSummary, + method TEXT¹ +) RETURNS DOUBLE PRECISION +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| +|method|TEXT|The extrapolation method to use. Not case-sensitive.| + +Currently, the only allowed value of method is `prometheus`, as we have only +implemented extrapolation following the Prometheus extrapolation protocol. + +## Returns + +|Name|Type|Description| +|-|-|-| +|extrapolated_rate|DOUBLE PRECISION|The per-second rate of change of the counter computed from the CounterSummary extrapolated to the bounds specified there.| + +## Sample usage + +```sql +SELECT + id, + bucket, + extrapolated_rate( + with_bounds( + summary, + time_bucket_range('15 min'::interval, bucket) + ) + ) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +`` +` + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/gapfilling-interpolation.md b/api/gapfilling-interpolation.md index b5feb5bbe293..6d71ea922824 100644 --- a/api/gapfilling-interpolation.md +++ b/api/gapfilling-interpolation.md @@ -1,2 +1,20 @@ # Gapfilling and interpolation -The functions related to gapfilling and interpolation. +This section contains functions related to gapfilling and interpolation. You can +use a gapfilling function to create additional rows of data in any gaps, +ensuring that the returned rows are in chronological order, and contiguous. For +more information about gapfilling and interpolation functions, see the +[hyperfunctions documentation][hyperfunctions-gapfilling]. + +Some hyperfunctions are included in the default TimescaleDB product. For +additional hyperfunctions, you need to install the +[Timescale Toolkit][install-toolkit] PostgreSQL extension. + +|Hyperfunction family|Types|API Calls|Included by default|Toolkit required| +|-|-|-|-|-| +|Gapfilling and interpolation|Time bucket gapfill|[`time_bucket_gapfill`](/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill/)|❌|✅| +||Last observation carried forward|[`locf`](/hyperfunctions/gapfilling-interpolation/locf/)|✅|❌| +|||[`interpolate`](/hyperfunctions/gapfilling-interpolation/interpolate/)|✅|❌| + + +[hyperfunctions-gapfilling]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/gapfilling-interpolation/ +[install-toolkit]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/install-toolkit diff --git a/api/hyperfunctions.md b/api/hyperfunctions.md index 0f7b5e7aba34..e7867903baaa 100644 --- a/api/hyperfunctions.md +++ b/api/hyperfunctions.md @@ -1,11 +1,15 @@ # Hyperfunctions +Timescale hyperfunctions are a specialized set of functions that allow you to to +analyze time-series data. You can use hyperfunctions to analyze anything you +have stored as time-series data, including IoT devices, IT systems, marketing +analytics, user behavior, financial metrics, and cryptocurrency. -TimescaleDB hyperfunctions are a series of SQL functions within TimescaleDB that -make it easier to manipulate and analyze time-series data in PostgreSQL with -fewer lines of code. You can use hyperfunctions to easily aggregate data into -consistent buckets of time, calculate percentile approximations of data, compute -time-weighted averages, downsample and smooth data, and perform faster COUNT DISTINCT -queries using approximations. +Some hyperfunctions are included in the default TimescaleDB product. For +additional hyperfunctions, you need to install the +[Timescale Toolkit][install-toolkit] PostgreSQL extension. -Hyperfunctions are “easy” to use: you call a hyperfunction using the same SQL -syntax you know and love. \ No newline at end of file +For more information, see the [hyperfunctions documentation][hyperfunctions-howto]. + + +[hyperfunctions-howto]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/ +[install-toolkit]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/install-toolkit/ diff --git a/api/hyperloglog.md b/api/hyperloglog.md new file mode 100644 index 000000000000..8cab212c08f1 --- /dev/null +++ b/api/hyperloglog.md @@ -0,0 +1,43 @@ +# hyperloglog() ToolkitExperimental +The `hyperloglog` function constructs and returns a hyperloglog with at least +the specified number of buckets over the given values. + +For more information about approximate count distinct functions, see the +[hyperfunctions documentation][hyperfunctions-approx-count-distincts]. + +## Required Arguments + +|Name|Type|Description| +|-|-|-| +|buckets|integer|Number of buckets in the digest. Rounded up to the next power of 2, must be between 16 and 2^18.| +|value|AnyElement| Column to count distinct elements. The type must have an extended, 64-bit, hash function.| + +Increasing the `buckets` argument usually provides more accuracy at the expense +of more storage. + +## Returns + +|Column|Type|Description| +|-|-|-| +|hyperloglog|hyperloglog|A hyperloglog object which can be passed to other hyperloglog APIs.| + + + +## Sample Usage +This examples assumes you have a table called `samples`, that contains a column +called `weights` that holds DOUBLE PRECISION values. This command returns a +digest over that column: + +``` sql +SELECT toolkit.hyperloglog(64, weights) FROM samples; +``` + +Alternatively, you can build a view from the aggregate that you can pass to +other `tdigest` functions: + +``` sql +CREATE VIEW digest AS SELECT toolkit.hyperloglog(64, data) FROM samples; +``` + + +[hyperfunctions-approx-count-distincts]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/approx-count-distincts/ diff --git a/api/idelta.md b/api/idelta.md new file mode 100644 index 000000000000..47fca947c97a --- /dev/null +++ b/api/idelta.md @@ -0,0 +1,91 @@ +# idelta_left() and idelta_right() +The instantaneous change in the counter at the left (earlier) and right (later) +side of the time range. + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## idelta_left() +The instantaneous change in the counter at the left (earlier) side of the time +range. Essentially, the first value subtracted from the second value seen in the +time range (handling resets appropriately). This can be especially useful for +fast moving counters. + +```sql +idelta_left( + summary CounterSummary +) RETURNS DOUBLE PRECISION +``` + +### Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +### Returns + +|Name|Type|Description| +|-|-|-| +|idelta_left|DOUBLE PRECISION|The instantaneous delta computed from the left (earlier) side of the CounterSummary| + +### Sample usage + +```sql +SELECT + id, + bucket, + idelta_left(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + +## idelta_right() +The instantaneous change in the counter at the right (later) side of the time +range. Essentially, the penultimate value subtracted from the last value seen in +the time range (handling resets appropriately). This can be especially useful +for fast moving counters. + +```sql +idelta_right( + summary CounterSummary +) RETURNS DOUBLE PRECISION +``` + +### Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +### Returns + +|Name|Type|Description| +|-|-|-| +|idelta_right|DOUBLE PRECISION|The instantaneous delta computed from the right (later) side of the CounterSummary| + +### Sample usage + +```sql +SELECT + id, + bucket, + idelta_right(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/intercept.md b/api/intercept.md new file mode 100644 index 000000000000..41ed3a8daa5f --- /dev/null +++ b/api/intercept.md @@ -0,0 +1,46 @@ +# intercept() +The intercept of the least squares fit line computed from the adjusted counter +values and times input in the CounterSummary. This corresponds to the projected +value at the PostgreSQL epoch (2000-01-01 00:00:00+00). This is useful for +drawing the best fit line on a graph, using the slope and the intercept. + +```sql +intercept( + summary CounterSummary +) RETURNS DOUBLE PRECISION +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +## Returns + +|Name|Type|Description| +|-|-|-| +|intercept|DOUBLE PRECISION|The intercept of the least squares fit line computed from the points input to the CounterSummary| + +## Sample usage + +```sql +SELECT + id, + bucket, + intercept(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/interpolate.md b/api/interpolate.md index 2d47d9771728..10b8d129eafe 100644 --- a/api/interpolate.md +++ b/api/interpolate.md @@ -1,40 +1,43 @@ -## interpolate() Community - -The `interpolate` function does linear interpolation for missing values. -It can only be used in an aggregation query with [time_bucket_gapfill](/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill/). +# interpolate() Community +The `interpolate` function does linear interpolation for missing values. It can +only be used in an aggregation query with +[time_bucket_gapfill](/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill/). The `interpolate` function call cannot be nested inside other function calls. -### Required Arguments +For more information about gapfilling and interpolation functions, see the +[hyperfunctions documentation][hyperfunctions-gapfilling]. + +## Required arguments |Name|Type|Description| -|---|---|---| -| `value` | ANY VALUES | The value to interpolate (int2/int4/int8/float4/float8) | +|-|-|-| +|`value`|ANY VALUES|The value to interpolate (int2/int4/int8/float4/float8)| -### Optional Arguments +## Optional arguments |Name|Type|Description| -|---|---|---| -| `prev` | EXPRESSION | The lookup expression for values before the gapfill time range (record) | -| `next` | EXPRESSION | The lookup expression for values after the gapfill time range (record) | - -Because the interpolation function relies on having values before and after -each bucketed period to compute the interpolated value, it might not have -enough data to calculate the interpolation for the first and last time bucket -if those buckets do not otherwise contain valid values. -For example, the interpolation would require looking before this first -time bucket period, yet the query's outer time predicate WHERE time > ... -normally restricts the function to only evaluate values within this time range. -Thus, the `prev` and `next` expression tell the function how to look for -values outside of the range specified by the time predicate. -These expressions will only be evaluated when no suitable value is returned by the outer query -(i.e., the first and/or last bucket in the queried time range is empty). -The returned record for `prev` and `next` needs to be a time, value tuple. -The datatype of time needs to be the same as the time datatype in the `time_bucket_gapfill` call. -The datatype of value needs to be the same as the `value` datatype of the `interpolate` call. +|-|-|-| +|`prev`|EXPRESSION|The lookup expression for values before the gapfill time range (record)| +|`next`|EXPRESSION|The lookup expression for values after the gapfill time range (record)| -### Sample Usage +Because the `interpolation` function relies on having values before and after +each time bucket to compute the interpolated value, it might not have enough +data to calculate the interpolation for the first and last time bucket if those +buckets do not contain valid values. For example, the interpolation requires +looking before the first time bucket period, but the query's outer time +predicate `WHERE time > ...` restricts the function to only evaluate values +within this time range. You can use the `prev` and `next` expressions to tell +the function how to look for values outside of the range specified by the time +predicate. These expressions are only evaluated when no suitable value is +returned by the outer query, such as when the first or last bucket in the +queried time range is empty. The returned record for `prev` and `next` needs to +be a time,value tuple. The data type of `time` needs to be the same as the time +data type in the `time_bucket_gapfill` call. The data type of `value` needs to +be the same as the `value` data type of the `interpolate` call. -Get the temperature every day for each device over the last week interpolating for missing readings: +## Sample usage +Get the temperature every day for each device over the last week, interpolating +for missing readings: ```sql SELECT time_bucket_gapfill('1 day', time, now() - INTERVAL '1 week', now()) AS day, @@ -58,7 +61,9 @@ ORDER BY day; (7 row) ``` -Get the average temperature every day for each device over the last 7 days interpolating for missing readings with lookup queries for values before and after the gapfill time range: +Get the average temperature every day for each device over the last seven days, +interpolating for missing readings, with lookup queries for values before and +after the gapfill time range: ```sql SELECT time_bucket_gapfill('1 day', time, now() - INTERVAL '1 week', now()) AS day, @@ -84,3 +89,6 @@ ORDER BY day; 2019-01-16 01:00:00+01 | 1 | 9.0 | 9.0 (7 row) ``` + + +[hyperfunctions-gapfilling]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/gapfilling-interpolation/ diff --git a/api/irate.md b/api/irate.md new file mode 100644 index 000000000000..b05bd93a2c61 --- /dev/null +++ b/api/irate.md @@ -0,0 +1,91 @@ +# irate_left() and irate_right() +The instantaneous rate of change of the counter at the left (earlier) and right +(later) side of the time range. + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## irate_left() +The instantaneous rate of change of the counter at the left (earlier) side of +the time range. Essentially, the `idelta_left` divided by the duration between the +first and second observed points in the CounterSummary. This can be especially +useful for fast moving counters. + +```sql +irate_left( + summary CounterSummary +) RETURNS DOUBLE PRECISION +``` + +### Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +### Returns + +|Name|Type|Description| +|-|-|-| +|irate_left|DOUBLE PRECISION|The instantaneous rate computed from the left (earlier) side of the CounterSummary| + +### Sample usage + +```sql +SELECT + id, + bucket, + irate_left(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + +## irate_right() +The instantaneous rate of change of the counter at the right (later) side of the +time range. Essentially, the `idelta_right` divided by the duration between the +first and second observed points in the CounterSummary. This can be especially +useful for fast moving counters. + +```sql +irate_right( + summary CounterSummary +) RETURNS DOUBLE PRECISION +``` + +### Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +### Returns + +|Name|Type|Description| +|-|-|-| +|irate_right|DOUBLE PRECISION|The instantaneous rate computed from the right (later) side of the CounterSummary| + +### Sample usage + +```sql +SELECT + id, + bucket, + irate_right(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/locf.md b/api/locf.md index 1226ca720f78..3295a983fb16 100644 --- a/api/locf.md +++ b/api/locf.md @@ -1,36 +1,39 @@ -## locf() Community - -The `locf` function (last observation carried forward) allows you to carry the last seen value in an aggregation group forward. -It can only be used in an aggregation query with [time_bucket_gapfill](/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill/). +# locf() Community +The `locf` (last observation carried forward) function allows you to carry the +last seen value in an aggregation group forward. It can only be used in an +aggregation query with +[time_bucket_gapfill](/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill/). The `locf` function call cannot be nested inside other function calls. -### Required Arguments +For more information about gapfilling and interpolation functions, see the +[hyperfunctions documentation][hyperfunctions-gapfilling]. + +## Required arguments |Name|Type|Description| -|---|---|---| -| `value` | ANY ELEMENT | The value to carry forward | +|-|-|-| +|`value`|ANY ELEMENT|The value to carry forward| -### Optional Arguments +## Optional arguments |Name|Type|Description| -|---|---|---| -| `prev` | EXPRESSION | The lookup expression for values before gapfill start | -| `treat_null_as_missing` | BOOLEAN | Ignore NULL values in locf and only carry non-NULL values forward | - -Because the locf function relies on having values before each bucketed period -to carry forward, it might not have enough data to fill in a value for the first -bucket if it does not contain a value. -For example, the function would need to look before this first -time bucket period, yet the query's outer time predicate WHERE time > ... -normally restricts the function to only evaluate values within this time range. -Thus, the `prev` expression tell the function how to look for -values outside of the range specified by the time predicate. -The `prev` expression will only be evaluated when no previous value is returned -by the outer query (i.e., the first bucket in the queried time range is empty). +|-|-|-| +|`prev`|EXPRESSION|The lookup expression for values before gapfill start| +|`treat_null_as_missing`|BOOLEAN|Ignore NULL values in locf and only carry non-NULL values forward| -### Sample Usage +Because the `locf` function relies on having values before each time bucket to +carry forward, it might not have enough data to fill in a value for the first +bucket if it does not contain a value. For example, the function needs to look +before the first time bucket, but the query's outer time predicate `WHERE +time > ...` restricts the function to only evaluate values within this time +range. This means that the `prev` expression tells the function how to look for +values outside of the range specified by the time predicate. The `prev` +expression is only evaluated when no previous value is returned by the outer +query. For example, when the first bucket in the queried time range is empty. -Get the average temperature every day for each device over the last 7 days carrying forward the last value for missing readings: +## Sample usage +Get the average temperature every day for each device over the last seven days, +carrying forward the last value for missing readings: ```sql SELECT time_bucket_gapfill('1 day', time, now() - INTERVAL '1 week', now()) AS day, @@ -54,7 +57,8 @@ ORDER BY day; (7 row) ``` -Get the average temperature every day for each device over the last 7 days carrying forward the last value for missing readings with out-of-bounds lookup +Get the average temperature every day for each device over the last seven days, +carrying forward the last value for missing readings with out-of-bounds lookup: ```sql SELECT time_bucket_gapfill('1 day', time, now() - INTERVAL '1 week', now()) AS day, @@ -79,4 +83,6 @@ ORDER BY day; 2019-01-15 01:00:00+01 | 1 | 8.0 | 8.0 2019-01-16 01:00:00+01 | 1 | 9.0 | 9.0 (7 row) -``` \ No newline at end of file +``` + +[hyperfunctions-gapfilling]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/gapfilling-interpolation/ diff --git a/api/max_val.md b/api/max_val.md index 193988609783..5c30670853f2 100644 --- a/api/max_val.md +++ b/api/max_val.md @@ -1,26 +1,32 @@ -## max_val() Toolkit +# max_val() Toolkit ```SQL max_val(digest TDigest) RETURNS DOUBLE PRECISION ``` -Get the maximum value from a t-digest (does not work with `percentile_agg` or `uddsketch` based estimators). -This is provided in order to save space -when both a maximum and a percentile estimate are required as part of continuous aggregates. -You can simply compute a single percentile estimator and do not need to specify a separate -`max` aggregate, just extract the `max_val` from the percentile estimator. +Get the maximum value from a `tdigest`. This does not work with `percentile_agg` +or `uddsketch` based estimators. This is provided in order to save space when +both a maximum and a percentile estimate are required as part of continuous +aggregates. You can calculate a single percentile estimator by extracting the +`max_val` from the percentile estimator, without needing to specify a separate +`max` aggregate. -### Required Arguments +* For more information about percentile approximation algorithms, see + [advanced aggregation methods][advanced-agg]. +* For more information about percentile approximation functions, see the + [hyperfunctions documentation][hyperfunctions-percentile-approx]. + +### Required arguments |Name|Type|Description| -|---|---|---| -| `digest` | `TDigest` | The digest to extract the max value from. | +|-|-|-| +|`digest`|`TDigest`|The digest to extract the max value from| ### Returns |Column|Type|Description| -|---|---|---| -| `max_val` | `DOUBLE PRECISION` | The maximum value entered into the t-digest. | +|-|-|-| +|`max_val`|`DOUBLE PRECISION`|The maximum value entered into the t-digest.| -### Sample Usage +### Sample usage ```SQL SELECT max_val(tdigest(100, data)) @@ -31,4 +37,8 @@ FROM generate_series(1, 100) data; max_val --------- 100 -``` \ No newline at end of file +``` + + +[advanced-agg]: /timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/advanced-agg/ +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ diff --git a/api/mean.md b/api/mean.md index 791470a5531b..6737b56c77f3 100644 --- a/api/mean.md +++ b/api/mean.md @@ -1,4 +1,4 @@ -## mean() Toolkit +# mean() Toolkit ```SQL mean(sketch UddSketch) RETURNS DOUBLE PRECISION @@ -7,23 +7,29 @@ mean(sketch UddSketch) RETURNS DOUBLE PRECISION mean(digest tdigest) RETURNS DOUBLE PRECISION ``` -Get the exact average of all the values in the percentile estimate. (Percentiles -returned are estimates, the average is exact). This is provided in order to save space -when both a mean and a percentile estimate are required as part of continuous aggregates. -You can simply compute a single percentile estimator and do not need to specify a separate -`avg` aggregate, just extract the mean from the percentile estimator. +Get the exact average of all the values in the percentile estimate. Percentiles +returned are estimates, the average is exact. This is provided in order to save +space when both a mean and a percentile estimate are required as part of +continuous aggregates. You can compute a single percentile estimator by +extracting the mean from the percentile estimator, without needing to specify a +separate `avg` aggregate. + +* For more information about percentile approximation algorithms, see + [advanced aggregation methods][advanced-agg]. +* For more information about percentile approximation functions, see the + [hyperfunctions documentation][hyperfunctions-percentile-approx]. ### Required arguments |Name|Type|Description| -|---|---|---| -| `sketch` / `digest` | `UddSketch`/`tdigest` | The sketch to extract the mean value from, usually from a [`percentile_agg()`](/hyperfunctions/percentile-approximation/percentile_agg/) call. | +|-|-|-| +|`sketch`/`digest`|`UddSketch`/`tdigest`|The sketch to extract the mean value from, usually from a `percentile_agg()`| ### Returns |Column|Type|Description| -|---|---|---| -| `mean` | `DOUBLE PRECISION` | The average of the values in the percentile estimate. | +|-|-|-| +|`mean`|`DOUBLE PRECISION`|The average of the values in the percentile estimate.| ### Sample usage @@ -36,3 +42,7 @@ FROM generate_series(0, 100) data; ------ 50 ``` + + +[advanced-agg]: /timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/advanced-agg/ +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ diff --git a/api/min_val.md b/api/min_val.md index dca20f6f4d8d..d98122a6f081 100644 --- a/api/min_val.md +++ b/api/min_val.md @@ -1,27 +1,32 @@ -## min_val() Toolkit +# min_val() Toolkit ```SQL min_val(digest TDigest) RETURNS DOUBLE PRECISION ``` -Get the minimum value from a t-digest (does not work with `percentile_agg` or `uddsketch` based estimators). -This is provided in order to save space -when both a minimum and a percentile estimate are required as part of continuous aggregates. -You can simply compute a single percentile estimator and do not need to specify a separate -`min` aggregate, just extract the `min_val` from the percentile estimator. +Get the minimum value from a `tdigest`. This does not work with `percentile_agg` +or `uddsketch` based estimators. This saves space when you require both a +minimum and a percentile estimate as part of a continuous aggregate. You can +compute a single percentile estimator and do not need to specify a separate +`min` aggregate, by extracting the `min_val` from the percentile estimator. -### Required Arguments +* For more information about percentile approximation algorithms, see + [advanced aggregation methods][advanced-agg]. +* For more information about percentile approximation functions, see the + [hyperfunctions documentation][hyperfunctions-percentile-approx]. + +## Required arguments |Name|Type|Description| -|---|---|---| -| `digest` | `TDigest` | The digest to extract the min value from. | +|-|-|-| +|`digest`|`TDigest`|The digest to extract the min value from| -### Returns +## Returns |Column|Type|Description| |---|---|---| -| `min_val` | `DOUBLE PRECISION` | The minimum value entered into the t-digest. | +|`min_val`|`DOUBLE PRECISION`|The minimum value entered into the t-digest| -### Sample Usages +## Sample usage ```SQL SELECT min_val(tdigest(100, data)) @@ -32,4 +37,8 @@ FROM generate_series(1, 100) data; min_val ----------- 1 -``` \ No newline at end of file +``` + + +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ +[advanced-agg]: /timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/advanced-agg/ diff --git a/api/num_changes.md b/api/num_changes.md new file mode 100644 index 000000000000..ba86e6d93409 --- /dev/null +++ b/api/num_changes.md @@ -0,0 +1,47 @@ +# num_changes() +The number of times the value changed within the period over which the +CounterSummary is calculated. This is determined by evaluating consecutive +points. Any change in the value is counted, including counter resets where the +counter is reset to zero. This can result in the same adjusted counter value for +consecutive points, but is still treated as a change. + +```sql +num_changes( + summary CounterSummary +) RETURNS BIGINT +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +## Returns + +|Name|Type|Description| +|-|-|-| +|num_changes|BIGINT|The number of times the value changed| + +## Sample usage + +```sql +SELECT + id, + bucket, + num_changes(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/num_elements.md b/api/num_elements.md new file mode 100644 index 000000000000..299baa0b21f0 --- /dev/null +++ b/api/num_elements.md @@ -0,0 +1,45 @@ +# num_elements() +The total number of points seen while calculating the CounterSummary. Only +points with distinct times are counted, as duplicate times are usually discarded +in these calculations. + +```sql +num_elements( + summary CounterSummary +) RETURNS BIGINT +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +## Returns + +|Name|Type|Description| +|-|-|-| +|num_elements|BIGINT|The number of points seen during the counter_agg call| + +## Sample usage + +```sql +SELECT + id, + bucket, + num_elements(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/num_resets.md b/api/num_resets.md new file mode 100644 index 000000000000..9ad8e79594c7 --- /dev/null +++ b/api/num_resets.md @@ -0,0 +1,44 @@ +# num_resets() +The total number of times the counter is reset while calculating the +CounterSummary. + +```sql +num_resets( + summary CounterSummary +) RETURNS BIGINT +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +## Returns + +|Name|Type|Description| +|-|-|-| +|num_resets|BIGINT|The number of resets detected during the counter_agg call| + +## Sample usage + +```sql +SELECT + id, + bucket, + num_resets(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/num_vals.md b/api/num_vals.md index 3daff51440b3..74436666dd04 100644 --- a/api/num_vals.md +++ b/api/num_vals.md @@ -1,4 +1,4 @@ -## num_vals() Toolkit +# num_vals() Toolkit ```SQL num_vals(sketch UddSketch) RETURNS DOUBLE PRECISION @@ -7,24 +7,30 @@ num_vals(sketch UddSketch) RETURNS DOUBLE PRECISION num_vals(digest tdigest) RETURNS DOUBLE PRECISION ``` -Get the number of values contained in a percentile estimate. -This is provided in order to save space when both a count and a percentile estimate are required as part of continuous aggregates. -You can simply compute a single percentile estimator and do not need to specify a separate -`count` aggregate, just extract the `num_vals` from the percentile estimator. +Get the number of values contained in a percentile estimate. This saves space +when you need both a count and a percentile estimate as part of a continuous +aggregate. You can compute a single percentile estimator by extracting the +`num_vals` from the percentile estimator. You do not need to specify a separate +`count` aggregate. -### Required arguments +* For more information about statistical aggregate functions, see the + [hyperfunctions documentation][hyperfunctions-stats-agg]. +* For more information about percentile approximation functions, see the + [hyperfunctions documentation][hyperfunctions-percentile-approx]. + +## Required arguments |Name|Type|Description| |---|---|---| -|`sketch` / `digest` |`UddSketch` or `tdigest` |The percentile estimator to extract the number of values from, usually from a [`percentile_agg()`](/hyperfunctions/percentile-approximation/aggregation-methods/percentile_agg/) call. | +|`sketch`/`digest`|`UddSketch` or `tdigest`|The percentile estimator to extract the number of values from, usually from a [`percentile_agg()`](/hyperfunctions/percentile-approximation/aggregation-methods/percentile_agg/) call| -### Returns +## Returns |Column|Type|Description| |---|---|---| |`num_vals`|`DOUBLE PRECISION`|The number of values in the percentile estimate| -### Sample usage +## Sample usage ```SQL SELECT num_vals(percentile_agg(data)) @@ -35,3 +41,7 @@ FROM generate_series(0, 100) data; ----------- 101 ``` + + +[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/ +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ diff --git a/api/page-index/page-index.js b/api/page-index/page-index.js index a1538140b95f..747793c9d5d8 100644 --- a/api/page-index/page-index.js +++ b/api/page-index/page-index.js @@ -279,6 +279,44 @@ module.exports = [ title: 'time_bucket_ng', href: 'time_bucket_ng', }, + { + title: 'Approximate count distincts', + type: 'directory', + href: 'approx_count_distincts', + children: [ + { + title: 'hyperloglog', + href: 'hyperloglog', + }, + { + title: 'rollup', + href: 'rollup-hyperloglog', + }, + { + title: 'distinct_count', + href: 'distinct_count', + }, + { + title: 'stderror', + href: 'stderror', + }, + ], + }, + { + title: 'Statistical aggregates', + type: 'directory', + href: 'stats_aggs', + children: [ + { + title: 'average', + href: 'average', + }, + { + title: 'num_vals', + href: 'num_vals', + } + ], + }, { title: 'Gapfilling and interpolation', type: 'directory', @@ -356,6 +394,81 @@ module.exports = [ }, ], }, + { + title: 'Counter aggregation', + type: 'directory', + href: 'counter_aggs', + children: [ + { + title: 'counter_agg (point form)', + href: 'counter_agg_point', + }, + { + title: 'rollup', + href: 'rollup-counter', + }, + { + title: 'corr', + href: 'corr', + }, + { + title: 'counter_zero_time', + href: 'counter_zero_time', + }, + { + title: 'delta', + href: 'delta', + }, + { + title: 'extrapolated_delta', + href: 'extrapolated_delta', + }, + { + title: 'extrapolated_rate', + href: 'extrapolated_rate', + }, + { + title: 'idelta', + href: 'idelta', + }, + { + title: 'intercept', + href: 'intercept', + }, + { + title: 'irate', + href: 'irate', + }, + { + title: 'num_changes', + href: 'num_changes', + }, + { + title: 'num_elements', + href: 'num_elements', + }, + { + title: 'num_resets', + href: 'num_resets', + }, + { + title: 'rate', + href: 'rate', + }, + { + title: 'slope', + href: 'slope', + }, + { + title: 'time_delta', + href: 'time_delta', + }, + { + title: 'with_bounds', + href: 'with_bounds', + }, + ], + }, { title: 'Time weighted averages', type: 'directory', diff --git a/api/percentile-aggregation-methods.md b/api/percentile-aggregation-methods.md index 7658fb3b5ce1..d08157267a2f 100644 --- a/api/percentile-aggregation-methods.md +++ b/api/percentile-aggregation-methods.md @@ -1,71 +1,7 @@ # Advanced percentile aggregation Toolkit -While the simple [`Percentile_agg()`](/hyperfunctions/percentile-approximation/percentile_agg) -interface will be sufficient for many users, we do provide more specific APIs for -advanced users who want more control of how their percentile approximation is -computed and how much space the intermediate representation uses. We currently -provide implementations of the following percentile approximation algorithms: +Timescale uses approximation algorithms to calculate a percentile without +requiring all of the data. This also makes them more compatible with continuous +aggregates. By default, Timescale Toolkit uses `uddsketch`, but you can also +choose to use `tdigest`. For more information about the different algorithms, see the [hyperfunction documentation][hyperfunction-advanced-agg]. -- [T-Digest][tdigest] – -This algorithm buckets data more aggressively toward the center of the quantile range, -giving it greater accuracy near the tails (i.e. 0.001 or 0.995). -- [UddSketch][uddsketch] – T -his algorithm uses exponentially sized buckets to guarantee the approximation -falls within a known error range, relative to the true discrete percentile. - -The `UddSketch` algorithm underlies the `percentile_agg()` interface, it offers -tunability for the size and maximum error target of the sketch, while `percentile_agg` -uses preset defaults. - -### Choosing the right algorithm for your use case -There are different tradeoffs that each algorithm makes, and different use cases -where each will shine. The doc pages above each link to the research papers fully -detailing the algorithms if you want all the details. However, at a higher level, -here are some of the differences to consider when choosing an algorithm: - -1. First off, it's interesting to note that the formal definition for a percentile -is actually imprecise, and there are different methods for determining what the -true percentile actually is. In Postgres, given a target percentile 'p', -[`percentile_disc`](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) -will return the smallest element of a set such that 'p' percent of the set is -less than that element, while [`percentile_cont`](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) -will return an interpolated value between the two nearest matches for 'p'. The -difference here isn't usually that interesting in practice, but if it matters to -your use case, then keep in mind that TDigest will approximate the continuous -percentile while UddSketch provides an estimate of the discrete value. -1. It's also important to consider the types of percentiles you're most interested -in. In particular, TDigest is optimized to trade off more accurate estimates at -the extremes with weaker estimates near the median. If your work flow involves -estimating 99th percentiles, this is probably a good trade off. However if you're -more concerned about getting highly accurate median estimates, UddSketch is -probably a better fit. -1. UddSketch has a stable bucketing function, so it will always return the same -quantile estimate for the same underlying data, regardless of how it is ordered -or re-aggregated. TDigest, on the other hand, builds up incremental buckets based -on the average of nearby points, which will result in (usually subtle) differences -in estimates based on the same data, unless the order and batching of the -aggregation is strictly controlled (which can be difficult to do in Postgres). -Therefore, if having stable estimates is important to you, UddSketch will likely -be required. -1. Trying to calculate precise error bars for TDigest can be difficult, especially -when merging multiple sub-digests into a larger one (this can come about either -through summary aggregation or just parallelization of the normal point aggregate). -If being able to tightly characterize your error is important, UddSketch will -likely be the desired algorithm. -1. That being said, the fact that UddSketch uses exponential bucketing to provide -a guaranteed relative error can cause some wildly varying absolute errors if the -data set covers a large range. For instance if the data is evenly distributed -over the range [1,100], estimates at the high end of the percentile range would -have about 100 times the absolute error of those at the low end of the range. -This gets much more extreme if the data range is [0,100]. If having a stable -absolute error is important to your use case, consider TDigest. -1. While both implementation will likely get smaller and/or faster with future -optimizations, in general UddSketch will end up with a smaller memory footprint -than TDigest, and a correspondingly smaller disk footprint for any continuous -aggregates. This is one of the main reasons that the default `percentile_agg` -uses UddSketch, and is a pretty good reason to prefer that algorithm if your use -case doesn't clearly benefit from TDigest. Regardless of the algorithm, the best -way to improve the accuracy of your percentile estimates is to increase the -number of buckets, and UddSketch gives you more leeway to do so. - -[tdigest]: /hyperfunctions/percentile-approximation/percentile-aggregation-methods/tdigest/ -[uddsketch]: /hyperfunctions/percentile-approximation/percentile-aggregation-methods/uddsketch/ +[hyperfunction-advanced-agg]: /timescaledb/:currentVersion:/how-to-guides/hyperfunctions/advanced-agg/ diff --git a/api/percentile-approximation.md b/api/percentile-approximation.md index 7f361836e4de..23ad76f8661d 100644 --- a/api/percentile-approximation.md +++ b/api/percentile-approximation.md @@ -1,55 +1,27 @@ # Percentile approximation Toolkit -Examining time-series data through percentiles is useful for understanding the -distribution of your time-series data. Specifically, they can help eliminate the -inherent impact that outliers have on calculations such as average. For instance -the 50% percentile (median) of the data can be a more useful measure than -average when there are outliers that would dramatically impact the average, but -have a much smaller impact on the median. The median or 50th percentile means -that in an ordered list of your data half of the data will have a greater value -and half a smaller value. Likewise, the 10th percentile would mean that 10% fall -below and 90% above the value returned. - -Often the 95th or 99th percentile can be very useful in identifying normalized -trends in networking and monitoring applications. For instance, when a user reports -that your website is taking 30 second to load, it's helpful to quickly identify -that 99% of requests occur in 200ms or less, which means that this specific -report is an outlier and likely caused by extraordinary conditions. - -By using percentiles, outliers have less of an impact on the calculations because -their magnitude doesn't affect their percentile, only their order in the set. -Therefore, the skew that is introduced to calculations like `AVG()` by infrequent -very large or very small values is reduced or eliminated. - -We provide percentile approximation functions because exact percentiles are not -parallelizable, cannot be used with continuous aggregates and would be very -inefficient when used with multi-node TimescaleDB. Our percentile approximation -algorithm provide good estimates of percentiles while integrating much more fully -with all these other TimescaleDB features. - -## Using percentile approximation in TimescaleDB - - -In order to use functions in the TimescaleDB Toolkit, ensure that -the [extension is installed](/timescaledb/latest/how-to-guides/install-timescaledb-toolkit/) and available within your database. - - -Percentiles in TimescaleDB are calculated in two steps. First, we -must create a percentile estimator which can be created using either -[`percentile_agg()`][percentile_agg], -or one of the [advanced aggregation methods][advanced_agg_methods] `uddsketch()` or `tdigest()`. Estimators can be combined or re-aggregated using the [rollupfunction][rollup]. - -Once the estimator is created, the desired values can be obtained by using the aggregate result as -input to the following functions: [](#percentile-accessors) - - * [`approx_percentile()`](/hyperfunctions/percentile-approximation/approx_percentile) - * [`approx_percentile_rank()`](/hyperfunctions/percentile-approximation/approx_percentile_rank) - * [`mean()`](/hyperfunctions/percentile-approximation/mean) - * [`error()`](/hyperfunctions/percentile-approximation/error) - * [`num_vals()`](/hyperfunctions/percentile-approximation/num_vals) - -Additionally, the output of the aggregation methods can be stored as part of a -continuous aggregate for re-aggregation using the above value functions. - -[percentile_agg]: /hyperfunctions/percentile-approximation/percentile_agg/ -[advanced_agg_methods]: /hyperfunctions/percentile-approximation/percentile-aggregation-methods/ -[rollup]: /hyperfunctions/percentile-approximation/rollup-percentile +This section contains functions related to percentile approximation. +Approximation algorithms are used to calculate a percentile without requiring +all of the data. For more information about percentile approximation functions, +see the [hyperfunctions documentation][hyperfunctions-percentile-approx]. + +Some hyperfunctions are included in the default TimescaleDB product. For +additional hyperfunctions, you need to install the +[Timescale Toolkit][install-toolkit] PostgreSQL extension. + +|Hyperfunction family|Types|API Calls|Included by default|Toolkit required| +|-|-|-|-|-| +|Percentile approximation|Approximate percentile|[`percentile_agg`](/hyperfunctions/percentile-approximation/percentile_agg/)|❌|✅| +|||[`approx_percentile`](/hyperfunctions/percentile-approximation/approx_percentile/)|❌|✅| +|||[`approx_percentile_rank`](/hyperfunctions/percentile-approximation/approx_percentile_rank/)|❌|✅| +|||[`rollup`](/hyperfunctions/percentile-approximation/rollup-percentile/)|❌|✅| +|||[`max_val`](/hyperfunctions/percentile-approximation/max_val/)|✅|❌| +|||[`mean`](/hyperfunctions/percentile-approximation/mean/)|✅|❌| +|||[`error`](/hyperfunctions/percentile-approximation/error/)|✅|❌| +|||[`min_val`](/hyperfunctions/percentile-approximation/min_val/)|✅|❌| +|||[`num_vals`](/hyperfunctions/percentile-approximation/num_vals/)|✅|❌| +||Advanced aggregation methods|[`uddsketch`](/hyperfunctions/percentile-approximation/percentile-aggregation-methods/uddsketch/)|❌|✅| +|||[`tdigest`](/hyperfunctions/percentile-approximation/percentile-aggregation-methods/tdigest/)|❌|✅| + + +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ +[install-toolkit]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/install-toolkit diff --git a/api/percentile_agg.md b/api/percentile_agg.md index f293781162b6..2018793003bc 100644 --- a/api/percentile_agg.md +++ b/api/percentile_agg.md @@ -1,4 +1,4 @@ -## percentile_agg() Toolkit +# percentile_agg() Toolkit ```sql percentile_agg( @@ -6,34 +6,39 @@ percentile_agg( ) RETURNS UddSketch ``` -This is the default percentile aggregation function. It uses the [UddSketch -algorithm](/hyperfunctions/percentile-approximation/percentile-aggregation-methods/uddsketch/) -with 200 buckets and an initial maximum error of 0.001. This is appropriate for -most common use cases of percentile approximation. For more advanced use of -percentile approximation algorithms, -see [advanced usage](/hyperfunctions/percentile-approximation/percentile-aggregation-methods/). -This creates a `Uddsketch` percentile estimator, it is usually used with the [approx_percentile()](/hyperfunctions/percentile-approximation/approx_percentile/) accessor -function to extract an approximate percentile, however it is in a form that can -be re-aggregated using the [rollup](/hyperfunctions/percentile-approximation/rollup-percentile/) function and/or any of the [accessor functions](/hyperfunctions/percentile-approximation/#accessor-functions). +This creates a `Uddsketch` percentile estimator. It is usually used with the +[approx_percentile][approx_percentile] accessor function to extract an +approximate percentile, however it is in a form that can be re-aggregated using +the [rollup][rollup] function or any of the percentile approximation accessor +functions. -### Required arguments +This is the default percentile aggregation function. It uses the `UddSketch` +algorithm with 200 buckets and an initial maximum error of 0.001. This is +appropriate for most common use cases of percentile approximation. + +* For more information about percentile approximation algorithms, see + [advanced aggregation methods][advanced-agg]. +* For more information about percentile approximation functions, see the + [hyperfunctions documentation][hyperfunctions-percentile-approx]. + +## Required arguments |Name|Type|Description| -|---|---|---| +|-|-|-| |`value`|`DOUBLE PRECISION`|Column to aggregate| -### Returns +## Returns |Column|Type|Description| -|---|---|---| +|-|-|-| |`percentile_agg`|`UddSketch`|A UddSketch percentile estimator object which may be passed to other percentile approximation APIs| The `percentile_agg` function uses the UddSketch algorithm, so it returns the -UddSketch data structure for use in further calls. - -### Sample usage -Get the approximate first percentile using the `percentile_agg()` plus the [`approx_percentile`](/hyperfunctions/percentile-approximation/approx_percentile/) accessor function. +`UddSketch` data structure for use in further calls. +## Sample usage +Get the approximate first percentile using the `percentile_agg()` plus the +[`approx_percentile`][approx_percentile] accessor function: ```SQL SELECT approx_percentile(0.01, percentile_agg(data)) @@ -45,9 +50,8 @@ approx_percentile 0.999 ``` -The `percentile_agg` function is often used to create continuous aggregates, after which you can use -multiple accessors -for [retrospective analysis](https://github.com/timescale/timescale-analytics/blob/main/docs/two-step_aggregation.md#retrospective-analysis-over-downsampled-data). +The `percentile_agg` function can be used used to create continuous aggregates, +after which you can use multiple accessors for retrospective analysis: ```SQL CREATE MATERIALIZED VIEW foo_hourly @@ -58,4 +62,9 @@ AS SELECT FROM foo GROUP BY 1; ``` ---- + + +[approx_percentile]: /hyperfunctions/percentile-approximation/approx_percentile/ +[rollup]: /hyperfunctions/percentile-approximation/rollup-percentile/ +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ +[advanced-agg]: /timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/advanced-agg/ diff --git a/api/rate.md b/api/rate.md new file mode 100644 index 000000000000..2ae9efd7b8c0 --- /dev/null +++ b/api/rate.md @@ -0,0 +1,45 @@ +# rate() +The rate of change of the counter over the observed time period. This is the raw +or simple rate, equivalent to `delta(summary)` or `time_delta(summary)`. After +accounting for resets, the last value is subtracted from the first value and +divided by the duration between the last observed time and the first observed +time. + +```sql +rate( + summary CounterSummary +) RETURNS DOUBLE PRECISION +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +## Returns + +|Name|Type|Description| +|-|-|-| +|rate|DOUBLE PRECISION|The per second observed rate computed from the CounterSummary| + +## Sample usage + +```sql +SELECT + id, + rate(summary) +FROM ( + SELECT + id, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/rollup-counter.md b/api/rollup-counter.md new file mode 100644 index 000000000000..dd21aa74b347 --- /dev/null +++ b/api/rollup-counter.md @@ -0,0 +1,51 @@ +# rollup(CounterSummary) Toolkit + +```SQL +rollup( + cs CounterSummary +) RETURNS CounterSummary +``` + +An aggregate to compute a combined `CounterSummary` from a series of +non-overlapping `CounterSummaries`. Non-disjointed `CounterSummaries` cause +errors. + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name| Type |Description| +|-|-|-| +|`cs`|CounterSummary|The input CounterSummary from a previous `counter_agg` (point form) call, often from a continuous aggregate| + +## Returns + +|Column|Type|Description| +|-|-|-| +|`counter_agg`|CounterSummary|A CounterSummary object that can be passed to accessor functions or other objects in the counter aggregate API| + + +## Sample usage + +```SQL +WITH t as ( + SELECT + date_trunc('day', ts) as dt, + counter_agg(ts, val) AS counter_summary -- get a time weight summary + FROM foo + WHERE id = 'bar' + GROUP BY date_trunc('day') +), q as ( + SELECT rollup(counter_summary) AS full_cs -- do a second level of aggregation to get the full CounterSummary + FROM t +) +SELECT + dt, + delta(counter_summary), -- extract the delta from the CounterSummary + delta(counter_summary) / (SELECT delta(full_cs) FROM q LIMIT 1) as normalized -- get the fraction of the delta that happened each day compared to the full change of the counter +FROM t; +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/rollup-hyperloglog.md b/api/rollup-hyperloglog.md new file mode 100644 index 000000000000..107ba3777812 --- /dev/null +++ b/api/rollup-hyperloglog.md @@ -0,0 +1,42 @@ +# rollup() Toolkit + +```SQL +rollup( + log hyperloglog +) RETURNS Hyperloglog +``` + +Returns a hyperloglog by aggregating over the union of the input elements. + +For more information about approximate count distinct functions, see the +[hyperfunctions documentation][hyperfunctions-approx-count-distincts]. + +## Required arguments + +|Name| Type |Description| +|-|-|-| +|`log`|`Hyperloglog`|Column of Hyperloglogs to be united.| + +## Returns + +|Column|Type|Description| +|-|-|-| +|`hyperloglog`|`Hyperloglog`|A hyperloglog containing the count of the union of the input hyperloglogs.| + + +## Sample usage + +```SQL +SELECT toolkit.distinct_count(toolkit.rollup(logs)) +FROM ( + (SELECT toolkit.hyperloglog(32, v::text) logs FROM generate_series(1, 100) v) + UNION ALL + (SELECT toolkit.hyperloglog(32, v::text) FROM generate_series(50, 150) v) +) hll; + count +------- + 152 +``` + + +[hyperfunctions-approx-count-distincts]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/approx-count-distincts/ diff --git a/api/rollup-percentile.md b/api/rollup-percentile.md index f94cd17a5d29..b30dc80949de 100644 --- a/api/rollup-percentile.md +++ b/api/rollup-percentile.md @@ -1,4 +1,4 @@ -## rollup() Toolkit +# rollup() Toolkit ```SQL rollup( @@ -12,29 +12,42 @@ rollup( ``` This combines multiple outputs from the -[`percentile_agg()` function][percentile_agg] (or either -[`uddsketch()` or `tdigest()`][advanced_agg_methods]). This is especially -useful for re-aggregation in a continuous aggregate. For example, bucketing by a larger [`time_bucket()`][time_bucket], or re-grouping on other dimensions -included in an aggregation. +[`percentile_agg()` function][percentile_agg] function, or either +[`uddsketch()` or `tdigest()`][advanced-agg]). This is especially useful for +re-aggregation in a continuous aggregate. For example, bucketing by a larger +[`time_bucket()`][time_bucket], or re-grouping on other dimensions included in +an aggregation. -### Required arguments +* For more information about percentile approximation algorithms, see + [advanced aggregation methods][advanced-agg]. +* For more information about percentile approximation functions, see the + [hyperfunctions documentation][hyperfunctions-percentile-approx]. + +## Required arguments |Name|Type|Description| -|---|---|---| -|`sketch` / `digest` |`UddSketch` or `tdigest` |The already constructed data structure from a previous `percentile_agg`, `uddsketch`, or `tdigest` call| +|-|-|-| +|`sketch`/`digest`|`UddSketch` or `tdigest`|The already constructed data structure from a previous `percentile_agg`, `uddsketch`, or `tdigest` call| -### Returns +## Returns |Column|Type|Description| |---|---|---| |`rollup`|`UddSketch` / `tdigest`|A UddSketch or tdigest object which may be passed to further APIs| -Because the `percentile_agg()`](/hyperfunctions/percentile-approximation/aggregation-methods/percentile_agg/) function uses the [UddSketch algorithm](/hyperfunctions/percentile-approximation/percentile-aggregation-methods/uddsketch), `rollup` returns the UddSketch data structure for use in further calls. +Because the [`percentile_agg()`][percentile_agg] function uses the [UddSketch +algorithm][advanced-agg], `rollup` returns the `UddSketch` data structure for +use in further calls. -When using the `percentile_agg` or `UddSketch` aggregates, the `rollup` function will not introduce additional error (compared to calculating the estimator directly), however, using `rollup` with `tdigest` may introduce additional error compared to calculating the estimator directly on the underlying data. +When you use the `percentile_agg` or `UddSketch` aggregates, the `rollup` +function does not introduce additional errors compared to calculating the +estimator directly, however, using `rollup` with `tdigest` can introduce +additional errors compared to calculating the estimator directly on the +underlying data. -### Sample usage -Here, we re-aggregate an hourly continuous aggregate into daily buckets, the usage with `uddsketch` & `tdigest` is analogous: +## Sample usage +Re-aggregate an hourly continuous aggregate into daily buckets, the usage with +`uddsketch` & `tdigest` is exactly the same: ```SQL CREATE MATERIALIZED VIEW foo_hourly WITH (timescaledb.continuous) @@ -53,5 +66,6 @@ GROUP BY 1; ``` [percentile_agg]: /hyperfunctions/percentile-approximation/percentile_agg/ -[advanced_agg_methods]: /hyperfunctions/percentile-approximation/percentile-aggregation-methods/ +[advanced-agg]: /timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/advanced-agg/ [time_bucket]: /hyperfunctions/time_bucket/ +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ diff --git a/api/rollup-timeweight.md b/api/rollup-timeweight.md index 35e7939dab04..d6f8e0f32b94 100644 --- a/api/rollup-timeweight.md +++ b/api/rollup-timeweight.md @@ -1,4 +1,4 @@ -## rollup(TimeWeightSummary) Toolkit +# rollup(TimeWeightSummary) Toolkit ```SQL rollup( @@ -7,25 +7,26 @@ rollup( ``` An aggregate to compute a combined `TimeWeightSummary` from a series of -non-overlapping `TimeWeightSummaries`. Overlapping `TimeWeightSummaries` will -cause errors. -See [Notes on Parallelism and Ordering](/hyperfunctions/time-weighted-averages/time_weight/##advanced-usage-notes) -for more information. +non-overlapping `TimeWeightSummaries`. Overlapping `TimeWeightSummaries` causes +errors. -### Required arguments +For more information about time-weighted average functions, see the +[hyperfunctions documentation][hyperfunctions-time-weight-average]. + +## Required arguments |Name| Type |Description| |---|---|---| -|`tws`|`TimeWeightSummary`|The input TimeWeightSummary from a previous [`time_weight`](/hyperfunctions/time-weighted-averages/time_weight/) call, often from a continuous aggregate| +|`tws`|`TimeWeightSummary`|The input TimeWeightSummary from a previous `time_weight` call, often from a continuous aggregate| -### Returns +## Returns |Column|Type|Description| |---|---|---| |`time_weight`|`TimeWeightSummary`|A TimeWeightSummary object that can be passed to other functions within the time weighting API| -### Sample usage +## Sample usage ```SQL WITH t as ( @@ -45,3 +46,6 @@ SELECT average(tw) / (SELECT average(full_tw) FROM q LIMIT 1) as normalized -- get the normalized average FROM t; ``` + + +[hyperfunctions-time-weight-average]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/time-weighted-averages/ diff --git a/api/slope.md b/api/slope.md new file mode 100644 index 000000000000..257905fb72b6 --- /dev/null +++ b/api/slope.md @@ -0,0 +1,48 @@ +# slope() +The slope of the least squares fit line computed from the adjusted counter +values and times input in the CounterSummary. Because the times are input as +seconds, the slope provides a per-second rate of change estimate based on the +least squares fit, which is often similar to the result of the rate calculation, +but can more accurately reflect the usual behavior if there are infrequent, +large changes in a counter. + +```sql +slope( + summary CounterSummary +) RETURNS DOUBLE PRECISION +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +## Returns + +|Name|Type|Description| +|-|-|-| +|slope|DOUBLE PRECISION|The per second rate of change computed by taking the slope of the least squares fit of the points input in the CounterSummary| + +## Sample usage + +```sql +SELECT + id, + bucket, + slope(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/stats_aggs.md b/api/stats_aggs.md new file mode 100644 index 000000000000..88a4a7433446 --- /dev/null +++ b/api/stats_aggs.md @@ -0,0 +1,39 @@ +# Statistical aggregates +This section includes functions related to statistical aggregates. + +Some hyperfunctions are included in the default TimescaleDB product. For +additional hyperfunctions, you need to install the +[Timescale Toolkit][install-toolkit] PostgreSQL extension. + +|Hyperfunction family|Types|API Calls|Included by default|Toolkit required| +|-|-|-|-|-| +|Statistical aggregates|Statistical functions|[`average`](/hyperfunctions/stats_aggs/average/)|✅|❌| +|||[`num_vals`](/hyperfunctions/stats_aggs/num_vals/)|✅|❌| + +Additionally, this table includes some other common statistical aggregate +functions: + +|Function|Description|Argument type|Return type| +|-|-|-|-| +|`corr`|Finds the correlation coefficient|DOUBLEPRECISION|DOUBLEPRECISION| +|`covar_pop`|Finds the population covariance|DOUBLEPRECISION|DOUBLEPRECISION| +|`covar_samp`|Finds the sample covariance|DOUBLEPRECISION|DOUBLEPRECISION| +|`regr_avgx`|Finds the average of the independent variable, sum(X)/N|DOUBLEPRECISION|DOUBLEPRECISION| +|`regr_avgy`|Finds the average of the dependent variable, sum(Y)/N|DOUBLEPRECISION|DOUBLEPRECISION| +|`regr_count`|Finds the number of rows in which both inputs are non-null|DOUBLEPRECISION|BIGINT| +|`regr_intercept`|Finds the y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs|DOUBLEPRECISION|DOUBLEPRECISION| +|`regr_r2`|Finds the square of the correlation coefficient|DOUBLEPRECISION|DOUBLEPRECISION| +|`regr_slope`|Finds the slope of the least-squares-fit linear equation determined by the (X, Y) pairs|DOUBLEPRECISION|DOUBLEPRECISION| +|`regr_sxx`|Finds the sum of squares of the independent variable, sum(X^2) - sum(X)^2/N|DOUBLEPRECISION|DOUBLEPRECISION| +|`regr_sxy`|Finds the sum of products of independent times dependent variables, sum(X*Y) - sum(X) * sum(Y)/N|DOUBLEPRECISION|DOUBLEPRECISION| +|`regr_syy`|Finds the sum of squares of the dependent variable, sum(Y^2) - sum(Y)^2/N|DOUBLEPRECISION|DOUBLEPRECISION| +|`stddev_pop`|Finds the population standard deviation of the input values|NUMERIC_TYPE|DOUBLEPRECISION or NUMERIC_TYPE| +|`stddev_samp`|Finds the sample standard deviation of the input values|NUMERIC_TYPE|DOUBLEPRECISION or NUMERIC_TYPE| +|`var_pop`|Finds the population variance of the input values (square of the population standard deviation)|NUMERIC_TYPE|DOUBLEPRECISION or NUMERIC_TYPE| +|`var_samp`|Finds the sample variance of the input values (square of the sample standard deviation)|NUMERIC_TYPE|DOUBLEPRECISION or NUMERIC_TYPE| + +For more information about statistical aggregate functions, see the +[hyperfunctions documentation][hyperfunctions-stats-agg]. + +[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/ +[install-toolkit]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/install-toolkit diff --git a/api/stderror.md b/api/stderror.md new file mode 100644 index 000000000000..496f62330152 --- /dev/null +++ b/api/stderror.md @@ -0,0 +1,53 @@ +# stderror() ToolkitExperimental +The `stderror` function returns an estimate of the relative standard error of the hyperloglog, based on the hyperloglog error formula. Approximate results are: + +|precision|registers|error|bytes| +|-|-|-|-| +|4|16|0.2600|12| +|5|32|0.1838|24| +|6|64|0.1300|48| +|7|128|0.0919|96| +|8|256|0.0650|192| +|9|512|0.0460|384| +|10|1024|0.0325|768| +|11|2048|0.0230||1536| +|12|4096|0.0163|3072| +|13|8192|0.0115|6144| +|14|16384|0.0081|12288| +|15|32768|0.0057|24576| +|16|65536|0.0041|49152| +|17|131072|0.0029|98304| +|18|262144|0.0020|196608| + +For more information about approximate count distinct functions, see the +[hyperfunctions documentation][hyperfunctions-approx-count-distincts]. + +## Required Arguments + +|Name|Type|Description| +|-|-|-| +|hyperloglog|Hyperloglog|The hyperloglog to extract the count from.| + +## Returns + +|Column|Type|Description| +|-|-|-| +|stderror|BIGINT|The number of distinct elements counted by the hyperloglog.| + + + +## Sample Usage +This examples retrieves the standard error from a hyperloglog called `hyperloglog`: + +``` sql +SELECT toolkit.stderror(toolkit.hyperloglog(64, data)) +FROM generate_series(1, 100) data + + stderror +---------- + 0.13 + +``` + + +[hyperfunctions-approx-count-distincts]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/approx-count-distincts/ diff --git a/api/tdigest.md b/api/tdigest.md index bdfd5b674cb8..b7c228961874 100644 --- a/api/tdigest.md +++ b/api/tdigest.md @@ -1,24 +1,4 @@ -## tdigest() Toolkit -TimescaleDB Toolkit provides an implementation of the t-digest data structure -for quantile approximations. A t-digest is a space efficient aggregation which -provides increased resolution at the edges of the distribution. This allows for -more accurate estimates of extreme quantiles than traditional methods. - -Timescale's t-digest is implemented as an aggregate function in PostgreSQL. They -do not support moving-aggregate mode, and are not ordered-set aggregates. Presently -they are restricted to float values, but the goal is to make them polymorphic. -They are partializable and are good candidates for continuous aggregation. - -One additional thing to note about TDigests is that they are somewhat dependant -on the order of inputs. The percentile approximations should be nearly equal for -the same underlying data, especially at the extremes of the quantile range where -the TDigest is inherently more accurate, they are unlikely to be identical if -built in a different order. While this should have little effect on the accuracy -of the estimates, it is worth noting that repeating the creation of the TDigest -might have subtle differences if the call is being parallelized by Postgres. - -## tdigest() usage - +# tdigest() Toolkit ```SQL tdigest( buckets INTEGER, @@ -26,31 +6,61 @@ tdigest( ) RETURNS TDigest ``` -This will construct and return a TDigest with the specified number of buckets over the given values. +This constructs and returns a `tdigest` with the specified number of buckets +over the given values. + +TimescaleDB provides an implementation of the `tdigest` data structure +for quantile approximations. A `tdigest` is a space efficient aggregation which +provides increased resolution at the edges of the distribution. This allows for +more accurate estimates of extreme quantiles than traditional methods. + +Timescale's `tdigest` is implemented as an aggregate function in PostgreSQL. It +does not support moving-aggregate mode, and are not ordered-set aggregates. They +are currently restricted to float values. They are parallelizable and are good +candidates for continuous aggregation. + +The `tdigest` function is somewhat dependent on the order of inputs. The +percentile approximations should be nearly equal for the same underlying data, +especially at the extremes of the quantile range where the `tdigest` is +inherently more accurate, they are unlikely to be identical if built in a +different order. While this should have little effect on the accuracy of the +estimates, it is worth noting that repeating the creation of the `tdigest` might +have subtle differences if the call is being parallelized by PostgreSQL. + +* For more information about percentile approximation functions, see the + [hyperfunctions documentation][hyperfunctions-percentile-approx]. +* For some more technical details and usage examples of this algorithm, + see the [developer documentation][gh-tdigest]. + + +### Required arguments -### Required Arguments |Name| Type |Description| -|---|---|---| -| `buckets` | `INTEGER` | Number of buckets in the digest. Increasing this will provide more accurate quantile estimates, but will require more memory.| -| `value` | `DOUBLE PRECISION` | Column to aggregate. +|-|-|-| +|`buckets`|`INTEGER`|Number of buckets in the digest. Increasing this provides more accurate quantile estimates, but requires more memory.| +|`value`|`DOUBLE PRECISION`|Column to aggregate| ### Returns |Column|Type|Description| -|---|---|---| -| `tdigest` | `TDigest` | A t-digest object which may be passed to other t-digest APIs. | +|-|-|-| +|||A `tdigest` object which can be passed to other `tdigest` APIs| ### Sample usage -For this example, assume we have a table 'samples' with a column 'weights' holding `DOUBLE PRECISION` values. The following will simply return a digest over that column - +This example uses a table called `samples`, with a column called `weights`, that +holds `DOUBLE PRECISION` values. This query returns a digest over that column: ```SQL SELECT tdigest(100, data) FROM samples; ``` -It may be more useful to build a view from the aggregate that can later be passed to other tdigest functions. - +This example builds a view from the aggregate that can be passed to other +tdigest functions: ```SQL CREATE VIEW digest AS SELECT tdigest(100, data) FROM samples; ``` + + +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ +[gh-tdigest]: https://github.com/timescale/timescaledb-toolkit/blob/main/docs/`uddsketch`.md diff --git a/api/time-weighted-averages.md b/api/time-weighted-averages.md index a357a38e184c..55cb8abb5180 100644 --- a/api/time-weighted-averages.md +++ b/api/time-weighted-averages.md @@ -1,24 +1,19 @@ # Time-weighted average functions Toolkit -Time weighted averages are commonly used in cases where a time series is not -evenly sampled, so a traditional average will give misleading results. Consider -a voltage sensor that sends readings once every 5 minutes or whenever the value -changes by more than 1 V from the previous reading. If the results are generally -stable, but with some quick moving transients, a simple average over all of the -points will tend to over-weight the transients instead of the stable readings. -A time weighted average weights each value by the duration over which it occurred -based on the points around it and produces correct results for unevenly spaced series. +This section contains functions related to time-weighted averages. Time weighted +averages are commonly used in cases where a time series is not evenly sampled, +so a traditional average will give misleading results. For more information +about time-weighted average functions, see the +[hyperfunctions documentation][hyperfunctions-time-weight-average]. -TimescaleDB Toolkit's time weighted average is implemented as an aggregate which -weights each value either using a last observation carried forward (LOCF) -approach or a linear interpolation approach. +Some hyperfunctions are included in the default TimescaleDB product. For +additional hyperfunctions, you need to install the +[Timescale Toolkit][install-toolkit] PostgreSQL extension. - -In order to use functions in the TimescaleDB Toolkit, ensure that -the [extension is installed](/timescaledb/latest/how-to-guides/install-timescaledb-toolkit/) and available within your database. - +|Hyperfunction family|Types|API Calls|Included by default|Toolkit required| +|-|-|-|-|-| +|Time-weighted averages|Time-weighted averages|[`time_weight`](/hyperfunctions/time-weighted-averages/time_weight/)|❌|✅| +|||[`rollup`](/hyperfunctions/time-weighted-averages/rollup-timeweight/)|❌|✅| +|||[`average`](/hyperfunctions/time-weighted-averages/average/)|❌|✅| -As with other Toolkit functions that support two-step aggregations, the -[`time_weight`](/hyperfunctions/time-weighted-averages/time_weight/) function produces a summary output (`TimeWeightSummary`) which -is intended to be consumed by either the [`average`](/hyperfunctions/time-weighted-averages/average/) or [`rollup`](/hyperfunctions/time-weighted-averages/rollup-timeweight/) function -Additionally, the output of [`time_weight`](/hyperfunctions/time-weighted-averages/time_weight/)can be stored in a Continuous -Aggregate and re-aggregated or analyzed later. +[hyperfunctions-time-weight-average]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/time-weighted-averages/ +[install-toolkit]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/install-toolkit diff --git a/api/time_bucket_gapfill.md b/api/time_bucket_gapfill.md index cf5233a82854..ad453bfef569 100644 --- a/api/time_bucket_gapfill.md +++ b/api/time_bucket_gapfill.md @@ -1,68 +1,73 @@ -## time_bucket_gapfill() Community - -The `time_bucket_gapfill` function works similar to `time_bucket` but also activates gap -filling for the interval between `start` and `finish`. It can only be used with an aggregation -query. Values outside of `start` and `finish` will pass through but no gap filling will be -done outside of the specified range. - -Starting with version 1.3.0, `start` and `finish` are optional arguments and will -be inferred from the WHERE clause if not supplied as arguments. - - - We recommend using a WHERE clause whenever possible (instead of just -`start` and `finish` arguments), as start and finish arguments will not filter -input rows. Thus without a WHERE clause, this will lead TimescaleDB's planner -to select all data and not perform constraint exclusion to exclude chunks from -further processing, which would be less performant. +# time_bucket_gapfill() Community +The `time_bucket_gapfill` function works similar to `time_bucket` but also +activates gap filling for the interval between `start` and `finish`. It can only +be used with an aggregation query. Values outside of `start` and `finish` pass +through but no gap filling is done outside of the specified range. + + +The `time_bucket_gapfill` function must be a top-level expression in a query or +subquery, as shown in these examples. You cannot, for example, do something like +`round(time_bucket_gapfill(...))` or cast the result of the gapfill call. The +only exception is if you use it as a subquery, where the outer query does the +type cast. -The `time_bucket_gapfill` must be a top-level expression in a query or -subquery, as shown in the above examples. You cannot, for example, do -something like `round(time_bucket_gapfill(...))` or cast the result of the gapfill -call (unless as a subquery where the outer query does the type cast). +For more information about gapfilling and interpolation functions, see the +[hyperfunctions documentation][hyperfunctions-gapfilling]. -### Required Arguments +## Required arguments |Name|Type|Description| -|---|---|---| -| `bucket_width` | INTERVAL | A PostgreSQL time interval for how long each bucket is | -| `time` | TIMESTAMP | The timestamp to bucket | +|-|-|-| +|`bucket_width`|INTERVAL|A PostgreSQL time interval for how long each bucket is| +|`time`|TIMESTAMP|The timestamp to bucket| -### Optional Arguments +## Optional arguments |Name|Type|Description| -|---|---|---| -| `start` | TIMESTAMP | The start of the gapfill period | -| `finish` | TIMESTAMP | The end of the gapfill period | +|-|-|-| +|`start`|TIMESTAMP|The start of the gapfill period| +|`finish`|TIMESTAMP|The end of the gapfill period| -Note that explicitly provided `start` and `stop` or derived from WHERE clause values -need to be simple expressions. Such expressions should be evaluated to constants -at the query planning. For example, simple expressions can contain constants or -call to `now()`, but cannot reference to columns of a table. +In TimescaleDB 1.3 and later, `start` and `finish` are optional arguments. If +they are not supplied, the parameters are inferred from the `WHERE` clause. We +recommend using a `WHERE` clause if possible, instead of `start` and `finish` +arguments. This is because `start` and `finish` arguments do not filter input +rows. If you do not provide a `WHERE` clause, TimescaleDB's planner selects all +data, and does not perform constraint exclusion to exclude chunks from further +processing, which is less performant. -### For Integer Time Inputs +Values explicitly provided in `start` and `stop` arguments, or values derived +from `WHERE` clause values, must be simple expressions. They should be evaluated +to constants at query planning. For example, simple expressions can contain +constants or call to `now()`, but cannot reference columns of a table. -#### Required Arguments +## For integer time inputs -|Name|Type|Description| -|---|---|---| -| `bucket_width` | INTEGER | integer interval for how long each bucket is | -| `time` | INTEGER | The timestamp to bucket | - -### Optional Arguments +### Required arguments |Name|Type|Description| -|---|---|---| -| `start` | INTEGER | The start of the gapfill period | -| `finish` | INTEGER | The end of the gapfill period | - -Starting with version 1.3.0 `start` and `finish` are optional arguments and will -be inferred from the WHERE clause if not supplied as arguments. +|-|-|-| +|`bucket_width`|INTEGER|integer interval for how long each bucket is| +|`time`|INTEGER|The timestamp to bucket| -### Sample Usage - -Get the metric value every day over the last 7 days: +## Optional arguments +|Name|Type|Description| +|-|-|-| +|`start`|INTEGER|The start of the gapfill period| +|`finish`|INTEGER|The end of the gapfill period| + +In TimescaleDB 1.3 and later, `start` and `finish` are optional arguments. If +they are not supplied, the parameters are inferred from the `WHERE` clause. We +recommend using a `WHERE` clause if possible, instead of `start` and `finish` +arguments. This is because `start` and `finish` arguments do not filter input +rows. If you do not provide a `WHERE` clause, TimescaleDB's planner selects all +data, and does not perform constraint exclusion to exclude chunks from further +processing, which is less performant. + +### Sample usage +Get the metric value every day over the last seven days: ```sql SELECT time_bucket_gapfill('1 day', time) AS day, @@ -85,8 +90,8 @@ ORDER BY day; (7 row) ``` -Get the metric value every day over the last 7 days carrying forward the previous seen value if none is available in an interval: - +Get the metric value every day over the last seven days, carrying forward the +previous seen value if none is available in an interval: ```sql SELECT time_bucket_gapfill('1 day', time) AS day, @@ -109,8 +114,8 @@ ORDER BY day; 2019-01-16 01:00:00+01 | 1 | 9.0 | 9.0 ``` -Get the metric value every day over the last 7 days interpolating missing values: - +Get the metric value every day over the last seven days, interpolating missing +values: ```sql SELECT time_bucket_gapfill('5 minutes', time) AS day, @@ -132,3 +137,6 @@ ORDER BY day; 2019-01-15 01:00:00+01 | 1 | 8.0 | 8.0 2019-01-16 01:00:00+01 | 1 | 9.0 | 9.0 ``` + + +[hyperfunctions-gapfilling]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/gapfilling-interpolation/ diff --git a/api/time_delta.md b/api/time_delta.md new file mode 100644 index 000000000000..cd4e58280d23 --- /dev/null +++ b/api/time_delta.md @@ -0,0 +1,44 @@ +# time_delta() +The observed change in time. Calculated by subtracting the first observed time +from the last observed time over the period aggregated. Measured in seconds. + +```sql +time_delta( + summary CounterSummary +) RETURNS DOUBLE PRECISION +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary from a counter_agg call| + +## Returns + +|Name|Type|Description| +|-|-|-| +|time_delta|DOUBLE PRECISION|The total duration in seconds between the first and last observed times in the CounterSummary| + +## Sample usage + +```sql +SELECT + id, + bucket, + time_delta(summary) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/api/time_weight.md b/api/time_weight.md index 514a9185bb5d..77ef614f9703 100644 --- a/api/time_weight.md +++ b/api/time_weight.md @@ -14,6 +14,9 @@ NB: Only two values for `method` are currently supported: `linear` and `LOCF`, a any capitalization is accepted. See [interpolation methods](#interpolation-methods-details) for more information. +For more information about time-weighted average functions, see the +[hyperfunctions documentation][hyperfunctions-time-weight-average]. + ### Required arguments |Name|Type|Description| @@ -50,42 +53,42 @@ FROM t; ``` ## Advanced usage notes -Most cases will work out of the box, but for power users, or those who want to +Most cases will work out of the box, but for power users, or those who want to dive deeper, we've included a bit more context below. ### Interpolation methods details -Discrete time values don't always allow for an obvious calculation of the time -weighted average. In order to calculate a time weighted average we need to choose -how to weight each value. The two methods we currently use are last observation +Discrete time values don't always allow for an obvious calculation of the time +weighted average. In order to calculate a time weighted average we need to choose +how to weight each value. The two methods we currently use are last observation carried forward (LOCF) and linear interpolation. -In the LOCF approach, the value is treated as if it remains constant until the -next value is seen. The LOCF approach is commonly used when the sensor or +In the LOCF approach, the value is treated as if it remains constant until the +next value is seen. The LOCF approach is commonly used when the sensor or measurement device sends measurement only when there is a change in value. -The linear interpolation approach treats the values between any two measurements -as if they lie on the line connecting the two measurements. The linear -interpolation approach is used to account for irregularly sampled data where the +The linear interpolation approach treats the values between any two measurements +as if they lie on the line connecting the two measurements. The linear +interpolation approach is used to account for irregularly sampled data where the sensor doesn't provide any guarantees. ### Parallelism and ordering -The time weighted average calculations we perform require a strict ordering of -inputs and therefore the calculations are not parallelizable in the strict -Postgres sense. This is because when Postgres does parallelism it hands out rows -randomly, basically as it sees them to workers. However, if your parallelism can -guarantee disjoint (in time) sets of rows, the algorithm can be parallelized, just -so long as within some time range, all rows go to the same worker. This is the -case for both continuous aggregates and for distributed hypertables (as long as -the partitioning keys are in the group by, though the aggregate itself doesn't +The time weighted average calculations we perform require a strict ordering of +inputs and therefore the calculations are not parallelizable in the strict +Postgres sense. This is because when Postgres does parallelism it hands out rows +randomly, basically as it sees them to workers. However, if your parallelism can +guarantee disjoint (in time) sets of rows, the algorithm can be parallelized, just +so long as within some time range, all rows go to the same worker. This is the +case for both continuous aggregates and for distributed hypertables (as long as +the partitioning keys are in the group by, though the aggregate itself doesn't horribly make sense otherwise). -We throw an error if there is an attempt to combine overlapping `TimeWeightSummaries`, -for instance, in our example above, if you were to try to combine summaries across -`measure_ids` it would error. This is because the interpolation techniques really -only make sense within a given time series determined by a single `measure_id`. -However, given that the time weighted average produced is a dimensionless -quantity, a simple average of time weighted average should better represent the -variation across devices, so the recommendation for things like baselines across +We throw an error if there is an attempt to combine overlapping `TimeWeightSummaries`, +for instance, in our example above, if you were to try to combine summaries across +`measure_ids` it would error. This is because the interpolation techniques really +only make sense within a given time series determined by a single `measure_id`. +However, given that the time weighted average produced is a dimensionless +quantity, a simple average of time weighted average should better represent the +variation across devices, so the recommendation for things like baselines across many timeseries would be something like: ```sql @@ -99,17 +102,17 @@ SELECT avg(time_weighted_average) -- use the normal avg function to average our FROM t; ``` -Internally, the first and last points seen as well as the calculated weighted sum -are stored in each `TimeWeightSummary` and used to combine with a neighboring -`TimeWeightSummary` when re-aggregation or the Postgres combine function is called. -In general, the functions support partial aggregation and partitionwise aggregation -in the multinode context, but are not parallelizable (in the Postgres sense, +Internally, the first and last points seen as well as the calculated weighted sum +are stored in each `TimeWeightSummary` and used to combine with a neighboring +`TimeWeightSummary` when re-aggregation or the Postgres combine function is called. +In general, the functions support partial aggregation and partitionwise aggregation +in the multinode context, but are not parallelizable (in the Postgres sense, which requires them to accept potentially overlapping input). -Because they require ordered sets, the aggregates build up a buffer of input -data, sort it and then perform the proper aggregation steps. In cases where -memory is proving to be too small to build up a buffer of points causing OOMs -or other issues, a multi-level aggregate can be useful. Following our example +Because they require ordered sets, the aggregates build up a buffer of input +data, sort it and then perform the proper aggregation steps. In cases where +memory is proving to be too small to build up a buffer of points causing OOMs +or other issues, a multi-level aggregate can be useful. Following our example from above: ```sql @@ -126,3 +129,6 @@ SELECT measure_id, FROM t GROUP BY measure_id; ``` + + +[hyperfunctions-time-weight-average]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/time-weighted-averages/ diff --git a/api/uddsketch.md b/api/uddsketch.md index a966d979bcdb..312b584c2752 100644 --- a/api/uddsketch.md +++ b/api/uddsketch.md @@ -1,42 +1,4 @@ -## uddsketch() Toolkit -Timescale's UddSketch implementation is provided as an aggregate function in -PostgreSQL. The output is currently only suitable as input to the -the percentile approximation functions. This can be directly as part of a one-off -SQL query, or as transient data stored in a Continuous Aggregate that is queried -later with these functions and using the UddSketch data as input. - -## Implementation details - -[UddSketch](https://arxiv.org/pdf/2004.08604.pdf) is a specialization of the -[DDSketch](https://arxiv.org/pdf/1908.10693.pdf) data structure. It follows the -same approach of breaking the data range into a series of logarithmically sized -buckets such that it can guarantee a maximum relative error for any percentile -estimate as long as it knows which bucket that percentile falls in. - -Where UddSketch differs from DDSketch is in its behavior when the number of buckets -required by a set of values exceeds some predefined maximum. In these circumstances -DDSketch will maintain it's original error bound, but only for a subset of the -range of percentiles. UddSketch, on the other hand, will combine buckets in such -a way that it loosens the error bound, but can still estimate all percentile values. - -As an example, assume both sketches were trying to capture an large set of values -to be able to estimate percentiles with 1% relative error but were given too few -buckets to do so. The DDSketch implementation would still guarantee 1% relative -error, but may only be able to provides estimates in the range (0.05, 0.95). The -UddSketch implementation however, might end up only able to guarantee 2% relative -error, but would still be able to estimate all percentiles at that error. - -Timescale's UddSketch implementation is provided as an aggregate function in -PostgreSQL. It does not support moving-aggregate mode, and is not a ordered-set -aggregate. It currently only works with `DOUBLE PRECISION` types, but we're -intending to relax this constraint as needed. UddSketches are partializable and -are good candidates for [continuous aggregation](https://docs.timescale.com/latest/using-timescaledb/continuous-aggregates). - -It's also worth noting that attempting to set the relative error too small or -large can result in breaking behavior. For this reason, the error is required -to fall into the range [1.0e-12, 1.0). - -## uddsketch() usage +# uddsketch() Toolkit ```SQL ,ignore uddsketch( @@ -46,45 +8,57 @@ uddsketch( ) RETURNS UddSketch ``` -This will construct and return a new UddSketch with at most `size` buckets. -The maximum relative error of the UddSketch will be bounded by `max_error` unless -it is impossible to do so while with the bucket bound. If the sketch has had to -combine buckets, the new error can be found with the [uddsketch_error](#error) -command. +This constructs and returns a new `UddSketch` with at most `size` buckets. +The maximum relative error of the `UddSketch` is bounded by `max_error` unless +it is impossible to do so while with the bucket bound. -Note that since the error will be increased automatically (roughly doubling at -each step) as the number of buckets is exceeded, it is probably worth erring on -the side of too small unless you have a good understanding of exactly what your -error should be. +If the sketch has to combine buckets, the new error can be found with the +[uddsketch_error][error] command. Because the error is increased automatically +(roughly doubling at each step) as the number of buckets is exceeded, start +smaller unless you have a good understanding of exactly what your error should +be. -### Required arguments -|Name| Type |Description| -|---|---|---| -| `size` | `INTEGER` | Maximum number of buckets in the sketch. Providing a larger value here will make it more likely that the aggregate will able to maintain the desired error, though will potentially increase the memory usage. | -| `max_error` | `DOUBLE PRECISION` | This is the starting maximum relative error of the sketch, as a multiple of the actual value. The true error may exceed this if too few buckets are provided for the data distribution. | -| `value` | `DOUBLE PRECISION` | Column to aggregate. +Timescale's `UddSketch` implementation is provided as an aggregate function in +PostgreSQL. The output is currently only suitable as input to the +the percentile approximation functions. This can be directly as part of a one-off +SQL query, or as transient data stored in a continuous aggregate that is queried +later with these functions and using the `UddSketch` data as input. +* For more information about percentile approximation functions, see the + [hyperfunctions documentation][hyperfunctions-percentile-approx]. +* For some more technical details and usage examples of this algorithm, + see the [developer documentation][gh-uddsketch]. -### Returns +## Required arguments +|Name| Type |Description| +|-|-|-| +|`size`|`INTEGER`|Maximum number of buckets in the sketch. Providing a larger value here will make it more likely that the aggregate will able to maintain the desired error, though will potentially increase the memory usage.| +|`max_error`|`DOUBLE PRECISION`|This is the starting maximum relative error of the sketch, as a multiple of the actual value. The true error may exceed this if too few buckets are provided for the data distribution.| +|`value`|`DOUBLE PRECISION`|Column to aggregate| -|Column|Type|Description| -|---|---|---| -| `uddsketch` | `UddSketch` | A UddSketch object which may be passed to other UddSketch APIs. | +## Returns +|Column|Type|Description| +|-|-|-| +|`uddsketch`|`UddSketch`|A UddSketch object which can be passed to other UddSketch APIs| -### Sample usage -For this example assume we have a table 'samples' with a column 'data' holding -`DOUBLE PRECISION` values. The following will simply return a sketch over that column +## Sample usage +This example uses a table called `samples` with a column called `data` that +holds `DOUBLE PRECISION` values. This query returns a Uddsketch over that +column: ```SQL SELECT uddsketch(100, 0.01, data) FROM samples; ``` -It may be more useful to build a view from the aggregate that we can later pass -to other uddsketch functions. - +This example builds a view from the aggregate that you can pass to other +Uddsketch functions: ```SQL CREATE VIEW sketch AS SELECT uddsketch(100, 0.01, data) FROM samples; ``` + +[hyperfunctions-percentile-approx]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/ +[gh-uddsketch]: https://github.com/timescale/timescaledb-toolkit/blob/main/docs/`tdigest`.md +[error]: /hyperfunctions/percentile-approximation/error/ diff --git a/api/with_bounds.md b/api/with_bounds.md new file mode 100644 index 000000000000..994d8729a9f1 --- /dev/null +++ b/api/with_bounds.md @@ -0,0 +1,53 @@ +# with_bounds() +A utility function to add bounds to an already-computed CounterSummary. The +bounds represent the outer limits of the timestamps allowed for this +CounterSummary as well as the edges of the range to extrapolate to in functions +that allow it. + +```sql +with_bounds( + summary CounterSummary, + bounds TSTZRANGE, +) RETURNS CounterSummary +``` + +For more information about counter aggregation functions, see the +[hyperfunctions documentation][hyperfunctions-counter-agg]. + +## Required arguments + +|Name|Type|Description| +|-|-|-| +|summary|CounterSummary|The input CounterSummary| +|bounds|TSTZRANGE|A range of timestamptz representing the largest and smallest allowed times in this CounterSummary| + +## Returns + +|Name|Type|Description| +|-|-|-| +|counter_agg|CounterSummary|A CounterSummary object that can be passed to accessor functions or other objects in the counter aggregate API| + +## Sample usage + +```sql +SELECT + id, + bucket, + extrapolated_rate( + with_bounds( + summary, + time_bucket_range('15 min'::interval, bucket) + ) + ) +FROM ( + SELECT + id, + time_bucket('15 min'::interval, ts) AS bucket, + counter_agg(ts, val) AS summary + FROM foo + GROUP BY id, time_bucket('15 min'::interval, ts) +) t +``` + + +[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/ diff --git a/timescaledb/how-to-guides/hyperfunctions/about-hyperfunctions.md b/timescaledb/how-to-guides/hyperfunctions/about-hyperfunctions.md new file mode 100644 index 000000000000..d7dad52e2346 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/about-hyperfunctions.md @@ -0,0 +1,140 @@ +# About Timescale hyperfunctions +Timescale hyperfunctions are a specialized set of functions that allow you to to +analyze time-series data. You can use hyperfunctions to analyze anything you +have stored as time-series data, including IoT devices, IT systems, marketing +analytics, user behavior, financial metrics, and cryptocurrency. + +Hyperfunctions allow you to perform critical time-series queries quickly, +analyze time-series data, and extract meaningful information. They aim to +identify, build, and combine all of the functionality SQL needs to perform +time-series analysis into a single extension. + +Some hyperfunctions are included in the default TimescaleDB product. For +additional hyperfunctions, you need to install the +[Timescale Toolkit][install-toolkit] PostgreSQL extension. + +|Hyperfunction family|Types|API Calls|Included by default|Toolkit required| +|-|-|-|-|-| +|Approximate count distincts|Hyperloglog|`hyperloglog`|❌|✅| +|||`rollup`|❌|✅| +|||`distinct_count`|❌|✅| +|||`stderror`|❌|✅| +|Statistical aggregates|Statistical functions|`average`|✅|❌| +|||`stats_agg`|❌|✅| +|||`rollup`|❌|✅| +|||`rolling`|❌|✅| +|||`sum`|✅|❌| +|||`num_vals`|✅|❌| +|||`stddev`|✅|❌| +|||`variance`|✅|❌| +|||`skewness`|✅|❌| +|||`kurtosis`|✅|❌| +||Regression functions|`slope`|✅|❌| +|||`intercept`|✅|❌| +|||`x_intercept`|✅|❌| +|||`corr`|✅|❌| +|||`covariance`|✅|❌| +|||`skewness`|✅|❌| +|||`kurtosis`|✅|❌| +|||`determination_coeff`|✅|❌| +|Gapfilling and interpolation|Time bucket gapfill|`time_bucket_gapfill`|❌|✅| +||Last observation carried forward|`locf`|✅|❌| +|||`interpolate`|✅|❌| +|Percentile approximation|Approximate percentile|`percentile_agg`|❌|✅| +|||`approx_percentile`|❌|✅| +|||`approx_percentile_rank`|❌|✅| +|||`rollup`|❌|✅| +|||`max_val`|✅|❌| +|||`mean`|✅|❌| +|||`error`|✅|❌| +|||`min_val`|✅|❌| +|||`num_vals`|✅|❌| +||Advanced aggregation methods|`uddsketch`|❌|✅| +|||`tdigest`|❌|✅| +|Counter aggregation|Counter aggregates|`counter_agg`|❌|✅| +|||`rollup`|❌|✅| +|||`corr`|✅|❌| +|||`counter_zero_time`|✅|❌| +|||`delta`|✅|❌| +|||`extrapolated_delta`|✅|❌| +|||`extrapolated_rate`|✅|❌| +|||`idelta`|✅|❌| +|||`intercept`|✅|❌| +|||`irate`|✅|❌| +|||`num_changes`|✅|❌| +|||`num_elements`|✅|❌| +|||`num_resets`|✅|❌| +|||`rate`|✅|❌| +|||`slope`|✅|❌| +|||`time_delta`|✅|❌| +|||`with_bounds`|❌|✅| +|Time-weighted averages|Time-weighted averages|`time_weight`|❌|✅| +|||`rollup`|❌|✅| +|||`average`|❌|✅| + +For more information about each of the API calls listed in this table, see our [hyperfunction API documentation][api-hyperfunctions]. + +## Function pipelines +Function pipelines are an experimental feature, designed to radically improve +the developer ergonomics of analyzing data in PostgreSQL and SQL, by applying +principles from functional programming and popular tools like Python’s Pandas, +and PromQL. + +SQL is the best language for data analysis, but it is not perfect, and at times +can get quite unwieldy. For example, this query gets data from the last day from +the measurements table, sorts the data by the time column, calculates the delta +between the values, takes the absolute value of the delta, and then takes the +sum of the result of the previous steps: +```SQL +SELECT device id, +sum(abs_delta) as volatility +FROM ( + SELECT device_id, +abs(val - lag(val) OVER last_day) as abs_delta +FROM measurements +WHERE ts >= now()-'1 day'::interval) calc_delta +GROUP BY device_id; +``` + +You can express the same query with a function pipeline like this: +```SQL +SELECT device_id, + timevector(ts, val) -> sort() -> delta() -> abs() -> sum() as volatility +FROM measurements +WHERE ts >= now()-'1 day'::interval +GROUP BY device_id; +``` + +Function pipelines are completely SQL compliant, meaning that any tool that +speaks SQL is able to support data analysis using function pipelines. + +For more information about how function pipelines work, read our +[blog post][blog-function-pipelines]. + +## Toolkit feature development +Timescale Toolkit features are developed in the open. As features are developed +they are categorized as experimental, beta, stable, or deprecated. This +documentation covers the stable features, but more information on our +experimental features in development can be found in the +[Toolkit repository][gh-docs]. + +## Contribute to Timescale Toolkit +We want and need your feedback! What are the frustrating parts of analyzing +time-series data? What takes far more code than you feel it should? What runs +slowly, or only runs quickly after many rewrites? We want to solve +community-wide problems and incorporate as much feedback as possible. + +* Join the [discussion][gh-discussions]. +* Check out the [proposed features][gh-proposed]. +* Explore the current [feature requests][gh-requests]. +* Add your own [feature request][gh-newissue]. + + +[install-toolkit]: /how-to-guides/hyperfunctions/install-toolkit +[api-hyperfunctions]: /api/:currentVersion:/hyperfunctions +[gh-docs]: https://github.com/timescale/timescale-analytics/tree/main/docs +[blog-function-pipelines]: http://tsdb.co/function-pipelines +[gh-discussions]: https://github.com/timescale/timescale-analytics/discussions +[gh-proposed]: https://github.com/timescale/timescale-analytics/labels/proposed-feature +[gh-requests]: https://github.com/timescale/timescale-analytics/labels/feature-request +[gh-newissue]: https://github.com/timescale/timescale-analytics/issues/new?assignees=&labels=feature-request&template=feature-request.md&title= diff --git a/timescaledb/how-to-guides/hyperfunctions/advanced-agg.md b/timescaledb/how-to-guides/hyperfunctions/advanced-agg.md new file mode 100644 index 000000000000..f00ff8e24e77 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/advanced-agg.md @@ -0,0 +1,75 @@ +# Percentile approximation advanced aggregation methods +Timescale uses approximation algorithms to calculate a percentile without +requiring all of the data. This also makes them more compatible with continuous +aggregates. By default, TimescaleDB uses `uddsketch`, but you can also choose to +use `tdigest`. This section describes the different methods, and helps you to +decide which one you should use. + +`uddsketch` is the default algorithm. It uses exponentially sized buckets to +guarantee the approximation falls within a known error range, relative to the +true discrete percentile. This algorithm offers the ability to tune the size and +maximum error target of the sketch. + +`tdigest` buckets data more aggressively toward the center of the quantile +range, giving it greater accuracy at the tails of the range, around 0.001 or +0.995. + +## Choose the right algorithm +Each algorithm has different features, which can make one better than another +depending on your use case. Here are some of the differences to consider when +choosing an algorithm: + +Before you begin, it is important to understand that the formal definition for +a percentile is imprecise, and there are different methods for determining what +the true percentile actually is. In PostgreSQL, given a target percentile `p`, +[`percentile_disc`][pg-percentile] returns the smallest element of a set, so +that `p` percent of the set is less than that element. However, +[`percentile_cont`][pg-percentile] returns an interpolated value between the two +nearest matches for `p`. In practice, the difference between these methods is +very small but, if it matters to your use case, keep in mind that `tdigest` +approximates the continuous percentile, while `uddsketch` provides an estimate +of the discrete value. + +Think about the types of percentiles you're most interested in. `tdigest` is +optimized for more accurate estimates at the extremes, and less accurate +estimates near the median. If your workflow involves estimating 99th +percentiles, then choose `tdigest`. If you're more concerned about getting +highly accurate median estimates, choose `uddsketch`. + +The algorithms differ in the way they estimate data. `uddsketch` has a stable +bucketing function, so it will always return the same percentile estimate for +the same underlying data, regardless of how it is ordered or re-aggregated. On +the other hand, `tdigest` builds up incremental buckets based on the average of +nearby points, which can result in some subtle differences in estimates based on +the same data unless the order and batching of the aggregation is strictly +controlled, which is sometimes difficult to do in PostgreSQL. If stable +estimates are important to you, choose `uddsketch`. + +Calculating precise error bars for `tdigest` can be difficult, especially when +merging multiple sub-digests into a larger one. This can occur through summary +aggregation, or parallelization of the normal point aggregate. If you need to +tightly characterize your errors, choose `uddsketch`. However, because +`uddsketch` uses exponential bucketing to provide a guaranteed relative error, +it can cause some wildly varying absolute errors if the dataset covers a large +range. For example, if the data is evenly distributed over the range `[1,100]`, +estimates at the high end of the percentile range have about 100 times the +absolute error of those at the low end of the range. This gets much more extreme +if the data range is `[0,100]`. If having a stable absolute error is important to +your use case, choose `tdigest`. + +While both algorithms will probably get smaller and faster with future +optimizations, `uddsketch` generally requires a smaller memory footprint than +`tdigest`, and a correspondingly smaller disk footprint for any continuous +aggregates. Regardless of the algorithm you choose, the best way to improve the +accuracy of your percentile estimates is to increase the number of buckets, +which is simpler to do with `uddsketch`. If your use case does not get a clear +benefit from using `tdigest`, the default `uddsketch` is your best choice. + +For some more technical details and usage examples of the different algorithms, +see the developer documentation for [uddsketch][gh-uddsketch] and +[tdigest][gh-tdigest]. + +[pg-percentile]: https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE +[`percentile_cont`](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE) +[gh-uddsketch]: https://github.com/timescale/timescaledb-toolkit/blob/main/docs/`tdigest`.md +[gh-tdigest]: https://github.com/timescale/timescaledb-toolkit/blob/main/docs/`uddsketch`.md diff --git a/timescaledb/how-to-guides/hyperfunctions/approx-count-distincts.md b/timescaledb/how-to-guides/hyperfunctions/approx-count-distincts.md new file mode 100644 index 000000000000..984e8d650a53 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/approx-count-distincts.md @@ -0,0 +1,10 @@ +# Approximate count distincts +Approximate count distincts are used to find the number of unique values, or +cardinality, in a large dataset. When you calculate cardinality, in a dataset, +the time it takes to process the query is proportional to how large the dataset +is. So if you wanted to find the cardinality of a dataset that contained only 20 +entries, the calculation would be very fast. Finding the cardinality of a +dataset that contains 20,000 or 20 million entries, however, can take a +significant amount of time and compute resources. Approximate count distincts do +not calculate the exact cardinality of a dataset, but rather estimate the number +of unique values, in order to improve compute time. diff --git a/timescaledb/how-to-guides/hyperfunctions/approximate-percentile.md b/timescaledb/how-to-guides/hyperfunctions/approximate-percentile.md new file mode 100644 index 000000000000..4c80f6503e07 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/approximate-percentile.md @@ -0,0 +1,54 @@ +# Approximate percentiles +Timescale uses approximation algorithms to calculate a percentile without +requiring all of the data. This also makes them more compatible with continuous +aggregates. + +By default, Timescale Toolkit uses `uddsketch`, but you can also choose to use +`tdigest`. For more information about these algorithms, see the +[advanced aggregation methods][advanced-agg] documentation. + +## Run an approximate percentage query +In this procedure, we use an example table called `response_times` that contains +information about how long a server takes to respond to API calls. + + + +### Running an approximate percentage query +1. At the `psql` prompt, create a continuous aggregate that computes the + daily aggregates: + ```sql + CREATE MATERIALIZED VIEW response_times_daily + WITH (timescaledb.continuous) + AS SELECT + time_bucket('1 day'::interval, ts) as bucket, + percentile_agg(response_time_ms) + FROM response_times + GROUP BY 1; + ``` +1. Re-aggregate the aggregate to get the last 30 days, and look for the 95th + percentile: + ```sql + SELECT approx_percentile(0.95, percentile_agg(percentile_agg)) as threshold + FROM response_times_daily + WHERE bucket >= time_bucket('1 day'::interval, now() - '30 days'::interval); + ``` +1. You can also create an alert: + ```sql + WITH t as (SELECT approx_percentile(0.95, percentile_agg(percentile_agg)) as threshold + FROM response_times_daily + WHERE bucket >= time_bucket('1 day'::interval, now() - '30 days'::interval)) + + SELECT count(*) + FROM response_times + WHERE ts > now()- '1 minute'::interval + AND response_time_ms > (SELECT threshold FROM t); + ``` + + + +For more information about percentile approximation API calls, see the +[hyperfunction API documentation][hyperfunctions-api-approx-percentile]. + + +[advanced-agg]: /how-to-guides/hyperfunctions/advanced-agg +[hyperfunctions-api-approx-percentile]: /api/:currentVersion:/hyperfunctions/percentile-approximation/ diff --git a/timescaledb/how-to-guides/hyperfunctions/counter-aggregation.md b/timescaledb/how-to-guides/hyperfunctions/counter-aggregation.md new file mode 100644 index 000000000000..9b6f84be53a7 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/counter-aggregation.md @@ -0,0 +1,15 @@ +# Counter aggregation +When you are monitoring application performance, there are two main types of +metrics that you can collect: gauges, and counters. Gauges fluctuate up and +down, like temperature or speed, while counters always increase, like the total +number of miles travelled in a vehicle. + +Counter data usually resets to zero if there is an interruption. Counter +aggregation functions are used to continue accumulating data, while ignoring any +interruptions or resets. + +For more information about counter aggregation API calls, see the +[hyperfunction API documentation][hyperfunctions-api-counter-agg]. + + +[hyperfunctions-api-counter-agg]: /api/:currentVersion:/hyperfunctions/counter_aggs/ diff --git a/timescaledb/how-to-guides/hyperfunctions/counter-aggs.md b/timescaledb/how-to-guides/hyperfunctions/counter-aggs.md new file mode 100644 index 000000000000..890846764a26 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/counter-aggs.md @@ -0,0 +1,157 @@ +# Counter aggregates +When you process counter data, it is usually assumed that if the value of the +counter goes down, the counter has been reset. For example, if you wanted to +count the total number of miles travelled in a vehicle, you would expect the +values to continuously increase: 1, 2, 3, 4, and so on. If the counter reset to +0, you would expect that this was a new trip, or an entirely new vehicle. This +can become a problem if you want to continue counting from where you left off, +rather than resetting to 0. A reset could occur if you have had a short server +outage, or any number of other reasons. To get around this, you can analyze +counter data by looking at the change over time, which accounts for resets. + +Accounting for resets can be difficult to do in SQL, so Timescale has developed +aggregate and accessor functions that handle calculations for counters in a more +practical way. + + +Counter aggregates can be used in continuous aggregates, even though they are +not parallelizable in PostgreSQL. For more information, see the section on +parallelism and ordering. + + +## Run a counter aggregate query using a delta function +In this procedure, we are using an example table called `example` that contains +counter data. + + + +### Running a counter aggregate query using a delta function +1. Create a table called `example`: +```sql +CREATE TABLE example ( + measure_id BIGINT, + ts TIMESTAMPTZ , + val DOUBLE PRECISION, + PRIMARY KEY (measure_id, ts) +); +``` +1. Create a counter aggregate and the delta accessor function. This gives you + the change in the counter's value over the time period, accounting for any + resets. This allows you to search for fifteen minute periods where the + counter increased by a larger or smaller amount: +```sql +SELECT measure_id, + toolkit.delta( + toolkit.counter_agg(ts, val) + ) +FROM example +GROUP BY measure_id; +``` +1. You can also use the `time_bucket` function to produce a series of deltas over fifteen minute increments: +```sql +SELECT measure_id, + time_bucket('15 min'::interval, ts) as bucket, + toolkit.delta( + toolkit.counter_agg(ts, val) + ) +FROM example +GROUP BY measure_id, time_bucket('15 min'::interval, ts); +``` + + + +## Run a counter aggregate query using an extrapolated delta function +If your series is less regular, the deltas are affected by the number of samples +in each fifteen minute period. You can improve this by using the +`extrapolated_delta` function. To do this, you need to provide bounds that +define where to extrapolate to. In this example, we use the `time_bucket_range` +function, which works in the same way as `time_bucket` but produces an open +ended range of all the times in the bucket. This example also uses a CTE to do +the counter aggregation, which makes it a little easier to understand what's +going on in each part. + + + +### Running a counter aggregate query using an extrapolated delta function +1. Create a table called `example`: + ```sql + CREATE TABLE example ( + measure_id BIGINT, + ts TIMESTAMPTZ , + val DOUBLE PRECISION, + PRIMARY KEY (measure_id, ts) + ); + ``` +1. Create a counter aggregate and the extrapolated delta function: + ```sql + with t as ( + SELECT measure_id, + time_bucket('15 min'::interval, ts) as bucket, + toolkit.counter_agg(ts, val, bounds => toolkit.time_bucket_range('15 min'::interval, ts)) + FROM example + GROUP BY measure_id, time_bucket('15 min'::interval, ts)) + SELECT time_bucket, + toolkit.extrapolated_delta(counter_agg, method => 'prometheus') + FROM t ; + ``` + + +In this procedure, we used `prometheus` to do the extrapolation. Timescale's +current `extrapolation` function is built to mimic the Prometheus project's +`increase` function, which measures the change of a counter extrapolated to the +edges of the queried region. + + + + +## Run a counter aggregate query with a continuous aggregate +Your counter aggregate might be more useful if you make a continuous aggregate +out of it. + + + +### Running a counter aggregate query with a continuous aggregate +1. Create a hypertable partitioned on the `ts` column: + ```sql + SELECT create_hypertable('example', 'ts', chunk_time_interval=> '15 days'::interval, migrate_data => true); + ``` +1. Create the continuous aggregate: + ```sql + CREATE MATERIALIZED VIEW example_15 + WITH (timescaledb.continuous) + AS SELECT measure_id, + time_bucket('15 min'::interval, ts) as bucket, + toolkit.counter_agg(ts, val, bounds => time_bucket_range('15 min'::interval, ts)) + FROM example + GROUP BY measure_id, time_bucket('15 min'::interval, ts); + ``` +1. You can also re-aggregate from the continuous aggregate into a larger + bucket size: + ```sql + SELECT + measure_id, + time_bucket('1 day'::interval, bucket), + toolkit.delta( + toolkit.rollup(counter_agg) + ) + FROM example_15 + GROUP BY measure_id, time_bucket('1 day'::interval, bucket); + ``` + + + +## Parallelism and ordering +The counter reset calculations require a strict ordering of inputs, which means +they are not parallelizable in PostgreSQL. This is because PostgreSQL handles +parallelism by issuing rows randomly to workers. However, if your parallelism +can guarantee sets of rows that are disjointed in time, the algorithm can be +parallelized, as long as it is within a time range, and all rows go to the same +worker. This is the case for both continuous aggregates and for distributed +hypertables, as long as the partitioning keys are in the `group by`, even though +the aggregate itself doesn't really make sense otherwise. + +For more information about parallelism and ordering, see our +[developer documentation][gh-parallelism-ordering] + + +[gh-parallelism-ordering]: https://github.com/timescale/timescaledb-toolkit/blob/main/docs/counter_agg.md#counter-agg-ordering diff --git a/timescaledb/how-to-guides/hyperfunctions/gapfilling-interpolation.md b/timescaledb/how-to-guides/hyperfunctions/gapfilling-interpolation.md new file mode 100644 index 000000000000..44dffd38aa97 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/gapfilling-interpolation.md @@ -0,0 +1,22 @@ +# Gapfilling and interpolation +Most time-series data analysis techniques aggregate data into fixed time +intervals, which smooths the data and makes it easier to interpret and analyze. +When you write queries for data in this form, you need an efficient way to +aggregate raw observations, which are often noisy and irregular, in to fixed +time intervals. TimescaleDB does this using time bucketing, which gives a clear +picture of the important data trends using a concise, declarative SQL query. + +Sorting data into time buckets works well in most cases, but problems can arise +if there are gaps in the data. This can happen if you have irregular sampling +intervals, or you have experienced an outage of some sort. You can use a +gapfilling function to create additional rows of data in any gaps, ensuring that +the returned rows are in chronological order, and contiguous. + +* For more information about how gapfilling works, read our + [gapfilling blog][blog-gapfilling]. +* For more information about gapfilling and interpolation API calls, see the + [hyperfunction API documentation][hyperfunctions-api-gapfilling]. + + +[blog-gapfilling]: https://blog.timescale.com/blog/sql-functions-for-time-series-analysis/ +[hyperfunctions-api-gapfilling]: /api/:currentVersion:/hyperfunctions/gapfilling-interpolation/ diff --git a/timescaledb/how-to-guides/hyperfunctions/hyperloglog.md b/timescaledb/how-to-guides/hyperfunctions/hyperloglog.md new file mode 100644 index 000000000000..1a3931c75199 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/hyperloglog.md @@ -0,0 +1,26 @@ +# Hyperloglog +Hyperloglog is used to find the cardinality of very large datasets. If you want +to find the number of unique values, or cardinality, in a dataset, the time it +takes to process this query is proportional to how large the dataset is. So if +you wanted to find the cardinality of a dataset that contained only 20 entries, +the calculation would be very fast. Finding the cardinality of a dataset that +contains 20,000 or 20 million entries, however, can take a significant amount of +time and compute resources. + +Hyperloglog does not calculate the exact cardinality of a dataset, but rather +estimates the number of unique values. It does this by converting the original +data into a hash of random numbers that represents the cardinality of the +dataset. This is not a perfect calculation of the cardinality, but it is usually +within a margin of error of 2%. + +The benefit of hyperloglog on time-series data is that it can continue to +calculate the approximate cardinality of a dataset as it changes over time. It +does this by adding an entry to the hyperloglog hash as new data is retrieved, +rather than recalculating the result for the entire dataset every time it is +needed. This makes it an ideal candidate for using with continuous aggregates. + +For more information about approximate count distinct API calls, see the +[hyperfunction API documentation][hyperfunctions-api-approx-count-distincts]. + + +[hyperfunctions-api-approx-count-distincts]: /api/:currentVersion:/hyperfunctions/approx_count_distincts/ diff --git a/timescaledb/how-to-guides/hyperfunctions/index.md b/timescaledb/how-to-guides/hyperfunctions/index.md new file mode 100644 index 000000000000..f9e82a5b2cca --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/index.md @@ -0,0 +1,36 @@ +# Hyperfunctions +Hyperfunctions allow you to perform critical time-series queries quickly, +analyze time-series data, and extract meaningful information. + +Some hyperfunctions are included in the default TimescaleDB product. For +additional hyperfunctions, you need to install the +[Timescale Toolkit][install-toolkit] PostgreSQL extension. + +* [Learn about hyperfunctions][about-hyperfunctions] to understand how it works + before you begin using it. +* Install the [Toolkit extension][install-toolkit] to access more + hyperfunctions. +* Use the [approximate count distinct][hyperfunctions-approx-count-distinct] + functions. +* Use the [statistical aggregate][hyperfunctions-stats-agg] + functions. +* Use the [gapfilling and interpolation][hyperfunctions-gapfilling] + functions. +* Use the [approximate percentile][hyperfunctions-approximate-percentile] + functions. +* Use the [counter aggregation][hyperfunctions-counteragg] functions. +* Use the [time-weighted average][hyperfunctions-time-weighted-averages] + functions. + +For more information about hyperfunctions, read our [blog post][hyperfunctions-blog]. + + +[about-hyperfunctions]: how-to-guides/hyperfunctions/about-hyperfunctions +[install-toolkit]: /how-to-guides/hyperfunctions/install-toolkit +[hyperfunctions-approx-count-distinct]: /how-to-guides/hyperfunctions/approx-count-distincts +[hyperfunctions-stats-agg]: /how-to-guides/hyperfunctions/stats-aggs +[hyperfunctions-gapfilling]: /how-to-guides/hyperfunctions/gapfilling-interpolation +[hyperfunctions-approximate-percentile]: how-to-guides/hyperfunctions/approximate_percentile +[hyperfunctions-time-weighted-averages]: how-to-guides/hyperfunctions/time-weighted-averages +[hyperfunctions-counteragg]: /how-to-guides/hyperfunctions/counter_agg +[hyperfunctions-blog]: https://blog.timescale.com/blog/time-series-analytics-for-postgresql-introducing-the-timescale-analytics-project/ diff --git a/timescaledb/how-to-guides/hyperfunctions/install-toolkit.md b/timescaledb/how-to-guides/hyperfunctions/install-toolkit.md new file mode 100644 index 000000000000..f0285fef46e3 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/install-toolkit.md @@ -0,0 +1,54 @@ +# Install TimescaleDB Toolkit +Some hyperfunctions are included in the default TimescaleDB product. For +additional hyperfunctions, you need to install the Timescale Toolkit PostgreSQL +extension. + +If you are using [Timescale Cloud][], the Toolkit is already installed. + +On [Managed TimescaleDB][], run this command on each database you want to use +the Toolkit with: +```sql +CREATE EXTENSION timescaledb_toolkit; +``` + +You can update an installed version of the Toolkit using this command: +```sql +ALTER EXTENSION timescaledb_toolkit UPDATE; +``` + +## Install Toolkit on self-hosted TimescaleDB +If you are hosting your own TimescaleDB database you install the Toolkit +extension from the command prompt. + + + +### Installing Toolkit on self-hosted TimescaleDB +1. The extension requires `rust`, `rustfmt`, `clang`, and `pgx` packages, as + well as the PostgreSQL headers for your installed version of PostgreSQL. + Install these using your native package manager. For instructions on how to + install Rust, see the [Rust installation instructions][rust-install]. +1. Install the TimescaleDB `pgx` package using Cargo: + ```bash + cargo install --git https://github.com/JLockerman/pgx.git --branch timescale2 cargo-pgx && \ + cargo pgx init --pg13 pg_config + ``` +1. Clone the Toolkit repository, and change into the new directory: + ```bash + git clone https://github.com/timescale/timescaledb-toolkit && \ + cd timescaledb-toolkit/extension + ``` +1. Use Cargo to complete installation: + ```bash + cargo pgx install --release && \ + cargo run --manifest-path ../tools/post-install/Cargo.toml -- pg_config + ``` + + + +For more information about installing Toolkit from source, see our +[developer documentation][toolkit-gh-docs] . + +[Timescale Cloud]: /cloud/:currentVersion:/ +[Managed TimescaleDB]: /mst/:currentVersion:/ +[rust-install]: https://www.rust-lang.org/tools/install +[toolkit-gh-docs]: https://github.com/timescale/timescaledb-toolkit#-installing-from-source diff --git a/timescaledb/how-to-guides/hyperfunctions/locf.md b/timescaledb/how-to-guides/hyperfunctions/locf.md new file mode 100644 index 000000000000..66767f90c251 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/locf.md @@ -0,0 +1,10 @@ +# Last observation carried forward +Last observation carried forward (LOCF) is a form of linear interpolation used +to fill gaps in your data. It takes the last known value and uses it as a +replacement for the missing data. + +For more information about gapfilling and interpolation API calls, see the +[hyperfunction API documentation][hyperfunctions-api-gapfilling]. + + +[hyperfunctions-api-gapfilling]: /api/:currentVersion:/hyperfunctions/gapfilling-interpolation/ diff --git a/timescaledb/how-to-guides/hyperfunctions/percentile-approx.md b/timescaledb/how-to-guides/hyperfunctions/percentile-approx.md new file mode 100644 index 000000000000..3c54ae0e14fd --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/percentile-approx.md @@ -0,0 +1,42 @@ +# Percentile approximation +In general, percentiles are useful for understanding the distribution of data. +The 50th percentile is the point at which half of your data is greater and half +is lesser. The 10th percentile is the point at which 90% of the data is greater, +and 10% is lesser. The 99th percentile is the point at which 1% is greater, and +99% is lesser. + +The 50th percentile, or median, is often a more useful measure than the average, +especially when your data contains outliers. Outliers can dramatically change +the average, but do not affect the median as much. For example, if you have +three rooms in your house and two of them are 40℉ (4℃) and one is 130℉ (54℃), +the average room temperature is 70℉ (21℃), which doesn't tell you much. However, +the 50th percentile temperature is 40℉ (4℃), which tells you that at least half +your rooms are at refrigerator temperatures (also, you should probably get your +heating checked!) + +Percentiles are sometimes avoided because calculating them requires more CPU and +memory than an average or other aggregate measures. This is because an exact +computation of the percentile needs the full dataset as an ordered list. +Timescale uses approximation algorithms to calculate a percentile without +requiring all of the data. This also makes them more compatible with continuous +aggregates. By default, TimescaleDB uses `uddsketch`, but you can also choose to +use `tdigest`. For more information about these algorithms, see the +[advanced aggregation methods][advanced-agg] documentation. + + +Technically, a percentile divides a group into 100 equally sized pieces, while a +quantile divides a group into an arbitrary number of pieces. Because we don't +always use exactly 100 buckets, "quantile" is the more technically correct term +in this case. However, we use the word "percentile" because it's a more common +word for this type of function. + + +* For more information about how percentile approximation works, read our + [percentile approximation blog][blog-percentile-approx]. +* For more information about percentile approximation API calls, see the + [hyperfunction API documentation][hyperfunctions-api-approx-percentile]. + + +[advanced-agg]: /how-to-guides/hyperfunctions/advanced-agg +[blog-percentile-approx]: https://blog.timescale.com/blog/how-percentile-approximation-works-and-why-its-more-useful-than-averages/ +[hyperfunctions-api-approx-percentile]: /api/:currentVersion:/hyperfunctions/percentile-approximation/ diff --git a/timescaledb/how-to-guides/hyperfunctions/regression-functions.md b/timescaledb/how-to-guides/hyperfunctions/regression-functions.md new file mode 100644 index 000000000000..0e7ceb26904d --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/regression-functions.md @@ -0,0 +1,13 @@ +# Regression functions +Monotonectally whiteboard compelling mindshare before virtual action items. +Dramatically deploy integrated "outside the box" thinking without premier +experiences. Uniquely reconceptualize frictionless methods of empowerment +without backend web services. Energistically e-enable ubiquitous data and +future-proof relationships. Globally develop high-quality sources without +orthogonal processes. + +Monotonectally evolve world-class e-tailers vis-a-vis premier niches. Credibly +transform intermandated customer service without emerging growth strategies. +Objectively facilitate integrated metrics for enabled services. Holisticly +coordinate premier content rather than cooperative supply chains. Proactively +predominate diverse experiences through just in time action items. diff --git a/timescaledb/how-to-guides/hyperfunctions/stats-aggs.md b/timescaledb/how-to-guides/hyperfunctions/stats-aggs.md new file mode 100644 index 000000000000..168b7d0020fe --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/stats-aggs.md @@ -0,0 +1,25 @@ +# Statistical aggregation +To make common statistical aggregates easier to work with in window functions +and continuous aggregates, TimescaleDB provides common statistical aggregates in +a slightly different form than otherwise available in PostgreSQL and +TimescaleDB. + + + +This uses a two-step aggregation process. The first step is an aggregation step, +which creates a machine-readable dataset. The second step is an accessor, which +creates a human-readable output for the display of the data. This makes it +easier to construct your queries, because it distinguishes the parameters, and +makes it clear which aggregates are being re-aggregated or stacked. +Additionally, because this query syntax is used in all Timescale Toolkit +queries, when you are used to it, you can use it to construct more and more +complicated queries. + +* For some more technical details and usage examples of the two-step + aggregation method, see the [developer documentation][gh-two-step-agg]. +* For more information about statistical aggregation API calls, see the + [hyperfunction API documentation][hyperfunctions-api-stats-agg]. + + +[gh-two-step-agg]: https://github.com/timescale/timescaledb-toolkit/blob/main/docs/two-step_aggregation.md +[hyperfunctions-api-stats-agg]: /api/:currentVersion:/hyperfunctions/stats_aggs/ diff --git a/timescaledb/how-to-guides/hyperfunctions/stats-functions.md b/timescaledb/how-to-guides/hyperfunctions/stats-functions.md new file mode 100644 index 000000000000..831161db49f9 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/stats-functions.md @@ -0,0 +1,14 @@ +# Statistical functions +Enthusiastically aggregate highly efficient metrics for turnkey convergence. +Phosfluorescently expedite intermandated initiatives through client-focused core +competencies. Intrinsicly morph compelling total linkage rather than +best-of-breed functionalities. Objectively administrate go forward experiences +via 24/7 action items. Objectively unleash alternative action items via +cross-platform web-readiness. + +Phosfluorescently transform an expanded array of technology after orthogonal web +services. Intrinsicly develop dynamic leadership skills whereas extensive +e-markets. Seamlessly enable inexpensive web-readiness before leveraged +collaboration and idea-sharing. Synergistically syndicate visionary niches +through top-line solutions. Proactively underwhelm professional expertise and +distributed benefits. diff --git a/timescaledb/how-to-guides/hyperfunctions/time-bucket-gapfill.md b/timescaledb/how-to-guides/hyperfunctions/time-bucket-gapfill.md new file mode 100644 index 000000000000..566eb769dcdc --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/time-bucket-gapfill.md @@ -0,0 +1,16 @@ +# Time bucket gapfill +Sometimes data sorted into time buckets can have gaps. This can happen if you +have irregular sampling intervals, or you have experienced an outage of some +sort. If you have a time bucket that has no data at all, the average returned +from the time bucket is NULL, which could cause problems. You can use a +gapfilling function to create additional rows of data in any gaps, ensuring that +the returned rows are in chronological order, and contiguous. The time bucket +gapfill function creates a contiguous set of time buckets but does not fill the +rows with data. You can create data for the new rows using another function, +such as last observation carried forward (LOCF), or interpolation. + +For more information about gapfilling and interpolation API calls, see the +[hyperfunction API documentation][hyperfunctions-api-gapfilling]. + + +[hyperfunctions-api-gapfilling]: /api/:currentVersion:/hyperfunctions/gapfilling-interpolation/ diff --git a/timescaledb/how-to-guides/hyperfunctions/time-weighted-average.md b/timescaledb/how-to-guides/hyperfunctions/time-weighted-average.md new file mode 100644 index 000000000000..d3f8507f1c89 --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/time-weighted-average.md @@ -0,0 +1,40 @@ +# Time-weighted average +Timescale's time weighted average is implemented as an aggregate that +weights each value using last observation carried forward (LOCF), or linear +interpolation. The aggregate is not parallelizable, but it is supported with +[continuous aggregation][caggs]. + +## Run a time-weighted average query +In this procedure, we are using an example table called `freezer_temps` that +contains data about internal freezer temperatures. + + + +### Running a time-weighted average query +1. At the `psql`prompt, find the average and the time-weighted average of + the data: + ```sql + SELECT freezer_id, + avg(temperature), + average(time_weight('Linear', ts, temperature)) as time_weighted_average + FROM freezer_temps + GROUP BY freezer_id; + ``` +1. To determine if the freezer has been out of temperature range for more + than 15 minutes at a time, use a time-weighted average in a window function: + ```sql + SELECT *, + average( + time_weight('Linear', ts, temperature) OVER (PARTITION BY freezer_id ORDER BY ts RANGE '15 minutes'::interval PRECEDING ) + ) as rolling_twa + FROM freezer_temps + ORDER BY freezer_id, ts; + ``` + + + +For more information about time-weighted average API calls, see the +[hyperfunction API documentation][hyperfunctions-api-timeweight]. + +[hyperfunctions-api-timeweight]: /api/:currentVersion:/hyperfunctions/time-weighted-averages/ +[caggs]: /how-to-guides/continuous-aggregates diff --git a/timescaledb/how-to-guides/hyperfunctions/time-weighted-averages.md b/timescaledb/how-to-guides/hyperfunctions/time-weighted-averages.md new file mode 100644 index 000000000000..6ab3ee3610bb --- /dev/null +++ b/timescaledb/how-to-guides/hyperfunctions/time-weighted-averages.md @@ -0,0 +1,27 @@ +# Time-weighted averages +Time weighted averages are used in cases where a time series is not evenly +sampled. Time series data points are often evenly spaced, for example every 30 +seconds, or every hour. But sometimes data points are recorded irregularly, for +example if a value has a large change, or changes quickly. Computing an average +using data that is not evenly sampled is not always useful. + +For example, if you have a lot of ice cream in freezers, you need to make sure +the ice cream stays within a 0-10℉ (-20 to -12℃) temperature range. The +temperature in the freezer can vary if folks are opening and closing the door, +but the ice cream will only have a problem if the temperature is out of range +for a long time. You can set your sensors in the freezer to sample every five +minutes while the temperature is in range, and every 30 seconds while the +temperature is out of range. If the results are generally stable, but with some +quick moving transients, an average of all the data points weights the transient +values too highly. A time weighted average weights each value by the duration +over which it occurred based on the points around it, producing much more +accurate results. + +* For more information about how time-weighted averages work, read our + [time-weighted averages blog][blog-timeweight]. +* For more information about time-weighted average API calls, see the + [hyperfunction API documentation][hyperfunctions-api-timeweight]. + + +[blog-timeweight]: https://blog.timescale.com/blog/what-time-weighted-averages-are-and-why-you-should-care/ +[hyperfunctions-api-timeweight]: /api/:currentVersion:/hyperfunctions/time-weighted-averages/ diff --git a/timescaledb/how-to-guides/install-timescaledb-toolkit.md b/timescaledb/how-to-guides/install-timescaledb-toolkit.md deleted file mode 100644 index 01dd4fed6ec6..000000000000 --- a/timescaledb/how-to-guides/install-timescaledb-toolkit.md +++ /dev/null @@ -1,20 +0,0 @@ -# Installing TimescaleDB Toolkit - -In order to use functions from the TimescaleDB Toolkit, you'll need to install -it. If you are using [Timescale Cloud][] to host your database, the Toolkit is already -installed. - -On [Managed TimescaleDB][] you may need to run `CREATE EXTENSION timescaledb_toolkit;` -in each database that you need to use the functions with. - -If you already have it installed and are updating to the latest version, run -`ALTER EXTENSION timescaledb_toolkit UPDATE;`. - -## Self-hosted install -If you are hosting your own TimescaleDB database and need to install the TimescaleDB -Toolkit first, follow the instructions provided at the GitHub repo to [install it -from source][install-source]. - -[Timescale Cloud]: /cloud/:currentVersion:/ -[Managed TimescaleDB]: /mst/:currentVersion:/ -[install-source]: https://github.com/timescale/timescaledb-toolkit#-installing-from-source \ No newline at end of file diff --git a/timescaledb/how-to-guides/page-index/page-index.js b/timescaledb/how-to-guides/page-index/page-index.js index c3c12f89207a..daeab4c12a2b 100644 --- a/timescaledb/how-to-guides/page-index/page-index.js +++ b/timescaledb/how-to-guides/page-index/page-index.js @@ -277,19 +277,12 @@ module.exports = [ }, ], }, - { - title: 'Install TimescaleDB Toolkit', - href: 'install-timescaledb-toolkit', - tags: ['toolkit', 'install', 'timescaledb'], - keywords: ['TimescaleDB', 'install', 'toolkit'], - excerpt: 'Install the TimescaleDB toolkit', - }, { title: 'Connecting to TimescaleDB', href: 'connecting', - tags: ['toolkit', 'install', 'timescaledb'], - keywords: ['TimescaleDB', 'install', 'toolkit'], - excerpt: 'Connect to the TimescaleDB toolkit', + tags: ['psql', 'install', 'timescaledb'], + keywords: ['TimescaleDB', 'install', 'psql'], + excerpt: 'Connect to TimescaleDB with psql', children: [ { href: 'psql', @@ -907,7 +900,117 @@ module.exports = [ }, ], }, - + { + title: 'Hyperfunctions', + href: 'hyperfunctions', + children: [ + { + title: 'About hyperfunctions', + href: 'about-hyperfunctions', + tags: ['hyperfunctions', 'toolkit', 'timescaledb'], + keywords: ['TimescaleDB', 'hyperfunctions', 'Toolkit'], + excerpt: 'Learn about TimescaleDB hyperfunctions for additional analysis' + }, + { + title: 'Install TimescaleDB Toolkit', + href: 'install-toolkit', + tags: ['toolkit', 'install', 'hyperfunctions', 'timescaledb'], + keywords: ['TimescaleDB', 'install', 'toolkit'], + excerpt: 'Install the TimescaleDB toolkit', + }, + { + title: 'Approximate count distincts', + href: 'approx-count-distincts', + type: 'directory', + children: [ + { + title: 'Hyperloglog', + href: 'hyperloglog', + tags: ['hyperfunctions', 'toolkit', 'query', 'timescaledb'], + keywords: ['TimescaleDB', 'hyperfunctions', 'Toolkit'], + excerpt: 'Learn about the hyperloglog hyperfunction' + } + ], + }, + { + title: 'Statistical aggregates', + href: 'stats-aggs', + tags: ['hyperfunctions', 'toolkit', 'query', 'timescaledb'], + keywords: ['TimescaleDB', 'hyperfunctions', 'Toolkit'], + excerpt: 'Learn about the statistical aggregates hyperfunction' + }, + { + title: 'Gapfilling and interpolation', + href: 'gapfilling-interpolation', + type: 'directory', + children: [ + { + title: 'Time bucket gapfill', + href: 'time-bucket-gapfill', + tags: ['hyperfunctions', 'toolkit', 'query', 'timescaledb'], + keywords: ['TimescaleDB', 'hyperfunctions', 'Toolkit'], + excerpt: 'Learn about the time bucket gapfillling hyperfunction' + }, + { + title: 'Last observation carried forward', + href: 'locf', + tags: ['hyperfunctions', 'toolkit', 'query', 'timescaledb'], + keywords: ['TimescaleDB', 'hyperfunctions', 'Toolkit'], + excerpt: 'Learn about the locf hyperfunction' + }, + ], + }, + { + title: 'Percentile approximation', + href: 'percentile-approx', + type: 'directory', + children: [ + { + title: 'Approximate percentile', + href: 'approximate-percentile', + tags: ['hyperfunctions', 'toolkit', 'query', 'timescaledb'], + keywords: ['TimescaleDB', 'hyperfunctions', 'Toolkit'], + excerpt: 'Learn about the approximate percentile hyperfunction' + }, + { + title: 'Advanced aggregation methods', + href: 'advanced-agg', + tags: ['hyperfunctions', 'toolkit', 'query', 'timescaledb'], + keywords: ['TimescaleDB', 'hyperfunctions', 'Toolkit'], + excerpt: 'Learn about advanced aggregation methods for hyperfunctions' + } + ] + }, + { + title: 'Counter aggregation', + href: 'counter-aggregation', + type: 'directory', + children: [ + { + title: 'Counter aggregates', + href: 'counter-aggs', + tags: ['hyperfunctions', 'toolkit', 'query', 'timescaledb'], + keywords: ['TimescaleDB', 'hyperfunctions', 'Toolkit'], + excerpt: 'Learn about the counter aggregate hyperfunction' + } + ] + }, + { + title: 'Time-weighted averages', + href: 'time-weighted-averages', + type: 'directory', + children: [ + { + title: 'Time-weighted averages', + href: 'time-weighted-average', + tags: ['hyperfunctions', 'toolkit', 'query', 'timescaledb'], + keywords: ['TimescaleDB', 'hyperfunctions', 'Toolkit'], + excerpt: 'Learn about the time-weighted averages hyperfunction' + }, + ] + }, + ], + }, { title: 'Alerting', href: 'alerting', diff --git a/timescaledb/how-to-guides/toolkit/about-toolkit.md b/timescaledb/how-to-guides/toolkit/about-toolkit.md deleted file mode 100644 index 61df66759982..000000000000 --- a/timescaledb/how-to-guides/toolkit/about-toolkit.md +++ /dev/null @@ -1,68 +0,0 @@ -# About Timescale Toolkit -Timescale Toolkit is a PostreSQL extension containing a specialized set of -functions that allow you to to analyze time-series data. You can use it to -analyze anything you have stored as time-series data, including IoT devices, IT -systems, marketing analytics, user behavior, financial metrics, and -cryptocurrency. - -Timescale Toolkit allows you to perform critical time-series queries quickly, -analyze time-series data, and extract meaningful information. It aims to -identify, build, and combine all of the functionality SQL needs to perform -time-series analysis into a single extension. - -## Tools for graphing -Timescale Toolkit brings graphing functions to the database. This allows you -to choose your graphing front-end based on how well it does graphing, not on how -well it does data analytics. It also allows you to run queries that stay -consistent across all front-end tools and consumers of your data. Additionally, -by doing all the graphing work in the database, you need to send a much smaller -number of data points over the network. - -## Simplifying queries -SQL queries can get long, especially if you have multiple layers of aggregation -and function-calls. There are many scenarios where it's possible to write a -query in native SQL, but the resulting code is relatively complicated to write, -and to understand. Timescale Toolkit can greatly simplify your queries by -using a two-step calling convention. - -For example, a typical Timescale Toolkit query to get the time-weighted -average of a set of values could look like this: ```sql SELECT -average(time_weight('LOCF', value)) as time_weighted_average FROM foo; ``` - -The first step in this query is to call the inner aggregate function, such as -`time_weighted_average`. The second step is to call the accessor function, such -as `average`. - -This makes it easier to construct your queries, because it distinguishes the -parameters, and makes it clear which aggregates are being re-aggregated or -stacked. Additionally, because this query syntax is used in all Timescale -Toolkit queries, when you are used to it, you can use it to construct more and -more complicated queries. - -## Toolkit features -Timescale Toolkit features are developed in the open. As features are developed they are categorized as experimental, beta, stable, or deprecated. The documentation on this page will focus on the stable features, but more information on our experimental features in development can be found in the [Toolkit repository][gh-docs]. - -|Feature|Notes|More information| -|-------|-----|----------------| -|Percentile Approximation|Efficient approximation of percentiles|[Percentile Approximation documentation][approx-percentile]| -|Time-weighted averages|Average that weights each value based on duration|[Time-weighted average documentation][time-weighted-avg]| - -## Contribute to Timescale Toolkit -We want and need your feedback! What are the frustrating parts of analyzing -time-series data? What takes far more code than you feel it should? What runs -slowly, or only runs quickly after many rewrites? We want to solve -community-wide problems and incorporate as much feedback as possible. - -* Join the [discussion][gh-discussions]. -* Check out the [proposed features][gh-proposed]. -* Explore the current [feature requests][gh-requests]. -* Add your own [feature request][gh-newissue]. - -[gh-docs]: https://github.com/timescale/timescale-analytics/tree/main/docs -[approx-percentile]: /how-to-guides/toolkit/approximate_percentile.md -[time-weighted-avg]: /how-to-guides/toolkit/time-weighted-averages.md -[doc-promscale]: /tutorials/promscale -[gh-discussions]: https://github.com/timescale/timescale-analytics/discussions -[gh-proposed]: https://github.com/timescale/timescale-analytics/labels/proposed-feature -[gh-requests]: https://github.com/timescale/timescale-analytics/labels/feature-request -[gh-newissue]: https://github.com/timescale/timescale-analytics/issues/new?assignees=&labels=feature-request&template=feature-request.md&title= diff --git a/timescaledb/how-to-guides/toolkit/approximate-percentile.md b/timescaledb/how-to-guides/toolkit/approximate-percentile.md deleted file mode 100644 index 05f2068b2eb2..000000000000 --- a/timescaledb/how-to-guides/toolkit/approximate-percentile.md +++ /dev/null @@ -1,68 +0,0 @@ -# Approximate percentiles -In general, percentiles are useful for understanding the distribution of data. -The 50th percentile is the point at which half of your data is greater and half -is lesser. The 10th percentile is the point at which 90% of the data is greater, -and 10% is lesser. The 99th percentile is the point at which 1% is greater, and -99% is lesser. - -The 50th percentile, or median, is often a more useful measure than the average, -especially when your data contains outliers. Outliers can dramatically change -the average, but do not affect the median as much. For example, if you have -three rooms in your house and two of them are 40℉ (4℃) and one is 130℉ (54℃), -the fact that the average room is 70℉ (21℃) doesn't matter much. However, the -50th percentile temperature is 40℉ (4℃), and tells you that at least half your -rooms are at refrigerator temperatures (also, you should probably get your -heating checked!) - -Percentiles are sometimes used less frequently because they can use more CPU and -memory to calculate than an average or another aggregate measure. This is -because an exact computation of the percentile needs the full dataset as an -ordered list. Timescale Toolkit uses approximation algorithms to calculate a -percentile without requiring all of the data. This also makes them more -compatible with continuous aggregates. By default, Timescale Toolkit uses -`uddsketch`, but you can also choose to use `tdigest`. See -the [Toolkit documentation][gh-analytics-algorithms] for more information -about these algorithms. - - -Technically, a percentile divides a group into 100 equally sized pieces, while a -quantile divides a group into an arbitrary number of pieces. Because we don't -always use exactly 100 buckets, "quantile" is the more technically correct term -in this case. However, we use the word "percentile" because it's a more common -word for this type of function. - - -## Run an approximate percentage query -In this procedure, we are using an example table called `response_times` that contains information about how long a server takes to respond to API calls. - -### Procedure: Running an approximate percentage query -1. At the `psql` prompt, create a continuous aggregate that computes the daily aggregates: - ```sql - CREATE MATERIALIZED VIEW response_times_daily - WITH (timescaledb.continuous) - AS SELECT - time_bucket('1 day'::interval, ts) as bucket, - percentile_agg(response_time_ms) - FROM response_times - GROUP BY 1; - ``` -1. Re-aggregate the aggregate to get the last 30 days, and look for the 95th percentile: - ```sql - SELECT approx_percentile(0.95, percentile_agg(percentile_agg)) as threshold - FROM response_times_daily - WHERE bucket >= time_bucket('1 day'::interval, now() - '30 days'::interval); - ``` -1. You can also create an alert: - ```sql - WITH t as (SELECT approx_percentile(0.95, percentile_agg(percentile_agg)) as threshold - FROM response_times_daily - WHERE bucket >= time_bucket('1 day'::interval, now() - '30 days'::interval)) - - SELECT count(*) - FROM response_times - WHERE ts > now()- '1 minute'::interval - AND response_time_ms > (SELECT threshold FROM t); - ``` - - -[gh-analytics-algorithms]: https://github.com/timescale/timescale-analytics/blob/main/docs/percentile_approximation.md#advanced-usage diff --git a/timescaledb/how-to-guides/toolkit/index.md b/timescaledb/how-to-guides/toolkit/index.md deleted file mode 100644 index 302ca5a63228..000000000000 --- a/timescaledb/how-to-guides/toolkit/index.md +++ /dev/null @@ -1,19 +0,0 @@ -# Toolkit -Timescale Toolkit is a PostgreSQL extension that provides a specialized set of -functions for time-series analysis. Timescale Toolkit allows you to perform -critical time-series queries quickly, analyze the data, and extract meaningful -information. - -* [Learn about Toolkit][about-toolkit] to understand how it works before you - begin using it. -* Use [general analytic queries][toolkit-general] to get started. -* Use the [approximate percentile][toolkit-approximate-percentile] function. -* Use the [time-weighted average][toolkit-time-weighted-averages] function. - -For more information about Timescale Toolkit, read our [blog post][toolkit-blog]. - -[about-toolkit]: how-to-guides/toolkit/about-toolkit -[toolkit-general]: how-to-guides/toolkit/general-analytic-queries -[toolkit-approximate-percentile]: how-to-guides/toolkit/approximate_percentile -[toolkit-time-weighted-averages]: how-to-guides/toolkit/time-weighted-averages -[toolkit-blog]: https://blog.timescale.com/blog/time-series-analytics-for-postgresql-introducing-the-timescale-analytics-project/ diff --git a/timescaledb/how-to-guides/toolkit/time-weighted-averages.md b/timescaledb/how-to-guides/toolkit/time-weighted-averages.md deleted file mode 100644 index 9b633634d4b6..000000000000 --- a/timescaledb/how-to-guides/toolkit/time-weighted-averages.md +++ /dev/null @@ -1,53 +0,0 @@ -# Time-weighted averages -Time weighted averages are used in cases where a time series is not evenly -sampled. Time series data points are often evenly spaced, for example every 30 -seconds, or every hour. But sometimes data points are recorded irregularly, for -example if a value has a large change, or changes quickly. Computing an average -on data that is not evenly sampled is not always useful. - -For example, if you have a lot of ice cream in freezers, you need to make sure -the ice cream stays within a 0-10℉ (-20 to -12℃) temperature range. The -temperature in the freezer can vary if folks are opening and closing the door, -but the ice cream will only have a problem if the temperature is out of range -for a long time. You can set your sensors in the freezer to sample every five -minutes while the temperature is in range, and every 30 seconds while the -temperature is out of range. If the results are generally stable, but with some -quick moving transients, an average of all the data points weights the transient -values too highly. A time weighted average weights each value by the duration -over which it occurred based on the points around it, producing much more -accurate results. - -Timescale Toolkit' time weighted average is implemented as an aggregate that -weights each value using last observation carried forward (LOCF), or linear -interpolation. The aggregate is not parallelizable, but it is supported with -[continuous aggregation][caggs]. See the Toolkit documentation for more -information about [interpolation methods][gh-interpolation], -and [parallelism and ordering][gh-parallelism]. - - -## Run a time-weighted average query -In this procedure, we are using an example table called `freezer_temps` that contains data about internal freezer temperatures. - -### Procedure: Running a time-weighted average query -1. At the `psql`prompt, find the average and the time-weighted average of the data: - ```sql - SELECT freezer_id, - avg(temperature), - average(time_weight('Linear', ts, temperature)) as time_weighted_average - FROM freezer_temps - GROUP BY freezer_id; - ``` -1. To determine if the freezer has been out of temperature range for more than 15 minutes at a time, use a time-weighted average in a window function: - ```sql - SELECT *, - average( - time_weight('Linear', ts, temperature) OVER (PARTITION BY freezer_id ORDER BY ts RANGE '15 minutes'::interval PRECEDING ) - ) as rolling_twa - FROM freezer_temps - ORDER BY freezer_id, ts; - ``` - - -[caggs]: /how-to-guides/continuous-aggregates -[gh-interpolation]: https://github.com/timescale/timescale-analytics/blob/main/docs/time_weighted_average.md#interpolation-methods-details -[gh-parallelism]: https://github.com/timescale/timescale-analytics/blob/main/docs/time_weighted_average.md#notes-on-parallelism-and-ordering