diff --git a/api/add_dimension.md b/api/add_dimension.md index 0da4d86683..fd0eab9a3f 100644 --- a/api/add_dimension.md +++ b/api/add_dimension.md @@ -11,79 +11,61 @@ api: # add_dimension() -Add an additional partitioning dimension to a Timescale hypertable. You can only execute this -`add_dimension` command on an empty hypertable. To convert a normal table to a hypertable, +Add an additional partitioning dimension to a Timescale hypertable. + + +Best practice is to not use additional dimensions. However, Timescale Cloud transparently provides seamless storage scaling, +both in terms of storage capacity and available storage IOPS/bandwidth. + + +You can only execute this `add_dimension` command on an empty hypertable. To convert a normal table to a hypertable, call [create hypertable][create_hypertable]. The column you select as the dimension can use either: - Interval partitions: For example, for a second range partition. -- [hash partitions][hash-partition]: for [distributed hypertables][distributed-hypertables] - - +- [hash partitions][hash-partition]: to enable parallelization across multiple disks. This page describes the generalized hypertable API introduced in [TimescaleDB v2.13.0][rn-2130]. - For information about the deprecated interface, see [add_dimension(), deprecated interface][add-dimension-old]. - ### Hash partitions -To achieve efficient scale-out performance, best practice is to use hash partitions -for [distributed hypertables][distributed-hypertables]. For [regular hypertables][regular-hypertables] -that exist on a single node only, it is possible to configure additional partitioning -for specialized use cases. However, this is is an expert option. - -Every distinct item in hash partitioning is hashed to one of -*N* buckets. Remember that we are already using (flexible) range -intervals to manage chunk sizes; the main purpose of hash -partitioning is to enable parallelization across multiple -data nodes (in the case of distributed hypertables) or -across multiple disks within the same time interval -(in the case of single-node deployments). - -### Parallelizing queries across multiple data nodes - -In a distributed hypertable, hash partitioning enables inserts to be -parallelized across data nodes, even while the inserted rows share -timestamps from the same time interval, and thus increases the ingest rate. -Query performance also benefits by being able to parallelize queries -across nodes, particularly when full or partial aggregations can be -"pushed down" to data nodes (for example, as in the query -`avg(temperature) FROM conditions GROUP BY hour, location` -when using `location` as a hash partition). Please see our -[best practices about partitioning in distributed hypertables][distributed-hypertable-partitioning-best-practices] -for more information. - -### Parallelizing disk I/O on a single node - -Parallel I/O can benefit in two scenarios: (a) two or more concurrent -queries should be able to read from different disks in parallel, or -(b) a single query should be able to use query parallelization to read -from multiple disks in parallel. - -Thus, users looking for parallel I/O have two options: - -1. Use a RAID setup across multiple physical disks, and expose a -single logical disk to the hypertable (that is, via a single tablespace). - -1. For each physical disk, add a separate tablespace to the -database. Timescale allows you to actually add multiple tablespaces -to a *single* hypertable (although under the covers, a hypertable's -chunks are spread across the tablespaces associated with that hypertable). - -We recommend a RAID setup when possible, as it supports both forms of -parallelization described above (that is, separate queries to separate -disks, single query to multiple disks in parallel). The multiple -tablespace approach only supports the former. With a RAID setup, -*no spatial partitioning is required*. - -That said, when using hash partitions, we recommend using 1 -hash partition per disk. - -Timescale does *not* benefit from a very large number of hash -partitions (such as the number of unique items you expect in partition -field). A very large number of such partitions leads both to poorer +Every distinct item in hash partitioning is hashed to one of *N* buckets. By default, +TimescaleDB uses flexible range intervals to manage chunk sizes. The main purpose of hash +partitioning is to enable parallelization across multiple disks within the same time +interval. + +### Parallelizing disk I/O + +You use Parallel I/O in the following scenarios: + +- Two or more concurrent queries should be able to read from different disks in parallel. +- A single query should be able to use query parallelization to read from multiple disks in parallel. + +For the following options: + +- **RAID**: use a RAID setup across multiple physical disks, and expose a single logical disk to the hypertable. + That is, using a single tablespace. + + Best practice is to use RAID when possible, as you do not need to manually manage tablespaces + in the database. + +- **Multiple tablespaces**: for each physical disk, add a separate tablespace to the database. TimescaleDB allows you to add + multiple tablespaces to a *single* hypertable. However, although under the hood, a hypertable's + chunks are spread across the tablespaces associated with that hypertable. + + +When using multiple tablespaces, a best practice is to also add a second hash-partitioned dimension to your hypertable and to have at least one hash partition per disk. While a single time dimension would also work, it would mean that the first chunk is written to one tablespace, the second to another, and so on, and thus would parallelize only if a query's time range exceeds a single chunk. + +When adding a hash partitioned dimension, set the number of partitions to a multiple of number of disks. For example, the number of +partitions P=N*Pd where N is the number of disks and Pd is the number of partitions per +disk. This enables you to add more disks later and move partitions to the new disk from other disks. + + +TimescaleDB does *not* benefit from a very large number of hash +partitions, such as the number of unique items you expect in partition +field. A very large number of hash partitions leads both to poorer per-partition load balancing (the mapping of items to partitions using hashing), as well as much increased planning latency for some types of queries. @@ -133,13 +115,13 @@ partitionining on `device_id`. ```sql SELECT create_hypertable('conditions', by_range('time')); -SELECT add_dimension('conditions', , by_hash('location', 2)); +SELECT add_dimension('conditions', by_hash('location', 2)); SELECT add_dimension('conditions', by_range('time_received', INTERVAL '1 day')); SELECT add_dimension('conditions', by_hash('device_id', 2)); SELECT add_dimension('conditions', by_hash('device_id', 2), if_not_exists => true); ``` -Now in a multi-node example for distributed hypertables with a cluster +In a multi-node example for distributed hypertables with a cluster of one access node and two data nodes, configure the access node for access to the two data nodes. Then, convert table `conditions` to a distributed hypertable with just range partitioning on column `time`, @@ -150,7 +132,7 @@ with two partitions (as the number of the attached data nodes). SELECT add_data_node('dn1', host => 'dn1.example.com'); SELECT add_data_node('dn2', host => 'dn2.example.com'); SELECT create_distributed_hypertable('conditions', 'time'); -SELECT add_dimension('conditions', by_hash('location', 2)); +SELECT add_dimension('conditions', by_range('time'), by_hash('location', 2)); ``` [create_hypertable]: /api/:currentVersion:/hypertable/create_hypertable/ diff --git a/api/dimension_info.md b/api/dimension_info.md index 3a58f00c80..8895b55b47 100644 --- a/api/dimension_info.md +++ b/api/dimension_info.md @@ -1,41 +1,39 @@ # Dimension Builders - -Dimension builders were introduced in TimescaleDB 2.13. - - -The `create_hypertable` and `add_dimension` are used together with -dimension builders to specify the dimensions to partition a -hypertable on. +You call [`create_hypertable`][create_hypertable] and [`add_dimension`][add_dimension] to specify the dimensions to +partition a hypertable on. TimescaleDB supports partitioning [`by_range`][by-range] and [`by_hash`][by-hash]. You can +partition `by_range` on it's own. -TimescaleDB currently supports two partition types: partitioning by -range and partitioning by hash. +Hypertables must always have a primary range dimension, followed by an arbitrary number of additional dimensions that +can be either range or hash, Typically this is just one hash. + - -For incompatible data types (for example, `jsonb`) you can specify a function to +For incompatible data types such as `jsonb`, you can specify a function to the `partition_func` argument of the dimension build to extract a compatible data type. Look in the example section below. - + +Dimension builders were introduced in TimescaleDB 2.13. + + ## Partition Function -It is possible to specify a custom partitioning function for both +If you do not set custom partitioning, TimescaleDB calls PostgreSQL's internal hash function for the given type. +You use custom partitioning function for value types that do not have a native PostgreSQL hash +function. + +You can specify a custom partitioning function for both range and hash partitioning. A partitioning function should take a `anyelement` argument as the only parameter and return a positive -`integer` hash value. Note that this hash value is _not_ a partition -identifier, but rather the inserted value's position in the -dimension's key space, which is then divided across the partitions. - -If no custom partitioning function is specified, the default -partitioning function is used, which calls PostgreSQL's internal hash -function for the given type. Thus, a custom partitioning function can -be used for value types that do not have a native PostgreSQL hash -function. +`integer` hash value. This hash value is _not_ a partition identifier, but rather the +inserted value's position in the dimension's key space, which is then divided across +the partitions. + ## by_range() -Creates a by-range dimension builder that can be used with -`create_hypertable` and `add_dimension`. +Create a by-range dimension builder that can be used with +[`create_hypertable`][create_hypertable] and [`add_dimension`][add_dimension]. ### Required Arguments @@ -59,19 +57,15 @@ information. ### Notes -The `partition_interval` should be specified as follows: +Specify the `partition_interval` as follows. If the column to be partitioned is a: -- If the column to be partitioned is a `TIMESTAMP`, `TIMESTAMPTZ`, or - `DATE`, this length should be specified either as an `INTERVAL` type +- `TIMESTAMP`, `TIMESTAMPTZ`, or `DATE`: specify `partition_interval` either as an `INTERVAL` type or an integer value in *microseconds*. -- If the column is some other integer type, this length should be an - integer that reflects the column's underlying semantics (for example, the - `partition_interval` should be given in milliseconds if this column - is the number of milliseconds since the UNIX epoch). +- Another integer type: specify `partition_interval` as an integer that reflects the column's + underlying semantics. For example, if this column is in UNIX time, specify `partition_interval` in milliseconds. -A summary of the partition type and default value depending on the -column type is summarized below. +The partition type and default value depending on column type is: | Column Type | Partition Type | Default value | |------------------------------|------------------|---------------| @@ -90,13 +84,12 @@ The simplest usage is to partition on a time column: SELECT create_hypertable('my_table', by_range('time')); ``` -In this case, the dimension builder can be excluded since -`create_hypertable` by default assumes that a single provided column +In this case, the dimension builder can be excluded since by default, +`create_hypertable` assumes that a single provided column is range partitioned by time. -If you have a table with a non-time column containing the time, for -example a JSON column, you can add a partition function to extract the -time. +If you have a table with a non-time column containing the time, such as +a JSON column, add a partition function to extract the time. ```sql CREATE TABLE my_table ( @@ -131,3 +124,10 @@ SELECT create_hypertable('my_table', by_range('data', '1 day', 'get_time')); An *dimension builder*, which is an which is an opaque type `_timescaledb_internal.dimension_info`, holding the dimension information. + +[create_hypertable]: /api/:currentVersion:/hypertable/create_hypertable/ +[add_dimension]: /api/:currentVersion:/hypertable/add_dimension/ +[dimension_builders]: /api/:currentVersion://hypertable/dimension_info/ +[by-range]: /api/:currentVersion:/hypertable/dimension_info/#by_range +[by-hash]: /api/:currentVersion:/hypertable/dimension_info/#by_hash + diff --git a/mlc_config.json b/mlc_config.json index 2c1d6c7c87..dd840f21b2 100644 --- a/mlc_config.json +++ b/mlc_config.json @@ -13,6 +13,12 @@ { "pattern": "^https://console.aws.amazon.com/rds/home#databases" }, + { + "pattern": "^https?://dbeaver.io/" + }, + { + "pattern": "^https?://dbeaver.io/download/" + }, { "pattern": "^https?://localhost" }, diff --git a/use-timescale/page-index/page-index.js b/use-timescale/page-index/page-index.js index 7f69cbd25e..fa38e44496 100644 --- a/use-timescale/page-index/page-index.js +++ b/use-timescale/page-index/page-index.js @@ -716,7 +716,7 @@ module.exports = [ excerpt: "Change the schema of a hypertable", }, { - title: "Index", + title: "Index data", href: "indexing", excerpt: "Create an index on a hypertable", }, diff --git a/use-timescale/schema-management/indexing.md b/use-timescale/schema-management/indexing.md index a767647d06..ad317d029b 100644 --- a/use-timescale/schema-management/indexing.md +++ b/use-timescale/schema-management/indexing.md @@ -5,7 +5,7 @@ products: [cloud, mst, self_hosted] keywords: [hypertables, indexes] --- -# Indexing data +# Index data You can use an index on your database to speed up read operations. You can create an index on any combination of columns, as long as you include the `time` @@ -36,7 +36,7 @@ use this command: CREATE INDEX ON conditions (time DESC); ``` -When you create a hypertable with the `create_hypertable` command, and you +When you create a hypertable with [`create_hypertable`][create_hypertable], and you specify an optional hash partition in addition to time, such as a `location` column, an additional index is created on the optional column and time. For example: @@ -57,13 +57,12 @@ SELECT create_hypertable('conditions', by_range('time')) CREATE_DEFAULT_INDEXES false; ``` - -The `by_range` dimension builder is an addition to TimescaleDB 2.13. - +[`by_range`][by-range] is an addition [dimension builder][dimension_builders] since TimescaleDB v2.13. + ## Best practices for indexing -If you have sparse data, with columns that are often NULL, you can add a clause +If you have sparse data with columns that are often NULL, you can add a clause to the index, saying `WHERE column IS NOT NULL`. This prevents the index from indexing NULL data, which can lead to a more compact and efficient index. For example: @@ -95,3 +94,5 @@ to perform indexing transactions on an individual chunk. [create_hypertable]: /api/:currentVersion:/hypertable/create_hypertable/ [about-index]: /use-timescale/:currentVersion:/schema-management/about-indexing/ [create-index]: https://docs.timescale.com/api/latest/hypertable/create_index/ +[by-range]: /api/:currentVersion:/hypertable/dimension_info/#by_range +[dimension_builders]: /api/:currentVersion://hypertable/dimension_info/