Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Overview #2591

Open
wants to merge 26 commits into
base: latest
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
a88ff7d
Create pages
Loquacity Aug 2, 2023
82847f5
page index and move our of dir
Loquacity Aug 2, 2023
6066046
data tiering partials
Loquacity Aug 2, 2023
0f0e395
more data tiering
Loquacity Aug 2, 2023
d75d641
Hypertables partials
Loquacity Aug 2, 2023
618ead1
Time bucket partials
Loquacity Aug 2, 2023
cd4983d
Caggs partials
Loquacity Aug 2, 2023
c537bfc
Compression partials
Loquacity Aug 2, 2023
c2ab135
Merge branch 'latest' into dev-overview-lana
Loquacity Aug 7, 2023
c03e067
Move overview out of use ts
Loquacity Aug 10, 2023
291615b
Merge branch 'latest' into dev-overview-lana
Loquacity Aug 10, 2023
9980a40
Merge branch 'latest' into dev-overview-lana
Loquacity Aug 14, 2023
b5bb548
fix: add page index for overview section
charislam Aug 15, 2023
f1cfd7d
Merge branch 'latest' into dev-overview-lana
Loquacity Aug 16, 2023
8cd53aa
Add intro and value prop
Loquacity Aug 16, 2023
101669e
Merge branch 'latest' into dev-overview-lana
Loquacity Aug 28, 2023
63209e9
Merge branch 'latest' into dev-overview-lana
Loquacity Sep 4, 2023
3793e33
Change title of overview
Loquacity Sep 4, 2023
397458b
Merge branch 'latest' into dev-overview-lana
Loquacity Sep 6, 2023
a617950
Merge branch 'latest' into dev-overview-lana
Loquacity Sep 8, 2023
5c901aa
Merge branch 'latest' into dev-overview-lana
Loquacity Sep 12, 2023
8b15f09
Merge branch 'latest' into dev-overview-lana
Loquacity Sep 13, 2023
ed4cd2f
split page
Loquacity Sep 13, 2023
eb4c5e0
metadata
Loquacity Sep 13, 2023
75c1162
update top level content
Loquacity Sep 13, 2023
3692e67
Update title
Loquacity Sep 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions _partials/_architecture-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FIXME

Check warning on line 1 in _partials/_architecture-overview.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.FixMe] Replace placeholder text. Raw Output: {"message": "[Google.FixMe] Replace placeholder text.", "location": {"path": "_partials/_architecture-overview.md", "range": {"start": {"line": 1, "column": 1}}}, "severity": "WARNING"}
18 changes: 18 additions & 0 deletions _partials/_caggs-next.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Creating a continuous aggregate is a two-step process. You need to create the
view first, then enable a policy to keep the view refreshed. You can create the
view on a hypertable, or on top of another continuous aggregate. You can have
more than one continuous aggregate on each source table or view.

Continuous aggregates require a `time_bucket` on the time partitioning column of
the hypertable.

By default, views are automatically refreshed. You can adjust this by setting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, views are automatically refreshed. You can adjust this by setting
By default, views are automatically refreshed when they are created. You can adjust this by using

the [WITH NO DATA](#using-the-with-no-data-option) option. Additionally, the
view can not be a [security barrier view][postgres-security-barrier].

Check failure on line 11 in _partials/_caggs-next.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.MarkdownLinks] Remember to include the link. Raw Output: {"message": "[Google.MarkdownLinks] Remember to include the link.", "location": {"path": "_partials/_caggs-next.md", "range": {"start": {"line": 11, "column": 42}}}, "severity": "ERROR"}

Continuous aggregates use hypertables in the background, which means that they
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Continuous aggregates use hypertables in the background, which means that they
Continuous aggregates use hypertables internally, which means that they

also use chunk time intervals. By default, the continuous aggregate's chunk time
interval is 10 times what the original hypertable's chunk time interval is. For
example, if the original hypertable's chunk time interval is 7 days, the
continuous aggregates that are on top of it have a 70 day chunk time
interval.
21 changes: 21 additions & 0 deletions _partials/_compression-next.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Research has shown that when data is newly ingested, the queries are more likely
to be shallow in time, and wide in columns. Generally, they are debugging
queries, or queries that cover the whole system, rather than specific, analytic
queries. An example of the kind of query more likely for new data is "show the
current CPU usage, disk usage, energy consumption, and I/O for a particular
server". When this is the case, the uncompressed data has better query
performance, so the native PostgreSQL row-based format is the best option.
Comment on lines +1 to +7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... not sure if "debugging queries" is the best categorization of this.

Suggested change
Research has shown that when data is newly ingested, the queries are more likely
to be shallow in time, and wide in columns. Generally, they are debugging
queries, or queries that cover the whole system, rather than specific, analytic
queries. An example of the kind of query more likely for new data is "show the
current CPU usage, disk usage, energy consumption, and I/O for a particular
server". When this is the case, the uncompressed data has better query
performance, so the native PostgreSQL row-based format is the best option.
For newly ingested data, the queries are usually
shallow in time, and wide in columns. At this stage, the queries delve into details of the system. An example of the kind of query more likely for new data is "show the
current CPU usage, disk usage, energy consumption, and I/O for a particular
server". When this is the case, the uncompressed data has better query
performance, so the native PostgreSQL row-based format is the best option.


However, as data ages, queries are likely to change. They become more
analytical, and involve fewer columns. An example of the kind of query run on
older data is "calculate the average disk usage over the last month." This type
of query runs much faster on compressed, columnar data.

To take advantage of this and increase your query efficiency, you want to run
queries on new data that is uncompressed, and on older data that is compressed.
Setting the right compression policy interval means that recent data is ingested
in an uncompressed, row format for efficient shallow and wide queries, and then
automatically converted to a compressed, columnar format after it ages and is
more likely to be queried using deep and narrow queries. Therefore, one
consideration for choosing the age at which to compress the data is when your
query patterns change from shallow and wide to deep and narrow.
6 changes: 6 additions & 0 deletions _partials/_data-tiering-intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Timescale includes traditional disk storage, and a low-cost object-storage
layer built on Amazon S3. You can move your hypertable data across the different
storage tiers to get the best price performance. You can use primary storage for
data that requires quick access, and low-cost object storage for historical
data. Regardless of where your data is stored, you can query it with standard
SQL.
17 changes: 17 additions & 0 deletions _partials/_data-tiering-next.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Data tiering works by periodically and asynchronously moving older chunks to S3
storage. There, it's stored in the Apache Parquet format, which is a compressed
columnar format well-suited for S3. Data remains accessible both during and
after migration.

When you run regular SQL queries, a behind-the-scenes process transparently
pulls data from wherever it's located: disk storage, object storage, or both.
Various SQL optimizations limit what needs to be read from S3:

* Chunk exclusion avoids processing chunks that fall outside the query's time
window
* The database uses metadata about row groups and columnar offsets, so only
part of an object needs to be read from S3

The result is transparent queries across standard PostgreSQL storage and S3
storage, so your queries fetch the same data as before, with minimal added
latency.
Comment on lines +15 to +17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work focus on the utility, not on performance.

Suggested change
The result is transparent queries across standard PostgreSQL storage and S3
storage, so your queries fetch the same data as before, with minimal added
latency.
As a result, you can write queries seamlessly reading and involving both tiered and untiered data.```

1 change: 1 addition & 0 deletions _partials/_elastic-compute-intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FIXME

Check warning on line 1 in _partials/_elastic-compute-intro.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.FixMe] Replace placeholder text. Raw Output: {"message": "[Google.FixMe] Replace placeholder text.", "location": {"path": "_partials/_elastic-compute-intro.md", "range": {"start": {"line": 1, "column": 1}}}, "severity": "WARNING"}
20 changes: 20 additions & 0 deletions _partials/_hypertables-next.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
When you create and use a hypertable, it automatically partitions data by time,
and optionally by space.
Comment on lines +1 to +2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When you create and use a hypertable, it automatically partitions data by time,
and optionally by space.
Hypertables are used to automatically partition data: traditionally using time, but hypertables can also be used to partition data in other dimensions.```


Each hypertable is made up of child tables called chunks. Each chunk is assigned
a range of time, and only contains data from that range. If the hypertable is
also partitioned by space, each chunk is also assigned a subset of the space
values.
Comment on lines +4 to +7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can partition using multiple time dimensions and multiple space dimensions, so suggest to elaborate a little on this.

Suggested change
Each hypertable is made up of child tables called chunks. Each chunk is assigned
a range of time, and only contains data from that range. If the hypertable is
also partitioned by space, each chunk is also assigned a subset of the space
values.
Each hypertable is made up of child tables called chunks. Each chunk is assigned
a range of time, and only contains data from that range. If the hypertable is
also partitioned by other dimensions, each chunk is also assigned a subset of the values in that dimension.


Each chunk of a hypertable only holds data from a specific time range. When you
insert data from a time range that doesn't yet have a chunk, Timescale
automatically creates a chunk to store it.

By default, each chunk covers 7 days. You can change this to better suit your
needs. For example, if you set `chunk_time_interval` to 1 day, each chunk stores
data from the same day. Data from different days is stored in different chunks.

<img class="main-content__illustration"
src="https://assets.timescale.com/docs/images/getting-started/hypertables-chunks.webp"
alt="A normal table compared to a hypertable. The normal table holds data for 3 different days in one container. The hypertable contains 3 containers, called chunks, each of which holds data for a separate day."
/>
22 changes: 22 additions & 0 deletions _partials/_time-bucket-intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
The [`time_bucket`][time_bucket] function allows you to aggregate data into

Check failure on line 1 in _partials/_time-bucket-intro.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.MarkdownLinks] Remember to include the link. Raw Output: {"message": "[Google.MarkdownLinks] Remember to include the link.", "location": {"path": "_partials/_time-bucket-intro.md", "range": {"start": {"line": 1, "column": 20}}}, "severity": "ERROR"}
buckets of time, for example: 5 minutes, 1 hour, or 3 days. It's similar to
PostgreSQL's [`date_bin`][date_bin] function, but it gives you more

Check failure on line 3 in _partials/_time-bucket-intro.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.MarkdownLinks] Remember to include the link. Raw Output: {"message": "[Google.MarkdownLinks] Remember to include the link.", "location": {"path": "_partials/_time-bucket-intro.md", "range": {"start": {"line": 3, "column": 26}}}, "severity": "ERROR"}
flexibility in bucket size and start time.

Time bucketing is essential to working with time-series data. You can use it to
roll up data for analysis or downsampling. For example, you can calculate
5-minute averages for a sensor reading over the last day. You can perform these
rollups as needed, or pre-calculate them in [continuous aggregates][caggs].

Check failure on line 9 in _partials/_time-bucket-intro.md

View workflow job for this annotation

GitHub Actions / prose

[vale] reported by reviewdog 🐶 [Google.MarkdownLinks] Remember to include the link. Raw Output: {"message": "[Google.MarkdownLinks] Remember to include the link.", "location": {"path": "_partials/_time-bucket-intro.md", "range": {"start": {"line": 9, "column": 68}}}, "severity": "ERROR"}

Time bucketing groups data into time intervals. With `time_bucket`, the interval
length can be any number of microseconds, milliseconds, seconds, minutes, hours,
days, weeks, months, years, or centuries.

The `time_bucket` function is usually used in combination with `GROUP BY` to
aggregate data. For example, you can calculate the average, maximum, minimum, or
sum of values within a bucket.

<img class="main-content__illustration"
src="https://assets.timescale.com/docs/images/getting-started/time-bucket.webp"
alt="Diagram showing time-bucket aggregating data into daily buckets, and calculating the daily sum of a value"
/>
11 changes: 9 additions & 2 deletions _partials/_timescale-intro.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,9 @@
Timescale extends PostgreSQL for time-series and analytics, so you can build
faster, scale further, and stay under budget.
Timescale is a database platform engineered to deliver speed and scale to
resource-intensive workloads, which makes it great for things like time series,
event, and analytics data. Timescale is built on PostgreSQL, so you have access
to the entire PostgreSQL ecosystem, with a user-friendly interface that
simplifies database deployment and management. Timescale dramatically improves
your database performance with hypertables and continuous aggregates, and can
save you money with features like compression, usage-based storage, and data
tiering. And the Timescale expert support team is available to assist at no
extra charge 24 hours a day, 7 days a week, 365 days a year.
10 changes: 10 additions & 0 deletions _partials/_timescale-value-prop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Timescale works for you end-to-end. Converting your PostgreSQL tables to
hypertables instantly improves query and insert performance, and gives you
immediate access to continuous aggregates and compression. Continuous aggregates
continuously and incrementally materialize your aggregate queries, giving you
updated insights as soon as new data arrives. Compression immediately improves
database performance and, with usage-based storage, also saves you money. Pair
all this with data tiering to automatically archive older data, saving money,
but retaining access when you need it. Need to know more? Keep reading, and
remember a world-class support team is here to help you if you need it, every
step of the way.
106 changes: 106 additions & 0 deletions overview/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
title: Timescale overview
excerpt: Learn about core Timescale concepts, architecture, and features
products: [cloud, mst, self_hosted]
keywords: [learn, architecture, hypertables, time buckets, compression, continuous aggregates]
---

import CaggsIntro from "versionContent/_partials/_caggs-intro.mdx";
import CaggsTypes from "versionContent/_partials/_caggs-types.mdx";
import CaggsNext from "versionContent/_partials/_caggs-next.mdx";
import CloudIntro from "versionContent/_partials/_cloud-intro.mdx";
import HypertablesIntro from "versionContent/_partials/_hypertables-intro.mdx";
import HypertablesNext from "versionContent/_partials/_hypertables-next.mdx";
import TimeSeriesIntro from "versionContent/_partials/_timeseries-intro.mdx";
import TimescaleIntro from "versionContent/_partials/_timescale-intro.mdx";
import TimescaleValueProp from "versionContent/_partials/_timescale-value-prop.mdx";
import CompressionIntro from "versionContent/_partials/_compression-intro.mdx";
import CompressionNext from "versionContent/_partials/_compression-next.mdx";
import UbsIntro from "versionContent/_partials/_usage-based-storage-intro.mdx";
import ElasticComputeIntro from "versionContent/_partials/_elastic-compute-intro.mdx";
import DataTieringIntro from "versionContent/_partials/_data-tiering-intro.mdx";
import DataTieringNext from "versionContent/_partials/_data-tiering-next.mdx";
import Architecture from "versionContent/_partials/_architecture-overview.mdx";
import TimeBucketIntro from "versionContent/_partials/_time-bucket-intro.mdx";

# Timescale overview

<TimescaleIntro />

<TimescaleValueProp />

This section provides an overview of Timescale architecture, introducing you
to special Timescale concepts and features.

## Time-series data

<TimeSeriesIntro />

## Timescale architecture

<Architecture />

## Timescale

<CloudIntro />

## Hypertables

<HypertablesIntro />

<HypertablesNext />

For more information about hypertables, see the
[hypertables section][hypertables]

## Time buckets

<TimeBucketIntro />

For more information about time bucketing, see the
[time buckets section][time-buckets]

## Data tiering

<DataTieringIntro />

<DataTieringNext />

For more information about data tiering, see the
[data tiering section][data-tiering]

## Continuous aggregation

<CaggsIntro />

<CaggsTypes />

<CaggsNext />

For more information about continuous aggregation, see the
[continuous aggregates section][caggs]

## Compression

<CompressionIntro />

<CompressionNext />

For more information about compression, see the
[compression section][time-buckets]

## Elastic compute and usage-based storage

<UbsIntro />

<ElasticComputeIntro />

For more information about elastic compute and usage-based storage, see the
[billing section][billing]

[hypertables]: /use-timescale/:currentVersion:/hypertables/
[time-buckets]: /use-timescale/:currentVersion:/time-buckets/
[data-tiering]: /use-timescale/:currentVersion:/data-tiering/
[caggs]: /use-timescale/:currentVersion:/continuous-aggregates/
[time-buckets]: /use-timescale/:currentVersion:/time-buckets/
[billing]: /use-timescale/:currentVersion:/account-management/
9 changes: 9 additions & 0 deletions overview/page-index/page-index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
module.exports = [
{
title: "What is Timescale?",
href: "overview",
filePath: "index.md",
excerpt:
"What is Timescale?",
},
];
3 changes: 3 additions & 0 deletions page-index/page-index.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,11 @@ const navigationPageIndex = require("../navigation/page-index/page-index");
const tutorialsPageIndex = require("../tutorials/page-index/page-index.js");
const codeQuickStartsPageIndex = require("../quick-start/page-index/page-index.js");
const timescaleAboutPageIndex = require("../about/page-index/page-index");
const overviewPageIndex = require("../overview/page-index/page-index");


module.exports = [
...overviewPageIndex,
...gsgPageIndex,
...timescaleUsingPageIndex,
...tutorialsPageIndex,
Expand Down
26 changes: 4 additions & 22 deletions use-timescale/compression/about-compression.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ keywords: [compression, hypertables]

import CompressionIntro from 'versionContent/_partials/_compression-intro.mdx';

import CompressionNext from "versionContent/_partials/_compression-next.mdx";

# About compression

<CompressionIntro />
Expand Down Expand Up @@ -104,33 +106,13 @@ compression.
## Compression policy intervals

Data is usually compressed after an interval of time, and not
immediately. In the "Enabling compression" procedure, you used a seven day
immediately. In the example in this section, you used a seven day
compression interval. Choosing a good compression interval can make your queries
more efficient, and also allow you to handle data that is out of order.

### Query efficiency

Research has shown that when data is newly ingested, the queries are more likely
to be shallow in time, and wide in columns. Generally, they are debugging
queries, or queries that cover the whole system, rather than specific, analytic
queries. An example of the kind of query more likely for new data is "show the
current CPU usage, disk usage, energy consumption, and I/O for a particular
server". When this is the case, the uncompressed data has better query
performance, so the native PostgreSQL row-based format is the best option.

However, as data ages, queries are likely to change. They become more
analytical, and involve fewer columns. An example of the kind of query run on
older data is "calculate the average disk usage over the last month." This type
of query runs much faster on compressed, columnar data.

To take advantage of this and increase your query efficiency, you want to run
queries on new data that is uncompressed, and on older data that is compressed.
Setting the right compression policy interval means that recent data is ingested
in an uncompressed, row format for efficient shallow and wide queries, and then
automatically converted to a compressed, columnar format after it ages and is
more likely to be queried using deep and narrow queries. Therefore, one
consideration for choosing the age at which to compress the data is when your
query patterns change from shallow and wide to deep and narrow.
<CompressionNext />

### Modified data

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,8 @@ WHERE t1.id IN (1, 2, 3, 4)
GROUP BY ...
```

`INNER JOIN` on a single equality condition specified in `WHERE` clause, this is allowed but not recommended:
`INNER JOIN` on a single equality condition specified in `WHERE` clause, this is
allowed but not recommended:

```sql
CREATE MATERIALIZED VIEW my_view WITH (timescaledb.continuous) AS
Expand All @@ -99,7 +100,8 @@ JOIN table_2 t2 ON t1.t2_id = t2.id AND t1.t2_id_2 = t2.id
GROUP BY ...
```

A `JOIN` with a single equality condition specified in `WHERE` clause cannot be combined with further conditions in the `WHERE` clause.
A `JOIN` with a single equality condition specified in `WHERE` clause cannot be
combined with further conditions in the `WHERE` clause.

```sql
CREATE MATERIALIZED VIEW my_view WITH (timescaledb.continuous) AS
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,26 +5,11 @@ products: [cloud, mst, self_hosted]
keywords: [continuous aggregates, create]
---

# Create continuous aggregates
import CaggsNext from "versionContent/_partials/_caggs-next.mdx";

Creating a continuous aggregate is a two-step process. You need to create the
view first, then enable a policy to keep the view refreshed. You can create the
view on a hypertable, or on top of another continuous aggregate. You can have
more than one continuous aggregate on each source table or view.
# Create continuous aggregates

Continuous aggregates require a `time_bucket` on the time partitioning column of
the hypertable.

By default, views are automatically refreshed. You can adjust this by setting
the [WITH NO DATA](#using-the-with-no-data-option) option. Additionally, the
view can not be a [security barrier view][postgres-security-barrier].

Continuous aggregates use hypertables in the background, which means that they
also use chunk time intervals. By default, the continuous aggregate's chunk time
interval is 10 times what the original hypertable's chunk time interval is. For
example, if the original hypertable's chunk time interval is 7 days, the
continuous aggregates that are on top of it have a 70 day chunk time
interval.
<CaggsNext />

## Create a continuous aggregate

Expand Down
Loading
Loading