Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: data index #1410

Merged
merged 4 commits into from
Dec 26, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/user-guide/concepts/key-concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Find all the supported data types in [Data Types](/reference/sql/data-types.md).

## Index

The `index` is a performance-tuning method that allows faster retrieval of records. GreptimeDB uses the [inverted index](/contributor-guide/datanode/data-persistence-indexing.md#inverted-index) to accelerate queries.
The `index` is a performance-tuning method that allows faster retrieval of records. GreptimeDB provides various kinds of [indexes](/user-guide/manage-data/data-index.md) to accelerate queries.

## View

Expand Down
100 changes: 100 additions & 0 deletions docs/user-guide/manage-data/data-index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Data Index

GreptimeDB provides various indexing mechanisms to accelerate query performance. Indexes are essential database structures that help optimize data retrieval operations by creating efficient lookup paths to specific data.

## Overview

Indexes in GreptimeDB are specified during table creation and are designed to improve query performance for different types of data and query patterns. The database currently supports these types of indexes:

- Inverted Index
- Skipping Index
- Fulltext Index

Notice that in this chapter we are narrowing the word "index" to those related to data value indexing. PRIMARY KEY and TIME INDEX can also be treated as index in some scenarios, but they are not covered here.

## Index Types

### Inverted Index

An inverted index is particularly useful for tag columns. It creates a mapping between unique tag values and their corresponding rows, enabling fast lookups for specific tag values.

**Typical Use Cases:**
- Querying data by tag values
- Filtering operations on string columns
- Point queries on tag columns

Example:
```sql
CREATE TABLE monitoring_data (
host STRING,
region STRING PRIMARY KEY,
cpu_usage DOUBLE,
timestamp TIMESTAMP TIME INDEX,
INDEX INVERTED_INDEX(host, region)
);
```

However, when you have a large number of unique tag values (Cartesian product among all columns indexed by inverted index), the inverted index may not be the best choice due to the overhead of maintaining the index. It may bring high memory consumption and large index size. In this case, you may consider using the skipping index.

### Skipping Index

Skipping index suits for columnar data systems like GreptimeDB. It maintains metadata about value ranges within data blocks, allowing the query engine to skip irrelevant data blocks during range queries efficiently. This index also has smaller size compare to others.

**Use Cases:**
- When certain values are sparse, such as MAC address codes in logs.
- Querying specific values that occur infrequently within large datasets

Example:
```sql
CREATE TABLE sensor_data (
domain STRING PRIMARY KEY,
device_id STRING SKIPPING INDEX,
temperature DOUBLE,
timestamp TIMESTAMP TIME INDEX,
);
```

Skipping index can't handle complex filter conditions, and usually has a lower filtering performance compared to inverted index or fulltext index.

### Fulltext Index

Fulltext index is designed for text search operations on string columns. It enables efficient searching of text content using word-based matching and text search capabilities. You can query text data with flexible keywords, phrases, or pattern matching queries.

**Use Cases:**
- Text search operations
- Pattern matching queries
- Large text filtering

Example:
```sql
CREATE TABLE logs (
message STRING FULLTEXT INDEX,
level STRING PRIMARY KEY,
timestamp TIMESTAMP TIME INDEX,
);
```

Fulltext index usually comes with following drawbacks:

- Higher storage overhead compared to regular indexes due to storing word tokens and positions
- Increased flush and compaction latency as each text document needs to be tokenized and indexed
- May not be optimal for simple prefix or suffix matching operations

Consider using fulltext index only when you need advanced text search capabilities and flexible query patterns.

## Best Practices

1. Choose the appropriate index type based on your data type and query patterns
2. Index only the columns that are frequently used in WHERE clauses
3. Consider the trade-off between query performance, ingest performance and resource consumption
4. Monitor index usage and performance to optimize your indexing strategy continuously

## Performance Considerations

While indexes can significantly improve query performance, they come with some overhead:

- Additional storage space required for index structures
- Impact on flush and compaction performance due to index maintenance
- Memory usage for index caching

Choose indexes carefully based on your specific use case and performance requirements.
2 changes: 1 addition & 1 deletion sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ const sidebars: SidebarsConfig = {
'user-guide/query-data/query-external-data',
],
},
{ type: 'category', label: 'Manage Data', items: ['user-guide/manage-data/overview'] },
{ type: 'category', label: 'Manage Data', items: ['user-guide/manage-data/overview', 'user-guide/manage-data/data-index'] },
{
type: 'category',
label: 'Integrations',
Expand Down
Loading