Skip to content

Commit

Permalink
chore: adjust image size (#1412)
Browse files Browse the repository at this point in the history
  • Loading branch information
nicecui authored Dec 26, 2024
1 parent a8a10a9 commit 51d967a
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 8 deletions.
4 changes: 2 additions & 2 deletions docs/contributor-guide/datanode/data-persistence-indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ First, clustering data by column makes file scanning more efficient, especially

Second, data of the same column tends to be homogeneous which helps with compression when apply techniques like dictionary and Run-Length Encoding (RLE).

![Parquet file format](/parquet-file-format.png)
<img src="/parquet-file-format.png" alt="Parquet file format" width="500"/>

## Data Persistence

Expand All @@ -29,7 +29,7 @@ When the size of data buffered in MemTables reaches that threshold, GreptimeDB w

Apache Parquet file format provides inherent statistics in headers of column chunks and data pages, which are used for pruning and skipping.

![Column chunk header](/column-chunk-header.png)
<img src="/column-chunk-header.png" alt="Column chunk header" width="350"/>

For example, in the above Parquet file, if you want to filter rows where `name` = `Emily`, you can easily skip row group 0 because the max value for `name` field is `Charlie`. This statistical information reduces IO operations.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Parquet 具有层次结构,类似于“行组-列-数据页”。Parquet 文

其次,相同列的数据往往是同质的(比如具备近似的值),这有助于在采用字典和 Run-Length Encoding(RLE)等技术进行压缩。

![Parquet file format](/parquet-file-format.png)
<img src="/parquet-file-format.png" alt="Parquet file format" width="500"/>

## 数据持久化

Expand All @@ -28,7 +28,7 @@ GreptimeDB 提供了 `storage.flush.global_write_buffer_size` 的配置项来设

Apache Parquet 文件格式在列块和数据页的头部提供了内置的统计信息,用于剪枝和跳过。

![Column chunk header](/column-chunk-header.png)
<img src="/column-chunk-header.png" alt="Column chunk header" width="350"/>

例如,在上述 Parquet 文件中,如果你想要过滤 `name` 等于 `Emily` 的行,你可以轻松跳过行组 0,因为 `name` 字段的最大值是 `Charlie`。这些统计信息减少了 IO 操作。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Parquet 具有层次结构,类似于“行组-列-数据页”。Parquet 文

其次,相同列的数据往往是同质的(比如具备近似的值),这有助于在采用字典和 Run-Length Encoding(RLE)等技术进行压缩。

![Parquet file format](/parquet-file-format.png)
<img src="/parquet-file-format.png" alt="Parquet file format" width="500"/>

## 数据持久化

Expand All @@ -28,7 +28,7 @@ GreptimeDB 提供了 `storage.flush.global_write_buffer_size` 的配置项来设

Apache Parquet 文件格式在列块和数据页的头部提供了内置的统计信息,用于剪枝和跳过。

![Column chunk header](/column-chunk-header.png)
<img src="/column-chunk-header.png" alt="Column chunk header" width="350"/>

例如,在上述 Parquet 文件中,如果你想要过滤 `name` 等于 `Emily` 的行,你可以轻松跳过行组 0,因为 `name` 字段的最大值是 `Charlie`。这些统计信息减少了 IO 操作。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ First, clustering data by column makes file scanning more efficient, especially

Second, data of the same column tends to be homogeneous which helps with compression when apply techniques like dictionary and Run-Length Encoding (RLE).

![Parquet file format](/parquet-file-format.png)
<img src="/parquet-file-format.png" alt="Parquet file format" width="500"/>

## Data Persistence

Expand All @@ -29,7 +29,7 @@ When the size of data buffered in MemTables reaches that threshold, GreptimeDB w

Apache Parquet file format provides inherent statistics in headers of column chunks and data pages, which are used for pruning and skipping.

![Column chunk header](/column-chunk-header.png)
<img src="/column-chunk-header.png" alt="Column chunk header" width="350"/>

For example, in the above Parquet file, if you want to filter rows where `name` = `Emily`, you can easily skip row group 0 because the max value for `name` field is `Charlie`. This statistical information reduces IO operations.

Expand Down

0 comments on commit 51d967a

Please sign in to comment.