Skip to content

Commit

Permalink
Add page index caveat
Browse files Browse the repository at this point in the history
  • Loading branch information
alamb committed Jun 21, 2024
1 parent 28dceb7 commit d2f6477
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions datafusion-examples/examples/advanced_parquet_index.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ use url::Url;
/// 1. Use [`ParquetFileReaderFactory`] to avoid re-reading parquet metadata on each query
/// 2. Use [`PruningPredicate`] for predicate analysis
/// 3. Pass a row group selection to [`ParuetExec`]
/// 4. Pass a row selection (within a row group) to [`ParuetExec`]
/// 4. Pass a row selection (within a row group) to [`ParquetExec`]
///
/// Note this is a *VERY* low level example for people who want to build their
/// own custom indexes (e.g. for low latency queries). Most users should use
Expand Down Expand Up @@ -125,13 +125,14 @@ use url::Url;
///
/// Within a Row Group, Column Chunks store data in DataPages. This example also
/// shows how to configure the ParquetExec to read a `RowSelection` (row ranges)
/// which will skip unneeded data pages:
/// which will skip unneeded data pages. This requires that the Parquet file has
/// a [Page Index].
///
/// ```text
/// ┌───────────────────────┐ If the RowSelection does not include any
/// │ ... │ rows from a particular Data Page, that
/// │ │ Data Page is not fetched or decoded
/// │ ┌───────────────────┐ │
/// │ │ Data Page is not fetched or decoded.
/// │ ┌───────────────────┐ │ Note this requires a PageIndex
/// │ │ ┌──────────┐ │ │
/// Row │ │ │DataPage 0│ │ │ ┌────────────────────┐
/// Groups │ │ └──────────┘ │ │ │ │
Expand All @@ -153,6 +154,7 @@ use url::Url;
/// ```
///
/// [`ListingTable`]: datafusion::datasource::listing::ListingTable
/// [Page Index](https://github.com/apache/parquet-format/blob/master/PageIndex.md)
#[tokio::main]
async fn main() -> Result<()> {
// the object store is used to read the parquet files (in this case, it is
Expand Down

0 comments on commit d2f6477

Please sign in to comment.