Skip RowFilter and page pruning for fully matched row groups by xudong963 · Pull Request #21637 · apache/datafusion

xudong963 · 2026-04-15T05:33:43Z

Which issue does this PR close?

Closes Do not evaluate predicates if they can be proven to be false #19028.

Rationale for this change

When DataFusion evaluates a Parquet scan with filter pushdown, it uses row group statistics to determine which row groups contain matching rows. The RowGroupAccessPlanFilter already tracks which row groups are "fully matched" — where statistics prove that all rows satisfy the predicate (via is_fully_matched).

However, this information was not propagated downstream. Even for fully matched row groups:

Page index pruning still evaluated page-level statistics (wasted work since no pages can be pruned)
RowFilter evaluation still decoded filter columns and evaluated the predicate for every row (wasted work since every row passes)

This is especially costly when filter columns are expensive to decode (e.g., large strings) or when predicates are complex. Common real-world examples include time-range filters where entire row groups fall within the range, or WHERE status != 'DELETED' on data with no deleted rows.

What changes are included in this PR?

DataFusion changes (this PR)

row_group_filter.rs: RowGroupAccessPlanFilter::build() now returns (ParquetAccessPlan, Vec<usize>) — the access plan plus the indices of fully matched row groups.
page_filter.rs: prune_plan_with_page_index() accepts a fully_matched_row_groups parameter and skips page-level pruning for those row groups.
opener.rs: Wires fully matched row groups through the pipeline — passes them to page pruning and to the ParquetPushDecoderBuilder via with_fully_matched_row_groups().

Arrow-rs dependency (apache/arrow-rs#9694)

The new ArrowReaderBuilder::with_fully_matched_row_groups() API in arrow-rs allows skipping RowFilter evaluation during Parquet decoding for specified row groups. This PR uses [patch.crates-io] pointing to the arrow-rs fork branch until that PR is merged and released.

Benchmark

Includes a criterion benchmark (parquet_fully_matched_filter) using ParquetPushDecoder directly — the same code path DataFusion's async opener uses. Dataset: 20 row groups × 50K rows, with a 1KB string payload column and predicate x < 200 (all row groups fully matched).

Scenario	Time	vs. baseline
Filter pushdown, no skip	~43 ms	baseline
Filter pushdown, with skip	~20 ms	~2.2x faster
No pushdown at all	~24 ms	—

Are these changes tested?

All 82 existing non-submodule datafusion-datasource-parquet tests pass (16 failures are pre-existing, caused by missing parquet-testing submodule)
The benchmark verifies correctness by asserting the expected row count
Clippy and fmt pass

Are there any user-facing changes?

No user-facing API changes. This is a transparent performance optimization — queries that previously worked will now be faster when row group statistics prove all rows match the predicate.

Note: This PR depends on apache/arrow-rs#9694. The [patch.crates-io] in Cargo.toml will be removed once that arrow-rs change is released. all logic is on df side now

xudong963 · 2026-05-06T05:20:05Z

@alamb thanks for the review, before getting the PR in, I think it's better to have your look for the comment #21637 (comment), and it's fix commit: da7db27 (this is the lowest cost way I found to fix the metric. Let me know if you have other thoughts)

xudong963 · 2026-05-06T05:20:45Z

run benchmark clickbench_extended

env:
  DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
  DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true

adriangbot · 2026-05-06T05:23:26Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4385338380-2036-5z6mw 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing datafusion/issue-19028-benchmark (da7db27) to ba038e9 (merge-base) diff using: clickbench_extended
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-06T05:44:52Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and datafusion_issue-19028-benchmark
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query     ┃                                      HEAD ┃          datafusion_issue-19028-benchmark ┃         Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0  │         811.88 / 821.08 ±7.70 / 834.76 ms │        800.65 / 816.09 ±10.06 / 831.13 ms │      no change │
│ QQuery 1  │         196.31 / 198.30 ±3.50 / 205.30 ms │         191.05 / 194.30 ±5.52 / 205.30 ms │      no change │
│ QQuery 2  │         483.55 / 487.48 ±3.15 / 492.28 ms │         469.04 / 470.24 ±1.37 / 472.90 ms │      no change │
│ QQuery 3  │         310.29 / 311.17 ±0.67 / 312.01 ms │         308.38 / 310.78 ±2.99 / 316.41 ms │      no change │
│ QQuery 4  │         661.10 / 671.06 ±5.58 / 677.71 ms │         663.47 / 677.74 ±9.50 / 693.45 ms │      no change │
│ QQuery 5  │ 10480.76 / 10749.26 ±135.39 / 10838.42 ms │ 10381.10 / 10707.66 ±229.67 / 11069.32 ms │      no change │
│ QQuery 6  │           29.83 / 41.34 ±15.31 / 69.33 ms │            28.00 / 32.99 ±8.92 / 50.81 ms │  +1.25x faster │
│ QQuery 7  │        771.70 / 787.65 ±13.90 / 803.77 ms │        750.14 / 768.13 ±16.43 / 787.55 ms │      no change │
│ QQuery 8  │        378.07 / 403.59 ±35.92 / 474.57 ms │        380.54 / 399.98 ±32.19 / 464.04 ms │      no change │
│ QQuery 9  │     2872.22 / 2922.21 ±31.65 / 2960.94 ms │     2795.45 / 2887.03 ±72.17 / 2963.15 ms │      no change │
│ QQuery 10 │         641.11 / 647.49 ±4.55 / 653.68 ms │        643.03 / 671.47 ±43.17 / 757.36 ms │      no change │
│ QQuery 11 │     2185.14 / 2209.10 ±19.59 / 2233.01 ms │     2165.30 / 2213.30 ±42.41 / 2275.07 ms │      no change │
│ QQuery 12 │        197.95 / 215.56 ±29.02 / 273.46 ms │        193.75 / 212.64 ±25.73 / 262.96 ms │      no change │
│ QQuery 13 │        540.86 / 560.53 ±11.38 / 571.47 ms │            14.03 / 14.18 ±0.10 / 14.31 ms │ +39.54x faster │
└───────────┴───────────────────────────────────────────┴───────────────────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 21025.80ms │
│ Total Time (datafusion_issue-19028-benchmark)   │ 20376.52ms │
│ Average Time (HEAD)                             │  1501.84ms │
│ Average Time (datafusion_issue-19028-benchmark) │  1455.47ms │
│ Queries Faster                                  │          2 │
│ Queries Slower                                  │          0 │
│ Queries with No Change                          │         12 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_extended — base (merge-base)

Metric	Value
Wall time	110.0s
Peak memory	30.9 GiB
Avg memory	23.9 GiB
CPU user	1019.6s
CPU sys	39.8s
Peak spill	0 B

clickbench_extended — branch

Metric	Value
Wall time	105.0s
Peak memory	31.7 GiB
Avg memory	24.2 GiB
CPU user	985.3s
CPU sys	37.3s
Peak spill	0 B

File an issue against this benchmark runner

alamb · 2026-05-06T17:59:10Z

@alamb thanks for the review, before getting the PR in, I think it's better to have your look for the comment #21637 (comment), and it's fix commit: da7db27 (this is the lowest cost way I found to fix the metric. Let me know if you have other thoughts)

Maybe we should just add a new metric on ParquetScanMetrics 🤔

datafusion/datafusion/datasource-parquet/src/metrics.rs

Line 30 in 4c909ba

pub struct ParquetFileMetrics {

When row group statistics prove that ALL rows satisfy the filter predicate, skip both RowFilter evaluation (late materialization) and page index pruning for those row groups. This avoids wasted work decoding filter columns and evaluating predicates that produce no useful filtering. Depends on apache/arrow-rs#9694 for the `with_fully_matched_row_groups()` builder API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When a row group has NULL values in predicate columns, those rows evaluate to NULL (not true) in the filter. The inverted predicate approach can incorrectly mark such row groups as "fully matched" because NULLs satisfy neither the predicate nor its inverse. Check null_count statistics for predicate columns before marking a row group as fully matched. If any predicate column has NULLs, the row group is not fully matched and the filter must still run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ed row groups The fully matched row group optimization skips page index pruning, reducing the page_index_pages_pruned count from 6 to 4 (the 2 pages in the fully matched row group are no longer evaluated). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fully matched row groups now skip page index pruning, reducing the page_index_pages_pruned counts in limit_pruning.slt and dynamic_filter_pushdown_config.slt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…-rs changes) Instead of adding a `with_fully_matched_row_groups` API to arrow-rs, implement the optimization entirely in DataFusion by creating separate ParquetPushDecoders for row groups that need filtering vs those that are fully matched. Key changes: - Split row groups into consecutive runs of same filter requirement via `split_decoder_runs()`, preserving original row group ordering for ordered scans. - Each filtered run gets its own RowFilter; fully-matched runs skip it. - Use VecDeque<ParquetPushDecoder> in PushDecoderStreamState to chain decoders sequentially. - Remove [patch.crates-io] arrow-rs fork dependency. This aligns with the direction of per-row-group morsels: each decoder run can naturally become a morsel when that infrastructure lands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## Which issue does this PR close? Split from apache#21637. ## Rationale for this change The benchmark binaries currently reject enabling both `snmalloc` and `mimalloc`. However, workspace-wide checks such as `cargo clippy --all-targets --all-features` enable both feature flags, which makes these binaries fail before clippy can run. ## What changes are included in this PR? This PR removes the explicit compile error from the benchmark binaries and makes allocator selection deterministic: `snmalloc` is used when enabled, otherwise `mimalloc` is used when enabled. ## Are these changes tested? ## Are there any user-facing changes? No. This only affects benchmark binary feature combinations used by development checks.

github-actions · 2026-05-07T05:33:11Z

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details

     Cloning apache/main
    Building datafusion-datasource-parquet v53.1.0 (current)
       Built [  43.029s] (current)
     Parsing datafusion-datasource-parquet v53.1.0 (current)
      Parsed [   0.026s] (current)
    Building datafusion-datasource-parquet v53.1.0 (baseline)
       Built [  42.813s] (baseline)
     Parsing datafusion-datasource-parquet v53.1.0 (baseline)
      Parsed [   0.025s] (baseline)
    Checking datafusion-datasource-parquet v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.142s] 222 checks: 220 pass, 2 fail, 0 warn, 30 skip

--- failure auto_trait_impl_removed: auto trait no longer implemented ---

Description:
A public type has stopped implementing one or more auto traits. This can break downstream code that depends on the traits being implemented.
        ref: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/auto_trait_impl_removed.ron

Failed in:
  type ParquetFileMetrics is no longer UnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/metrics.rs:31
  type ParquetFileMetrics is no longer RefUnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/metrics.rs:31

--- failure constructible_struct_adds_private_field: struct no longer constructible due to new private field ---

Description:
A struct constructible with a struct literal has a new non-public field. It can no longer be constructed using a struct literal outside of its crate.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_private_field.ron

Failed in:
  field ParquetFileMetrics.page_index_pages_skipped_by_fully_matched in /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/metrics.rs:75

     Summary semver requires new major version: 2 major and 0 minor checks failed
    Finished [  88.643s] datafusion-datasource-parquet
    Building datafusion-physical-expr-common v53.1.0 (current)
       Built [  20.482s] (current)
     Parsing datafusion-physical-expr-common v53.1.0 (current)
      Parsed [   0.020s] (current)
    Building datafusion-physical-expr-common v53.1.0 (baseline)
       Built [  20.210s] (baseline)
     Parsing datafusion-physical-expr-common v53.1.0 (baseline)
      Parsed [   0.020s] (baseline)
    Checking datafusion-physical-expr-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.196s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  42.486s] datafusion-physical-expr-common
    Building datafusion-sqllogictest v53.1.0 (current)
       Built [ 136.503s] (current)
     Parsing datafusion-sqllogictest v53.1.0 (current)
      Parsed [   0.022s] (current)
    Building datafusion-sqllogictest v53.1.0 (baseline)
       Built [ 135.436s] (baseline)
     Parsing datafusion-sqllogictest v53.1.0 (baseline)
      Parsed [   0.023s] (baseline)
    Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.085s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 277.236s] datafusion-sqllogictest

xudong963 · 2026-05-07T05:42:31Z

@alamb thanks for the review, before getting the PR in, I think it's better to have your look for the comment #21637 (comment), and it's fix commit: da7db27 (this is the lowest cost way I found to fix the metric. Let me know if you have other thoughts)

Maybe we should just add a new metric on ParquetScanMetrics 🤔

datafusion/datafusion/datasource-parquet/src/metrics.rs

Line 30 in 4c909ba

pub struct ParquetFileMetrics {

Thanks @alamb, I agree that adding a separate metric is cleaner.

I changed the PR 3f2401e to keep page_index_pages_pruned reporting only pages that were actually evaluated by page-index pruning, and added page_index_pages_skipped_by_fully_matched for pages where page-index pruning was skipped because row-group statistics already proved the row group was fully matched.

For example, the metrics can now look like:

row_groups_pruned_statistics=4 total → 3 matched -> 1 fully matched,
page_index_pages_pruned=2 total → 2 matched,
page_index_pages_skipped_by_fully_matched=1

I would read this as:

row-group statistics evaluated 4 row groups, 3 matched, and 1 of those was fully matched;
page-index pruning actually evaluated 2 pages, and both matched;
1 additional page belonged to the fully matched row group, so page-index pruning was skipped for that page. The page is still scanned; only page-index predicate evaluation was skipped.

This avoids counting statistics-derived fully matched pages as page-index matched pages.

xudong963 · 2026-05-07T05:42:56Z

+    /// because the containing row group was fully matched by row-group statistics.
+    ///
+    /// These pages are still scanned; only page-index predicate evaluation is skipped.
+    page_index_pages_skipped_by_fully_matched: LazyParquetSummaryCount,


It is registered lazily so normal Parquet scans do not show an extra page_index_pages_skipped_by_fully_matched=0 metric.

github-actions Bot added the datasource Changes to the datasource crate label Apr 15, 2026

xudong963 force-pushed the datafusion/issue-19028-benchmark branch from 54a4166 to 5da11ea Compare April 15, 2026 05:36