Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Mar 31, 2025

This is still a draft as the branch appears to hang on certain queries:

(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ cargo test --test sqllogictests -- parquet_filter_pushdown
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.24s
     Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-ae4ca2e4c85de797)
[00:00:00] ##########################--------------      11/17      "parquet_filter_pushdown.slt"
.. hangs indefinitely ..

Which issue does this PR close?

Rationale for this change

This PR is designed to verify the changes from @XiangpengHao 's pushdown encoder"

What changes are included in this PR?

  1. Pin to Experimental parquet decoder with first-class selection pushdown support arrow-rs#6921
  2. Enable filter pushdown by default

Are these changes tested?

Are there any user-facing changes?

Benchmarks

Not too shabby!

I need to look at some of these queries that report being slower to see if there is somethig we cna do to make the speed back up


--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_filter_pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.57ms │                0.58ms │     no change │
│ QQuery 1     │    68.16ms │               70.64ms │     no change │
│ QQuery 2     │   116.52ms │              118.28ms │     no change │
│ QQuery 3     │   123.86ms │              123.76ms │     no change │
│ QQuery 4     │   776.15ms │              797.72ms │     no change │
│ QQuery 5     │   848.82ms │              880.01ms │     no change │
│ QQuery 6     │    64.80ms │               67.14ms │     no change │
│ QQuery 7     │    77.36ms │               93.04ms │  1.20x slower │
│ QQuery 8     │   957.21ms │              983.79ms │     no change │
│ QQuery 9     │  1239.49ms │             1275.73ms │     no change │
│ QQuery 10    │   299.68ms │              318.01ms │  1.06x slower │
│ QQuery 11    │   344.19ms │              360.58ms │     no change │
│ QQuery 12    │   945.45ms │             1058.68ms │  1.12x slower │
│ QQuery 13    │  1323.58ms │             1548.38ms │  1.17x slower │
│ QQuery 14    │   885.86ms │             1063.57ms │  1.20x slower │
│ QQuery 15    │  1110.56ms │             1134.68ms │     no change │
│ QQuery 16    │  1834.24ms │             1789.95ms │     no change │
│ QQuery 17    │  1662.90ms │             1650.05ms │     no change │
│ QQuery 18    │  3176.45ms │             3164.17ms │     no change │
│ QQuery 19    │   116.43ms │              123.84ms │  1.06x slower │
│ QQuery 20    │  1206.01ms │             1204.47ms │     no change │
│ QQuery 21    │  1445.16ms │             1351.41ms │ +1.07x faster │
│ QQuery 22    │  2708.56ms │             2401.18ms │ +1.13x faster │
│ QQuery 23    │  8690.72ms │             5234.73ms │ +1.66x faster │
│ QQuery 24    │   509.28ms │              684.55ms │  1.34x slower │
│ QQuery 25    │   426.36ms │              553.26ms │  1.30x slower │
│ QQuery 26    │   581.56ms │              802.46ms │  1.38x slower │
│ QQuery 27    │  1797.38ms │             2464.11ms │  1.37x slower │
│ QQuery 28    │ 13274.54ms │            14650.91ms │  1.10x slower │
│ QQuery 29    │   629.26ms │              598.10ms │     no change │
│ QQuery 30    │   970.56ms │             1286.68ms │  1.33x slower │
│ QQuery 31    │  1008.49ms │             1398.40ms │  1.39x slower │
│ QQuery 32    │  3220.54ms │             3249.25ms │     no change │
│ QQuery 33    │  3948.22ms │             3595.57ms │ +1.10x faster │
│ QQuery 34    │  3968.56ms │             3536.51ms │ +1.12x faster │
│ QQuery 35    │  1477.42ms │             1317.28ms │ +1.12x faster │
│ QQuery 36    │   310.58ms │              277.12ms │ +1.12x faster │
│ QQuery 37    │   144.55ms │              136.09ms │ +1.06x faster │
│ QQuery 38    │   188.40ms │              167.13ms │ +1.13x faster │
│ QQuery 39    │   537.25ms │              419.03ms │ +1.28x faster │
│ QQuery 40    │    73.01ms │              110.69ms │  1.52x slower │
│ QQuery 41    │    83.59ms │              105.50ms │  1.26x slower │
│ QQuery 42    │    91.66ms │               96.24ms │     no change │
└──────────────┴────────────┴───────────────────────┴───────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)               │ 63263.96ms │
│ Total Time (alamb_filter_pushdown)   │ 62263.28ms │
│ Average Time (main_base)             │  1471.25ms │
│ Average Time (alamb_filter_pushdown) │  1447.98ms │
│ Queries Faster                       │         10 │
│ Queries Slower                       │         15 │
│ Queries with No Change               │         18 │
└──────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_filter_pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2255.92ms │             1949.07ms │ +1.16x faster │
│ QQuery 1     │   750.84ms │              716.29ms │     no change │
│ QQuery 2     │  1618.66ms │             1410.83ms │ +1.15x faster │
│ QQuery 3     │   739.24ms │              703.75ms │     no change │
│ QQuery 4     │  1659.53ms │             1715.83ms │     no change │
│ QQuery 5     │ 18684.93ms │            17202.28ms │ +1.09x faster │
└──────────────┴────────────┴───────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)               │ 25709.12ms │
│ Total Time (alamb_filter_pushdown)   │ 23698.05ms │
│ Average Time (main_base)             │  4284.85ms │
│ Average Time (alamb_filter_pushdown) │  3949.67ms │
│ Queries Faster                       │          3 │
│ Queries Slower                       │          0 │
│ Queries with No Change               │          3 │
└──────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_filter_pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     3.56ms │                2.75ms │ +1.30x faster │
│ QQuery 1     │    40.80ms │               44.22ms │  1.08x slower │
│ QQuery 2     │   101.20ms │               95.75ms │ +1.06x faster │
│ QQuery 3     │   104.30ms │              100.38ms │     no change │
│ QQuery 4     │   837.17ms │              760.17ms │ +1.10x faster │
│ QQuery 5     │   980.95ms │              872.30ms │ +1.12x faster │
│ QQuery 6     │    40.34ms │               37.64ms │ +1.07x faster │
│ QQuery 7     │    44.91ms │               63.34ms │  1.41x slower │
│ QQuery 8     │  1069.04ms │              953.13ms │ +1.12x faster │
│ QQuery 9     │  1450.28ms │             1225.24ms │ +1.18x faster │
│ QQuery 10    │   310.34ms │              301.07ms │     no change │
│ QQuery 11    │   349.23ms │              351.62ms │     no change │
│ QQuery 12    │  1095.13ms │             1081.91ms │     no change │
│ QQuery 13    │  1635.08ms │             1496.78ms │ +1.09x faster │
│ QQuery 14    │   988.22ms │             1101.26ms │  1.11x slower │
│ QQuery 15    │  1185.88ms │             1116.01ms │ +1.06x faster │
│ QQuery 16    │  2003.29ms │             1804.50ms │ +1.11x faster │
│ QQuery 17    │  1822.90ms │             1638.95ms │ +1.11x faster │
│ QQuery 18    │  3521.78ms │             3125.60ms │ +1.13x faster │
│ QQuery 19    │    93.58ms │              100.39ms │  1.07x slower │
│ QQuery 20    │  1260.89ms │             1150.78ms │ +1.10x faster │
│ QQuery 21    │  1528.21ms │             1303.30ms │ +1.17x faster │
│ QQuery 22    │  2742.69ms │             2316.73ms │ +1.18x faster │
│ QQuery 23    │  9454.13ms │             4883.92ms │ +1.94x faster │
│ QQuery 24    │   520.27ms │              703.58ms │  1.35x slower │
│ QQuery 25    │   435.35ms │              490.67ms │  1.13x slower │
│ QQuery 26    │   606.38ms │              765.75ms │  1.26x slower │
│ QQuery 27    │  1837.71ms │             2154.70ms │  1.17x slower │
│ QQuery 28    │ 13463.85ms │            13430.95ms │     no change │
│ QQuery 29    │   558.16ms │              536.66ms │     no change │
│ QQuery 30    │   934.19ms │             1342.03ms │  1.44x slower │
│ QQuery 31    │   985.71ms │             1388.86ms │  1.41x slower │
│ QQuery 32    │  3225.41ms │             2742.11ms │ +1.18x faster │
│ QQuery 33    │  3887.36ms │             3454.26ms │ +1.13x faster │
│ QQuery 34    │  3852.77ms │             3411.24ms │ +1.13x faster │
│ QQuery 35    │  1530.92ms │             1322.73ms │ +1.16x faster │
│ QQuery 36    │   265.16ms │              237.61ms │ +1.12x faster │
│ QQuery 37    │   105.59ms │              100.96ms │     no change │
│ QQuery 38    │   142.28ms │              142.17ms │     no change │
│ QQuery 39    │   522.51ms │              425.74ms │ +1.23x faster │
│ QQuery 40    │    60.05ms │               89.14ms │  1.48x slower │
│ QQuery 41    │    50.38ms │               78.62ms │  1.56x slower │
│ QQuery 42    │    62.24ms │               70.20ms │  1.13x slower │
└──────────────┴────────────┴───────────────────────┴───────────────┘


--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_filter_pushdown ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │  125.02ms │              125.37ms │    no change │
│ QQuery 2     │   24.36ms │               24.40ms │    no change │
│ QQuery 3     │   36.76ms │               35.86ms │    no change │
│ QQuery 4     │   20.87ms │               20.76ms │    no change │
│ QQuery 5     │   57.47ms │               56.27ms │    no change │
│ QQuery 6     │    8.11ms │                8.39ms │    no change │
│ QQuery 7     │  102.83ms │              103.92ms │    no change │
│ QQuery 8     │   26.51ms │               26.58ms │    no change │
│ QQuery 9     │   62.19ms │               63.73ms │    no change │
│ QQuery 10    │   60.77ms │               60.92ms │    no change │
│ QQuery 11    │   13.11ms │               13.00ms │    no change │
│ QQuery 12    │   37.41ms │               38.59ms │    no change │
│ QQuery 13    │   30.73ms │               29.83ms │    no change │
│ QQuery 14    │    9.86ms │               10.15ms │    no change │
│ QQuery 15    │   25.03ms │               26.15ms │    no change │
│ QQuery 16    │   25.89ms │               24.94ms │    no change │
│ QQuery 17    │   95.52ms │               98.14ms │    no change │
│ QQuery 18    │  253.23ms │              252.32ms │    no change │
│ QQuery 19    │   28.84ms │               30.05ms │    no change │
│ QQuery 20    │   40.94ms │               41.14ms │    no change │
│ QQuery 21    │  172.30ms │              181.19ms │ 1.05x slower │
│ QQuery 22    │   18.05ms │               17.22ms │    no change │
└──────────────┴───────────┴───────────────────────┴──────────────┘


@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate functions Changes to functions implementation labels Mar 31, 2025
@alamb
Copy link
Contributor Author

alamb commented Apr 1, 2025

I wrote up a performance analysis here:

@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Jun 11, 2025
@github-actions github-actions bot closed this Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt) Stale PR has not had any activity for some time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant