If I run a query like the following:
SELECT MIN(fare_amount), MAX(fare_amount) FROM tripdata
I see this logical plan:
Logical plan: Aggregate: groupBy=[[]], aggr=[[MIN(#10), MAX(#10)]]
TableScan: tripdata projection=None
This means that every column is being loaded into arrays rather than just the two columns that I care about, resulting in terrible performance.
Reporter: Andy Grove / @andygrove
Assignee: Andy Grove / @andygrove
PRs and other links:
Note: This issue was originally created as ARROW-4589. Please see the migration documentation for further details.