Skip to content

[native_datafusion] Partition count mismatch when creating HashJoinExec #2660

@andygrove

Description

@andygrove

Describe the bug

When using native_datafusion scan, we sometimes try and construct a HashJoinExec with different number of partitions for left and right inputs. This issue is currently hidden because we wrap the inputs in a CopyExec which always reports output partition count of 1.

Here is debug logging for one example. The left input has 5 partitions and the right input has 1 partition.

LEFT: FilterExec: c0@0 IS NOT NULL
  DataSourceExec: file_groups={5 groups: [[tmp/CometFuzzTestSuite_1761747873664.parquet/part-00002-e5c142ac-d0f8-4eb4-891c-4484865ded05-c000.snappy.parquet:0..133654], [tmp/CometFuzzTestSuite_1761747873664.parquet/part-00003-e5c142ac-d0f8-4eb4-891c-4484865ded05-c000.snappy.parquet:0..133566], [tmp/CometFuzzTestSuite_1761747873664.parquet/part-00000-e5c142ac-d0f8-4eb4-891c-4484865ded05-c000.snappy.parquet:0..133492], [tmp/CometFuzzTestSuite_1761747873664.parquet/part-00004-e5c142ac-d0f8-4eb4-891c-4484865ded05-c000.snappy.parquet:0..133471], [tmp/CometFuzzTestSuite_1761747873664.parquet/part-00001-e5c142ac-d0f8-4eb4-891c-4484865ded05-c000.snappy.parquet:0..132287]]}, projection=[c0], file_type=parquet, predicate=c0@0 IS NOT NULL, pruning_predicate=c0_null_count@1 != row_count@0, required_guarantees=[]

RIGHT: ScanExec: source=[BroadcastQueryStage (unknown), Statistics(sizeInBytes=1384.0 B, rowCount=1.00E+3)], schema=[col_0: Boolean]

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions