Skip to content

[BUG]: IndexError in _decompose when running pdsh query 17 with decimal data #20408

@TomAugspurger

Description

@TomAugspurger

Describe the bug

Running PDSH query 17 with decimal data at scale 10 errors with an IndexError

Steps/Code to reproduce bug

Generate the data:

❯ tpchgen-cli -s 10  --format parquet --output-dir scale-10

Run the query

POLARS_GPU_ENABLE_CUDA_MANAGED_MEMORY=0 python python/cudf_polars/cudf_polars/experimental/benchmarks/pdsh.py --path scale-10 --no-print-results --no-summarize --executor streaming --iterations 1 17                                                                                     

which outputs

❯ POLARS_GPU_ENABLE_CUDA_MANAGED_MEMORY=0 python python/cudf_polars/cudf_polars/experimental/benchmarks/pdsh.py --path scale-10 --no-print-results --no-summarize --executor streaming --iterations 1 17                                                                                     (base) 
❌ query=17 iteration=0 failed!
Traceback (most recent call last):
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 222, in __call__
    return self.cache[value]
           ~~~~~~~~~~^^^^^^^
KeyError: Select({'avg_yearly': <DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>}, (NamedExpr(avg_yearly, UnaryFunction(<DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>, 'round', (2, 'half_to_even'), Cast(<DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>, BinOp(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, <binary_operator.TRUE_DIV: 4>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Agg(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'sum', None, <ExecutionContext.FRAME: 1>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice'))), Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Literal(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, Decimal('7.00'))))))),), True, Projection({'key': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'avg_quantity': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>}, Filter({'key': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'avg_quantity': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, NamedExpr(l_quantity, BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.LESS: 23>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity')), Col(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 'avg_quantity'))), Join({'key': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'avg_quantity': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, (NamedExpr(key, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'key')),), (NamedExpr(p_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')),), ('Inner', False, None, '_right', True, 'none'), Select({'key': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'avg_quantity': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>}, (NamedExpr(key, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')), NamedExpr(avg_quantity, Col(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 'avg_quantity'))), True, Select({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'avg_quantity': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>}, (NamedExpr(p_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')), NamedExpr(avg_quantity, BinOp(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, <binary_operator.MUL: 2>, Literal(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 0.2), Col(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, '____________1')))), True, GroupBy({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, '____________1': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>}, (NamedExpr(p_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')),), (NamedExpr(____________1, Agg(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 'mean', None, <ExecutionContext.GROUPBY: 2>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity')))),), False, None, Projection({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, Cache({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, 84423446721521749262699096325989021065, None, Projection({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, Join({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_container': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, (NamedExpr(p_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')),), (NamedExpr(l_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_partkey')),), ('Left', False, None, '_right', True, 'none'), Scan({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_container': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand': <DataType(polars=String, plc=<type_id.STRING: 23>)>}, 'parquet', {'schema': None, 'parallel': 'Auto', 'low_memory': False, 'use_statistics': True}, {'max_retries': 2, 'file_cache_ttl': 3600, 'config': None, 'credential_provider': None}, ['scale-10/part.parquet'], ['p_partkey', 'p_container', 'p_brand'], 0, -1, None, None, NamedExpr(p_container, BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.NULL_LOGICAL_AND: 32>, BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.EQUAL: 21>, Col(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_container'), Literal(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'MED BOX')), BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.EQUAL: 21>, Col(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand'), Literal(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'Brand#23')))), ParquetOptions(chunked=True, n_output_chunks=1, chunk_read_limit=0, pass_read_limit=0, max_footer_samples=3, max_row_group_samples=1)), Scan({'l_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, 'parquet', {'schema': None, 'parallel': 'Auto', 'low_memory': False, 'use_statistics': True}, {'max_retries': 2, 'file_cache_ttl': 3600, 'config': None, 'credential_provider': None}, ['scale-10/lineitem.parquet'], ['l_partkey', 'l_quantity', 'l_extendedprice'], 0, -1, None, None, None, ParquetOptions(chunked=True, n_output_chunks=1, chunk_read_limit=0, pass_read_limit=0, max_footer_samples=3, max_row_group_samples=1))))))))), Projection({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, Cache({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, 84423446721521749262699096325989021065, None, Projection({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, Join({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_container': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, (NamedExpr(p_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')),), (NamedExpr(l_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_partkey')),), ('Left', False, None, '_right', True, 'none'), Scan({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_container': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand': <DataType(polars=String, plc=<type_id.STRING: 23>)>}, 'parquet', {'schema': None, 'parallel': 'Auto', 'low_memory': False, 'use_statistics': True}, {'max_retries': 2, 'file_cache_ttl': 3600, 'config': None, 'credential_provider': None}, ['scale-10/part.parquet'], ['p_partkey', 'p_container', 'p_brand'], 0, -1, None, None, NamedExpr(p_container, BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.NULL_LOGICAL_AND: 32>, BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.EQUAL: 21>, Col(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_container'), Literal(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'MED BOX')), BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.EQUAL: 21>, Col(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand'), Literal(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'Brand#23')))), ParquetOptions(chunked=True, n_output_chunks=1, chunk_read_limit=0, pass_read_limit=0, max_footer_samples=3, max_row_group_samples=1)), Scan({'l_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, 'parquet', {'schema': None, 'parallel': 'Auto', 'low_memory': False, 'use_statistics': True}, {'max_retries': 2, 'file_cache_ttl': 3600, 'config': None, 'credential_provider': None}, ['scale-10/lineitem.parquet'], ['l_partkey', 'l_quantity', 'l_extendedprice'], 0, -1, None, None, None, ParquetOptions(chunked=True, n_output_chunks=1, chunk_read_limit=0, pass_read_limit=0, max_footer_samples=3, max_row_group_samples=1))))))))))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 222, in __call__
    return self.cache[value]
           ~~~~~~~~~~^^^^^^^
KeyError: UnaryFunction(<DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>, 'round', (2, 'half_to_even'), Cast(<DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>, BinOp(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, <binary_operator.TRUE_DIV: 4>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Agg(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'sum', None, <ExecutionContext.FRAME: 1>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice'))), Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Literal(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, Decimal('7.00'))))))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 222, in __call__
    return self.cache[value]
           ~~~~~~~~~~^^^^^^^
KeyError: Cast(<DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>, BinOp(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, <binary_operator.TRUE_DIV: 4>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Agg(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'sum', None, <ExecutionContext.FRAME: 1>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice'))), Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Literal(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, Decimal('7.00')))))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 222, in __call__
    return self.cache[value]
           ~~~~~~~~~~^^^^^^^
KeyError: BinOp(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, <binary_operator.TRUE_DIV: 4>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Agg(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'sum', None, <ExecutionContext.FRAME: 1>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice'))), Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Literal(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, Decimal('7.00'))))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 222, in __call__
    return self.cache[value]
           ~~~~~~~~~~^^^^^^^
KeyError: Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Literal(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, Decimal('7.00')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/benchmarks/utils.py", line 894, in run_polars
    result = execute_query(q_id, i, q, run_config, args, engine)
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/benchmarks/utils.py", line 566, in execute_query
    return q.collect(engine=engine)
           ~~~~~~~~~^^^^^^^^^^^^^^^
  File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/_utils/deprecation.py", line 97, in wrapper
    return function(*args, **kwargs)
  File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/lazyframe/opt_flags.py", line 330, in wrapper
    return function(*args, **kwargs)
  File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/lazyframe/frame.py", line 2407, in collect
    return wrap_df(ldf.collect(engine, callback))
                   ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/_utils/scan.py", line 27, in _execute_from_rust
    return function(with_columns, *args)
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/callback.py", line 249, in _callback
    return evaluate_streaming(ir, config_options).to_polars()
           ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/parallel.py", line 248, in evaluate_streaming
    ir, partition_info = lower_ir_graph(ir, config_options)
                         ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/parallel.py", line 94, in lower_ir_graph
    return mapper(ir)
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 224, in __call__
    return self.cache.setdefault(value, self.fn(value, self))
                                        ~~~~~~~^^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/.local/share/uv/python/cpython-3.13.3-linux-x86_64-gnu/lib/python3.13/functools.py", line 934, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/select.py", line 174, in _
    return decompose_select(
        ir,
    ...<3 lines>...
        rec.state["stats"],
    )
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/select.py", line 80, in decompose_select
    new_ne, partial_input_ir, _partition_info = decompose_expr_graph(
                                                ~~~~~~~~~~~~~~~~~~~~^
        ne,
        ^^^
    ...<4 lines>...
        stats.column_stats.get(select_ir.children[0], {}),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 592, in decompose_expr_graph
    expr, input_ir, partition_info = mapper(named_expr.value)
                                     ~~~~~~^^^^^^^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 224, in __call__
    return self.cache.setdefault(value, self.fn(value, self))
                                        ~~~~~~~^^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 497, in _decompose
    children, input_irs, _partition_info = zip(
                                           ~~~^
        *(rec(c) for c in expr.children), strict=True
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 498, in <genexpr>
    *(rec(c) for c in expr.children), strict=True
      ~~~^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 224, in __call__
    return self.cache.setdefault(value, self.fn(value, self))
                                        ~~~~~~~^^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 497, in _decompose
    children, input_irs, _partition_info = zip(
                                           ~~~^
        *(rec(c) for c in expr.children), strict=True
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 498, in <genexpr>
    *(rec(c) for c in expr.children), strict=True
      ~~~^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 224, in __call__
    return self.cache.setdefault(value, self.fn(value, self))
                                        ~~~~~~~^^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 497, in _decompose
    children, input_irs, _partition_info = zip(
                                           ~~~^
        *(rec(c) for c in expr.children), strict=True
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 498, in <genexpr>
    *(rec(c) for c in expr.children), strict=True
      ~~~^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 224, in __call__
    return self.cache.setdefault(value, self.fn(value, self))
                                        ~~~~~~~^^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 522, in _decompose
    input_ir = unique_input_irs[0]
               ~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

Expected behavior

No error

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of cuDF install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context

scale-1 is fine.

This fails to reproduce the error (i.e. it succeeds):

import polars as pl


a = pl.LazyFrame({
    "a_key": pl.Series(list(range(2000000))),
})
b = pl.LazyFrame({
    "b_key": pl.Series(list(range(60000000 // 4)) * 4),
    "c": pl.Series(list(range(60000000)), dtype=pl.Decimal(15, 2)),
})
a.sink_parquet("a.parquet")
b.sink_parquet("b.parquet")

aa = pl.scan_parquet("a.parquet")
bb = pl.scan_parquet("b.parquet")

q = (
    aa
    .join(bb, how="left", left_on="a_key", right_on="b_key")
    .select(
        (pl.col("c").sum() / 7.0).round(2).alias("avg_yearly")
    )
)

q.collect(engine=pl.GPUEngine())

I hope to keep looking at this tomorrow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcudf-polarsIssues specific to cudf-polars

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions