-
Notifications
You must be signed in to change notification settings - Fork 981
Open
Labels
bugSomething isn't workingSomething isn't workingcudf-polarsIssues specific to cudf-polarsIssues specific to cudf-polars
Description
Describe the bug
Running PDSH query 17 with decimal data at scale 10 errors with an IndexError
Steps/Code to reproduce bug
Generate the data:
❯ tpchgen-cli -s 10 --format parquet --output-dir scale-10
Run the query
POLARS_GPU_ENABLE_CUDA_MANAGED_MEMORY=0 python python/cudf_polars/cudf_polars/experimental/benchmarks/pdsh.py --path scale-10 --no-print-results --no-summarize --executor streaming --iterations 1 17
which outputs
❯ POLARS_GPU_ENABLE_CUDA_MANAGED_MEMORY=0 python python/cudf_polars/cudf_polars/experimental/benchmarks/pdsh.py --path scale-10 --no-print-results --no-summarize --executor streaming --iterations 1 17 (base)
❌ query=17 iteration=0 failed!
Traceback (most recent call last):
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 222, in __call__
return self.cache[value]
~~~~~~~~~~^^^^^^^
KeyError: Select({'avg_yearly': <DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>}, (NamedExpr(avg_yearly, UnaryFunction(<DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>, 'round', (2, 'half_to_even'), Cast(<DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>, BinOp(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, <binary_operator.TRUE_DIV: 4>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Agg(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'sum', None, <ExecutionContext.FRAME: 1>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice'))), Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Literal(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, Decimal('7.00'))))))),), True, Projection({'key': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'avg_quantity': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>}, Filter({'key': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'avg_quantity': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, NamedExpr(l_quantity, BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.LESS: 23>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity')), Col(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 'avg_quantity'))), Join({'key': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'avg_quantity': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, (NamedExpr(key, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'key')),), (NamedExpr(p_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')),), ('Inner', False, None, '_right', True, 'none'), Select({'key': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'avg_quantity': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>}, (NamedExpr(key, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')), NamedExpr(avg_quantity, Col(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 'avg_quantity'))), True, Select({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'avg_quantity': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>}, (NamedExpr(p_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')), NamedExpr(avg_quantity, BinOp(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, <binary_operator.MUL: 2>, Literal(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 0.2), Col(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, '____________1')))), True, GroupBy({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, '____________1': <DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>}, (NamedExpr(p_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')),), (NamedExpr(____________1, Agg(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, 'mean', None, <ExecutionContext.GROUPBY: 2>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity')))),), False, None, Projection({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, Cache({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, 84423446721521749262699096325989021065, None, Projection({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, Join({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_container': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, (NamedExpr(p_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')),), (NamedExpr(l_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_partkey')),), ('Left', False, None, '_right', True, 'none'), Scan({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_container': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand': <DataType(polars=String, plc=<type_id.STRING: 23>)>}, 'parquet', {'schema': None, 'parallel': 'Auto', 'low_memory': False, 'use_statistics': True}, {'max_retries': 2, 'file_cache_ttl': 3600, 'config': None, 'credential_provider': None}, ['scale-10/part.parquet'], ['p_partkey', 'p_container', 'p_brand'], 0, -1, None, None, NamedExpr(p_container, BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.NULL_LOGICAL_AND: 32>, BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.EQUAL: 21>, Col(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_container'), Literal(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'MED BOX')), BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.EQUAL: 21>, Col(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand'), Literal(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'Brand#23')))), ParquetOptions(chunked=True, n_output_chunks=1, chunk_read_limit=0, pass_read_limit=0, max_footer_samples=3, max_row_group_samples=1)), Scan({'l_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, 'parquet', {'schema': None, 'parallel': 'Auto', 'low_memory': False, 'use_statistics': True}, {'max_retries': 2, 'file_cache_ttl': 3600, 'config': None, 'credential_provider': None}, ['scale-10/lineitem.parquet'], ['l_partkey', 'l_quantity', 'l_extendedprice'], 0, -1, None, None, None, ParquetOptions(chunked=True, n_output_chunks=1, chunk_read_limit=0, pass_read_limit=0, max_footer_samples=3, max_row_group_samples=1))))))))), Projection({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, Cache({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, 84423446721521749262699096325989021065, None, Projection({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, Join({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_container': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, (NamedExpr(p_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_partkey')),), (NamedExpr(l_partkey, Col(<DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_partkey')),), ('Left', False, None, '_right', True, 'none'), Scan({'p_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'p_container': <DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand': <DataType(polars=String, plc=<type_id.STRING: 23>)>}, 'parquet', {'schema': None, 'parallel': 'Auto', 'low_memory': False, 'use_statistics': True}, {'max_retries': 2, 'file_cache_ttl': 3600, 'config': None, 'credential_provider': None}, ['scale-10/part.parquet'], ['p_partkey', 'p_container', 'p_brand'], 0, -1, None, None, NamedExpr(p_container, BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.NULL_LOGICAL_AND: 32>, BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.EQUAL: 21>, Col(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_container'), Literal(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'MED BOX')), BinOp(<DataType(polars=Boolean, plc=<type_id.BOOL8: 11>)>, <binary_operator.EQUAL: 21>, Col(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'p_brand'), Literal(<DataType(polars=String, plc=<type_id.STRING: 23>)>, 'Brand#23')))), ParquetOptions(chunked=True, n_output_chunks=1, chunk_read_limit=0, pass_read_limit=0, max_footer_samples=3, max_row_group_samples=1)), Scan({'l_partkey': <DataType(polars=Int64, plc=<type_id.INT64: 4>)>, 'l_quantity': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice': <DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>}, 'parquet', {'schema': None, 'parallel': 'Auto', 'low_memory': False, 'use_statistics': True}, {'max_retries': 2, 'file_cache_ttl': 3600, 'config': None, 'credential_provider': None}, ['scale-10/lineitem.parquet'], ['l_partkey', 'l_quantity', 'l_extendedprice'], 0, -1, None, None, None, ParquetOptions(chunked=True, n_output_chunks=1, chunk_read_limit=0, pass_read_limit=0, max_footer_samples=3, max_row_group_samples=1))))))))))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 222, in __call__
return self.cache[value]
~~~~~~~~~~^^^^^^^
KeyError: UnaryFunction(<DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>, 'round', (2, 'half_to_even'), Cast(<DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>, BinOp(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, <binary_operator.TRUE_DIV: 4>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Agg(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'sum', None, <ExecutionContext.FRAME: 1>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice'))), Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Literal(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, Decimal('7.00'))))))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 222, in __call__
return self.cache[value]
~~~~~~~~~~^^^^^^^
KeyError: Cast(<DataType(polars=Decimal(precision=None, scale=6), plc=<type_id.DECIMAL128: 27>)>, BinOp(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, <binary_operator.TRUE_DIV: 4>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Agg(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'sum', None, <ExecutionContext.FRAME: 1>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice'))), Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Literal(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, Decimal('7.00')))))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 222, in __call__
return self.cache[value]
~~~~~~~~~~^^^^^^^
KeyError: BinOp(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, <binary_operator.TRUE_DIV: 4>, Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Agg(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'sum', None, <ExecutionContext.FRAME: 1>, Col(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, 'l_extendedprice'))), Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Literal(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, Decimal('7.00'))))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 222, in __call__
return self.cache[value]
~~~~~~~~~~^^^^^^^
KeyError: Cast(<DataType(polars=Float64, plc=<type_id.FLOAT64: 10>)>, Literal(<DataType(polars=Decimal(precision=15, scale=2), plc=<type_id.DECIMAL128: 27>)>, Decimal('7.00')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/benchmarks/utils.py", line 894, in run_polars
result = execute_query(q_id, i, q, run_config, args, engine)
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/benchmarks/utils.py", line 566, in execute_query
return q.collect(engine=engine)
~~~~~~~~~^^^^^^^^^^^^^^^
File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/_utils/deprecation.py", line 97, in wrapper
return function(*args, **kwargs)
File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/lazyframe/opt_flags.py", line 330, in wrapper
return function(*args, **kwargs)
File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/lazyframe/frame.py", line 2407, in collect
return wrap_df(ldf.collect(engine, callback))
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/_utils/scan.py", line 27, in _execute_from_rust
return function(with_columns, *args)
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/callback.py", line 249, in _callback
return evaluate_streaming(ir, config_options).to_polars()
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/parallel.py", line 248, in evaluate_streaming
ir, partition_info = lower_ir_graph(ir, config_options)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/parallel.py", line 94, in lower_ir_graph
return mapper(ir)
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 224, in __call__
return self.cache.setdefault(value, self.fn(value, self))
~~~~~~~^^^^^^^^^^^^^
File "/home/nfs/toaugspurger/.local/share/uv/python/cpython-3.13.3-linux-x86_64-gnu/lib/python3.13/functools.py", line 934, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/select.py", line 174, in _
return decompose_select(
ir,
...<3 lines>...
rec.state["stats"],
)
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/select.py", line 80, in decompose_select
new_ne, partial_input_ir, _partition_info = decompose_expr_graph(
~~~~~~~~~~~~~~~~~~~~^
ne,
^^^
...<4 lines>...
stats.column_stats.get(select_ir.children[0], {}),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 592, in decompose_expr_graph
expr, input_ir, partition_info = mapper(named_expr.value)
~~~~~~^^^^^^^^^^^^^^^^^^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 224, in __call__
return self.cache.setdefault(value, self.fn(value, self))
~~~~~~~^^^^^^^^^^^^^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 497, in _decompose
children, input_irs, _partition_info = zip(
~~~^
*(rec(c) for c in expr.children), strict=True
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 498, in <genexpr>
*(rec(c) for c in expr.children), strict=True
~~~^^^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 224, in __call__
return self.cache.setdefault(value, self.fn(value, self))
~~~~~~~^^^^^^^^^^^^^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 497, in _decompose
children, input_irs, _partition_info = zip(
~~~^
*(rec(c) for c in expr.children), strict=True
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 498, in <genexpr>
*(rec(c) for c in expr.children), strict=True
~~~^^^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 224, in __call__
return self.cache.setdefault(value, self.fn(value, self))
~~~~~~~^^^^^^^^^^^^^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 497, in _decompose
children, input_irs, _partition_info = zip(
~~~^
*(rec(c) for c in expr.children), strict=True
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 498, in <genexpr>
*(rec(c) for c in expr.children), strict=True
~~~^^^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/traversal.py", line 224, in __call__
return self.cache.setdefault(value, self.fn(value, self))
~~~~~~~^^^^^^^^^^^^^
File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/expressions.py", line 522, in _decompose
input_ir = unique_input_irs[0]
~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
Expected behavior
No error
Environment overview (please complete the following information)
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
- Method of cuDF install: [conda, Docker, or from source]
- If method of install is [Docker], provide
docker pull&docker runcommands used
- If method of install is [Docker], provide
Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details
Additional context
scale-1 is fine.
This fails to reproduce the error (i.e. it succeeds):
import polars as pl
a = pl.LazyFrame({
"a_key": pl.Series(list(range(2000000))),
})
b = pl.LazyFrame({
"b_key": pl.Series(list(range(60000000 // 4)) * 4),
"c": pl.Series(list(range(60000000)), dtype=pl.Decimal(15, 2)),
})
a.sink_parquet("a.parquet")
b.sink_parquet("b.parquet")
aa = pl.scan_parquet("a.parquet")
bb = pl.scan_parquet("b.parquet")
q = (
aa
.join(bb, how="left", left_on="a_key", right_on="b_key")
.select(
(pl.col("c").sum() / 7.0).round(2).alias("avg_yearly")
)
)
q.collect(engine=pl.GPUEngine())I hope to keep looking at this tomorrow.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingcudf-polarsIssues specific to cudf-polarsIssues specific to cudf-polars
Type
Projects
Status
Todo