Skip to content

[data] rename_columns doesn't work with from_pandas #57709

@richardliaw

Description

@richardliaw

What happened + What you expected to happen

RayTaskError(UserCodeException): ray::Project() (pid=52375, ip=127.0.0.1)
  File "/Users/rliaw/miniconda3/envs/oss/lib/python3.10/site-packages/pandas/core/generic.py", line 6318, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'rename_columns'

The above exception was the direct cause of the following exception:

ray::Project() (pid=52375, ip=127.0.0.1)
  File "/Users/rliaw/miniconda3/envs/oss/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 564, in _map_task
    for b_out in map_transformer.apply_transform(iter(blocks), ctx):
  File "/Users/rliaw/miniconda3/envs/oss/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 102, in __call__
    yield from self._post_process(results)
  File "/Users/rliaw/miniconda3/envs/oss/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 84, in _shape_blocks
    for result in results:
  File "/Users/rliaw/miniconda3/envs/oss/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 412, in _apply_transform
    yield from self._block_fn(blocks, ctx)
  File "/Users/rliaw/miniconda3/envs/oss/lib/python3.10/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 620, in transform_fn
    out_block = fn(block)
  File "/Users/rliaw/miniconda3/envs/oss/lib/python3.10/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 151, in _project_block
    _try_wrap_udf_exception(e)
  File "/Users/rliaw/miniconda3/envs/oss/lib/python3.10/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 439, in _try_wrap_udf_exception
    raise UserCodeException("UDF failed to process a data block.") from e
ray.exceptions.UserCodeException: UDF failed to process a data block.

Versions / Dependencies

master

Reproduction script

import ray
import pandas as pd
pd.DataFrame({"doc_id": range(100)})
df = pd.DataFrame({"doc_id": range(100)})
ray.data.from_pandas(df).rename_columns({"doc": "doc_id"}).materialize()

Issue Severity

None

Metadata

Metadata

Assignees

Labels

bugSomething that is supposed to be working; but isn'tdataRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to RaytriageNeeds triage (eg: priority, bug/not-bug, and owning component)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions