Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary operations on Series + DataFrame doesn't work #4578

Open
dchigarev opened this issue Jun 15, 2022 · 1 comment
Open

Binary operations on Series + DataFrame doesn't work #4578

dchigarev opened this issue Jun 15, 2022 · 1 comment
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas

Comments

@dchigarev
Copy link
Collaborator

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Any
  • Modin version (modin.__version__): 4ec7f63
  • Python version: 3.8.10
  • Code we can use to reproduce:
import modin.pandas as pd


df = pd.DataFrame({"a": [1, 2, 3]}).T
sr = pd.Series([10, 20, 30])

print(f"Pandas:\n{sr._to_pandas() + df._to_pandas()}")
print(f"Modin:\n{sr + df}")
Output
Pandas:      
    0   1   2
a  11  22  33
Traceback (most recent call last):
  File "t3.py", line 8, in <module>
    print(f"Modin:\n{sr + df}")
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\logging\logger_metaclass.py", line 68, in log_wrap
    return method(*args, **kwargs)
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\pandas\series.py", line 163, in __add__
    return self.add(right)
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\logging\logger_metaclass.py", line 68, in log_wrap
    return method(*args, **kwargs)
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\pandas\series.py", line 514, in add
    return super(Series, new_self).add(
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\logging\logger_metaclass.py", line 68, in log_wrap
    return method(*args, **kwargs)
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\pandas\base.py", line 593, in add
    return self._binary_op(
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\logging\logger_metaclass.py", line 68, in log_wrap
    return method(*args, **kwargs)
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\pandas\base.py", line 431, in _binary_op
    new_query_compiler = getattr(self._query_compiler, op)(other, **kwargs)
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\logging\logger_metaclass.py", line 68, in log_wrap
    return method(*args, **kwargs)
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\core\dataframe\algebra\binary.py", line 92, in caller
    query_compiler._modin_frame.binary_op(
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\logging\logger_metaclass.py", line 68, in log_wrap
    return method(*args, **kwargs)
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\core\dataframe\pandas\dataframe\dataframe.py", line 115, in run_f_on_minimally_updated_metadata
    result = f(self, *args, **kwargs)
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\core\dataframe\pandas\dataframe\dataframe.py", line 2516, in binary_op
    return self.__constructor__(
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\logging\logger_metaclass.py", line 68, in log_wrap
    return method(*args, **kwargs)
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\core\dataframe\pandas\dataframe\dataframe.py", line 210, in __init__
    ErrorMessage.catch_bugs_and_request_email(
  File "C:\Users\rp-re\OneDrive\Desktop\rep\modin\modin\error_message.py", line 70, in catch_bugs_and_request_email        
    raise Exception(
Exception: Internal Error. Please visit https://github.com/modin-project/modin/issues to file an issue with the traceback and the command that caused this error. If you can't file a GitHub issue, please email [email protected].
Column widths: 1 != 4

Describe the problem

The code fails on the column widths check when constructing the binary operation result.

The problem is that the binary_op is designed for df + df operations only. The handling of mixin a frame and a series has to be done via broadcasting a series to every column of the frame instead of attempting to align the shapes of two operands. We already have the broadcasting logic inside Binary operator, the logic is triggered when broadcast parameter is True (happens in cases of df + series), however, the parameter appears to be False when series + df.

@vnlitvinov
Copy link
Collaborator

We don't fail with exception anymore, but the output is wrong anyway:

>>> print(f"Pandas:\n{sr._to_pandas() + df._to_pandas()}")
Pandas:
    0   1   2
a  11  22  33
>>> print(f"Modin:\n{sr + df}")
Modin:
   __reduced__   0   1   2
0          NaN NaN NaN NaN
1          NaN NaN NaN NaN
2          NaN NaN NaN NaN
a          NaN NaN NaN NaN

@vnlitvinov vnlitvinov added P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas labels Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas
Projects
None yet
Development

No branches or pull requests

3 participants