Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions python/ray/dataframe/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -1189,9 +1189,17 @@ def as_blocks(self, copy=True):
"github.com/ray-project/ray.")

def as_matrix(self, columns=None):
raise NotImplementedError(
"To contribute to Pandas on Ray, please visit "
"github.com/ray-project/ray.")
"""Convert the frame to its Numpy-array representation.

Args:
columns: If None, return all columns, otherwise,
returns specified columns.

Returns:
values: ndarray
"""
# TODO this is very inneficient, also see __array__
return to_pandas(self).as_matrix(columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__array__ does the same thing here. Would be better if this called that under the hood, so that it can be optimized in one place later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think b/c as_matrix has the columns kwargs, that we should leave as_matrix like it is now, but then use return self.as_matrix() for __array__, so it will be optimized in one place, but we can deal with the columns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that works too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After review, it looks like they do similar things, but __array__ takes dtypes (our implementation is just disregarding that for some reason), and as_matrix takes columns, so let's just keep them separate for now. I'll add a TODO here also, though.


def asfreq(self, freq, method=None, how=None, normalize=False,
fill_value=None):
Expand Down Expand Up @@ -4588,8 +4596,8 @@ def __round__(self, decimals=0):
"github.com/ray-project/ray.")

def __array__(self, dtype=None):
# TODO: This is very inefficient and needs fix
return np.array(to_pandas(self))
# TODO: This is very inefficient and needs fix, also see as_matrix
return to_pandas(self).__array__(dtype=dtype)

def __array_wrap__(self, result, context=None):
raise NotImplementedError(
Expand Down
27 changes: 24 additions & 3 deletions python/ray/dataframe/test/test_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -994,10 +994,31 @@ def test_as_blocks():


def test_as_matrix():
ray_df = create_test_dataframe()
test_data = TestData()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the fixture model for testing here and define the numpy matrix in the tests to compare against?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need a fixture model?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixture models are more in tune with what we have been using. The simplest way to do this test would be to run to_matrix or __array__ on both pd_df and ray_df and check equality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will make a pass over the tests to unify them in a later PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

frame = rdf.DataFrame(test_data.frame)
mat = frame.as_matrix()

frameCols = frame.columns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frameCols seems a bit inconsistent with the naming convention, maybe frame_columns?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll make this change.

for i, row in enumerate(mat):
for j, value in enumerate(row):
col = frameCols[j]
if np.isnan(value):
assert np.isnan(frame[col][i])
else:
assert value == frame[col][i]

with pytest.raises(NotImplementedError):
ray_df.as_matrix()
# mixed type
mat = rdf.DataFrame(test_data.mixed_frame).as_matrix(['foo', 'A'])
assert mat[0, 0] == 'bar'

df = rdf.DataFrame({'real': [1, 2, 3], 'complex': [1j, 2j, 3j]})
mat = df.as_matrix()
assert mat[0, 0] == 1j

# single block corner case
mat = rdf.DataFrame(test_data.frame).as_matrix(['A', 'B'])
expected = test_data.frame.reindex(columns=['A', 'B']).values
tm.assert_almost_equal(mat, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test array(df) here too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify what you mean by array(df).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the __array__ function here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a test___array__, why do we need to test that here also?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot about that. In that case, this is not necessary.



def test_asfreq():
Expand Down