Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions python/ray/dataframe/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -1189,9 +1189,17 @@ def as_blocks(self, copy=True):
"github.com/ray-project/ray.")

def as_matrix(self, columns=None):
raise NotImplementedError(
"To contribute to Pandas on Ray, please visit "
"github.com/ray-project/ray.")
"""Convert the frame to its Numpy-array representation.

Args:
columns: If None, return all columns, otherwise,
returns specified columns.

Returns:
values: ndarray
"""
# TODO this is very inneficient, also see __array__
return to_pandas(self).as_matrix(columns)

def asfreq(self, freq, method=None, how=None, normalize=False,
fill_value=None):
Expand Down Expand Up @@ -4588,8 +4596,8 @@ def __round__(self, decimals=0):
"github.com/ray-project/ray.")

def __array__(self, dtype=None):
# TODO: This is very inefficient and needs fix
return np.array(to_pandas(self))
# TODO: This is very inefficient and needs fix, also see as_matrix
return to_pandas(self).__array__(dtype=dtype)

def __array_wrap__(self, result, context=None):
raise NotImplementedError(
Expand Down
27 changes: 24 additions & 3 deletions python/ray/dataframe/test/test_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -994,10 +994,31 @@ def test_as_blocks():


def test_as_matrix():
ray_df = create_test_dataframe()
test_data = TestData()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the fixture model for testing here and define the numpy matrix in the tests to compare against?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need a fixture model?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixture models are more in tune with what we have been using. The simplest way to do this test would be to run to_matrix or __array__ on both pd_df and ray_df and check equality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will make a pass over the tests to unify them in a later PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

frame = rdf.DataFrame(test_data.frame)
mat = frame.as_matrix()

frame_columns = frame.columns
for i, row in enumerate(mat):
for j, value in enumerate(row):
col = frame_columns[j]
if np.isnan(value):
assert np.isnan(frame[col][i])
else:
assert value == frame[col][i]

with pytest.raises(NotImplementedError):
ray_df.as_matrix()
# mixed type
mat = rdf.DataFrame(test_data.mixed_frame).as_matrix(['foo', 'A'])
assert mat[0, 0] == 'bar'

df = rdf.DataFrame({'real': [1, 2, 3], 'complex': [1j, 2j, 3j]})
mat = df.as_matrix()
assert mat[0, 0] == 1j

# single block corner case
mat = rdf.DataFrame(test_data.frame).as_matrix(['A', 'B'])
expected = test_data.frame.reindex(columns=['A', 'B']).values
tm.assert_almost_equal(mat, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test array(df) here too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify what you mean by array(df).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the __array__ function here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a test___array__, why do we need to test that here also?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot about that. In that case, this is not necessary.



def test_asfreq():
Expand Down