Skip to content

Commit

Permalink
ARROW-5480: [Python] Add unit test asserting specifically that pandas…
Browse files Browse the repository at this point in the history
….Categorical roundtrips to Parquet format without special options

This only works for string types for the moment. Once ARROW-6277 is addressed we can expand to other types.

Closes #5110 from wesm/ARROW-5480 and squashes the following commits:

a161b24 <Wes McKinney> Add missing pandas marks
f1f8082 <Wes McKinney> Don't use pandas's Parquet functions since they don't work in CI for some reason
9e98404 <Wes McKinney> Improve unit test for out-of-order values, nulls, unobserved category values
620b3b8 <Wes McKinney> Add unit test for ARROW-5480

Authored-by: Wes McKinney <[email protected]>
Signed-off-by: Wes McKinney <[email protected]>
  • Loading branch information
wesm committed Aug 19, 2019
1 parent 0405116 commit c4b8cb6
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions python/pyarrow/tests/test_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -3033,6 +3033,25 @@ def test_pandas_categorical_na_type_row_groups():
assert result[1].equals(table[1])


@pytest.mark.pandas
def test_pandas_categorical_roundtrip():
# ARROW-5480, this was enabled by ARROW-3246

# Have one of the categories unobserved and include a null (-1)
codes = np.array([2, 0, 0, 2, 0, -1, 2], dtype='int32')
categories = ['foo', 'bar', 'baz']
df = pd.DataFrame({'x': pd.Categorical.from_codes(
codes, categories=categories)})

buf = pa.BufferOutputStream()
pq.write_table(pa.table(df), buf)

result = pq.read_table(buf.getvalue()).to_pandas()
assert result.x.dtype == 'category'
assert (result.x.cat.categories == categories).all()
tm.assert_frame_equal(result, df)


@pytest.mark.pandas
def test_multi_dataset_metadata(tempdir):
filenames = ["ARROW-1983-dataset.0", "ARROW-1983-dataset.1"]
Expand Down

0 comments on commit c4b8cb6

Please sign in to comment.