GH-43112: [Python] Set nullable `Int64` `dtype` for integer columns with `None` values when converting to pandas #44538

attwelveDev · 2024-10-26T13:41:19Z

Rationale for this change

When calling to_pandas method on a Table object, integer columns with at least one None value are converted to float64 in the resultant pandas DataFrame. For example, using the from_pydict method to return a Table from a Python Dictionary, say, {“col_name”: [1, None]}, the to_pandas method returns a DataFrame where the dtype of “col_name” is float64, and whose values are [1.0, NaN]. This may cause precision issues when certain integers cannot be precisely converted to a float.

What changes are included in this PR?

In the table_to_dataframe method, columns with the int64 dtype and have at least one None value now have the Int64 dtype in ext_columns_dtypes.

Various existing tests were modified to reflect this new behaviour.

Looking to close #43112.

Are these changes tested?

Tests have been added as part of test_pandas_dtype_conversions in /python/pyarrow/tests/test_pandas.py.

Are there any user-facing changes?

This may affect instances where integer columns with None values are expected to be converted to float64.

Additional Notes

There are failing CI tests for other languages, possibly due to unrelated issues from the upstream main branch. All CI tests for Python have been verified to pass.

This is my first contribution to an open source project, and I appreciate any feedback.

GitHub Issue: Table.to_pandas() converts ints to doubles #43112

Isaac7777-cpu-school · 2024-10-27T01:54:38Z

Everything looks good. Why is the CI failing?

attwelveDev · 2024-10-27T04:00:58Z

Everything looks good. Why is the CI failing?

I'm not actually sure. It appears TestConvertPrimitiveTypes.test_integer_with_nulls is failing, but I did change this test method after which the CI in my fork passed. I'll look into it.

attwelveDev and others added 22 commits October 19, 2024 18:16

Ensure integer arrays have Int64 dtype when creating pandas df

b8193f5

Test dtype conversions when creating pandas dfs

08d11b0

Remove unused import

81d05ac

Added comment to clarify nullable-integer array dtype in test

418525c

Revised logic to check if array has only integers and None values

e6c3c9d

Revised spacing to satisfy linting check

bd40e56

Added bool to prevent dtype being overridden

d6cd89f

Added logging

7540afe

Fix compilation issue

02943be

Added more logging

20daafd

Added more logging

f0a0378

Added conversion to Int64 dtype in _get_extension_dtypes method

c83d0fa

Fixed attribute error

5b90628

Added Int64 check

32d4d5f

Edited tests failing due to implementation

1bf1970

Edited more tests failing due to implementation

d7a02fd

Remove logs

196d2ca

Remove more logs

ec5da34

Changed failing doctests to implementation

ebbe1c7

Moved and parametrised dtype tests

ecf7209

Revert parametrisation of tests due to pd error

9ff8040

Merge branch 'apache:main' into main

2acb873

github-actions bot added Component: Python awaiting review Awaiting review labels Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-43112: [Python] Set nullable `Int64` `dtype` for integer columns with `None` values when converting to pandas #44538

GH-43112: [Python] Set nullable `Int64` `dtype` for integer columns with `None` values when converting to pandas #44538

attwelveDev commented Oct 26, 2024 •

edited

Loading

Isaac7777-cpu-school commented Oct 27, 2024

attwelveDev commented Oct 27, 2024

GH-43112: [Python] Set nullable Int64 dtype for integer columns with None values when converting to pandas #44538

Are you sure you want to change the base?

GH-43112: [Python] Set nullable Int64 dtype for integer columns with None values when converting to pandas #44538

Conversation

attwelveDev commented Oct 26, 2024 • edited Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Additional Notes

Isaac7777-cpu-school commented Oct 27, 2024

attwelveDev commented Oct 27, 2024

GH-43112: [Python] Set nullable `Int64` `dtype` for integer columns with `None` values when converting to pandas #44538

GH-43112: [Python] Set nullable `Int64` `dtype` for integer columns with `None` values when converting to pandas #44538

attwelveDev commented Oct 26, 2024 •

edited

Loading