Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43112: [Python] Set nullable Int64 dtype for integer columns with None values when converting to pandas #44538

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

attwelveDev
Copy link

@attwelveDev attwelveDev commented Oct 26, 2024

Rationale for this change

When calling to_pandas method on a Table object, integer columns with at least one None value are converted to float64 in the resultant pandas DataFrame. For example, using the from_pydict method to return a Table from a Python Dictionary, say, {“col_name”: [1, None]}, the to_pandas method returns a DataFrame where the dtype of “col_name” is float64, and whose values are [1.0, NaN]. This may cause precision issues when certain integers cannot be precisely converted to a float.

What changes are included in this PR?

In the table_to_dataframe method, columns with the int64 dtype and have at least one None value now have the Int64 dtype in ext_columns_dtypes.

Various existing tests were modified to reflect this new behaviour.

Looking to close #43112.

Are these changes tested?

Tests have been added as part of test_pandas_dtype_conversions in /python/pyarrow/tests/test_pandas.py.

Are there any user-facing changes?

This may affect instances where integer columns with None values are expected to be converted to float64.

Additional Notes

There are failing CI tests for other languages, possibly due to unrelated issues from the upstream main branch. All CI tests for Python have been verified to pass.

This is my first contribution to an open source project, and I appreciate any feedback.

@Isaac7777-cpu-school
Copy link

Everything looks good. Why is the CI failing?

@attwelveDev
Copy link
Author

Everything looks good. Why is the CI failing?

I'm not actually sure. It appears TestConvertPrimitiveTypes.test_integer_with_nulls is failing, but I did change this test method after which the CI in my fork passed. I'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Table.to_pandas() converts ints to doubles
2 participants