Add functions to compare Column objects with iterable references and to compare DataFrame objects with mapping references by anmyachev · Pull Request #66 · data-apis/dataframe-api-compat

anmyachev · 2024-01-05T01:33:11Z

The changes are aimed at getting rid of the use of the interchange_to_pandas function, so that the tests were implementation independent.

So far the new functions have only been applied to tests\column folder.

MarcoGorelli · 2024-01-05T06:46:09Z

nice idea! do we want to check the data type too?

anmyachev · 2024-01-06T23:57:11Z

pyproject.toml

 force-single-line = true

+[tool.black]
+line-length = 90


To sync with pre-commit.

anmyachev · 2024-01-06T23:59:36Z

tests/utils.py

 import dataframe_api_compat.pandas_standard
 import dataframe_api_compat.polars_standard

-DType = TypeVar("DType")


Looks unused, can return if needed.

anmyachev · 2024-01-07T00:44:17Z

tests/column/col_sorted_indices_test.py



 def test_column_sorted_indices_ascending(library: str) -> None:
-    df = integer_dataframe_6(library).persist()


I deleted .persist() call in several places, since the same call occurs in new comparison functions, which generates warnings, but due to the repository settings - errors. If this is incorrect, then we need a public way to check the ._is_persisted field, so as not to call the method several times.

anmyachev · 2024-01-07T00:47:04Z

tests/column/pow_test.py

-    pd.testing.assert_frame_equal(result_pd, expected)
+    expected = {"a": [1, 2, 3], "b": [4, 5, 6], "result": [1.0, 32.0, 729.0]}
+    expected_dtype = {"a": ns.Int64, "b": ns.Int64, "result": ns.Float64}
+    compare_dataframe_with_reference(result, expected, expected_dtype)  # type: ignore[arg-type]


I don’t know exactly why in some places mypy gives an error that has to be turned off, because it is a false positive. The first thing that catches my eye is that the lists inside the dictionaries have different types, for example int and float (not a homogeneous type).

anmyachev · 2024-01-07T00:49:20Z

dataframe_api_compat/pandas_standard/__init__.py

    if dtype == "Float32":
        return Namespace.Float32()
-    if dtype == "bool":
+    if dtype in ("bool", "boolean"):


I discovered it by accident while experimenting. It is possible that this is no longer necessary for the current changes.

anmyachev · 2024-01-07T00:50:51Z

dataframe_api_compat/pandas_standard/column_object.py

    "UInt16": "uint16",
    "UInt8": "uint8",
    "boolean": "bool",
+    "Float64": "float64",


I also discovered by accident, it seems that the float type was missing, but if it was done on purpose, I can try to redo it.

i probably just forgot it - let's add float32 too?

anmyachev · 2024-01-07T00:57:10Z

@MarcoGorelli ready for review :)

anmyachev · 2024-01-11T11:51:23Z

@MarcoGorelli friendly ping :)

A little information for context, after I manage to rewrite the tests in a backend-independent manner, I will try to integrate Modin into your repository. Such preliminary changes are necessary to avoid code duplication.

MarcoGorelli

awesome!

sorry it took a while to get to

just got two minor comments, but this is great

MarcoGorelli · 2024-01-18T19:40:43Z

dataframe_api_compat/pandas_standard/column_object.py

    "UInt16": "uint16",
    "UInt8": "uint8",
    "boolean": "bool",
+    "Float64": "float64",


i probably just forgot it - let's add float32 too?

MarcoGorelli · 2024-01-18T19:48:29Z

dataframe_api_compat/pandas_standard/__init__.py

+    if not hasattr(dtype, "startswith"):
+        dtype = str(dtype)


is it possible to do this in a less hacky way?

We can try to use name attribute if it exists.

anmyachev · 2024-01-19T14:10:23Z

@MarcoGorelli there are new deprecation warnings from new polars release:

FAILED tests/groupby/aggregate_test.py::test_aggregate[polars-lazy] - DeprecationWarning: `pl.count()` is deprecated. Please use `pl.len()` instead.
FAILED tests/groupby/aggregate_test.py::test_aggregate_only_size[polars-lazy] - DeprecationWarning: `pl.count()` is deprecated. Please use `pl.len()` instead.
FAILED tests/groupby/size_test.py::test_group_by_size[polars-lazy] - DeprecationWarning: `count` is deprecated. It has been renamed to `len`.

What should I do in this case?

MarcoGorelli · 2024-01-19T16:05:22Z

easiest thing would be to address that in a separate PR, and to set the new polars release as the minimum version (polars it moving quite fast so backwards compatibility is less of a concern there)

anmyachev · 2024-01-23T10:40:28Z

~~Blocked by #68~~

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

anmyachev · 2024-01-23T12:02:49Z

tests/integration/scale_column_test.py


 @pytest.mark.skipif(
-    tuple(int(v) for v in pl.__version__.split(".")) < (0, 19, 0),
+    parse(pl.__version__) < Version("0.19.0"),


This will help the tests work with release candidates, such as polars==0.20.6rc1

anmyachev · 2024-01-23T12:03:29Z

@MarcoGorelli ready for review

MarcoGorelli

thanks @anmyachev !

anmyachev · 2024-01-24T11:34:20Z

thanks for the review @MarcoGorelli!

anmyachev force-pushed the compare-columns branch 2 times, most recently from 00e34a6 to 91c4f31 Compare January 5, 2024 01:44

anmyachev force-pushed the compare-columns branch 8 times, most recently from d05595f to a88360c Compare January 6, 2024 02:09

anmyachev commented Jan 6, 2024

View reviewed changes

pyproject.toml

force-single-line = true

[tool.black]

line-length = 90

Copy link

Contributor Author

anmyachev Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To sync with pre-commit.

anmyachev commented Jan 6, 2024

View reviewed changes

anmyachev force-pushed the compare-columns branch 2 times, most recently from 4a20813 to 55944ce Compare January 7, 2024 00:20

anmyachev commented Jan 7, 2024

View reviewed changes

anmyachev changed the title ~~Add function to compare Column objects with iterable references~~ Add functions to compare Column objects with iterable references and to compare DataFrame objects with mapping references Jan 7, 2024

anmyachev force-pushed the compare-columns branch from cef7e10 to 6360940 Compare January 7, 2024 00:53

anmyachev marked this pull request as ready for review January 7, 2024 00:56

MarcoGorelli reviewed Jan 18, 2024

View reviewed changes

anmyachev force-pushed the compare-columns branch from 54173c3 to f9aa10d Compare January 23, 2024 11:58

Add function to compare Column objects with iterable references

d18b9b1

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

anmyachev added 11 commits January 23, 2024 12:59

check Column dtype

7708bfb

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

another way to check Column dtype

05171fc

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

temp fix for mypy

f0005d8

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

add 'compare_dataframe_with_reference' func

65c1f66

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

use new functions for more files [part1]

efdf097

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

use new functions for more files [part2]

1ed8fbf

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

use new functions for more files [final part for column tests]

1658fa8

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

refactor

f64ee52

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

workarounds for mypy errors

90467c3

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

address review comments

01fa14d

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

use 'parse' and 'Version' to compare package versions

a4c4aee

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

anmyachev force-pushed the compare-columns branch from f9aa10d to a4c4aee Compare January 23, 2024 12:00

anmyachev commented Jan 23, 2024

View reviewed changes

anmyachev mentioned this pull request Jan 23, 2024

Finally get rid of interchange_to_pandas #69

Merged

MarcoGorelli approved these changes Jan 24, 2024

View reviewed changes

MarcoGorelli merged commit 107969c into data-apis:main Jan 24, 2024

anmyachev deleted the compare-columns branch January 24, 2024 11:34



		def test_column_sorted_indices_ascending(library: str) -> None:
		df = integer_dataframe_6(library).persist()

Conversation

anmyachev commented Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcoGorelli commented Jan 5, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anmyachev commented Jan 7, 2024

Uh oh!

anmyachev commented Jan 11, 2024

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anmyachev commented Jan 19, 2024

Uh oh!

MarcoGorelli commented Jan 19, 2024

Uh oh!

anmyachev commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anmyachev commented Jan 23, 2024

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

anmyachev commented Jan 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anmyachev commented Jan 5, 2024 •

edited

Loading

anmyachev commented Jan 23, 2024 •

edited

Loading