Skip to content

Commit ac5ec64

Browse files
YikunHyukjinKwon
authored andcommitted
[SPARK-38821][PYTHON] Skip nsmall/nlarge nan test under pandas 1.4.[0,1,2]
### What changes were proposed in this pull request? Skip nsmall/nlarge nan test under pandas 1.4.[0,1,2]. Pandas get wrong results when ``np.nan`` in the sorting column since pandas-dev/pandas@16d2f59 (v1.4.0) I confirmed this issue are fixed by: pandas-dev/pandas@2886388 ### Why are the changes needed? No ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #36356 from Yikun/SPARK-38821. Authored-by: Yikun Jiang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent 028c472 commit ac5ec64

File tree

1 file changed

+12
-6
lines changed

1 file changed

+12
-6
lines changed

python/pyspark/pandas/tests/test_dataframe.py

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1814,8 +1814,12 @@ def test_nlargest(self):
18141814
index=np.random.rand(7),
18151815
)
18161816
psdf = ps.from_pandas(pdf)
1817-
self.assert_eq(psdf.nlargest(5, columns="a"), pdf.nlargest(5, columns="a"))
1818-
self.assert_eq(psdf.nlargest(5, columns=["a", "b"]), pdf.nlargest(5, columns=["a", "b"]))
1817+
# see also: https://github.com/pandas-dev/pandas/issues/46589
1818+
if not (LooseVersion("1.4.0") <= LooseVersion(pd.__version__) <= LooseVersion("1.4.2")):
1819+
self.assert_eq(psdf.nlargest(5, columns="a"), pdf.nlargest(5, columns="a"))
1820+
self.assert_eq(
1821+
psdf.nlargest(5, columns=["a", "b"]), pdf.nlargest(5, columns=["a", "b"])
1822+
)
18191823
self.assert_eq(psdf.nlargest(5, columns=["c"]), pdf.nlargest(5, columns=["c"]))
18201824
self.assert_eq(
18211825
psdf.nlargest(5, columns=["c"], keep="first"),
@@ -1838,10 +1842,12 @@ def test_nsmallest(self):
18381842
index=np.random.rand(7),
18391843
)
18401844
psdf = ps.from_pandas(pdf)
1841-
self.assert_eq(psdf.nsmallest(n=5, columns="a"), pdf.nsmallest(5, columns="a"))
1842-
self.assert_eq(
1843-
psdf.nsmallest(n=5, columns=["a", "b"]), pdf.nsmallest(5, columns=["a", "b"])
1844-
)
1845+
# see also: https://github.com/pandas-dev/pandas/issues/46589
1846+
if not (LooseVersion("1.4.0") <= LooseVersion(pd.__version__) <= LooseVersion("1.4.2")):
1847+
self.assert_eq(psdf.nsmallest(n=5, columns="a"), pdf.nsmallest(5, columns="a"))
1848+
self.assert_eq(
1849+
psdf.nsmallest(n=5, columns=["a", "b"]), pdf.nsmallest(5, columns=["a", "b"])
1850+
)
18451851
self.assert_eq(psdf.nsmallest(n=5, columns=["c"]), pdf.nsmallest(5, columns=["c"]))
18461852
self.assert_eq(
18471853
psdf.nsmallest(n=5, columns=["c"], keep="first"),

0 commit comments

Comments
 (0)