Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] udf test failed cudf-py 23.04 ENV setup on databricks 11.3 runtime #7773

Closed
pxLi opened this issue Feb 15, 2023 · 7 comments · Fixed by #8233
Closed

[BUG] udf test failed cudf-py 23.04 ENV setup on databricks 11.3 runtime #7773

pxLi opened this issue Feb 15, 2023 · 7 comments · Fixed by #8233
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf

Comments

@pxLi
Copy link
Member

pxLi commented Feb 15, 2023

Describe the bug
As described in #7639

cudf-py is not supporting python 3.9 (which is the default python in databricks 11.3 runtime) anymore in 23.04 or newer rapidsai/cudf#12764 (comment)

for now we just hardcode ENV as python3.8,
and skip udf_test and udf_cudf_test due to pandas dependency broken #7758

We will need to find a better way to support re-enable udf tests, and also solution for setup cudf-py ENV which would not break pkg dependencies

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Expected behavior
Pass all tests

@pxLi pxLi added bug Something isn't working ? - Needs Triage Need team to review and classify cudf_dependency An issue or PR with this label depends on a new feature in cudf and removed ? - Needs Triage Need team to review and classify labels Feb 15, 2023
@tgravescs
Copy link
Collaborator

So for this we want to change the regular tests back to not change the python version. We shouldn't be testing with a python version that isn't hte same as the user would use on Databricks. Personally I would rather see the tests.sh script back install pytest like in #7753 and just separate out the cudf_udf test to do something special there.

@tgravescs
Copy link
Collaborator

filed #7776 since I think this issue is tracking hte cudf issue.

@NvTimLiu
Copy link
Collaborator

moved cudf_udf tests out into the file jenkins/databricks/cudf_udf_test.sh from jenkins/databricks/test.sh; (either create cluster with or without init_cudf_udf.sh), hardcode python to 3.8 for DB11.3 to workaround cudf-udf conda install

jenkins/databricks/init_cudf_udf.sh
jenkins/databricks/cudf_udf_test.sh

......

../../src/main/python/udf_cudf_test.py::test_with_column[small data] PASSED [  9%]
../../src/main/python/udf_cudf_test.py::test_with_column[large data] PASSED [ 18%]
../../src/main/python/udf_cudf_test.py::test_sql PASSED (cudf_udf n...) [ 27%]
../../src/main/python/udf_cudf_test.py::test_select PASSED (cudf_ud...) [ 36%]
../../src/main/python/udf_cudf_test.py::test_map_in_pandas PASSED (...) [ 45%]
../../src/main/python/udf_cudf_test.py::test_group_apply PASSED (cu...) [ 54%]
../../src/main/python/udf_cudf_test.py::test_group_apply_in_pandas PASSED [ 63%]
../../src/main/python/udf_cudf_test.py::test_group_agg PASSED (cudf...) [ 72%]
../../src/main/python/udf_cudf_test.py::test_sql_group PASSED (cudf...) [ 81%]
../../src/main/python/udf_cudf_test.py::test_window PASSED (cudf_ud...) [ 90%]
../../src/main/python/udf_cudf_test.py::test_cogroup PASSED (cudf_u...) [100%]
============== 11 PASSED, 17684 deselected, 9 warnings in 16.16s ==============

@sameerz
Copy link
Collaborator

sameerz commented Apr 19, 2023

In the 23.06 release, RAPIDS will drop support for python 3.8 and add support for python 3.9. RAPIDS will continue support for python 3.10.

We will need to revert the changes for the Databricks 11.3 tests (since 3.9 will be supported), and make the same test changes for Databricks 10.4 (since 3.8 will be dropped).

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Apr 20, 2023

I can help revert changes of install python3.8 cudf_udf packages in Databricks 11.3 after our update for non Databricks environments have been done.

In Databricks 11.3, the default python is 3.8 for batch mode Databricks CLI (e.g., when running integration tests on DB11.3 cluster using DB CLI), we need to change default python to use 3.9 instead.

In Databricks 10.4, we need first to install python 3.9, then change default python to use 3.9.

So we may need to update our Databricks init_scripts to handle python version for both Databricks 11.3 and Databricks 10.4.

@pxLi
Copy link
Member Author

pxLi commented Apr 21, 2023

In Databricks 10.4, we need first to install python 3.9, then change default python to use 3.9.

@NvTimLiu Per above comment, we would just skip cudf udf test in databricks 10.4 for 23.06.
No need to install extra python version

@NvTimLiu
Copy link
Collaborator

In Databricks 10.4, we need first to install python 3.9, then change default python to use 3.9.

@NvTimLiu Per above comment, we would just skip cudf udf test in databricks 10.4 for 23.06. No need to install extra python version

Got it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants