Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove minhash conditional for 25.02 #558

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

praateekmahajan
Copy link
Collaborator

@praateekmahajan praateekmahajan commented Feb 18, 2025

Description

Since now stable points to 25.02 we can remove the old conditionla logic we had to handle various minhash api's.

Also fixes an issue with dask_cudf.read_parquet inconssitent schema, we hope rapidsai/cudf#17554 get's merged in 25.04

Closes #557.

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Praateek <[email protected]>
Signed-off-by: Praateek <[email protected]>

# TODO: remove when dask min version gets bumped
DASK_SHUFFLE_METHOD_ARG = _dask_version > parse_version("2024.1.0")
DASK_P2P_ERROR = _dask_version < parse_version("2023.10.0")
DASK_SHUFFLE_CAST_DTYPE = _dask_version > parse_version("2023.12.0")
DASK_CUDF_PARQUET_READ_INCONSISTENT_SCHEMA = _dask_version > parse_version("2024.12")
DASK_CUDF_PARQUET_READ_INCONSISTENT_SCHEMA = _dask_cudf_version > parse_version(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a bug we had.. added condition for 25.02 in the hope rapidsai/cudf#17554 gets merged for 25.04

Signed-off-by: Praateek <[email protected]>

# TODO: remove when dask min version gets bumped
DASK_SHUFFLE_METHOD_ARG = _dask_version > parse_version("2024.1.0")
DASK_P2P_ERROR = _dask_version < parse_version("2023.10.0")
DASK_SHUFFLE_CAST_DTYPE = _dask_version > parse_version("2023.12.0")
DASK_CUDF_PARQUET_READ_INCONSISTENT_SCHEMA = _dask_version > parse_version("2024.12")
DASK_CUDF_PARQUET_READ_INCONSISTENT_SCHEMA = _dask_cudf_version > parse_version(
"2025.02"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"2025.02"
"25.2.0"

(from https://pypi.org/project/dask-cudf-cu12/25.2.0/)?

@ayushdg
Copy link
Collaborator

ayushdg commented Feb 18, 2025

We should still hold off on merging this since one mode of installation is the NeMo-FW container that is currently tied to Rapids 24.10 (for 0.7.0). I believe the next release should include Rapids 25.02 or newer, but until then, this logic might fail in those containers.

Signed-off-by: Praateek <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-add test_read_data_different_columns_blocksize
3 participants