Skip to content

Enable rapidsmpf spilling in cudf-polars#18461

Merged
rapids-bot[bot] merged 19 commits intorapidsai:branch-25.06from
madsbk:rapidsmp_spilling
May 6, 2025
Merged

Enable rapidsmpf spilling in cudf-polars#18461
rapids-bot[bot] merged 19 commits intorapidsai:branch-25.06from
madsbk:rapidsmp_spilling

Conversation

@madsbk
Copy link
Member

@madsbk madsbk commented Apr 9, 2025

Wrap cudf_polars.DataFrame in a container that enables spilling using rapidsmp.

@madsbk madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 9, 2025
@madsbk madsbk self-assigned this Apr 9, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 9, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Apr 9, 2025
return obj


def unwrap_dataframe(obj: T) -> DataFrame | T:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be def unwrap_dataframe(obj: T | SpillableWrapper[DataFrame]) -> T | DataFrame: ... but note overload example too.

T = TypeVar("T")


def wrap_dataframe(obj: T) -> WrappedType | T:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WrappedType is just an unbound TypeVar, so I don't think this type signature makes sense. I think you mean T | SpillableWrapper[T].

Comment on lines 58 to 60
def wrap_func_spillable(
func: Callable[..., T], *, make_func_output_spillable: bool
) -> Callable[..., T]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type annotation is not correct if make_func_output_spillable is True (in which case you get (perhaps) a SpillableWrapper[T] as a return value.

I think you want something like:

from typing import Any, Generic, Literal, Protocol, TypeVar, overload

T = TypeVar("T")


class DataFrame:
    ...


class Spillable(Generic[T]):
    def __init__(self, obj: T):
        self.obj = obj

    def unwrap(self) -> T:
        return self.obj


@overload
def unwrap(obj: Spillable[T]) -> T: ...


@overload
def unwrap(obj: T) -> T: ...


def unwrap(obj):
    if isinstance(obj, Spillable):
        return obj.unwrap()
    return obj


class Function[T](Protocol):
    def __call__(self, *args: Any) -> T: ...


@overload
def wrap(obj: DataFrame) -> Spillable[DataFrame]: ...


@overload
def wrap(obj: T) -> T: ...


def wrap(obj):
    if isinstance(obj, DataFrame):
        return Spillable(obj)
    return obj


@overload
def wrap_func(
    func: Function[DataFrame], *, make_spillable: Literal[True]
) -> Function[Spillable[DataFrame]]: ...


@overload
def wrap_func(func: Function[T], *, make_spillable: bool) -> Function[T]: ...


def wrap_func(func, *, make_spillable: bool):
    def wrapper(*args: Any):
        result = func(*map(unwrap, args))
        if make_spillable:
            return wrap(result)
        return result

    return wrapper

@rjzamora rjzamora changed the title Dask-experimental: enable spilling Enable rapidsmpf spilling in cudf-polars Apr 16, 2025
@rjzamora rjzamora marked this pull request as ready for review May 5, 2025 14:30
@rjzamora rjzamora requested a review from a team as a code owner May 5, 2025 14:30
@rjzamora rjzamora requested review from rjzamora and vyasr May 5, 2025 14:30
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally this looks correct to me, and AFAICT all @wence- 's typing change suggestions were applied and they look ok to me but I'm not entirely confident in my assessment. I left one minor suggestion, and I noticed the failure is due to coverage being lower than 100% so that probably needs to be addressed, but in the interest of allowing moving quickly I'm approving this. Thanks @madsbk @rjzamora .

@rjzamora
Copy link
Member

rjzamora commented May 5, 2025

@pentschev - Thank you for the review! While testing this with an older python version, I realized that the typing changes require Python 3.12.

My personal opinion is that the typing information is so convoluted, that it barely has any value at this point. My preference is to generalize the typing in this PR. I welcome others with more Mypy-fu skills to take on the typing either in this PR or a follow-up if they want.

@pentschev
Copy link
Member

My personal opinion is that the typing information is so convoluted, that it barely has any value at this point. My preference is to generalize the typing in this PR. I welcome others with more Mypy-fu skills to take on the typing either in this PR or a follow-up if they want.

I'm not well-versed in mypy either, so I won't pretend I'll be of much help in this aspect. However, since we're trying to move fast I'd support simplifying this to reduce pressure and increase development velocity and open an issue to revisit typing in a few weeks.

@rjzamora rjzamora requested a review from a team as a code owner May 5, 2025 20:21
@pentschev
Copy link
Member

There was missing coverage when registering rapidsmpf serialization because rapidsmpf is not yet included as a dependency and thus fails to import. I disabled coverage in 03eaf84, an alternative is to add rapidsmpf as a dependency for the tests which is IMO a better solution, but I'm not sure whether there's a reason this was not done yet.

@rjzamora
Copy link
Member

rjzamora commented May 6, 2025

/merge

@rapids-bot rapids-bot bot merged commit 2114099 into rapidsai:branch-25.06 May 6, 2025
109 checks passed
vyasr added a commit to vyasr/cudf that referenced this pull request May 6, 2025
@madsbk madsbk deleted the rapidsmp_spilling branch May 16, 2025 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants