Enable rapidsmpf spilling in cudf-polars#18461
Enable rapidsmpf spilling in cudf-polars#18461rapids-bot[bot] merged 19 commits intorapidsai:branch-25.06from
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| return obj | ||
|
|
||
|
|
||
| def unwrap_dataframe(obj: T) -> DataFrame | T: |
There was a problem hiding this comment.
This could be def unwrap_dataframe(obj: T | SpillableWrapper[DataFrame]) -> T | DataFrame: ... but note overload example too.
| T = TypeVar("T") | ||
|
|
||
|
|
||
| def wrap_dataframe(obj: T) -> WrappedType | T: |
There was a problem hiding this comment.
WrappedType is just an unbound TypeVar, so I don't think this type signature makes sense. I think you mean T | SpillableWrapper[T].
| def wrap_func_spillable( | ||
| func: Callable[..., T], *, make_func_output_spillable: bool | ||
| ) -> Callable[..., T]: |
There was a problem hiding this comment.
This type annotation is not correct if make_func_output_spillable is True (in which case you get (perhaps) a SpillableWrapper[T] as a return value.
I think you want something like:
from typing import Any, Generic, Literal, Protocol, TypeVar, overload
T = TypeVar("T")
class DataFrame:
...
class Spillable(Generic[T]):
def __init__(self, obj: T):
self.obj = obj
def unwrap(self) -> T:
return self.obj
@overload
def unwrap(obj: Spillable[T]) -> T: ...
@overload
def unwrap(obj: T) -> T: ...
def unwrap(obj):
if isinstance(obj, Spillable):
return obj.unwrap()
return obj
class Function[T](Protocol):
def __call__(self, *args: Any) -> T: ...
@overload
def wrap(obj: DataFrame) -> Spillable[DataFrame]: ...
@overload
def wrap(obj: T) -> T: ...
def wrap(obj):
if isinstance(obj, DataFrame):
return Spillable(obj)
return obj
@overload
def wrap_func(
func: Function[DataFrame], *, make_spillable: Literal[True]
) -> Function[Spillable[DataFrame]]: ...
@overload
def wrap_func(func: Function[T], *, make_spillable: bool) -> Function[T]: ...
def wrap_func(func, *, make_spillable: bool):
def wrapper(*args: Any):
result = func(*map(unwrap, args))
if make_spillable:
return wrap(result)
return result
return wrapper
pentschev
left a comment
There was a problem hiding this comment.
Functionally this looks correct to me, and AFAICT all @wence- 's typing change suggestions were applied and they look ok to me but I'm not entirely confident in my assessment. I left one minor suggestion, and I noticed the failure is due to coverage being lower than 100% so that probably needs to be addressed, but in the interest of allowing moving quickly I'm approving this. Thanks @madsbk @rjzamora .
|
@pentschev - Thank you for the review! While testing this with an older python version, I realized that the typing changes require Python 3.12. My personal opinion is that the typing information is so convoluted, that it barely has any value at this point. My preference is to generalize the typing in this PR. I welcome others with more Mypy-fu skills to take on the typing either in this PR or a follow-up if they want. |
I'm not well-versed in mypy either, so I won't pretend I'll be of much help in this aspect. However, since we're trying to move fast I'd support simplifying this to reduce pressure and increase development velocity and open an issue to revisit typing in a few weeks. |
|
There was missing coverage when registering rapidsmpf serialization because rapidsmpf is not yet included as a dependency and thus fails to import. I disabled coverage in 03eaf84, an alternative is to add rapidsmpf as a dependency for the tests which is IMO a better solution, but I'm not sure whether there's a reason this was not done yet. |
|
/merge |
This reverts commit 2114099.
Wrap
cudf_polars.DataFramein a container that enables spilling using rapidsmp.