-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41212][CONNECT][PYTHON] Implement DataFrame.isEmpty
#38734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we talked about that do not do cache now but thinking about build general caching layer cc @HyukjinKwon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here #38546 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with that but we can even do the caching in the DataFrame instead of Spark Connect I guess (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, let me update it. Maybe it is time to design how to do the caching, I will work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
When I read this, I actually not sure if it means the size of DataFrame or it means DataFrame is None. I guess it is the size. Not sure if there is a better way to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doc was copied from pyspark and scala api. I think the dataframe itself is expected to be not None, otherwise, this method can not be called.
|
LGTM |
e6769ba to
34accba
Compare
| bool | ||
| Whether it's empty DataFrame or not. | ||
| """ | ||
| return len(self.take(1)) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think that maybe we should have some shared code between pyspark and python connect client, to share some API definitions (api doc, function signature) and functions with default implementations. cc @HyukjinKwon
|
merged into master, thanks all for reviews |
### What changes were proposed in this pull request? Implement `DataFrame.isEmpty` ### Why are the changes needed? API Coverage ### Does this PR introduce _any_ user-facing change? Yes, new api ### How was this patch tested? added UT Closes apache#38734 from zhengruifeng/connect_df_is_empty. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? Implement `DataFrame.isEmpty` ### Why are the changes needed? API Coverage ### Does this PR introduce _any_ user-facing change? Yes, new api ### How was this patch tested? added UT Closes apache#38734 from zhengruifeng/connect_df_is_empty. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? Implement `DataFrame.isEmpty` ### Why are the changes needed? API Coverage ### Does this PR introduce _any_ user-facing change? Yes, new api ### How was this patch tested? added UT Closes apache#38734 from zhengruifeng/connect_df_is_empty. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
What changes were proposed in this pull request?
Implement
DataFrame.isEmptyWhy are the changes needed?
API Coverage
Does this PR introduce any user-facing change?
Yes, new api
How was this patch tested?
added UT