-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41201][CONNECT][PYTHON] Implement DataFrame.SelectExpr in Python client
#38723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this becomes an unresolved attribute and just works out of the box?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it becomes Expression.Builder().setExpressionString() which are SQL expression strings.
str could be different things in DataFrame API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not necessarily the question that I had, but I was not remembering correctly the type interface to Project:
def __init__(self, child: Optional["LogicalPlan"], *columns: "ExpressionOrString") -> None:
In this case SQLExpression is an expression and it just works.
python/pyspark/sql/connect/column.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we assert here that expr is string and not another expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this implementation, we don't need to because the caller has verified the type (thus mypy does not complain).
I think this is a good question:
Generally speaking, I think for public API, we should throw user-facing exception, for internal API, we can assert when we want to defensive check unexpected input.
So it is a question of if we want to enforce checking cross all public/private API (by corresponding ways). I guess maybe not now but worth it at a right time (maybe before 3.4 release).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine. One point of having type hints is to avoid asserts on those types too.
grundprinzip
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question.
|
Can one of the admins verify this patch? |
python/pyspark/sql/connect/column.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like kind of naming .. but this is at least somewhat consistent with what we have in DSv2 so I am fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see in the future... I guess we will need to name more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment with a JIRA ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we fix this to grpc.RPCError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm I guess that was gone during code conflict and resolution then good fix is gone.
3df9929 to
7802dcc
Compare
|
Merged to master. |
| .toPandas(), | ||
| ) | ||
|
|
||
| @unittest.skip("test_fill_na is flaky") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaliujia why you disable this test again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah .. I didn't notice this. Can we enable this back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, will send a followup for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure I removed this after conflict resolution.
Actually Martin pointed out another case: #38723 (comment)
Basically it seems happened more than once that after code conflict resolution, the code I want to keep is gone.|
Maybe I should always do a -i commits square to in case more than 1 commit rebase causing unexpected result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will follow up this soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really guessing if I have more than 1 commit locally, if the first one I resolve the conflict, the following commit that might add something back silently.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries, add it back in #38763
### What changes were proposed in this pull request? Reenable test_fill_na ### Why are the changes needed? `test_fill_na` was disabled by mistake in #38723 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? reenabled test Closes #38763 from zhengruifeng/connect_reenable_test_fillna. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
…thon client ### What changes were proposed in this pull request? Implement `DataFrame.SelectExpr` in Python client. `SelectExpr` also has a good amount of usage. ### Why are the changes needed? API coverage. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? UT Closes apache#38723 from amaliujia/support_select_expr_in_python. Authored-by: Rui Wang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…thon client ### What changes were proposed in this pull request? Implement `DataFrame.SelectExpr` in Python client. `SelectExpr` also has a good amount of usage. ### Why are the changes needed? API coverage. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? UT Closes apache#38723 from amaliujia/support_select_expr_in_python. Authored-by: Rui Wang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? Reenable test_fill_na ### Why are the changes needed? `test_fill_na` was disabled by mistake in apache#38723 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? reenabled test Closes apache#38763 from zhengruifeng/connect_reenable_test_fillna. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
…thon client ### What changes were proposed in this pull request? Implement `DataFrame.SelectExpr` in Python client. `SelectExpr` also has a good amount of usage. ### Why are the changes needed? API coverage. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? UT Closes apache#38723 from amaliujia/support_select_expr_in_python. Authored-by: Rui Wang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? Reenable test_fill_na ### Why are the changes needed? `test_fill_na` was disabled by mistake in apache#38723 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? reenabled test Closes apache#38763 from zhengruifeng/connect_reenable_test_fillna. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
What changes were proposed in this pull request?
Implement
DataFrame.SelectExprin Python client.SelectExpralso has a good amount of usage.Why are the changes needed?
API coverage.
Does this PR introduce any user-facing change?
NO
How was this patch tested?
UT