Skip to content

Conversation

@amaliujia
Copy link
Contributor

What changes were proposed in this pull request?

Implement DataFrame.SelectExpr in Python client. SelectExpr also has a good amount of usage.

Why are the changes needed?

API coverage.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

UT

@amaliujia
Copy link
Contributor Author

@zhengruifeng @HyukjinKwon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this becomes an unresolved attribute and just works out of the box?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it becomes Expression.Builder().setExpressionString() which are SQL expression strings.

str could be different things in DataFrame API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not necessarily the question that I had, but I was not remembering correctly the type interface to Project:

def __init__(self, child: Optional["LogicalPlan"], *columns: "ExpressionOrString") -> None:

In this case SQLExpression is an expression and it just works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we assert here that expr is string and not another expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this implementation, we don't need to because the caller has verified the type (thus mypy does not complain).

I think this is a good question:
Generally speaking, I think for public API, we should throw user-facing exception, for internal API, we can assert when we want to defensive check unexpected input.

So it is a question of if we want to enforce checking cross all public/private API (by corresponding ways). I guess maybe not now but worth it at a right time (maybe before 3.4 release).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine. One point of having type hints is to avoid asserts on those types too.

Copy link
Contributor

@grundprinzip grundprinzip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like kind of naming .. but this is at least somewhat consistent with what we have in DSv2 so I am fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see in the future... I guess we will need to name more...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment with a JIRA ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we fix this to grpc.RPCError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm I guess that was gone during code conflict and resolution then good fix is gone.

@amaliujia amaliujia force-pushed the support_select_expr_in_python branch from 3df9929 to 7802dcc Compare November 23, 2022 00:24
@HyukjinKwon
Copy link
Member

Merged to master.

.toPandas(),
)

@unittest.skip("test_fill_na is flaky")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaliujia why you disable this test again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah .. I didn't notice this. Can we enable this back?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, will send a followup for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure I removed this after conflict resolution.

Actually Martin pointed out another case: #38723 (comment)

Basically it seems happened more than once that after code conflict resolution, the code I want to keep is gone.|

Maybe I should always do a -i commits square to in case more than 1 commit rebase causing unexpected result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will follow up this soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really guessing if I have more than 1 commit locally, if the first one I resolve the conflict, the following commit that might add something back silently.....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries, add it back in #38763

zhengruifeng added a commit that referenced this pull request Nov 23, 2022
### What changes were proposed in this pull request?
Reenable test_fill_na

### Why are the changes needed?
`test_fill_na` was disabled by mistake in #38723

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
reenabled test

Closes #38763 from zhengruifeng/connect_reenable_test_fillna.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…thon client

### What changes were proposed in this pull request?

Implement `DataFrame.SelectExpr` in Python client. `SelectExpr` also has a good amount of usage.

### Why are the changes needed?

API coverage.

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

UT

Closes apache#38723 from amaliujia/support_select_expr_in_python.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
…thon client

### What changes were proposed in this pull request?

Implement `DataFrame.SelectExpr` in Python client. `SelectExpr` also has a good amount of usage.

### Why are the changes needed?

API coverage.

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

UT

Closes apache#38723 from amaliujia/support_select_expr_in_python.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
### What changes were proposed in this pull request?
Reenable test_fill_na

### Why are the changes needed?
`test_fill_na` was disabled by mistake in apache#38723

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
reenabled test

Closes apache#38763 from zhengruifeng/connect_reenable_test_fillna.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…thon client

### What changes were proposed in this pull request?

Implement `DataFrame.SelectExpr` in Python client. `SelectExpr` also has a good amount of usage.

### Why are the changes needed?

API coverage.

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

UT

Closes apache#38723 from amaliujia/support_select_expr_in_python.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
### What changes were proposed in this pull request?
Reenable test_fill_na

### Why are the changes needed?
`test_fill_na` was disabled by mistake in apache#38723

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
reenabled test

Closes apache#38763 from zhengruifeng/connect_reenable_test_fillna.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants