Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: calling with_column twice and using a window function for the 2nd call returns an error #12428

Closed
wants to merge 5 commits into from

Conversation

Michael-J-Ward
Copy link
Contributor

DRAFT

My first attempt was to convert with_column to a sugar API for df.select(vec![wildcard(), expr.alias(name)]), which mostly matches my mental model of what the function should do.

There are currently failing test cases surrounding ambiguous and/or "over-write an existing column" use cases. I suspect proper use of WildcardOptions will let me resolve those.

Which issue does this PR close?

Closes #12425.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

My mental model for `with_column` is sugar for `df.select([wildcard(), expr.alias(name)])`.

The bug test-case passes with this implementation, but a few other test cases fail.
Those test cases resolve around ambiguity in column names, which I *think* I can solve
by using `WildcardOptions`.
@github-actions github-actions bot added the core Core DataFusion crate label Sep 11, 2024
@Michael-J-Ward Michael-J-Ward changed the title With column bug fix: calling with_column twice and using a window function for the 2nd call returns an error Sep 11, 2024
@timsaucer
Copy link
Contributor

Alternative approach: #12431

The sugared version uses a wildcard to express the previous columns
The ambiguous projection now does not occur until the `physical` stage.

I assume there's a way to expand the wildcard and catch it during the `logical` stage, but I haven't found it yet.
…ing wildcardoptions

There's still a TODO because I'm not sure **how** to construct the WildcardOptions,
but this captures the semantic behavior.
@comphead
Copy link
Contributor

is this PR similar to #12431 ?

@Michael-J-Ward
Copy link
Contributor Author

@comphead yes. I started this draft late last night but hit a block, then Tim put up his working implementation this morning.

Purely for the benefit of my own understanding, could you take a look at my proposed impl and let me know if that captures the intended semantic behavior of the function?

I'm still building my understanding of the datafusion internals.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calling with_column twice generates an error when the second call uses a window function
3 participants