-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Support Schema Field Metadata in User-Defined Aggregate Functions (UDAFs) #17085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 22 commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
1d31dfe
Merge branch 'main' into udaf-schema-16997
kosiew 66ee0c5
Merge branch 'main' into udaf-schema-16997
kosiew 15e89aa
feat: Implement SchemaBasedAggregateUdf and enhance accumulator argum…
kosiew fea8c1c
refactor: Rename test function for schema-based aggregate UDF metadata
kosiew c386ba0
docs(aggregate): clarify AccumulatorArgs schema handling and usage
kosiew c8678e5
refactor: Extract AccumulatorArgs construction into a separate method…
kosiew 25df8c6
test: Add unit tests for AggregateUDF implementation and args_schema …
kosiew 2991acc
refactor: Consolidate use statements for improved readability in aggr…
kosiew 664367b
refactor(tests): Simplify argument passing in AggregateExprBuilder tests
kosiew 20d7a93
docs: Mark code examples as ignored in AggregateUDF and AccumulatorAr…
kosiew 99824bd
Merge branch 'main' into udaf-schema-16997
kosiew 8692423
Enhance DummyUdf struct by deriving PartialEq, Eq, and Hash traits
kosiew f2a2d51
Enhance SchemaBasedAggregateUdf struct by deriving PartialEq, Eq, and…
kosiew f4484be
Merge branch 'main' into udaf-schema-16997
kosiew 35014e0
Merge branch 'main' into udaf-schema-16997
kosiew 9e166ac
feat: refactor DummyUdf initialization and enhance args_schema handling
kosiew 0cd5d33
feat(aggregate): refactor accumulator argument building for clarity
kosiew 03579a1
refactor(tests): reorganize and enhance DummyUdf implementation and t…
kosiew 7175121
Merge branch 'main' into udaf-schema-16997
kosiew 4dd8bb2
Revert to last good point
kosiew fee3fe9
Merge branch 'main' into udaf-schema-16997
kosiew b5a931b
Merge branch 'main' into udaf-schema-16997
kosiew f6bb7ec
docs(udaf): improve documentation for AccumulatorArgs usage in Aggreg…
kosiew f5a36ee
refactor(tests): move tests to bottom
kosiew 07f6fab
Merge branch 'main' into udaf-schema-16997
kosiew 34b2d83
docs(udaf, accumulator): enhance documentation for AccumulatorArgs an…
kosiew c36b3aa
Clarify AccumulatorArgs documentation
kosiew File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having some trouble understanding this example; I can understand the part for getting the metadata of a field given the context of the PR, but why do we also include an example for getting the return field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snippet is meant to illustrate the sentence immediately above it: you pair
acc_args.exprswithacc_args.schemato recover the fullFieldReffor argument i.Pulling the metadata out of
schema.field(i)is one common use case, and the follow-up line shows how you would then obtain the completeFieldRef(name, type, metadata) via:...using the same pairing.
I'll tweak the wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a silly question, but what's the difference between
acc_args.exprs[i].return_field(&acc_args.schema)andacc_args.schema.field(i)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all 😄
acc_args.schema.field(i)— returns the raw ArrowFieldfrom the (physical) input schema at positioni(name, type, nullability, metadata exactly as in that schema).acc_args.exprs[i].return_field(&acc_args.schema)?— asks the expression for the effectiveFieldReffor argumentigiven the full schema. It incorporates expression semantics (casts, literals, computed types, extension metadata, nullability changes, etc.) and returns an owned/ArcFieldRef(and can fail), not just a borrowed&Field.