-
Notifications
You must be signed in to change notification settings - Fork 294
feat: .list.append
Expression
#5159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR adds a new .list.append
expression to Daft that allows appending individual values to list columns. The implementation provides a clean Python API through the append
method on list expressions, with comprehensive Rust backend support.
The change introduces several key components:
-
Python API: A new
append
method indaft/expressions/expressions.py
that follows the established pattern of list operations, taking another expression as input and returning a new expression with appended values. -
Rust Implementation: The core functionality is implemented in
src/daft-functions-list/src/append.rs
as a scalar UDF that handles type validation and delegates to series extension methods. The function supports broadcasting single values across multiple rows and ensures type compatibility between the list elements and appended values. -
Series Extension: The
list_append
method insrc/daft-functions-list/src/series.rs
handles the actual data processing, supporting both regularList
andFixedSizeList
types by converting FixedSizeList to List internally and using growable arrays for efficient memory management. -
Function Registration: The new function is properly registered in the function registry through
src/daft-functions-list/src/lib.rs
.
The implementation follows Daft's established architectural patterns for list operations, maintaining consistency with existing functionality like list_get
and list_slice
. The feature supports null handling, broadcasting of scalar values, and includes comprehensive test coverage for both variable-size and fixed-size lists. According to the PR description, this functionality is specifically needed for implementing a "100% native connected components algorithm", indicating its importance for graph processing operations where lists of connected nodes need to be dynamically built.
Confidence score: 4/5
- This PR is safe to merge with minimal risk as it adds new functionality without modifying existing behavior
- Score reflects solid implementation following established patterns, though some edge cases in type handling and null processing could benefit from additional validation
- Pay close attention to
src/daft-functions-list/src/series.rs
for potential memory efficiency concerns with large lists
6 files reviewed, no comments
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5159 +/- ##
==========================================
- Coverage 74.26% 72.79% -1.48%
==========================================
Files 956 957 +1
Lines 123101 125050 +1949
==========================================
- Hits 91424 91030 -394
- Misses 31677 34020 +2343
🚀 New features to boost your workflow:
|
@srilman thoughts on adding this outside of the list namespace as just |
@kevinzwang Does |
|
Changes Made
Add
.list.append
expression. Useful for implementing a 100% native connected components algorithm.