Skip to content

Conversation

srilman
Copy link
Contributor

@srilman srilman commented Sep 8, 2025

Changes Made

Add .list.append expression. Useful for implementing a 100% native connected components algorithm.

@github-actions github-actions bot added the feat label Sep 8, 2025
@srilman srilman changed the title feat: .list.append expression feat: .list.append Expression Sep 8, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR adds a new .list.append expression to Daft that allows appending individual values to list columns. The implementation provides a clean Python API through the append method on list expressions, with comprehensive Rust backend support.

The change introduces several key components:

  1. Python API: A new append method in daft/expressions/expressions.py that follows the established pattern of list operations, taking another expression as input and returning a new expression with appended values.

  2. Rust Implementation: The core functionality is implemented in src/daft-functions-list/src/append.rs as a scalar UDF that handles type validation and delegates to series extension methods. The function supports broadcasting single values across multiple rows and ensures type compatibility between the list elements and appended values.

  3. Series Extension: The list_append method in src/daft-functions-list/src/series.rs handles the actual data processing, supporting both regular List and FixedSizeList types by converting FixedSizeList to List internally and using growable arrays for efficient memory management.

  4. Function Registration: The new function is properly registered in the function registry through src/daft-functions-list/src/lib.rs.

The implementation follows Daft's established architectural patterns for list operations, maintaining consistency with existing functionality like list_get and list_slice. The feature supports null handling, broadcasting of scalar values, and includes comprehensive test coverage for both variable-size and fixed-size lists. According to the PR description, this functionality is specifically needed for implementing a "100% native connected components algorithm", indicating its importance for graph processing operations where lists of connected nodes need to be dynamically built.

Confidence score: 4/5

  • This PR is safe to merge with minimal risk as it adds new functionality without modifying existing behavior
  • Score reflects solid implementation following established patterns, though some edge cases in type handling and null processing could benefit from additional validation
  • Pay close attention to src/daft-functions-list/src/series.rs for potential memory efficiency concerns with large lists

6 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link

codecov bot commented Sep 8, 2025

Codecov Report

❌ Patch coverage is 88.88889% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.79%. Comparing base (1f59628) to head (1c9f885).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-functions-list/src/append.rs 77.14% 8 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5159      +/-   ##
==========================================
- Coverage   74.26%   72.79%   -1.48%     
==========================================
  Files         956      957       +1     
  Lines      123101   125050    +1949     
==========================================
- Hits        91424    91030     -394     
- Misses      31677    34020    +2343     
Files with missing lines Coverage Δ
daft/expressions/expressions.py 96.56% <100.00%> (+<0.01%) ⬆️
src/daft-functions-list/src/kernels.rs 95.46% <ø> (ø)
src/daft-functions-list/src/lib.rs 100.00% <100.00%> (ø)
src/daft-functions-list/src/series.rs 70.58% <100.00%> (+4.37%) ⬆️
src/daft-functions-list/src/append.rs 77.14% <77.14%> (ø)

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kevinzwang
Copy link
Member

@srilman thoughts on adding this outside of the list namespace as just Expression.append? I'm working on moving away from the expressions namespacing and append is probably clear enough as a function name.

@srilman
Copy link
Contributor Author

srilman commented Sep 8, 2025

@kevinzwang Does .append apply to anything else other then lists? Normally if its specific to a datatype I'd prefer to include the datatypes name. Plus, its called list_append in DuckDB and other SQL

@srilman srilman requested a review from kevinzwang September 8, 2025 19:38
@kevinzwang
Copy link
Member

kevinzwang commented Sep 8, 2025

@kevinzwang Does .append apply to anything else other then lists? Normally if its specific to a datatype I'd prefer to include the datatypes name. Plus, its called list_append in DuckDB and other SQL

list_append works too. I'm happy with either, append is just a method in Python lists so it might be more familiar to Python users, but it does seem like somewhat of a convention in SQL so I don't mind either way

@srilman srilman enabled auto-merge (squash) September 8, 2025 22:23
@srilman srilman merged commit feffe55 into main Sep 8, 2025
39 checks passed
@srilman srilman deleted the slade/list-append branch September 8, 2025 23:08
kevinzwang pushed a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants