Skip to content

Conversation

rchowell
Copy link
Contributor

Changes Made

Strategy is not support in a cross join, so raise with details.

Related Issues

Closes #4944

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

@rchowell rchowell requested a review from kevinzwang August 11, 2025 17:39
@github-actions github-actions bot added the fix label Aug 11, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR adds input validation to the DataFrame join() method to prevent users from specifying the strategy parameter when performing cross joins. The change adds a simple check at lines 2626-2627 in daft/dataframe/dataframe.py that raises a ValueError with a clear error message: "In a cross join, strategy cannot be set" when the strategy parameter is provided for cross joins.

This validation addresses a user experience issue where attempting to use join strategies (like 'broadcast') with cross joins would previously result in a cryptic error message about missing join columns. Since cross joins don't require join columns by definition (they produce a Cartesian product of all rows), they also don't support join strategies. The fix provides immediate, clear feedback to users about this invalid parameter combination, preventing confusion and improving the developer experience.

The change integrates seamlessly with the existing join method implementation, adding the validation early in the cross join code path before any complex processing occurs.

Confidence score: 5/5

  • This PR is safe to merge with minimal risk as it only adds input validation
  • Score reflects the simple nature of the change and clear error handling improvement
  • No files require special attention - the change is straightforward validation logic

1 file reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link

codecov bot commented Aug 11, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 79.27%. Comparing base (00386c9) to head (10cdf9d).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
daft/dataframe/dataframe.py 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #4952   +/-   ##
=======================================
  Coverage   79.27%   79.27%           
=======================================
  Files         910      910           
  Lines      126141   126143    +2     
=======================================
+ Hits        99994    99997    +3     
+ Misses      26147    26146    -1     
Files with missing lines Coverage Δ
daft/dataframe/dataframe.py 86.77% <50.00%> (-0.07%) ⬇️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@rchowell rchowell merged commit 22b6456 into main Aug 11, 2025
49 of 50 checks passed
@rchowell rchowell deleted the rchowell/join_strategy_warning branch August 11, 2025 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cross join w/ broadcasted dataframe failing (in Ray)
2 participants