-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[core][spark] Support push down transform predicate #6506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8b37699 to
fe5b4e9
Compare
fe5b4e9 to
7b13471
Compare
| /** Represents a transform function. */ | ||
| public interface Transform extends Serializable { | ||
|
|
||
| List<Object> inputs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can introduce a Literal object in the future, so that our inputs will only be of either Literal or FieldRef type now.
|
|
||
| /** Leaf node of a {@link Predicate} tree. Compares a field in the row with literals. */ | ||
| public class LeafPredicate implements Predicate { | ||
| public class LeafPredicate extends TransformPredicate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since LeafPredicate extends TransformPredicate, and we have
T visit(LeafPredicate predicate);
T visit(TransformPredicate predicate);
Maybe we can combine them in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for pushing down CONCAT expressions to Paimon for Spark 3.4+. The implementation introduces a transform-based predicate system that can handle not only simple field references but also general expressions like CONCAT.
Key Changes:
- Introduced a new
Transforminterface hierarchy to represent field transformations and expressions - Refactored predicate handling to work with
Transformobjects instead of just field indices - Added support for CONCAT expression pushdown in Spark filters
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
SparkExpressionConverter.scala |
New utility to convert Spark expressions to Paimon transforms and literals |
SparkV2FilterConverter.scala |
Refactored to use Transform-based predicates instead of field indices |
PredicateBuilder.java |
Added Transform-based predicate builder methods alongside index-based ones |
TransformPredicate.java |
Enhanced with factory method and field name extraction |
Transform.java |
Added withNewInputs method for transform reconstruction |
FieldTransform.java |
New class representing field extraction as a Transform |
ConcatTransform.java |
Added withNewInputs implementation |
ConcatWsTransform.java |
Added withNewInputs implementation |
LeafPredicate.java |
Refactored to extend TransformPredicate |
PaimonPushDownTestBase.scala |
Added test for CONCAT pushdown |
PaimonCatalogUtils.scala |
Removed unused imports |
ExpressionHelper.scala |
Removed unused import |
PaimonBasePushDown.scala |
Removed unused import and improved code style |
PaimonScan.scala |
Updated isSupportedRuntimeFilter call to use instance method |
PaimonBaseScan.scala |
Removed unused imports |
TransformPredicateTest.java |
Updated to use TransformPredicate.of factory method |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left minor comments.
| } | ||
|
|
||
| @Override | ||
| public Transform withNewInputs(List<Object> inputs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy
|
|
||
| /** Transform that extracts a field from a row. */ | ||
| public class FieldTransform implements Transform { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add ser id.
| public class LeafPredicate implements Predicate { | ||
| public class LeafPredicate extends TransformPredicate { | ||
|
|
||
| private static final long serialVersionUID = 1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change ser id.
|
+1 |
* master: (162 commits) [Python] Rename to BATCH_COMMIT_IDENTIFIER in snapshot.py [Python] Suppport multi prepare commit in the same TableWrite (apache#6526) [spark] Fix drop temporary view (apache#6529) [core] skip validate main branch before orphan files cleaning (apache#6524) [core][spark] Introduce upper transform (apache#6521) [Python] Keep the variable names of Identifier consistent with Java (apache#6520) [core] Remove hash lookup to simplify interface (apache#6519) [core][format] Format Table plan partitions should ignore hidden & illegal dirs (apache#6522) [hotfix] Print partition spec and type when error in InternalRowPartitionComputer [hotfix] Add more informat to check partition spec in InternalRowPartitionComputer [hotfix] Use deleteDirectoryQuietly in TempFileCommitter.clean [core] format table: support write file in _temporary at first (apache#6510) [core] Support non null column with write type (apache#6513) [core][fix] Blob with rolling file failed (apache#6518) [core][rest] Support schema validation and infer for external paimon table (apache#6501) [hotfix] Correct visitors for TransformPredicate [hotfix] Rename to copy from withNewInputs in TransformPredicate [core][spark] Support push down transform predicate (apache#6506) [spark] Implement SupportsReportStatistics for PaimonFormatTableBaseScan (apache#6515) [docs] add docs for auto-clustering of historical partitions (apache#6516) ...
Purpose
Tests
API and Format
Documentation