-
Notifications
You must be signed in to change notification settings - Fork 180
Add compare_ip operator udfs #3821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compare_ip operator udfs #3821
Conversation
Signed-off-by: Xinyu Hao <[email protected]>
Signed-off-by: Xinyu Hao <[email protected]>
Signed-off-by: Xinyu Hao <[email protected]>
Signed-off-by: Xinyu Hao <[email protected]>
Signed-off-by: Xinyu Hao <[email protected]>
|
I think we should implement After that, for Ip comparison, we can convert string to ExprIP. And then we can implement some generic comparison functions for types of comparable, via calling their But current approach is still acceptable if we won't have other UDT in the future. |
qianheng-aws
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add IT for ip comparison.
| // EXPR_IP is mapped to SqlTypeFamily.NULL in | ||
| // UserDefinedFunctionUtils.convertRelDataTypeToSqlTypeName | ||
| return UDFOperandMetadata.wrap(OperandTypes.STRING_STRING); | ||
| return UDFOperandMetadata.wrap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use new PPLTypeChecker here and override its checkOperandTypes method to only allow accept string or ip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled.
| OpenSearchTypeFactory.TYPE_FACTORY.createSqlType(SqlTypeName.INTEGER)); | ||
| case ANY, IGNORE -> List.of( | ||
| OpenSearchTypeFactory.TYPE_FACTORY.createSqlType(SqlTypeName.ANY)); | ||
| // We borrow SqlTypeFamily.NULL to represent EXPR_IP. This is a workaround |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we stop use CompositeOperandTypeChecker and create a new TypeChecker ourself, maybe we can avoid using SqlTypeFamily.NULL to represent EXPR_IP and check operands on RelDataType level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qianheng-aws How about this implementation ishaoxy#1
It doesn't use NULL for IP, but created two other classes for IP type checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just came to me that there's not only compare that checks IP type. CIDR_MATCH also has to validate IP types. If we make a specific IP type checker for compare operators, we may also have to create one for each other functions like cidrmatch and geoip.
|
|
||
| void populate() { | ||
| // register operators for IP comparing | ||
| registerOperator(NOTEQUAL, PPLBuiltinOperators.NOT_EQUALS_IP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions, Do we need to support IP as UDT in PPL engine?
In OpenSearch PPL, IP handling depends on index field type. CIDR-based filtering should use ip_range queries for ip fields, script pushdown / in-memory processing for keyword, text, and runtime string fields.
| Field Type | Use Case | Expectation |
|---|---|---|
| IP Field | search index=log ip="192.168.0.0/16" | Rewrite as term query |
| search index | where cidrmatch("192.168.0.0/16", ip) | Rewrite as term query | |
| Keyword Field | search index=log ip="192.168.0.0/16" | Rewrite as term query, extactally keyword match |
| search index | where cidrmatch("192.168.0.0/16", ip) | Script pushdown — ip field is a string, not rewrite as term query. | |
| Text Field | search index=log ip="192.168.0.0/16" | Rewrite as query_string query, full text search |
| search index | where cidrmatch("192.168.0.0/16", ip) | Script pushdown — ip field is a string, not rewrite as term query. | |
| Runtime Field | search index=log | parse ip=regex(...)| where ip="192.168.0.0/16" | Script pushdown — ip field is a string, it is a string comparsion query |
| search index=log | parse ip=regex(...)| where cidrmatch("192.168.0.0/16", ip) | Script pushdown — ip field is a string, not rewrite as term query. |
Signed-off-by: Xinyu Hao <[email protected]>
Yes, I think that's what we need. In the future when we have type conversion, that IPTypeChecker should only accept type of IP, and type conversion should convert
Yes since we treat |
Signed-off-by: Xinyu Hao <[email protected]>
Signed-off-by: Xinyu Hao <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
Make IpComparisonOperators an inner enum of CompareIPFunction
| registerOperator(EQUAL, PPLBuiltinOperators.EQUALS_IP); | ||
| registerOperator(GREATER, PPLBuiltinOperators.GREATER_IP); | ||
| registerOperator(GTE, PPLBuiltinOperators.GTE_IP); | ||
| registerOperator(LESS, PPLBuiltinOperators.LESS_IP); | ||
| registerOperator(LTE, PPLBuiltinOperators.LTE_IP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SqlStdOperatorTable.EQUALS` is a very basic std operator which widely used in calcite internal, I am worry that it may introduce potential bugs and performance regression for pushdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But Calcite's built-in operators like SqlStdOperatorTable.EQUALS does not handle our UDT like IP. These new operators are only effective to IP comparison, controlled via type checkers. Comparison between other types will still falls to Calcite's built-in comparison operators.
There was two solutions:
- one is to convert IP UDT to a type that is comparable by calcite's comparators.
- another is to add new operators exclusively for IP comparision
We opted the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to keep the comparison logic the same as v2, I chose to add these specific operator udfs for IP comparison instead of converting it to string type that calcite can compare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new operators are only effective to IP comparison, controlled via type checkers. Comparison between other types will still falls to Calcite's built-in comparison operators.
Can we move above logic to a specific method registerOverrideOperator, it quite confused me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or modify to
registerOperator(EQUAL, check(PPLBuiltinOperators.EQUALS_IP,SqlStdOperatorTable.EQUALS));
and leverage check() to manage the specific register logic via typeChecker.
check() can name to case() or set() or queue()...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or modify to
registerOperator(EQUAL, check(PPLBuiltinOperators.EQUALS_IP,SqlStdOperatorTable.EQUALS));and leverage
check()to manage the specific register logic via typeChecker.
check()can name tocase()orset()orqueue()...
Modified, improving codes readability.
Signed-off-by: Xinyu Hao <[email protected]>
Signed-off-by: Xinyu Hao <[email protected]>
Signed-off-by: Xinyu Hao <[email protected]>
add type checker for cidr
| case EXPR_IP -> SqlTypeName.VARCHAR; | ||
| // EXPR_IP is mapped to SqlTypeName.OTHER since there is no | ||
| // corresponding SqlTypeName in Calcite. | ||
| case EXPR_IP -> SqlTypeName.OTHER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the PR description to align changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding me.
| } else { | ||
| typeChecker = operator.getOperandTypeChecker(); | ||
| } | ||
| public void registerOperator(BuiltinFunctionName functionName, SqlOperator... operators) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a javadoc for this method to explain what are the operators for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
Signed-off-by: Xinyu Hao <[email protected]>
| IPAddress addr1 = IPUtils.toAddress(ip1); | ||
| IPAddress addr2 = IPUtils.toAddress(ip2); | ||
| int result = IPUtils.compare(addr1, addr2); | ||
| return switch (comparisonType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For efficiency consideration, we'd better move this switch case to the implement method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
|
||
| public static boolean compare(Object obj1, Object obj2, ComparisonType comparisonType) { | ||
| try { | ||
| String ip1 = extractIpString(obj1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not leverage ExprIpValue::compare? Then we don't need to do extra transformation for ExprIPValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for only focusing on IPUtils before and ignoring the functions encapsulated in ExprIpValue. Now fixed.
Signed-off-by: Xinyu Hao <[email protected]>
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-3821-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6c3efa1475f815fc9025a2e3c533b7b21d9ea8cf
# Push it to GitHub
git push --set-upstream origin backport/backport-3821-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
|
@ishaoxy please follow the instructions to backport manually |
* ip_compare operator added Signed-off-by: Xinyu Hao <[email protected]> * only type checker issue left Signed-off-by: Xinyu Hao <[email protected]> * fix by modifying ip.sqlTypeName from OTHER to NULL in type checker Signed-off-by: Xinyu Hao <[email protected]> * fix less Signed-off-by: Xinyu Hao <[email protected]> * modify the CalcitePPLFunctionTypeTest text Signed-off-by: Xinyu Hao <[email protected]> * allow CalciteIPComparisonIT in CalciteNoPushdownIT Signed-off-by: Xinyu Hao <[email protected]> * Modify the signature description in udf Signed-off-by: Xinyu Hao <[email protected]> * fix some typing errors Signed-off-by: Xinyu Hao <[email protected]> * modify the udfs for better style Signed-off-by: Xinyu Hao <[email protected]> * Make IpComparisonOperators an inner enum of CompareIPFunction Signed-off-by: Yuanchun Shen <[email protected]> * modify registerOperator Signed-off-by: Xinyu Hao <[email protected]> * modify registerOperator Signed-off-by: Xinyu Hao <[email protected]> * add type checker for cidr Signed-off-by: Xinyu Hao <[email protected]> * add javadoc Signed-off-by: Xinyu Hao <[email protected]> * move switch case to the implement method Signed-off-by: Xinyu Hao <[email protected]> --------- Signed-off-by: Xinyu Hao <[email protected]> Signed-off-by: Yuanchun Shen <[email protected]> Co-authored-by: Yuanchun Shen <[email protected]> (cherry picked from commit 6c3efa1)
* Add compare_ip operator udfs (#3821) * ip_compare operator added Signed-off-by: Xinyu Hao <[email protected]> * only type checker issue left Signed-off-by: Xinyu Hao <[email protected]> * fix by modifying ip.sqlTypeName from OTHER to NULL in type checker Signed-off-by: Xinyu Hao <[email protected]> * fix less Signed-off-by: Xinyu Hao <[email protected]> * modify the CalcitePPLFunctionTypeTest text Signed-off-by: Xinyu Hao <[email protected]> * allow CalciteIPComparisonIT in CalciteNoPushdownIT Signed-off-by: Xinyu Hao <[email protected]> * Modify the signature description in udf Signed-off-by: Xinyu Hao <[email protected]> * fix some typing errors Signed-off-by: Xinyu Hao <[email protected]> * modify the udfs for better style Signed-off-by: Xinyu Hao <[email protected]> * Make IpComparisonOperators an inner enum of CompareIPFunction Signed-off-by: Yuanchun Shen <[email protected]> * modify registerOperator Signed-off-by: Xinyu Hao <[email protected]> * modify registerOperator Signed-off-by: Xinyu Hao <[email protected]> * add type checker for cidr Signed-off-by: Xinyu Hao <[email protected]> * add javadoc Signed-off-by: Xinyu Hao <[email protected]> * move switch case to the implement method Signed-off-by: Xinyu Hao <[email protected]> --------- Signed-off-by: Xinyu Hao <[email protected]> Signed-off-by: Yuanchun Shen <[email protected]> Co-authored-by: Yuanchun Shen <[email protected]> (cherry picked from commit 6c3efa1) * backport to java11 Signed-off-by: Xinyu Hao <[email protected]> * backport to java11 Signed-off-by: Xinyu Hao <[email protected]> * backport to java11 Signed-off-by: Xinyu Hao <[email protected]> * backport to java11 Signed-off-by: Xinyu Hao <[email protected]> * backport to java11 Signed-off-by: Xinyu Hao <[email protected]> * backport to java11 Signed-off-by: Xinyu Hao <[email protected]> --------- Signed-off-by: Xinyu Hao <[email protected]> Signed-off-by: Yuanchun Shen <[email protected]> Co-authored-by: Yuanchun Shen <[email protected]>
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-manually manually
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-manually
# Create a new branch
git switch --create backport/backport-3821-to-manually
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6c3efa1475f815fc9025a2e3c533b7b21d9ea8cf
# Push it to GitHub
git push --set-upstream origin backport/backport-3821-to-manually
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-manuallyThen, create a pull request where the |
Description
CompareIpFunctionfor IP address comparisons.SqlTypeName.OTHER.registerOperatormethod to support registering one or multiple operators under a single function name, enabling function overloading based on operand types.Related Issues
Resolves #3776
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.