-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(frontend): add shrink cast for condition.split_to_scan_ranges() #7962
Conversation
Do u think this implementation works? @xiangjinwu @chenzl25 PTAL If it works, I can complete the rest todo in this PR or later PR. TODO:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @xiangjinwu PTAL. I am not quite sure whether it is the best way to deal with the implicit cast and 'shrink' cast.
src/frontend/src/expr/mod.rs
Outdated
_ => return Ok(CastResult::Failed), | ||
}; | ||
if let Some(scalar) = self.eval_row_const()? { | ||
let value = scalar.as_integral(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, will we be able to handle float
, decimal
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question... So it is not enough to only check for minimum and maximum. For where id < 2.5
and where id > 2.5
, we need to round in different directions: ceil
to id < 3
and floor
to id > 2
. And when it is id = 2.5
, this should evaluate to false
. And where id <= 2.5
to where id <= 2
, where id >= 2.5
to where id >= 3
.
There may be more complicated edge cases we are missing. Without optimization, the system always casts to a "larger" type and compare in that type. But in this optimization, we always want to compare in the type of order column, which may not be the "larger" type. Mimicking this CMP_B(A_TO_B(x), A_TO_B(y))
via CMP_A(B_TO_A(x), B_TO_A(y))
requires us to find the proper B_TO_A
function given existing A_TO_B
, CMP_B
and CMP_A
for all possible inputs.
More edge cases (PostgreSQL):
test=# select bigint '9223372036854775807' = double precision '9223372036854775296';
?column?
----------
t
(1 row)
test=# select (double precision '9223372036854775296')::bigint;
ERROR: bigint out of range
test=# select (double precision '9223372036854775295')::bigint;
int8
---------------------
9223372036854774784
(1 row)
Given it is an optimization, my suggestions is just to do it with best effort, and remain unoptimized for cases hard to handle correctly. int16/int32/int64 may be a good subset.
Also given its best-effort nature, the "cast_shrink" series helper functions should be limited to a mod for this special optimization, rather than on ExprImpl
.
src/frontend/src/expr/mod.rs
Outdated
/// Shorthand to create cast expr to `target` type in implicit context. | ||
pub fn cast_implicit(self, target: DataType) -> Result<ExprImpl> { | ||
pub fn cast_implicit(self, target: DataType) -> std::result::Result<ExprImpl, ErrorCode> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, why is it necessary to change the error type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For RwError, it will capture the backtrace so the cost of constructing the RwError is high. There is a known issue #6131.
In cast_implicit_or_shrink, We will store Error return form cast_implicit before call cast_shrink, so if cast_implicit return a RwError, the performance is unacceptable.
dc25f4a
to
77dd7c4
Compare
src/frontend/src/utils/condition.rs
Outdated
_ => Err(ErrorCode::BindError(format!( | ||
"Cannot cast type \"{}\" to \"{}\".", | ||
const_expr.return_type(), | ||
target | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is now thrown all the way up to the user:
dev=> create table t (v1 int primary key);
CREATE_TABLE
dev=> select * from t where v1 < 1.5;
ERROR: QueryError: Bind error: Cannot cast type "numeric" to "integer".
But it should just be a signal to skip the scan range optimization. Talking about its caller analyze_eq_const_expr
behavior, it should do one of the 3 things:
- return
Ok(true)
so that the upper layerreturn Ok(false_cond())
- assign
eq_conds
andreturn Ok(false)
- signal the upper layer it cannot match a known optimization and it should just
other_conds.push(expr);
We are lacking the last possibility right now.
77dd7c4
to
dcdf1e2
Compare
Summarize the todo work:
For these todo work, I add the related plan test named |
Codecov Report
@@ Coverage Diff @@
## main #7962 +/- ##
=======================================
Coverage 71.57% 71.58%
=======================================
Files 1132 1132
Lines 182874 182962 +88
=======================================
+ Hits 130897 130976 +79
- Misses 51977 51986 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation looks correct, but I would like to try harder and see if the code structure, function names and doc comments can be easier to understand 🤯
dcdf1e2
to
4b46b98
Compare
The interface and process before is indeed complex.😭 I have try to make it clear. Please let me know if there are still change needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the refactor!
src/frontend/src/utils/condition.rs
Outdated
}; | ||
|
||
let Some(new_cond) = new_expr.eval_row_const()? else { | ||
// column = NULL, PK column never be NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI even NULL = NULL
is not true
. The result is NULL
. Anything = NULL
is always NULL
.
3f70dbd
to
bf41eea
Compare
bf41eea
to
269912a
Compare
Hey @ZENOTME, this pull request failed to merge and has been dequeued from the merge train. If you believe your PR failed in the merge train because of a flaky test, requeue it by clicking "Update branch" or pushing an empty commit with |
…7962) * specify integral type in pgwire-extended * support mismatch process * support more plan test * refine implementation * refine plan test * refine implementation --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
assign type suffix to smallint,int,bigint to solve bug: prepared statement ignores explicit types for actual query results #7900
add shrink cast which can cast a bigger range type to smaller range type, e.g.: bigint -> int
related issue: #7916
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note