Skip to content

Conversation

@MBkkt
Copy link
Collaborator

@MBkkt MBkkt commented Nov 1, 2025

What is done?

Different handling of "and", "or", "not"
Different makeOr
Use real notEqual in more cases

Why is it good?

See Axiom tpch q19

Before

Executable Velox plan:

Fragment 0:  numWorkers=0:
-- Aggregation[5][SINGLE revenue := sum("dt1.__p121")] -> revenue:DOUBLE
  -- Project[4][expressions: (dt1.__p121:DOUBLE, multiply("l_extendedprice",minus(1,"l_discount")))] -> "dt1.__p121":DOUBLE
    -- Filter[0][expression: or(and(between(cast("p_size" as BIGINT),1,15),and(lte("l_quantity",30),and(gte("l_quantity",20),and(eq("p_brand",Brand#34),in("p_container",{LG CASE, LG BOX, LG PACK, LG PKG}))))),or(and(between(cast("p_size" as BIGINT),1,5),and(lte("l_quantity",11),and(gte("l_quantity",1),and(eq("p_brand",Brand#12),in("p_container",{SM CASE, SM BOX, SM PACK, SM PKG}))))),and(between(cast("p_size" as BIGINT),1,10),and(lte("l_quantity",20),and(gte("l_quantity",10),and(eq("p_brand",Brand#23),in("p_container",{MED BAG, MED BOX, MED PKG, MED PACK})))))))] -> l_quantity:DOUBLE, l_extendedprice:DOUBLE, l_discount:DOUBLE, p_brand:VARCHAR, p_size:INTEGER, p_container:VARCHAR
      -- HashJoin[3][INNER l_partkey=p_partkey] -> l_quantity:DOUBLE, l_extendedprice:DOUBLE, l_discount:DOUBLE, p_brand:VARCHAR, p_size:INTEGER, p_container:VARCHAR
        -- TableScan[1][table: lineitem, range filters: [(l_shipinstruct, Filter(BytesValues, deterministic, null not allowed)), (l_shipmode, Filter(BytesValues, deterministic, null not allowed))], remaining filter: (or(and(gte("l_quantity",20),lte("l_quantity",30)),or(and(gte("l_quantity",1),lte("l_quantity",11)),and(gte("l_quantity",10),lte("l_quantity",20))))), data columns: ROW<l_orderkey:BIGINT,l_partkey:BIGINT,l_suppkey:BIGINT,l_linenumber:INTEGER,l_quantity:DOUBLE,l_extendedprice:DOUBLE,l_discount:DOUBLE,l_tax:DOUBLE,l_returnflag:VARCHAR,l_linestatus:VARCHAR,l_shipdate:DATE,l_commitdate:DATE,l_receiptdate:DATE,l_shipinstruct:VARCHAR,l_shipmode:VARCHAR,l_comment:VARCHAR>] -> l_partkey:BIGINT, l_quantity:DOUBLE, l_extendedprice:DOUBLE, l_discount:DOUBLE
        -- TableScan[2][table: part, remaining filter: (or(and(between(cast("p_size" as BIGINT),1,15),and(eq("p_brand",Brand#34),in("p_container",{LG CASE, LG BOX, LG PACK, LG PKG}))),or(and(between(cast("p_size" as BIGINT),1,5),and(eq("p_brand",Brand#12),in("p_container",{SM CASE, SM BOX, SM PACK, SM PKG}))),and(between(cast("p_size" as BIGINT),1,10),and(eq("p_brand",Brand#23),in("p_container",{MED BAG, MED BOX, MED PKG, MED PACK})))))), data columns: ROW<p_partkey:BIGINT,p_name:VARCHAR,p_mfgr:VARCHAR,p_brand:VARCHAR,p_type:VARCHAR,p_size:INTEGER,p_container:VARCHAR,p_retailprice:DOUBLE,p_comment:VARCHAR>] -> p_partkey:BIGINT, p_brand:VARCHAR, p_size:INTEGER, p_container:VARCHAR

After

Executable Velox plan:

Fragment 0:  numWorkers=0:
-- Aggregation[5][SINGLE revenue := sum("dt1.__p121")] -> revenue:DOUBLE
  -- Project[4][expressions: (dt1.__p121:DOUBLE, multiply("l_extendedprice",minus(1,"l_discount")))] -> "dt1.__p121":DOUBLE
    -- Filter[0][expression: or(and(between(cast("p_size" as BIGINT),1,15),and(lte("l_quantity",30),and(gte("l_quantity",20),and(eq("p_brand",Brand#34),in("p_container",{LG CASE, LG BOX, LG PACK, LG PKG}))))),or(and(between(cast("p_size" as BIGINT),1,5),and(lte("l_quantity",11),and(gte("l_quantity",1),and(eq("p_brand",Brand#12),in("p_container",{SM CASE, SM BOX, SM PACK, SM PKG}))))),and(between(cast("p_size" as BIGINT),1,10),and(lte("l_quantity",20),and(gte("l_quantity",10),and(eq("p_brand",Brand#23),in("p_container",{MED BAG, MED BOX, MED PKG, MED PACK})))))))] -> l_quantity:DOUBLE, l_extendedprice:DOUBLE, l_discount:DOUBLE, p_brand:VARCHAR, p_size:INTEGER, p_container:VARCHAR
      -- HashJoin[3][INNER l_partkey=p_partkey] -> l_quantity:DOUBLE, l_extendedprice:DOUBLE, l_discount:DOUBLE, p_brand:VARCHAR, p_size:INTEGER, p_container:VARCHAR
        -- TableScan[1][table: lineitem, range filters: [(l_quantity, Filter(MultiRange, deterministic, null not allowed)), (l_shipinstruct, Filter(BytesValues, deterministic, null not allowed)), (l_shipmode, Filter(BytesValues, deterministic, null not allowed))], data columns: ROW<l_orderkey:BIGINT,l_partkey:BIGINT,l_suppkey:BIGINT,l_linenumber:INTEGER,l_quantity:DOUBLE,l_extendedprice:DOUBLE,l_discount:DOUBLE,l_tax:DOUBLE,l_returnflag:VARCHAR,l_linestatus:VARCHAR,l_shipdate:DATE,l_commitdate:DATE,l_receiptdate:DATE,l_shipinstruct:VARCHAR,l_shipmode:VARCHAR,l_comment:VARCHAR>] -> l_partkey:BIGINT, l_quantity:DOUBLE, l_extendedprice:DOUBLE, l_discount:DOUBLE
        -- TableScan[2][table: part, remaining filter: (or(and(between(cast("p_size" as BIGINT),1,15),and(eq("p_brand",Brand#34),in("p_container",{LG CASE, LG BOX, LG PACK, LG PKG}))),or(and(between(cast("p_size" as BIGINT),1,5),and(eq("p_brand",Brand#12),in("p_container",{SM CASE, SM BOX, SM PACK, SM PKG}))),and(between(cast("p_size" as BIGINT),1,10),and(eq("p_brand",Brand#23),in("p_container",{MED BAG, MED BOX, MED PKG, MED PACK})))))), data columns: ROW<p_partkey:BIGINT,p_name:VARCHAR,p_mfgr:VARCHAR,p_brand:VARCHAR,p_type:VARCHAR,p_size:INTEGER,p_container:VARCHAR,p_retailprice:DOUBLE,p_comment:VARCHAR>] -> p_partkey:BIGINT, p_brand:VARCHAR, p_size:INTEGER, p_container:VARCHAR

@netlify
Copy link

netlify bot commented Nov 1, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit af87b76
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/690cc2e550d1220008210244

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 1, 2025
@MBkkt MBkkt force-pushed the mbkkt/improve-subfield-filters branch 2 times, most recently from ab6aac8 to d8824ad Compare November 1, 2025 13:17
@MBkkt MBkkt requested review from Yuhta and mbasmanova November 1, 2025 13:21
@MBkkt MBkkt force-pushed the mbkkt/improve-subfield-filters branch from d8824ad to 1daf10f Compare November 1, 2025 14:05
const core::TypedExprPtr& expr,
core::ExpressionEvaluator*);
core::ExpressionEvaluator* evaluator,
bool negated = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this PR may combine multiple changes. Some might be optimizations that do not change behavior, others are API changes. Would it be possible to split these into separate PRs and add tests for newly introduced functionality?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't change API

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just add recursive case, it can be implemented as separate recursive function in anonymous namespace, but why? So I think api wasn't changed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is exposed in the header file, hence, I assume it is public API. It used to have 2 parameters, now it has 3. This seems like a change. If we do not expect users to specify the 3rd argument, then we should remove it and add a helper function to .cpp file.

I see this API wasn't documented before. Would you help document it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is exposed in the header file, hence, I assume it is public API. It used to have 2 parameters, now it has 3. This seems like a change.

It's has default value, so it's not changed, it's extended.

I see this API wasn't documented before. Would you help document it?

ok

@MBkkt MBkkt force-pushed the mbkkt/improve-subfield-filters branch from 1daf10f to 1645bdc Compare November 3, 2025 10:55
@Yuhta Yuhta changed the title perf: Improve subfield filters perf: Improve expr to subfield filters Nov 3, 2025
return std::make_unique<common::BigintMultiRange>(
std::move(newRanges), false);
} catch (...) {
// Found overlapping ranges, fall back to MultiRange.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log some WARNING here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not warning? We just didn't write code that merge overlapping ranges, it doesn't mean something wrong with user request

bool conjunction = call->name() == "and";
if (conjunction || call->name() == "or") {
if (call->inputs().empty()) {
return {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always true/false

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's wrong, I made comment why.

return bigintOr(
asUniquePtr<common::BigintRange>(std::move(a)),
asUniquePtr<common::BigintRange>(std::move(b)));
VELOX_DCHECK_NOT_NULL(a);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change on makeOrFilter probably should be a separate PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot of time for this PR.
I made this in my free time, just because it was easy to fix and looks overall better.

What can we do with this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MBkkt Maybe you can try to find someone else who'd be interested in picking up this work. Otherwise, may have to abandon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova I created the issue #15426

If you think better to have code with all these flaws there's nothing else what I can do without allocating additional time on this.


std::unique_ptr<common::Filter> lessThanFilter =
makeLessThanFilter(valueExpr, evaluator);
switch (value->typeKind()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change on makeNotEqualFilter is another PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot of time for this PR.
I made this in my free time, just because it was easy to fix and looks overall better.

What can we do with this?

@MBkkt MBkkt force-pushed the mbkkt/improve-subfield-filters branch 3 times, most recently from f341c39 to 8703b32 Compare November 6, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants