- Materialized performance 101 (skip forward to 5:13)
- Materialized internals 101.
- Introduction to one-off queries.
- Materialize Decorrelation explained in Jamie Brandon’s Blog.
Representations:
SQL
— source languageAST
— a parsed version of a SQL query.HIR
— high-level intermediate representation.MIR
— mid-level intermediate representation.LIR
— low-level intermediate representation.TDO
— target language (timely & differential operators).
Transformations in the compile-time lifecycle of a dataflow.
SQL ⇒ AST
.- Parsing the SQL query.
AST ⇒ AST
- Resolving names against the catalog.
CatalogItemType
lists the kinds of objects that can be resolved against the catalog.
- Resolving names against the catalog.
AST ⇒ HIR
.- Resolving column references, column aliases, and table aliases
- If the SQL query is a one-off, the outermost
TopK
is converted to a RowSetFinishing at this point. EXPLAIN RAW PLAN
returns the result of transformations up to this point.
HIR ⇒ HIR
HIR ⇒ MIR
.- Decorrelation:
- Correlated queries are rewritten as graphs with join and distinct.
- Lowering — express SQL-specific concepts as dataflow sub-graphs:
- Outer joins are decomposed into multiple inner joins (see README.md).
- Machinery for introducing defaults in empty global aggregates.
- Machinery for introducing errors for
SELECT
subqueries with more than one return value.
EXPLAIN DECORRELATED PLAN
returns the result of transformations up to this point.
- Decorrelation:
MIR ⇒ MIR
.- If the query is a view definition, run per-view logical optimizations against the SQL query. The catalog stores the result of transformations up to this point.
- Construct a dataflow for the query:
- If the query depends on not-materialized views, the definitions of the not-materialized views get inlined.
- For each materialized view that a query depends on, import all of its
materializations. (This corresponds to all indexes on that view, which
you can see if you call
SHOW INDEXES IN <view>
).
- Run optimizations against the dataflow:
- Per-view logical.
- Cross-view logical.
- Propagating source information up: optimize_dataflow_monotonic
- Pushing optimizations down to sources:
LinearOperators
- View inlining.
- Theoretically supports producing more than one index/sink in the same dataflow.
- Per-view logical (second round).
- Per-view physical.
EXPLAIN OPTIMIZED PLAN
returns the result of transformations up to this point.
MIR ⇒ LIR
.- Decisions are made regarding rendering.
- All aggregations are created equal in MIR, but from the rendering perspective, aggregations are evaluated differently according to what data needs to be kept to recalculate the aggregation after receiving a diff. A pictorial version can be found here.
- Joins are broken down into multiple stages, and filters + projects run between each stage to shrink the intermediate result.
- RelationTypes (column types + unique keys) are discarded since we do no key or type of validation at render time.
EXPLAIN PHYSICAL PLAN
returns the result of transformations up to this point.
- Decisions are made regarding rendering.
LIR ⇒ TDO
.
For a one-off query, we run all the transformations until the MIR stage. Then we
determine whether we need to serve the query on the "slow path", that is,
creating a temporary dataflow and then deleting it. If we don't need to serve
the query on the "slow path", then we can skip the MIR ⇒ LIR
and the LIR ⇒ TDO
steps.
Existing "fast paths" include:
Currently, the optimization team is mostly concerned with the HIR ⇒ MIR
and MIR ⇒ MIR
stages.
- Sqllogictest
- Philip’s RQG tests will be in this format.
- Add Philip to any PR where query plans may change.
- A PR can be merged if it passes Fast SLT.
- A PR does not need to pass Full SLT tests (
test/sqllogictest/sqlite
) to be merged.- Full SLT tests take 2-3 hours.
- You can manually initiate full SLT tests on your branch here.
- Philip’s RQG tests will be in this format.
- Testdrive
- We generally do not use testdrive except to see linear operators in action.
- Datadriven
- Transform unit tests currently allow:
- testing each transformation independently of the others.
- Printing out which block of transformations change the plan and how.
- Unit tests in the mz-expr crate currently allow:
- Testing the simplifying MirScalarExpr, predicates, join equivalences.
- Testing MapFilterProject.
- There is a DSL to specifying arbitrary MIRs.
- DSL to specify arbitrary enums and structs.
- Transform unit tests currently allow: