diff --git a/presto-docs/src/main/sphinx/index.rst b/presto-docs/src/main/sphinx/index.rst index 8a91e003b350..f99bd4c5c5ee 100644 --- a/presto-docs/src/main/sphinx/index.rst +++ b/presto-docs/src/main/sphinx/index.rst @@ -10,6 +10,7 @@ Presto Documentation installation security admin + optimizer connector functions language diff --git a/presto-docs/src/main/sphinx/optimizer.rst b/presto-docs/src/main/sphinx/optimizer.rst new file mode 100644 index 000000000000..ac160e3f9271 --- /dev/null +++ b/presto-docs/src/main/sphinx/optimizer.rst @@ -0,0 +1,10 @@ +*************** +Query Optimizer +*************** + +.. toctree:: + :maxdepth: 1 + + optimizer/statistics + optimizer/cost-in-explain + optimizer/cost-based-optimizations diff --git a/presto-docs/src/main/sphinx/optimizer/cost-based-optimizations.rst b/presto-docs/src/main/sphinx/optimizer/cost-based-optimizations.rst new file mode 100644 index 000000000000..f76f47dce53c --- /dev/null +++ b/presto-docs/src/main/sphinx/optimizer/cost-based-optimizations.rst @@ -0,0 +1,76 @@ +======================== +Cost based optimizations +======================== + +Presto supports several cost based optimizations, described below. + +Join Enumeration +---------------- + +The order in which joins are executed in a query can have a significant impact +on the query's performance. The aspect of join ordering that has the largest +impact on performance is the size of the data being processed and transferred +over the network. If a join that produces a lot of data is performed early in +the execution, then subsequent stages will need to process large amounts of +data for longer than necessary, increasing the time and resources needed for +the query. + +With cost based join enumeration, Presto uses +:doc:`/optimizer/statistics` provided by connectors to estimate +the costs for different join orders and automatically pick the +join order with the lowest computed costs. + +The join enumeration strategy is governed by the ``join_reordering_strategy`` +session property, with the ``optimizer.join-reordering-strategy`` +configuration property providing the default value. + +The valid values are: + * ``AUTOMATIC`` (default) - full automatic join enumeration enabled + * ``ELIMINATE_CROSS_JOINS`` - eliminate unnecessary cross joins + * ``NONE`` - purely syntactic join order + +If using ``AUTOMATIC`` and statistics are not available, or if for any other +reason a cost could not be computed, the ``ELIMINATE_CROSS_JOINS`` strategy is +used instead. + +Join Distribution Selection +--------------------------- + +Presto uses a hash based join algorithm. That implies that for each join +operator a hash table must be created from one join input (called build side). +The other input (probe side) is then iterated and for each row the hash table is +queried to find matching rows. + +There are two types of join distributions: + * Partitioned: each node participating in the query builds a hash table + from only fraction of the data + * Broadcast: each node participating in the query builds a hash table + from all of the data (data is replicated to each node) + +Each type have their trade offs. Partitioned joins require redistributing both +tables using a hash of the join key. This can be slower (sometimes +substantially) than broadcast joins, but allows much larger joins. In +particular, broadcast joins will be faster if the build side is much smaller +than the probe side. However, broadcast joins require that the tables on the +build side of the join after filtering fit in memory on each node, whereas +distributed joins only need to fit in distributed memory across all nodes. + +With cost based join distribution selection, Presto automatically chooses to +use a partitioned or broadcast join. With cost based join enumeration, Presto +automatically chooses which side is the probe and which is the build. + +The join distribution strategy is governed by the ``join_distribution_type `` +session property, with the ``join-distribution-type`` configuration property +providing the default value. + +The valid values are: + * ``AUTOMATIC`` (default) - join distribution type is determined automatically + for each join + * ``BROADCAST`` - broadcast join distribution is used for all joins + * ``PARTITIONED`` - partitioned join distribution is used for all join + +Connector Implementations +------------------------- + +In order for the Presto optimizer to use the cost based strategies, +the connection implementation must provide :doc:`statistics`. diff --git a/presto-docs/src/main/sphinx/optimizer/cost-in-explain.rst b/presto-docs/src/main/sphinx/optimizer/cost-in-explain.rst new file mode 100644 index 000000000000..4f8a38d3cffd --- /dev/null +++ b/presto-docs/src/main/sphinx/optimizer/cost-in-explain.rst @@ -0,0 +1,44 @@ +=============== +Cost in EXPLAIN +=============== + +During planning, the cost associated with each node of the plan is computed +based on the table statistics for the tables in the query. This calculated +cost is printed as part of the output of an :doc:`/sql/explain` statement. + +Cost information is displayed in the plan tree using the format ``{rows: XX +(XX), cpu: XX, memory: XX, network: XX}``. ``rows`` refers to the expected +number of rows output by each plan node during execution. The value in the +parentheses following the number of rows refers to the expected size of the data +output by each plan node in bytes. Other parameters indicate the estimated +amount of CPU, memory, and network utilized by the execution of a plan node. +These values do not represent any actual unit, but are numbers that are used to +compare the relative costs between plan nodes, allowing the optimizer to choose +the best plan for executing a query. If any of the values is not known, a ``?`` +is printed. + +For example: + +.. code-block:: none + + presto:default> EXPLAIN SELECT comment FROM tpch.sf1.nation WHERE nationkey > 3; + + - Output[comment] => [[comment]] + Estimates: {rows: 22 (1.69kB), cpu: 6148.25, memory: 0.00, network: 1734.25} + - RemoteExchange[GATHER] => [[comment]] + Estimates: {rows: 22 (1.69kB), cpu: 6148.25, memory: 0.00, network: 1734.25} + - ScanFilterProject[table = tpch:nation:sf1.0, filterPredicate = ("nationkey" > BIGINT '3')] => [[comment]] + Estimates: {rows: 25 (1.94kB), cpu: 2207.00, memory: 0.00, network: 0.00}/{rows: 22 (1.69kB), cpu: 4414.00, memory: 0.00, network: 0.00}/{rows: 22 (1.69kB), cpu: 6148.25, memory: 0.00, network: 0.00} + nationkey := tpch:nationkey + comment := tpch:comment + +Generally, there is only one cost printed for each plan node. However, when a +``Scan`` operator is combined with a ``Filter`` and/or ``Project`` operator, +then multiple cost structures will be printed, each corresponding to an +individual logical part of the combined operator. For example, three cost +structures will be printed for a ``ScanFilterProject`` operator, corresponding +to the ``Scan``, ``Filter``, and ``Project`` parts of the operator, in that order. + +Estimated cost is also printed in :doc:`/sql/explain-analyze` in addition to actual +runtime statistics. + diff --git a/presto-docs/src/main/sphinx/optimizer/statistics.rst b/presto-docs/src/main/sphinx/optimizer/statistics.rst new file mode 100644 index 000000000000..a0c562abeb4a --- /dev/null +++ b/presto-docs/src/main/sphinx/optimizer/statistics.rst @@ -0,0 +1,52 @@ +================ +Table Statistics +================ + +Presto supports statistics based optimizations for queries. For a query to take +advantage of these optimizations, Presto must have statistical information for +the tables in that query. + +Table statistics are provided to the query planner by connectors. Currently, the +only connector that supports statistics is the :doc:`/connector/hive`. + +Table Layouts +------------- + +Statistics are exposed to the query planner by a table layout. A table layout +represents a subset of a table's data and contains information about the +organizational properties of that data (like sort order and bucketing). + +The number of table layouts available for a table and the details of those table +layouts are specific to each connector. Using the Hive connector as an example: + +* Non-partitioned tables have just one table layout representing all data in the table +* Partitioned tables have a family of table layouts. Each set of partitions to + be scanned represents one table layout. Presto will try to pick a table + layout consisting of the smallest number of partitions based on filtering + predicates from the query. + +Available Statistics +-------------------- + +The following statistics are available in Presto: + + * For a table: + + * **row count**: the total number of rows in the table layout + + * For each column in a table: + + * **data size**: the size of the data that needs to be read + * **nulls fraction**: the fraction of null values + * **distinct value count**: the number of distinct values + * **low value**: the smallest value in the column + * **high value**: the largest value in the column + +The set of statistics available for a particular query depends on the connector +being used and can also vary by table or even by table layout. For example, the +Hive connector does not currently provide statistics on data size. + +Table statistics can be displayed via the Presto SQL interface using the +:doc:`/sql/show-stats` command. For the Hive connector, refer to the +:ref:`Hive connector ` documentation to learn how to update table +statistics.