prestodb · rschlussel · Jan 7, 2020 · Aug 13, 2018 · Aug 13, 2018
@@ -10,6 +10,7 @@ Presto Documentation
     installation
     security
     admin
+    optimizer
     connector
     functions
     language

@@ -0,0 +1,10 @@
+***************
+Query Optimizer
+***************
+
+.. toctree::
+    :maxdepth: 1
+
+    optimizer/statistics
+    optimizer/cost-in-explain
+    optimizer/cost-based-optimizations
@@ -0,0 +1,76 @@
+========================
+Cost based optimizations
+========================
+
+Presto supports several cost based optimizations, described below.
+
+Join Enumeration
+----------------
+
+The order in which joins are executed in a query can have a significant impact
+on the query's performance. The aspect of join ordering that has the largest
+impact on performance is the size of the data being processed and transferred
+over the network. If a join that produces a lot of data is performed early in
+the execution, then subsequent stages will need to process large amounts of
+data for longer than necessary, increasing the time and resources needed for
+the query.
+
+With cost based join enumeration, Presto uses
+cdoc:`/optimizer/statistics` provided by connectors to estimate
+the costs for different join orders and automatically pick the
+join order with the lowest computed costs.
+
+The join enumeration strategy is governed by the ``join_reordering_strategy``
+session property, with the ``optimizer.join-reordering-strategy``
+configuration property providing the default value.
+
+The valid values are:
+ * ``AUTOMATIC`` - full automatic join enumeration enabled
+ * ``ELIMINATE_CROSS_JOINS`` (default) - eliminate unnecessary cross joins
+ * ``NONE`` - purely syntactic join order
+
+If using ``AUTOMATIC`` and statistics are not available, or if for any other
+reason a cost could not be computed, the ``ELIMINATE_CROSS_JOINS`` strategy is
+used instead.
+
+Join Distribution Selection
+---------------------------
+
+Presto uses a hash based join algorithm. That implies that for each join
+operator a hash table must be created from one join input (called build side).
+The other input (probe side) is then iterated and for each row the hash table is
+queried to find matching rows.
+
+There are two types of join distributions:
+ * Partitioned: each node participating in the query builds a hash table
+   from only a fraction of the data
+ * Broadcast: each node participating in the query builds a hash table
+   from all of the data (data is replicated to each node)
+
+Each type have its trade offs. Partitioned joins require redistributing both
+tables using a hash of the join key. This can be slower (sometimes
+substantially) than broadcast joins, but allows much larger joins. In
+particular, broadcast joins will be faster if the build side is much smaller
+than the probe side. However, broadcast joins require that the tables on the
+build side of the join after filtering fit in memory on each node, whereas
+distributed joins only need to fit in distributed memory across all nodes.
+
+With cost based join distribution selection, Presto automatically chooses whether to
+use a partitioned or broadcast join. With both cost based join enumeration and cost based join distribution, Presto
+automatically chooses which side is the probe and which is the build.
+
+The join distribution type is governed by the ``join_distribution_type``
+session property, with the ``join-distribution-type`` configuration
+property providing the default value.
+
+The valid values are:
+ * ``AUTOMATIC`` - join distribution type is determined automatically
+   for each join
+ * ``BROADCAST`` - broadcast join distribution is used for all joins
+ * ``PARTITIONED`` (default) - partitioned join distribution is used for all join
+
+Connector Implementations
+-------------------------
+
+In order for the Presto optimizer to use the cost based strategies,
+the connector implementation must provide :doc:`statistics`.
@@ -0,0 +1,44 @@
+===============
+Cost in EXPLAIN
+===============
+
+During planning, the cost associated with each node of the plan is computed
+based on the table statistics for the tables in the query. This calculated
+cost is printed as part of the output of an :doc:`/sql/explain` statement.
+
+Cost information is displayed in the plan tree using the format ``{rows: XX
+(XX), cpu: XX, memory: XX, network: XX}``.  ``rows`` refers to the expected
+number of rows output by each plan node during execution.  The value in the
+parentheses following the number of rows refers to the expected size of the data
+output by each plan node in bytes. Other parameters indicate the estimated
+amount of CPU, memory, and network utilized by the execution of a plan node.
+These values do not represent any actual unit, but are numbers that are used to
+compare the relative costs between plan nodes, allowing the optimizer to choose
+the best plan for executing a query. If any of the values is not known, a ``?``
+is printed.
+
+For example:
+
+.. code-block:: none
+
+ presto:default> EXPLAIN SELECT comment FROM tpch.sf1.nation WHERE nationkey > 3;
+
+ - Output[comment] => [[comment]]
+         Estimates: {rows: 22 (1.69kB), cpu: 6148.25, memory: 0.00, network: 1734.25}
+     - RemoteExchange[GATHER] => [[comment]]
+             Estimates: {rows: 22 (1.69kB), cpu: 6148.25, memory: 0.00, network: 1734.25}
+         - ScanFilterProject[table = tpch:nation:sf1.0, filterPredicate = ("nationkey" > BIGINT '3')] => [[comment]]
+                 Estimates: {rows: 25 (1.94kB), cpu: 2207.00, memory: 0.00, network: 0.00}/{rows: 22 (1.69kB), cpu: 4414.00, memory: 0.00, network: 0.00}/{rows: 22 (1.69kB), cpu: 6148.25, memory: 0.00, network: 0.00}
+                 nationkey := tpch:nationkey
+                 comment := tpch:comment
+
+Generally, there is only one cost printed for each plan node.  However, when a
+``Scan`` operator is combined with a ``Filter`` and/or ``Project`` operator,
+then multiple cost structures will be printed, each corresponding to an
+individual logical part of the combined operator. For example, three cost
+structures will be printed for a ``ScanFilterProject`` operator, corresponding
+to the ``Scan``, ``Filter``, and ``Project`` parts of the operator, in that order.
+
+Estimated cost is also printed in :doc:`/sql/explain-analyze` in addition to actual
+runtime statistics.
+
@@ -0,0 +1,52 @@
+================
+Table Statistics
+================
+
+Presto supports statistics based optimizations for queries. For a query to take
+advantage of these optimizations, Presto must have statistical information for
+the tables in that query.
+
+Table statistics are provided to the query planner by connectors.  Currently, the
+only connector that supports statistics is the :doc:`/connector/hive`.
+
+Table Layouts
+-------------
+
+Statistics are exposed to the query planner by a table layout. A table layout
+represents a subset of a table's data and contains information about the
+organizational properties of that data (like sort order and bucketing).
+
+The number of table layouts available for a table and the details of those table
+layouts are specific to each connector.  Using the Hive connector as an example:
+
+* Non-partitioned tables have just one table layout representing all data in the table
+* Partitioned tables have a family of table layouts. Each set of partitions to
+  be scanned represents one table layout.  Presto will try to pick a table
+  layout consisting of the smallest number of partitions based on filtering
+  predicates from the query.
+
+Available Statistics
+--------------------
+
+The following statistics are available in Presto:
+
+ * For a table:
+
+   * **row count**: the total number of rows in the table layout
+
+ * For each column in a table:
+
+   * **data size**: the size of the data that needs to be read
+   * **nulls fraction**: the fraction of null values
+   * **distinct value count**: the number of distinct values
+   * **low value**: the smallest value in the column
+   * **high value**: the largest value in the column
+
+The set of statistics available for a particular query depends on the connector
+being used and can also vary by table or even by table layout. For example, the
+Hive connector does not currently provide statistics on data size.
+
+Table statistics can be displayed via the Presto SQL interface using the
+:doc:`/sql/show-stats` command. For the Hive connector, refer to the
+:ref:`Hive connector <hive_analyze>` documentation to learn how to update table
+statistics.