Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,17 @@
import static org.hamcrest.Matchers.nullValue;

/**
* Creates indices with all supported fields and fetches values from them to
* confirm that release builds correctly handle data types, even if they were
* introduced in later versions.
* Queries like {@code FROM * | KEEP *} can include columns of unsupported types,
* and we can run into serialization and correctness issues in mixed version clusters/CCS
* when support for a type is added in a later version.
* <p>
* Entirely skipped in snapshot builds; data types that are under
* This creates indices with all index-able fields and fetches values from them to
* confirm that we correctly handle data types, even if they were introduced in later versions.
* Generally, this means that a type is treated as unsupported if any older node is involved.
* See {@link org.elasticsearch.xpack.esql.session.Versioned} for more details on how ESQL
* handles planning for mixed version clusters.
* <p>
* This suite is entirely skipped in snapshot builds; data types that are under
* construction are normally tested well enough in spec tests, skipping
* old versions via {@link org.elasticsearch.xpack.esql.action.EsqlCapabilities}.
* <p>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,22 @@
* processing pipeline, and types which the language doesn't support, but require
* special handling anyway (e.g. {@link DataType#OBJECT})
*
* <h2>Behavior of new, previously unsupported data types</h2>
*
* Data types that have support in ES indices, but are not yet supported in ES|QL, are
* treated as {@link #UNSUPPORTED} by ES|QL. Fields of that type are filled with
* {@code null} values, and no functions support them.
* In query plans, these fields amount to
* {@link org.elasticsearch.xpack.esql.expression.function.UnsupportedAttribute}s.
* <p>
* When such a type gets support in ES|QL, query plans cannot contain it
* unless all nodes in the cluster (and remote clusters participating in the query)
* support it to avoid serialization errors and semantically invalid results.
* This is an example of version-aware query planning,
* see {@link org.elasticsearch.xpack.esql.session.Versioned}.
*
* <h2>Process for adding a new data type</h2>
*
* We assume that the data type is already supported in ES indices, but not in
* ES|QL. Types that aren't yet enabled in ES will require some adjustments to
* the process.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,17 @@
import static org.elasticsearch.xpack.esql.plan.logical.join.InlineJoin.firstSubPlan;
import static org.elasticsearch.xpack.esql.session.SessionUtils.checkPagesBelowSize;

/**
* Combines all components necessary for the coordinating node to plan and execute an ESQL query,
* including (pre-)analyzing, optimizing and running the physical plan.
* <p>
* In particular, this is where we perform remote calls to pre-analyze the query,
* that is, to resolve indices, enrich policies and their mappings.
* <p>
* Note that this is not a session in the traditional sense of a stateful connection. This will
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is not a session, I wonder if it justifies to rename the class to something else.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Thing is, we also have PlanExecutor that has overlap in responsibilities, because EsqlSession also has methods to execute ESQL queries. Maybe the execution stuff should wholly be moved to PlanExecutor and what remains in EsqlSession would then be query planning logic - justifying a rename to QueryPlanner, maybe?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, that's something for another PR IMHO.

* produce a single result set that is either returned to the user directly or stored for
* later retrieval if the query was async.
*/
public class EsqlSession {

private static final Logger LOGGER = LogManager.getLogger(EsqlSession.class);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,69 @@
import org.elasticsearch.TransportVersion;

/**
* A wrapper for objects to pass along the minimum transport version available in the cluster (and remotes),
* esp. when dealing with listeners. Where this object gets consumed, we generally need to assume that all nodes in the cluster
* (and remotes) are at the minimum transport version, so that we don't use features not supported everywhere.
* ESQL sometimes can only create query plans in a certain way if all nodes in the cluster (and participating remotes)
* are at a certain minimum transport version. This is a wrapper to carry the minimum version along with objects that
* get created during planning. Where this object gets consumed, we need to assume that all nodes are at the minimum
* transport version, so that we don't use features not supported everywhere.
*
* <h2>How does this work, in general?</h2>
*
* When an ESQL request comes in, we determine the minimum transport version of all nodes in the cluster based on the
* {@link org.elasticsearch.cluster.ClusterState}. In case of cross-cluster search, we also get the minimum versions of
* remote clusters from the field caps response during main index resolution. The minimum of the local and remote cluster
* versions is the overall minimum transport version for the request.
* <p>
* The minimum version is available in the analyzer and optimizers and can be used to make query plans that only
* work if all nodes are at or above a certain version. This is not required for new language features (commands,
* functions etc.) as it's fine to fail on the transport layer when a user uses a feature that is not supported on
* all nodes. Examples where this is required include:
* <ul>
* <li>
* When a previously unsupported ES data type gets support: when an old node participates in the query,
* the type needs to be treated as unsupported to avoid serialization errors and semantically invalid results.
* See {@link org.elasticsearch.xpack.esql.core.type.DataType}</li>
* <li>
* When an optimization relies on a feature that is only available in newer versions, but it would affect
* queries that are already supported on older clusters.</li>
* </ul>
*
* <h2>How to make version-sensitive changes</h2>
*
* Let's say that we want to create an optimization for ESQL queries like
* {@code ... | STATS ... BY field | SORT field DESC | LIMIT 10}. The optimization would fuse the commands
* into a single {@code TopNAggregate}, which corresponds to a new operator that only keeps at most N=10 groups in memory,
* as only the highest values for {@code field} will make it into the output.
* <p>
* Old nodes don't support this new {@code TopNAggregate} class, but they can already execute this query. Therefore, we
* need to make sure that this optimization is only applied when all nodes are at or above the version that introduced
* {@code TopNAggregate}, or we will break backward compatibility.
* <p>
* To achieve this, the optimizer rule can use the minimum transport version from the
* {@link org.elasticsearch.xpack.esql.optimizer.LogicalOptimizerContext}. If the minimum version is at or above the required version,
* the optimization can be applied; otherwise, it must be skipped.
* <p>
* The minimum version is available throughout the planning process; it can also be used in the analyzer
* ({@link org.elasticsearch.xpack.esql.analysis.AnalyzerContext}), or during physical planning and optimization
* ({@link org.elasticsearch.xpack.esql.planner.mapper.Mapper}, {@link org.elasticsearch.xpack.esql.optimizer.PhysicalOptimizerContext}).
*
* <h2>TRIPLE CAUTION: Backports of version-aware query planning PRs</h2>
*
* Because we use a single, global minimum transport version, backporting changes that introduce a version-sensitive planning
* change need to be handled with great care. Backporting such a change to multiple prior versions leads to situations where
* the minimum transport version indicates that a feature is supported, but some nodes don't actually support it
* because they're not on the latest patch version.
* <p>
* Let's say we introduce an optimization in 9.5.0 that requires a different query plan that older nodes
* don't support. Backporting this to 9.4.9 is likely fine, as the minimum transport version of a mixed 9.4.x/9.5+ setup
* (cluster or CCS) will be 9.4.x, so this optimization will be (correctly) disabled on pre-9.4.9 nodes.
* <p>
* However, if we also backport to 9.3.7, we have a problem: A 9.3.7/9.4.x/9.5+ setup will have a minimum transport version
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a solution, or we leave that to the author to figure out when the face this issue? Or is this a case where we simple cannot backport fixes to more than one minor, ever?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To solve this we'd have to be aware of not only the global minimum version, but at least of the minimum version per cluster. (Assuming that each cluster isn't in a rolling upgrade from a version with support to a version without support for a given version-aware planning change. Theoretically, knowing all different versions that participate is needed to solve this in all situations.)

I didn't want to write this here because I don't think that's complexity we want to introduce. For the time being, I'd say that backporting a version-aware planning change more than 1 minor version is just not supported.

* of 9.3.7, so the optimization will be enabled even if the 9.4.x nodes don't support the new plan because they're not patched
* to 9.4.9 yet.
* <p>
* Rolling upgrades from 9.3.7 to 9.4.8 would have the same problem. That's generally ok, as rolling upgrades should be performed
* to the latest patch version of a given minor version; however, if 9.3.7 gets released before 9.4.9, users may not be
* able to perform rolling upgrades safely at all.
*
*/
public record Versioned<T>(T inner, TransportVersion minimumVersion) {}