-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ESQL: Add javadoc that explains version-aware planning #139706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6f297b1
1bfdc3e
ae1494a
60ae208
18a233e
9a8ba11
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,8 +10,69 @@ | |
| import org.elasticsearch.TransportVersion; | ||
|
|
||
| /** | ||
| * A wrapper for objects to pass along the minimum transport version available in the cluster (and remotes), | ||
| * esp. when dealing with listeners. Where this object gets consumed, we generally need to assume that all nodes in the cluster | ||
| * (and remotes) are at the minimum transport version, so that we don't use features not supported everywhere. | ||
| * ESQL sometimes can only create query plans in a certain way if all nodes in the cluster (and participating remotes) | ||
| * are at a certain minimum transport version. This is a wrapper to carry the minimum version along with objects that | ||
| * get created during planning. Where this object gets consumed, we need to assume that all nodes are at the minimum | ||
| * transport version, so that we don't use features not supported everywhere. | ||
| * | ||
| * <h2>How does this work, in general?</h2> | ||
| * | ||
| * When an ESQL request comes in, we determine the minimum transport version of all nodes in the cluster based on the | ||
| * {@link org.elasticsearch.cluster.ClusterState}. In case of cross-cluster search, we also get the minimum versions of | ||
| * remote clusters from the field caps response during main index resolution. The minimum of the local and remote cluster | ||
| * versions is the overall minimum transport version for the request. | ||
| * <p> | ||
| * The minimum version is available in the analyzer and optimizers and can be used to make query plans that only | ||
| * work if all nodes are at or above a certain version. This is not required for new language features (commands, | ||
| * functions etc.) as it's fine to fail on the transport layer when a user uses a feature that is not supported on | ||
| * all nodes. Examples where this is required include: | ||
| * <ul> | ||
| * <li> | ||
| * When a previously unsupported ES data type gets support: when an old node participates in the query, | ||
| * the type needs to be treated as unsupported to avoid serialization errors and semantically invalid results. | ||
| * See {@link org.elasticsearch.xpack.esql.core.type.DataType}</li> | ||
| * <li> | ||
| * When an optimization relies on a feature that is only available in newer versions, but it would affect | ||
| * queries that are already supported on older clusters.</li> | ||
| * </ul> | ||
| * | ||
| * <h2>How to make version-sensitive changes</h2> | ||
| * | ||
| * Let's say that we want to create an optimization for ESQL queries like | ||
| * {@code ... | STATS ... BY field | SORT field DESC | LIMIT 10}. The optimization would fuse the commands | ||
| * into a single {@code TopNAggregate}, which corresponds to a new operator that only keeps at most N=10 groups in memory, | ||
| * as only the highest values for {@code field} will make it into the output. | ||
| * <p> | ||
| * Old nodes don't support this new {@code TopNAggregate} class, but they can already execute this query. Therefore, we | ||
| * need to make sure that this optimization is only applied when all nodes are at or above the version that introduced | ||
| * {@code TopNAggregate}, or we will break backward compatibility. | ||
| * <p> | ||
| * To achieve this, the optimizer rule can use the minimum transport version from the | ||
| * {@link org.elasticsearch.xpack.esql.optimizer.LogicalOptimizerContext}. If the minimum version is at or above the required version, | ||
| * the optimization can be applied; otherwise, it must be skipped. | ||
| * <p> | ||
| * The minimum version is available throughout the planning process; it can also be used in the analyzer | ||
| * ({@link org.elasticsearch.xpack.esql.analysis.AnalyzerContext}), or during physical planning and optimization | ||
| * ({@link org.elasticsearch.xpack.esql.planner.mapper.Mapper}, {@link org.elasticsearch.xpack.esql.optimizer.PhysicalOptimizerContext}). | ||
| * | ||
| * <h2>TRIPLE CAUTION: Backports of version-aware query planning PRs</h2> | ||
| * | ||
| * Because we use a single, global minimum transport version, backporting changes that introduce a version-sensitive planning | ||
| * change need to be handled with great care. Backporting such a change to multiple prior versions leads to situations where | ||
| * the minimum transport version indicates that a feature is supported, but some nodes don't actually support it | ||
| * because they're not on the latest patch version. | ||
| * <p> | ||
| * Let's say we introduce an optimization in 9.5.0 that requires a different query plan that older nodes | ||
| * don't support. Backporting this to 9.4.9 is likely fine, as the minimum transport version of a mixed 9.4.x/9.5+ setup | ||
| * (cluster or CCS) will be 9.4.x, so this optimization will be (correctly) disabled on pre-9.4.9 nodes. | ||
| * <p> | ||
| * However, if we also backport to 9.3.7, we have a problem: A 9.3.7/9.4.x/9.5+ setup will have a minimum transport version | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have a solution, or we leave that to the author to figure out when the face this issue? Or is this a case where we simple cannot backport fixes to more than one minor, ever?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To solve this we'd have to be aware of not only the global minimum version, but at least of the minimum version per cluster. (Assuming that each cluster isn't in a rolling upgrade from a version with support to a version without support for a given version-aware planning change. Theoretically, knowing all different versions that participate is needed to solve this in all situations.) I didn't want to write this here because I don't think that's complexity we want to introduce. For the time being, I'd say that backporting a version-aware planning change more than 1 minor version is just not supported. |
||
| * of 9.3.7, so the optimization will be enabled even if the 9.4.x nodes don't support the new plan because they're not patched | ||
| * to 9.4.9 yet. | ||
| * <p> | ||
| * Rolling upgrades from 9.3.7 to 9.4.8 would have the same problem. That's generally ok, as rolling upgrades should be performed | ||
| * to the latest patch version of a given minor version; however, if 9.3.7 gets released before 9.4.9, users may not be | ||
| * able to perform rolling upgrades safely at all. | ||
| * | ||
| */ | ||
| public record Versioned<T>(T inner, TransportVersion minimumVersion) {} | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is not a session, I wonder if it justifies to rename the class to something else.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Thing is, we also have
PlanExecutorthat has overlap in responsibilities, becauseEsqlSessionalso has methods to execute ESQL queries. Maybe the execution stuff should wholly be moved toPlanExecutorand what remains inEsqlSessionwould then be query planning logic - justifying a rename toQueryPlanner, maybe?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, that's something for another PR IMHO.