Skip to content

[ES|QL] Enable CCS tests for subqueries#137776

Merged
fang-xing-esql merged 26 commits intoelastic:mainfrom
fang-xing-esql:enable-subquery-ccs-tests
Dec 5, 2025
Merged

[ES|QL] Enable CCS tests for subqueries#137776
fang-xing-esql merged 26 commits intoelastic:mainfrom
fang-xing-esql:enable-subquery-ccs-tests

Conversation

@fang-xing-esql
Copy link
Member

@fang-xing-esql fang-xing-esql commented Nov 8, 2025

This feature is behind snapshot, and not formally released yet.

This is to support #136035, and it is also to support https://github.com/elastic/esql-planning/issues/88

This PR addresses the following items:

  • Enabled CCS tests with subqueries in the following tests:

    • Added CrossClusterSubqueryIT
    • Added CrossClusterSubqueryUnavailableRemotesIT
    • Updated MultiClusterSpecIT, convert queries with subqueries to use remote index patterns, by recognizing multiple from commands in the query
    • Updated subquery.csv-spec to allow more queries to run with MultiClusterSpecIT, removed the fork tags, and removed metadata from some of the queries.
  • Prune subqueries that do not have a valid index pattern found for them, so that the query can continue as far as there is valid IndexResolution for a subset of the subqueries.

    • Updated EsqlSession to recognize subquery index patterns that do not have a valid IndexResolution found, and mark them as EMPTY_SUBQUERY
    • A new rule PruneEmptyUnionAllBranch is added in Analyzer to prune empty subqueries.

Item that will be addressed in the next PRs:

  • We still have some restrictions of subquery(with remote index pattern) with lookup join, it is marked as TODO in the CrossClusterSubqueryIT, and it will be addressed in a separate PR.
  • The status and metrics collected in EsqlExecutionInfo may not reflect the situation of fork or subquery correctly, it will be addressed in a separate PR, as we need to clarify the execution time behaviors when something went wrong with one subquery, how do we process the other subqueries.
  • It is likely future change is needed to support CPS with subqueries.

** Prerequisite**:

This PR ensures consistent results for subquery and fork with CCS.

@fang-xing-esql fang-xing-esql added the test-release Trigger CI checks against release build label Nov 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

@fang-xing-esql fang-xing-esql marked this pull request as ready for review November 12, 2025 00:37
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

@smalyshev smalyshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed part of it, will finish up tomorrow.

for (int i = 0; i <= input.length() - delimiterLength; i++) {
char c = input.charAt(i);

if (c == '(') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we don't have (s inside strings anywhere, otherwise this I think would break.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have fancy queries like this in CsvTests yet, and we don't check for tokens like from, metadata , |, ), '(' either. This is a very lightweight parsing utility targeting to existing subqueries in CsvTests. If there are more fancy subqueries added to CsvTests in the future, it needs to be enhanced.

/**
* Convert index patterns and subqueries in FROM commands to use remote indices.
*/
private static String convertSubqueryToRemoteIndices(String testQuery) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about SET commands that can precede FROM? Are those supported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SET is not tested, and specially characters that could be a token used by query parser is not tested here either. We are not implementing a full query parser here. Just to make sure all the queries in CsvTests with the subquery capability works fine in this PR. If more fancy queries are added into CsvTests, we will need to enhance it.

for (String indexPatternOrSubquery : indexPatternsAndSubqueries) {
// remove the from keyword if it's there
indexPatternOrSubquery = indexPatternOrSubquery.strip();
if (indexPatternOrSubquery.toLowerCase(Locale.ROOT).startsWith("from ")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to imply every element of indexPatternsAndSubqueries can start with FROM, but there could be only one FROM and then comma-separated list of expressions... Not sure if it's important for particular queries in the tests, but seems incorrect.

Copy link
Member Author

@fang-xing-esql fang-xing-esql Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each indexPatternOrSubquery could be one of the following:

  • from + index patterns: from index1
  • index patterns: index2
  • subqueries: (from index3 | where a > 0), subqueries will be processed recursively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, ate you saying you can do FROM index1, FROM index2, FROM index3?

Copy link
Member Author

@fang-xing-esql fang-xing-esql Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, ate you saying you can do FROM index1, FROM index2, FROM index3?

I don't think this is a valid CsvTests query, this method might be able to deal with it, however it will not see this pattern in queries from CsvTests...

// take care of removing the subquery during analysis.
// If all subqueries have invalid index resolution, we should fail in Analyzer's verifier.
if (r.indexResolution.isEmpty() == false // it is not a row
&& r.indexResolution.size() > 1 // there is a subquery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can subquery be the only clause in FROM? I understand it's kinda weird to do it, but is it possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a subquery is the only clause in a FROM, it will be merged into the main query by parser, there is a test here.

String clusterAlias = entry.getKey();
String indexExpression = entry.getValue();
EsqlExecutionInfo.Cluster cluster = executionInfo.getCluster(clusterAlias);
if (indexExpression.equals(cluster.getIndexExpression())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this check - if it's the entry under this cluster's alias, why we need to check again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clustersWithInvalidIndexResolutions is a map, its key is cluster alias and value is all of the index patterns that are marked as invalid, because they are missing on that cluster.

If all of the indices are missing on a cluster, the cluster is marked as SKIPPED here in EsqlExwecutionInfo. If some of the indices are available, and some are missing, the cluster is not marked as SKIPPED. We do this check here mainly for the situations that a remote cluster is referenced by multiple subqueries.

required_capability: subquery_in_from_command

FROM employees, (FROM sample_data | EVAL x = client_ip::keyword ) metadata _index
FROM employees, (FROM sample_data metadata _index | EVAL x = client_ip::keyword ) metadata _index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this check:

        assumeFalse("can't test with _index metadata", (remoteMetadata == false) && hasIndexMetadata(testCase.query));

wouldn't it interfere with this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. I'd like to keep some metadata tests for subqueries here. So I removed metadata option from a few queries in this file and hope to cover more subquery tests in MultiClusterSpecIT.

@@ -3,7 +3,6 @@
//

subqueryInFrom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for my education - if we have two sets of metadata fields, like this: FROM index1, (FROM index2 METADATA _index, _id) METADATA _index, _version - what is the resulting field set here? Is it a union, an intersection or something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an UnionAll of the results of all subqueries, if there are duplicated records among the subqueries, the duplicates are not removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not asking about rows, but about column names - if there's _index in subquery and in the main expression, how does it work? Does main subquery _index only fills the column for actual indices, not subqueries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test has metadata for both the main and subquery, both indices referenced by main and subquery return.

) {
// check if all clusters involved have skip_unavailable = true
boolean allClustersSkipUnavailable = true;
var groupedIndices = indicesGrouper.groupIndices(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could no longer rely on indicesGrouper as it is not going to be available for CPS queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to open a pr with alternative for it this week. I will keep you updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged #138396
You should be able to access grouped original or concrete indices from EsRelation or EsIndex now.
Please let me know if you need any help with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference to IndicesExpressionGrouper when identifying empty subquery is removed.

}
// Check if a subquery need to be pruned. If some but not all the subqueries has invalid index resolution,
// and all the clusters referenced by the subquery that has invalid index resolution have skipUnavailable=true,
// try to prune it by setting IndexResolution to EMPTY_SUBQUERY. Analyzer.PruneEmptyUnionAllBranch will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should also add a warning with remote and failure.

Copy link
Member Author

@fang-xing-esql fang-xing-esql Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should also add a warning with remote and failure.

The empty subqueries with no valid index pattern found after field caps is logged in debug log in this PR. EsqlExecutionInfo seems to be a good place to add some details(for both planning and execution) at subquery level. It is added as a subtask here.

Copy link
Contributor

@idegtiarenko idegtiarenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not introduce more usages of indicesGrouper since it is not going to be available in CPS.

@fang-xing-esql
Copy link
Member Author

Thank you so much for reviewing @smalyshev @idegtiarenko ! Stas's comments have been addressed(let me know if I missed anything), and I'll wait for Ievgen's PR and then adjust the usage of indicesGrouper.

// miss the results from a branch and return wrong results. Both RUNNING and SUCCESSFUL mean we should continue
// processing the next compute on remote cluster in the queue. If it is partial, it is fine to skip the subsequent
// branches as something went wrong already.
if (clusterStatus != EsqlExecutionInfo.Cluster.Status.RUNNING
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smalyshev I'd like to discuss a bit more with you about the change here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would redefine how statuses work, I am not sure that's the right thing to do. SUCCESSFUL means "we are done with this cluster and all went well", PARTIAL means "we are done with this cluster and we had to skip some data". I don't think we should put cluster into SUCCESSFUL state before the main query is finished. Maybe we need more statuses here?

Copy link
Contributor

@smalyshev smalyshev Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I am not sure this:

If it is partial, it is fine to skip the subsequent branches as something went wrong already.

is correct. If one subquery partially failed, but we did not set the cluster to SKIPPED, I am not convinced it's ok to skip the other subqueries. In fact, we may have to revisit when we set clusters to SKIPPED - e.g. if there's a missing index in one of the subqueries, and we have partial results on, it shouldn't skip this cluster in all other subqueries I think. Also, if one index is corrupt, it doesn't mean other subqueries will fail. Again, we may need to consider maybe we need more fine-grained checks here now that we have subqueries.

Copy link
Member Author

@fang-xing-esql fang-xing-esql Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @smalyshev offline, today the way that the cluster status is updated in EsqlExecutionInfo does not quite work with the branch model of fork/subquery. The SUCCESSFUL status in EsqlExecutionInfo was set at branch level(the same for the metrics saved in EsqlExecutionInfo), not at the whole query level, so we might lose data from branches if there is overlap on the remote cluster between branches.

This needs more discussion, and should be addressed in a separate PR. We will need to find a better way to set cluster status and metrics in EsqlExecutionInfo. I'm going to remove this change from this PR, and make this PR add stable tests for subqueries with CCS only(avoid overlap remote cluster between subqueries(branches) for now).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the PR that ensures subquery/fork return deterministic results.

@fang-xing-esql
Copy link
Member Author

We should not introduce more usages of indicesGrouper since it is not going to be available in CPS.

@idegtiarenko The usage of indicesGrouper added by this PR has been removed, could you take a look again to see if there is anything left that might have negative impact to CPS?

@idegtiarenko idegtiarenko self-requested a review December 2, 2025 12:00
.anyMatch(ir -> ir.isValid() && ir.get().originalIndices().get(clusterAlias) != null);
if (clusterHasValidIndex == false) {
String errorMsg = "no valid indices found in any subquery " + EsqlCCSUtils.inClusterName(clusterAlias);
LOGGER.debug(errorMsg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this log statement here? If so, could we convert it to logger formatting instead of unconditional string concatenation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, this message is attached to the failure/warning message to skipped cluster as well.

@idegtiarenko idegtiarenko self-requested a review December 3, 2025 14:10
Comment on lines +616 to +617
String errorMsg = "no valid indices found in any subquery {}";
LOGGER.debug(errorMsg, EsqlCCSUtils.inClusterName(clusterAlias));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String errorMsg = "no valid indices found in any subquery {}";
LOGGER.debug(errorMsg, EsqlCCSUtils.inClusterName(clusterAlias));
LOGGER.debug("no valid indices found in any subquery {}", EsqlCCSUtils.inClusterName(clusterAlias));

Copy link
Contributor

@idegtiarenko idegtiarenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from index resolution point of view 👍

@fang-xing-esql fang-xing-esql merged commit 733eee4 into elastic:main Dec 5, 2025
35 checks passed
@fang-xing-esql
Copy link
Member Author

Thank you for reviewing @idegtiarenko @smalyshev !

@fang-xing-esql fang-xing-esql deleted the enable-subquery-ccs-tests branch January 30, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) test-release Trigger CI checks against release build v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants