Skip to content

CSV IT spec#142585

Merged
idegtiarenko merged 37 commits intoelastic:mainfrom
idegtiarenko:csv_it
Feb 27, 2026
Merged

CSV IT spec#142585
idegtiarenko merged 37 commits intoelastic:mainfrom
idegtiarenko:csv_it

Conversation

@idegtiarenko
Copy link
Copy Markdown
Contributor

@idegtiarenko idegtiarenko commented Feb 17, 2026

This change aims to replace CsvTests (that uses query engine with lot of stubs) with CsvIT (build on top of single node InternalTestCluster).

This allows:

  • reduce amount of test stubs and rely on actual production code when executing tests
  • reduce amount of assumptions: existing test skips ~1.6k or ~31% of scenarios. The new one supports much wider set of features (including views, subqueries and lookup join) and will likely only need to skip infference related spec (5-10%)
  • debug wider set of tests from ide
  • speedup test execution (~1m20s vs ~3m15s): the new test is faster as it only indexes data ones opposed to existing test that has to read data from resource files every time it is queried.

@idegtiarenko idegtiarenko added >test Issues or PRs that are addressing/adding tests Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.4.0 labels Feb 17, 2026
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between CsvIT and the single-node EsqlSpecIT?

Does the InternalTestCluster run on the same jvm as the test suite and thus we can debug easier without starting Debug Elasticsearch from within IntelliJ, or something like that?

@idegtiarenko
Copy link
Copy Markdown
Contributor Author

What's the difference between CsvIT and the single-node EsqlSpecIT?

EsqlSpecIT is blackbox, it runs in separate jvm and we query it via rest only. CsvIT runs in in the same jvm, we query transport dirrectly.

@alex-spies
Copy link
Copy Markdown
Contributor

Can confirm these run faster than the regular csv tests, btw (twice as fast!), so for that alone this change is quite nice already.

CsvTests:
image

CsvIT:
image

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't provide a deep review, but agree with the approach here. As mentioned above, I can see that this is a big improvement.

;

count_over_time_of_date_nanos_promql
count_over_time_of_date_nanos_promql-Ignore
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we disabling this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is implicitly skipped by relying on a non existing capability promql_date_nanos_support_v0.
I believe it is more obvious to mark it Ignore in order to skip.

@astefan astefan self-requested a review February 18, 2026 14:02
Copy link
Copy Markdown
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @idegtiarenko, LGTM

I left just one minor comment.

I think it could still make sense to keep the old CsvTests for now, just because they are not too expensive in general, compared to the total CI execution time, and because they could give us some flexibility for some specific tests (e.g. they already have some manipulation of the optimization rules in there, and we could extend that to randomize the optimizations in the future)

EMPLOYEES_NOT_REHIRED
);

public static final List<InferenceConfig> INFERENCE_CONFIGS = List.of(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see these are only used in findInferenceConfigByName(), but not at creation time (eg. line 882). Is it on purpose? Can't we iterate on these in createInferenceEndpoints?

idegtiarenko and others added 4 commits February 20, 2026 16:07
# Conflicts:
#	x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/EsqlSpecTestCase.java
#	x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java
#	x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvAssert.java
#	x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestUtils.java
#	x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java
@idegtiarenko idegtiarenko marked this pull request as ready for review February 20, 2026 15:31
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, consider a consistent and comprehensive javadoc for CsvIT class. We already have people confused about how those csv-spec files are used in multiple unit tests/integration tests and what have you. Look at the example in CsvTests for inspiration. Thank you.

import static org.hamcrest.Matchers.greaterThan;
import static org.hamcrest.Matchers.hasSize;

public class CsvIT extends ESTestCase {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class needs a different name and a very good Javadoc to explain the difference between it and CsvTests. This will confuse a lot of us on the minor difference between these two and which one is doing what

new ViewConfig("employees_rehired"),
new ViewConfig("employees_not_rehired")
);
).collect(toMap(ViewConfig::name, Function.identity()));;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double semi-colon


private static void loadEnrichPolicy(EnrichPolicyResolver.LookupRequest request) {
for (var name : request.policyNames) {
enrich.maybeLoad(CsvTestsDataLoader.ENRICH_POLICIES.get(name));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here could you adjust for the possibility of .get(name) returning null?

}

private static void loadInference(GetInferenceModelAction.Request request) {
inference.maybeLoad(INFERENCE_CONFIGS.get(request.getInferenceEntityId()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here could you adjust for the possibility of .get(request.....) returning null?

| STATS foo = MAX(num), count = COUNT(num) BY country
| WHERE country IS NULL
;
warning:Line 1:23 (in view [employees_not_rehired]): evaluation of [is_rehired == false] failed, treating result as null. Only first 20 failures recorded.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the line:column values? Imho, these are an essential part of the UX when writing ESQL queries.

Stream.of(request.indices()).flatMap(pattern -> {
assert pattern.contains("<") == false : "Date-math is not supported in test";
if (pattern.contains("*")) {
assert pattern.endsWith("*") : "Only suffix patterns are supported in test";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this restriction? Can you add a comment in code for this, please?

@idegtiarenko idegtiarenko merged commit 4afc895 into elastic:main Feb 27, 2026
35 checks passed
@idegtiarenko idegtiarenko deleted the csv_it branch February 27, 2026 07:42
PeteGillinElastic pushed a commit to PeteGillinElastic/elasticsearch that referenced this pull request Feb 27, 2026
szybia added a commit to szybia/elasticsearch that referenced this pull request Feb 27, 2026
…cations

* upstream/main: (35 commits)
  Create ARM bulk sqrI8 implementation (elastic#142461)
  Rework get-snapshots predicates (elastic#143161)
  Refactor downsampling fetchers and producers (elastic#140357)
  ESQL: Unmute test and add extra logging to generative test validation (elastic#143168)
  Fix metadata fields being nullified/loaded by unmapped_fields setting (elastic#143155)
  Determine remote cluster version (elastic#142494)
  Populate failure message for aborted clones (elastic#143206)
  Allow kibana_system role to read and manage logs streams (elastic#143053)
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:eval.DocsLength} elastic#143224
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:eval.DocsByteLength} elastic#143223
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:docs.DocsBitLength} elastic#143222
  Fix FloatVectorScorerSupplier bulkScore bug (elastic#143211)
  ESQL: Add data node execution for external sources (elastic#143209)
  [ESQL] Cleanup commands docs (elastic#143058)
  [ML]Fix latest transforms disregarding updates when sort and sync fields are non-monotonic (elastic#142856)
  Mute org.elasticsearch.index.mapper.IpFieldMapperTests testSyntheticSourceInObject elastic#143212
  Tests: Fix StoreDirectoryMetricsIT (elastic#143084)
  ESQL: Add distribution strategy for external sources (elastic#143194)
  CSV IT spec (elastic#142585)
  Fix VectorScorerOSQBenchmark.score to read corrections properly (elastic#143137)
  ...
tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test Issues or PRs that are addressing/adding tests v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants