[ES|QL|DS] Wire parallel parsing into production for text formats by costin · Pull Request #143997 · elastic/elasticsearch

costin · 2026-03-11T09:43:46Z

ParallelParsingCoordinator was fully built and tested but never called from production code — text files (CSV, NDJSON) were always read single-threaded per driver regardless of size. This wires SegmentableFormatReader implementations through parallelRead() so large files are split into byte-range segments and parsed concurrently.

Adds a parsing_parallelism QueryPragma (defaults to allocated processors, overridable per query) and propagates it through the operator factory. Parallel parsing is gated on file size and skipped when row limits are pushed down.

Developed using AI-assisted tooling

ParallelParsingCoordinator was fully built and tested but never called from production code. Text files were always read single-threaded per driver regardless of size. Route SegmentableFormatReader implementations (CSV, NDJSON) through ParallelParsingCoordinator.parallelRead() so large files are split into byte-range segments and parsed concurrently. Closes elastic/esql-planning#341

elasticsearchmachine · 2026-03-11T09:44:11Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2026-03-11T09:44:11Z

Hi @costin, I've created a changelog YAML for you.

The first segment of parallel parsing used readSplit() with resolvedAttributes, which skips header parsing. For CSV files this meant the header line was treated as data. Use read() for the first segment so the format reader handles its own header internally, and readSplit() only for subsequent segments.

bpintea · 2026-03-11T17:55:08Z

...c/main/java/org/elasticsearch/xpack/esql/datasources/AsyncExternalSourceOperatorFactory.java

 */
 public class AsyncExternalSourceOperatorFactory implements SourceOperator.SourceOperatorFactory {

+    private static final Logger logger = LogManager.getLogger(AsyncExternalSourceOperatorFactory.class);


Nit: not used.

…elocations * upstream/main: (54 commits) [ES|QL|DS] Wire parallel parsing into production for text formats (elastic#143997) ESQL: Allow EXTERNAL commands be run part of the CsvTests suite (elastic#143970) [ESQL] Push stats to external source via metadata (elastic#143940) Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:approximation.Approximate stats with stats where} elastic#144051 Refactored SortedNumericDocValuesSyntheticFieldLoader into a Layer (elastic#143912) Enable extended doc_values params feature flag in RandomizedRollingUpgradeIT (elastic#143918) Mute org.elasticsearch.xpack.esql.qa.multi_node.EsqlSpecIT test {csv-spec:approximation.Approximate stats with sample} elastic#144022 Ensure we use float values for rolling upgrade float vectors (elastic#144032) Remove sensitive info from reindex task description (elastic#143635) Fix HistogramUnionState.equals (elastic#143990) Use dedicated IndexRouting API in ShardSplittingQuery (elastic#143776) Engine/Store DistributedArchitectureGuide doc (elastic#143818) Mute org.elasticsearch.snapshots.ConcurrentSnapshotsIT testDeletesAreBatched elastic#144034 Avoid serializing exceptions as JSON in remote write endpoint (elastic#143987) allow testLoadDocSequenceReturnsCorrectResultsText to circuit break, it happens in serverless occasionally (elastic#144023) [ESQL] Adds memory accounting to GroupedLimitOperator (elastic#143941) Adjust ESIntegTestCase.getLiveDocs method to account for pruned sequence numbers (elastic#143999) Support target bucket count in `TBUCKET` with explicit from/to date range (elastic#142747) TSDBDocValuesFormatSingleNodeTests with and without synthetic id (elastic#144002) Fix circuit breaker leak in BreakingTDigestHolder (elastic#143873) ...

…astic#143997) * [ES|QL|DS] Wire parallel parsing into production ParallelParsingCoordinator was fully built and tested but never called from production code. Text files were always read single-threaded per driver regardless of size. Route SegmentableFormatReader implementations (CSV, NDJSON) through ParallelParsingCoordinator.parallelRead() so large files are split into byte-range segments and parsed concurrently. Closes elastic/esql-planning#341 * Update docs/changelog/143997.yaml * Fix parallel parsing header handling for CSV The first segment of parallel parsing used readSplit() with resolvedAttributes, which skips header parsing. For CSV files this meant the header line was treated as data. Use read() for the first segment so the format reader handles its own header internally, and readSplit() only for subsequent segments.

costin added >enhancement :Analytics/ES|QL AKA ESQL v9.4.0 ES|QL|DS ES|QL datasources labels Mar 11, 2026

costin requested a review from bpintea March 11, 2026 09:43

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 11, 2026

Update docs/changelog/143997.yaml

2ab5b73

costin enabled auto-merge (squash) March 11, 2026 09:44

costin added 2 commits March 11, 2026 14:55

Merge branch 'main' into esql/wire-parallel-parsing

95c4a0f

bpintea approved these changes Mar 11, 2026

View reviewed changes

costin merged commit ad604d4 into elastic:main Mar 11, 2026
36 checks passed

costin deleted the esql/wire-parallel-parsing branch March 11, 2026 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ES|QL|DS] Wire parallel parsing into production for text formats#143997

[ES|QL|DS] Wire parallel parsing into production for text formats#143997
costin merged 4 commits intoelastic:mainfrom
costin:esql/wire-parallel-parsing

costin commented Mar 11, 2026 •

edited

Loading

Uh oh!

elasticsearchmachine commented Mar 11, 2026

Uh oh!

elasticsearchmachine commented Mar 11, 2026

Uh oh!

bpintea Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

costin commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 11, 2026

Uh oh!

elasticsearchmachine commented Mar 11, 2026

Uh oh!

bpintea Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

costin commented Mar 11, 2026 •

edited

Loading