Skip to content

Partition time-series source#140475

Merged
dnhatn merged 9 commits intoelastic:mainfrom
dnhatn:partition-time-series-source
Jan 21, 2026
Merged

Partition time-series source#140475
dnhatn merged 9 commits intoelastic:mainfrom
dnhatn:partition-time-series-source

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Jan 9, 2026

Today, for rate aggregations, we can't partition a target shard by DOC or SEGMENT, only by SHARD. Because of this, we rely on ROUND_TO from time_bucket to partition the target shard into multiple slices. However, the number of slices can be either too many or too few. Too few large slices might under-utilize the CPUs and require a large amount of memory to buffer data points. Many smaller slices solve these issues but add significant overhead.

This change replaces the ROUND_TO partitioning with time-based partitioning for time-series sources. One potential overhead of this method compared to ROUND_TO is that, with ROUND_TO, we do not need to perform round_to for the timestamp, as the query source provides this constant.

I benchmarked this change with:

  • 100_hosts_1h: 350ms -> 220ms
  • 100_hosts_5m: 280ms -> 310ms (the 30ms difference is due to round_to, which the previous partitioning could avoid).
  • 10k_hosts_1h: 21s -> 20s
  • 10k_hosts_5m: 125s -> 60s

Although there is a performance regression for 5m queries, I think we should proceed with this change, as overall it should significantly speed up queries and manage memory better. I will look into reducing the overhead of round_to for time-series.

Related to #139186

@dnhatn dnhatn force-pushed the partition-time-series-source branch from 800ee11 to 41dcfd1 Compare January 11, 2026 06:43
@elastic elastic deleted a comment from elasticmachine Jan 11, 2026
@dnhatn dnhatn force-pushed the partition-time-series-source branch 3 times, most recently from 42d1175 to 4bc6ea2 Compare January 16, 2026 22:00
@dnhatn dnhatn force-pushed the partition-time-series-source branch 8 times, most recently from 81d2883 to 5a07957 Compare January 20, 2026 20:19
@elastic elastic deleted a comment from elasticmachine Jan 20, 2026
@dnhatn dnhatn force-pushed the partition-time-series-source branch 5 times, most recently from f141f16 to 7c758db Compare January 21, 2026 01:28
@dnhatn dnhatn force-pushed the partition-time-series-source branch from 7c758db to 1b4cd60 Compare January 21, 2026 04:48
@dnhatn dnhatn added :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL >enhancement labels Jan 21, 2026
@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn requested a review from kkrik-es January 21, 2026 05:49
);

public static final Setting<ByteSizeValue> RATE_BUFFER_SIZE = new Setting<>("esql.rate_buffer_size", settings -> {
long oneThird = JvmInfo.jvmInfo().getMem().getHeapMax().getBytes() / 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

half? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be less aggressive, e.g. 1/10? 2 queries will trip the circuit breaker, or I misread this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I updated the estimate and reduced this to 1/10.

}
String filter2 = "";
if (queryEndTs != null || randomBoolean()) {
filter2 = "| WHERE @timestamp <= \"" + (queryEndTs != null ? queryEndTs : maxTimestampFromData) + "\"";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRANGE works too? Maybe add a test for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for TRANGE in dbe959a

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very promising! Some minor comments and suggestions.

@kkrik-es
Copy link
Contributor

++ this looks like the right tradeoff. The performance for 100_hosts_5m is already decent, and we can improve it separately.

@dnhatn dnhatn marked this pull request as ready for review January 21, 2026 21:27
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@dnhatn
Copy link
Member Author

dnhatn commented Jan 21, 2026

Thanks Kostas!

@dnhatn dnhatn added v9.3.1 auto-backport Automatically create backport pull requests when merged labels Jan 21, 2026
@dnhatn dnhatn merged commit 54ee110 into elastic:main Jan 21, 2026
33 of 35 checks passed
@dnhatn dnhatn deleted the partition-time-series-source branch January 21, 2026 22:01
szybia added a commit to szybia/elasticsearch that referenced this pull request Jan 21, 2026
…-tests

* upstream/main: (104 commits)
  Partition time-series source (elastic#140475)
  Mute org.elasticsearch.xpack.esql.heap_attack.HeapAttackSubqueryIT testManyRandomKeywordFieldsInSubqueryIntermediateResultsWithSortManyFields elastic#141083
  Reindex relocation: skip nodes marked for shutdown (elastic#141044)
  Make fails on fixture caching not fail image building (elastic#140959)
  Add multi-project tests for get and list reindex (elastic#140980)
  Painless docs overhaul (reference) (elastic#137211)
  Panama vector implementation of codePointCount (elastic#140693)
  Enable PromQL in release builds (elastic#140808)
  Update rest-api-spec for Jina embedding task (elastic#140696)
  [CI] ShardSearchPhaseAPMMetricsTests testUniformCanMatchMetricAttributesWhenPlentyOfDocumentsInIndex failed (elastic#140848)
  Combine hash computation with bloom filter writes/reads (elastic#140969)
  Refactor posting iterators to provide more information (elastic#141058)
  Wait for cluster to recover to yellow before checking index health (elastic#141057) (elastic#141065)
  Fix repo analysis read count assertions (elastic#140994)
  Fixed a bug in logsdb rolling upgrade sereverless tests involving par… (elastic#141022)
  Fix readiness edge case on startup (elastic#140791)
  PromQL: fix quantile function (elastic#141033)
  ignore `mmr` command for check (in development) (elastic#140981)
  Use Double.compare to compare doubles in tdigest.Sort (elastic#141049)
  Migrate third party module tests using legacy test clusters framework (elastic#140991)
  ...
@dnhatn
Copy link
Member Author

dnhatn commented Jan 21, 2026

💚 All backports created successfully

Status Branch Result
9.3

Questions ?

Please refer to the Backport tool documentation

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jan 21, 2026
Today, for rate aggregations, we can't partition a target shard by DOC
or SEGMENT, only by SHARD. Because of this, we rely on ROUND_TO from
time_bucket to partition the target shard into multiple slices. However,
the number of slices can be either too many or too few. Too few large
slices might under-utilize the CPUs and require a large amount of memory
to buffer data points. Many smaller slices solve these issues but add
significant overhead.

This change replaces the ROUND_TO partitioning with time-based
partitioning for time-series sources. One potential overhead of this
method compared to ROUND_TO is that, with ROUND_TO, we do not need to
perform round_to for the timestamp, as the query source provides this
constant.

(cherry picked from commit 54ee110)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
dnhatn added a commit that referenced this pull request Jan 22, 2026
Today, for rate aggregations, we can't partition a target shard by DOC
or SEGMENT, only by SHARD. Because of this, we rely on ROUND_TO from
time_bucket to partition the target shard into multiple slices. However,
the number of slices can be either too many or too few. Too few large
slices might under-utilize the CPUs and require a large amount of memory
to buffer data points. Many smaller slices solve these issues but add
significant overhead.

This change replaces the ROUND_TO partitioning with time-based
partitioning for time-series sources. One potential overhead of this
method compared to ROUND_TO is that, with ROUND_TO, we do not need to
perform round_to for the timestamp, as the query source provides this
constant.

(cherry picked from commit 54ee110)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
);

String filter = randomFrom(
"\"2023-11-20T12:16:03.360Z\" <= @timestamp AND @timestamp <= \"2023-11-20T12:57:02.250Z\"",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First one should be >= ?

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jan 23, 2026
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jan 24, 2026
dnhatn added a commit that referenced this pull request Jan 24, 2026
The competitive benchmark shows a significant performance regression 
from #140475. This is likely due to the high number of CPUs in the 
benchmark, which results in many small slices and adds significant
overhead.

It is better to revert this change for now and adjust the partitioning 
for cases with high CPU and more than one shard before merging again.

Relates #14047
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jan 24, 2026
The competitive benchmark shows a significant performance regression
from elastic#140475. This is likely due to the high number of CPUs in the
benchmark, which results in many small slices and adds significant
overhead.

It is better to revert this change for now and adjust the partitioning
for cases with high CPU and more than one shard before merging again.

Relates elastic#14047

(cherry picked from commit 1146b88)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
elasticsearchmachine pushed a commit that referenced this pull request Jan 24, 2026
The competitive benchmark shows a significant performance regression
from #140475. This is likely due to the high number of CPUs in the
benchmark, which results in many small slices and adds significant
overhead.

It is better to revert this change for now and adjust the partitioning
for cases with high CPU and more than one shard before merging again.

Relates #14047

(cherry picked from commit 1146b88)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/k8s-timeseries.csv-spec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >enhancement :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL Team:StorageEngine v9.3.1 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants