Skip to content

Collect dimensions only once per tsid when downsampling#145089

Merged
elasticsearchmachine merged 17 commits intoelastic:mainfrom
gmarouli:downsampling-collect-dimensions-once-per-tsid
Apr 6, 2026
Merged

Collect dimensions only once per tsid when downsampling#145089
elasticsearchmachine merged 17 commits intoelastic:mainfrom
gmarouli:downsampling-collect-dimensions-once-per-tsid

Conversation

@gmarouli
Copy link
Copy Markdown
Contributor

We are checking to see if there will be any improvement to downsampling if we only collect the dimension once per tsid.

@gmarouli
Copy link
Copy Markdown
Contributor Author

Buildkite benchmark this with tsdb please

@gmarouli
Copy link
Copy Markdown
Contributor Author

Buildkite benchmark this with tsdb please

@gmarouli
Copy link
Copy Markdown
Contributor Author

Buildkite benchmark this with tsdb please

@gmarouli
Copy link
Copy Markdown
Contributor Author

gmarouli commented Mar 30, 2026

TSDB Benchmark

Aggregate

# Baseline
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7090709], indexed downsampled doc [7090709], failed [0], took [4.1m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [231486], indexed downsampled doc [231486], failed [0], took [2.1m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [119308], indexed downsampled doc [119308], failed [0], took [1.9m]

# Conteder
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7090709], indexed downsampled doc [7090709], failed [0], took [4.1m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [231486], indexed downsampled doc [231486], failed [0], took [2m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [119308], indexed downsampled doc [119308], failed [0], took [1.9m]

Last value

# Baseline
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7089492], indexed downsampled doc [7089492], failed [0], took [3.9m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [229256], indexed downsampled doc [229256], failed [0], took [2m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [116859], indexed downsampled doc [116859], failed [0], took [2m]

# Contender
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7089492], indexed downsampled doc [7089492], failed [0], took [3.8m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [229256], indexed downsampled doc [229256], failed [0], took [2m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [116859], indexed downsampled doc [116859], failed [0], took [1.9m]

@gmarouli
Copy link
Copy Markdown
Contributor Author

Buildkite benchmark this with tsdb please

@gmarouli
Copy link
Copy Markdown
Contributor Author

gmarouli commented Mar 31, 2026

TSDB Benchmark

Aggregate

# Baseline
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7090709], indexed downsampled doc [7090709], failed [0], took [4.1m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [231486], indexed downsampled doc [231486], failed [0], took [2m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [119308], indexed downsampled doc [119308], failed [0], took [1.9m]

# Conteder
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7090709], indexed downsampled doc [7090709], failed [0], took [3.9m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [231486], indexed downsampled doc [231486], failed [0], took [1.9m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [119308], indexed downsampled doc [119308], failed [0], took [1.8m]

Last value

# Baseline
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7089492], indexed downsampled doc [7089492], failed [0], took [3.9m]
hard [[tsdb][0]] successfully sent [116633696], received source doc [229256], indexed downsampled doc [229256], failed [0], took [2m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [116859], indexed downsampled doc [116859], failed [0], took [2m]

# Contender
Shard [[tsdb][0]] successfully sent [116633696], received source doc [7089492], indexed downsampled doc [7089492], failed [0], took [3.7m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [229256], indexed downsampled doc [229256], failed [0], took [1.9m]
Shard [[tsdb][0]] successfully sent [116633696], received source doc [116859], indexed downsampled doc [116859], failed [0], took [1.9m]

@gmarouli gmarouli changed the title [PoC] Collect dimensions only once per tsid when downsampling Collect dimensions only once per tsid when downsampling Apr 1, 2026
@gmarouli
Copy link
Copy Markdown
Contributor Author

gmarouli commented Apr 1, 2026

Buildkite benchmark this with tsdb please

@gmarouli gmarouli added :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data >enhancement labels Apr 1, 2026
@gmarouli gmarouli marked this pull request as ready for review April 1, 2026 10:26
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @gmarouli, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

assert value.equals(this.lastValue) != false
: "Dimension value changed without tsid change [" + value + "] != [" + this.lastValue + "]";
}
assert docValueCount == 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? It should hold in practice, but there may be cases with multi-values (e.g. ips).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have a list of objects as well, and use that instead? One of them should be set.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel strongly about it. It's there because the original code had an equivalent assertion. The collectOnce(docValues.nextValue()); has the assertion that it should be empty. If a dimension had multiple values, when reading the second value it would trigger this assertion.

If we think this is not necessary I prefer to remove it. Do you agree?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do support multi-values in dimensions. The logic was added fairly recently. We may've missed updating the assert but the downsample index still gets all values iiuc?

Copy link
Copy Markdown
Contributor Author

@gmarouli gmarouli Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in this case it will only keep the last:

   void collectOnce(final Object value) {
        assert isEmpty;
        Objects.requireNonNull(value);
        this.lastValue = value;
        this.isEmpty = false;
    }

where var value = docValues.nextValue();, to the best of my understanding this is just one of the values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open a PR to fix this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixbarny we did miss this one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change #112645, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix: #145458

@gmarouli
Copy link
Copy Markdown
Contributor Author

gmarouli commented Apr 2, 2026

TSDB Benchmark

Downsampling config Baseline Conteder
Aggregate 1m (7090709) 3.8m 4.1m
Aggregate 1h (231486) 1.9m 1.8m
Aggregate 1d (119308) 1.8m 1.8m
Last value 1m (7089492) 3.9m 3.7m
Last value 1h (229256) 2m 2.2m
Last value 1d (116859) 1.9m 2.1m

I think flaky results again are showing the contender being worse because we rarely see aggregate 1m to reach 3.8m.

@gmarouli
Copy link
Copy Markdown
Contributor Author

gmarouli commented Apr 2, 2026

Buildkite benchmark this with tsdb please

@elasticmachine
Copy link
Copy Markdown
Collaborator

elasticmachine commented Apr 2, 2026

💚 Build Succeeded

This build ran two tsdb benchmarks to evaluate performance impact of this PR.

History

@gmarouli
Copy link
Copy Markdown
Contributor Author

gmarouli commented Apr 3, 2026

TSDB Benchmark

Downsampling config Baseline Conteder
Aggregate 1m (7090709) 4.3m 4m
Aggregate 1h (231486) 2m 2m
Aggregate 1d (119308) 1.9m 1.9m
Last value 1m (7089492) 4m 3.9m
Last value 1h (229256) 2m 2m
Last value 1d (116859) 1.9m 2.1m

@gmarouli gmarouli requested a review from kkrik-es April 3, 2026 07:57
@gmarouli gmarouli added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Apr 6, 2026
@elasticsearchmachine elasticsearchmachine merged commit fff9a10 into elastic:main Apr 6, 2026
35 checks passed
@gmarouli gmarouli deleted the downsampling-collect-dimensions-once-per-tsid branch April 6, 2026 09:14
mromaios pushed a commit to mromaios/elasticsearch that referenced this pull request Apr 9, 2026
We are checking to see if there will be any improvement to downsampling
if we only collect the dimension once per tsid.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants