Skip to content

Add support for multi-value dimensions#112645

Merged
elasticsearchmachine merged 18 commits intoelastic:mainfrom
felixbarny:allow-array-dimensions-tsdb
Sep 23, 2024
Merged

Add support for multi-value dimensions#112645
elasticsearchmachine merged 18 commits intoelastic:mainfrom
felixbarny:allow-array-dimensions-tsdb

Conversation

@felixbarny
Copy link
Copy Markdown
Member

@felixbarny felixbarny commented Sep 9, 2024

Closes #110387

Having this in now affords us not having to introduce version checks in the ES exporter later. We can simply use the same serialization logic for metric attributes as we do for other signals.
This also enables us to properly map *.ip fields to the ip field type as ip fields containing a list of IPs are not converted to a comma-separated list.

Closes elastic#110387

Having this in now affords us not having to introduce version checks in the ES exporter later.
@felixbarny felixbarny added the :StorageEngine/Mapping The storage related side of mappings label Sep 9, 2024
@felixbarny felixbarny requested a review from a team as a code owner September 9, 2024 06:27
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Sep 9, 2024

Documentation preview:

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine elasticsearchmachine added Team:StorageEngine external-contributor Pull request authored by a developer outside the Elasticsearch team labels Sep 9, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @felixbarny, I've created a changelog YAML for you.

@kkrik-es
Copy link
Copy Markdown
Member

I'm still on the fence for adding IP fields to the tsid, it does seem wasteful.. I see that they're part of the ECS template, but this will likely penalize indexing performance. Do we believe there's value in doing so? @martijnvg fyi.

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM.

if (values.size() == 1) {
// converts the immutable list that's optimized for the common case of having only one value to a mutable list
BytesReference previousValue = values.get(0);
values = new ArrayList<>(4);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the multi valued case, do you often expect 4 ip values?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is assuming that when there are multiple values, we'll likely have less than 10 (the default size for ArrayLists), but probably more than 2.

@martijnvg martijnvg added auto-backport-and-merge v8.16.0 and removed external-contributor Pull request authored by a developer outside the Elasticsearch team labels Sep 13, 2024
@felixbarny felixbarny requested a review from carsonip September 13, 2024 16:36
Copy link
Copy Markdown
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried to understand the code change, but I built an image from this PR and it works with our otel setup.

Indexed document in ES with an array dimension named "arraydimension"

{
  "_index": ".ds-metrics-generic.otel-default-2024.09.13-000001",
  "_id": "cEnVhVfwBBT5RW1zAAABkezcIVI",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2024-09-13T19:28:55.122Z",
    "attributes": {
      "arraydimension": [
        "bar",
        "foo"
      ],
      "event.outcome": "success",
      "metricset.interval": "1m",
      "metricset.name": "transaction",
      "processor.event": "metric",
      "transaction.name": "parent",
      "transaction.result": "Success",
      "transaction.root": true,
      "transaction.type": "unknown"
    },
    "data_stream": {
      "dataset": "generic.otel",
      "namespace": "default",
      "type": "metrics"
    },
    "metrics": {
      "transaction.duration.histogram": {
        "values": [
          15000
        ],
        "counts": [
          1
        ]
      }
    },
    "resource": {
      "attributes": {
        "agent.name": "opentelemetry/go",
        "agent.version": "1.28.0",
        "metricset.interval": "1m",
        "service.name": "sendotlp",
        "some.resource.attribute": "resource.attr",
        "telemetry.sdk.language": "go",
        "telemetry.sdk.name": "opentelemetry",
        "telemetry.sdk.version": "1.28.0"
      },
      "dropped_attributes_count": 0
    },
    "scope": {
      "dropped_attributes_count": 0,
      "name": "otelcol/spanmetricsconnectorv2"
    }
  },
  "fields": {
    "resource.attributes.agent.version": [
      "1.28.0"
    ],
    "scope.name": [
      "otelcol/spanmetricsconnectorv2"
    ],
    "resource.attributes.telemetry.sdk.name": [
      "opentelemetry"
    ],
    "resource.attributes.agent.name": [
      "opentelemetry/go"
    ],
    "resource.attributes.telemetry.sdk.language": [
      "go"
    ],
    "scope.dropped_attributes_count": [
      0
    ],
    "service.language.name": [
      "go"
    ],
    "transaction.result": [
      "Success"
    ],
    "telemetry.sdk.name": [
      "opentelemetry"
    ],
    "attributes.transaction.result": [
      "Success"
    ],
    "telemetry.sdk.language": [
      "go"
    ],
    "transaction.root": [
      true
    ],
    "processor.event": [
      "metric"
    ],
    "resource.attributes.metricset.interval": [
      "1m"
    ],
    "agent.name": [
      "opentelemetry/go"
    ],
    "telemetry.sdk.version": [
      "1.28.0"
    ],
    "attributes.metricset.name": [
      "transaction"
    ],
    "metrics.transaction.duration.histogram": [
      {
        "values": [
          15000
        ],
        "counts": [
          1
        ]
      }
    ],
    "attributes.transaction.root": [
      true
    ],
    "event.outcome": [
      "success"
    ],
    "service.name": [
      "sendotlp"
    ],
    "attributes.arraydimension": [
      "bar",
      "foo"
    ],
    "data_stream.namespace": [
      "default"
    ],
    "attributes.transaction.name": [
      "parent"
    ],
    "resource.attributes.some.resource.attribute": [
      "resource.attr"
    ],
    "some.resource.attribute": [
      "resource.attr"
    ],
    "metricset.interval": [
      "1m"
    ],
    "data_stream.type": [
      "metrics"
    ],
    "transaction.duration.histogram": [
      {
        "values": [
          15000
        ],
        "counts": [
          1
        ]
      }
    ],
    "transaction.type": [
      "unknown"
    ],
    "metricset.name": [
      "transaction"
    ],
    "attributes.processor.event": [
      "metric"
    ],
    "@timestamp": [
      "2024-09-13T19:28:55.122Z"
    ],
    "resource.attributes.telemetry.sdk.version": [
      "1.28.0"
    ],
    "data_stream.dataset": [
      "generic.otel"
    ],
    "attributes.event.outcome": [
      "success"
    ],
    "attributes.transaction.type": [
      "unknown"
    ],
    "agent.version": [
      "1.28.0"
    ],
    "arraydimension": [
      "bar",
      "foo"
    ],
    "attributes.metricset.interval": [
      "1m"
    ],
    "transaction.name": [
      "parent"
    ],
    "resource.dropped_attributes_count": [
      0
    ],
    "resource.attributes.service.name": [
      "sendotlp"
    ]
  }
}

@felixbarny felixbarny added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 23, 2024
@elasticsearchmachine elasticsearchmachine merged commit 8d223cb into elastic:main Sep 23, 2024
@felixbarny felixbarny deleted the allow-array-dimensions-tsdb branch September 23, 2024 07:31
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 112645

@felixbarny
Copy link
Copy Markdown
Member Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

felixbarny added a commit to felixbarny/elasticsearch that referenced this pull request Sep 23, 2024
Closes elastic#110387

Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.

(cherry picked from commit 8d223cb)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
elasticsearchmachine pushed a commit that referenced this pull request Sep 23, 2024
* Add support for multi-value dimensions (#112645)

Closes #110387

Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.

(cherry picked from commit 8d223cb)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java

* Remove skip test for 8.x

This was just needed for 8.x to 9.0 compatibility tests
@felixbarny
Copy link
Copy Markdown
Member Author

@carsonip this has been merged and back ported to 8.x now. Could you take on the work to remove the workarounds from the ES exporter and the spec?

@carsonip
Copy link
Copy Markdown
Member

carsonip commented Sep 23, 2024

Could you take on the work to remove the workarounds from the ES exporter and the spec?

The es exporter PR is ready to go: open-telemetry/opentelemetry-collector-contrib#35291

elasticsearchmachine pushed a commit that referenced this pull request Apr 2, 2026
#112645 introduced support
for multi-value dimensions but downsampling was not handling that
correctly. 

**Observed behaviour**

Downsampling will store only one value in the downsampled document, if a
document has more than one values for a dimension field.

**Fix**

After this fix, downsampling will create an object array with all the
values and store that in the downsampled document. We also added extra
tests.
gmarouli added a commit to gmarouli/elasticsearch that referenced this pull request Apr 2, 2026
)

elastic#112645 introduced support
for multi-value dimensions but downsampling was not handling that
correctly.

**Observed behaviour**

Downsampling will store only one value in the downsampled document, if a
document has more than one values for a dimension field.

**Fix**

After this fix, downsampling will create an object array with all the
values and store that in the downsampled document. We also added extra
tests.

(cherry picked from commit 63dfd4a)

# Conflicts:
#	modules/data-streams/src/main/java/org/elasticsearch/datastreams/DataStreamFeatures.java
#	x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldDownsampler.java
gmarouli added a commit to gmarouli/elasticsearch that referenced this pull request Apr 2, 2026
)

elastic#112645 introduced support
for multi-value dimensions but downsampling was not handling that
correctly.

**Observed behaviour**

Downsampling will store only one value in the downsampled document, if a
document has more than one values for a dimension field.

**Fix**

After this fix, downsampling will create an object array with all the
values and store that in the downsampled document. We also added extra
tests.

(cherry picked from commit 63dfd4a)

# Conflicts:
#	x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldProducer.java
gmarouli added a commit to gmarouli/elasticsearch that referenced this pull request Apr 2, 2026
)

elastic#112645 introduced support
for multi-value dimensions but downsampling was not handling that
correctly.

**Observed behaviour**

Downsampling will store only one value in the downsampled document, if a
document has more than one values for a dimension field.

**Fix**

After this fix, downsampling will create an object array with all the
values and store that in the downsampled document. We also added extra
tests.

(cherry picked from commit 63dfd4a)

# Conflicts:
#	x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldDownsampler.java
gmarouli added a commit that referenced this pull request Apr 2, 2026
…) (#145536)

* Support multi-value dimensions in downsampling (bug fix) (#145458)

#112645 introduced support
for multi-value dimensions but downsampling was not handling that
correctly.

**Observed behaviour**

Downsampling will store only one value in the downsampled document, if a
document has more than one values for a dimension field.

**Fix**

After this fix, downsampling will create an object array with all the
values and store that in the downsampled document. We also added extra
tests.

(cherry picked from commit 63dfd4a)

# Conflicts:
#	x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldProducer.java

* Fix test

* Remove unnecessary warning
gmarouli added a commit that referenced this pull request Apr 2, 2026
…) (#145539)

* Support multi-value dimensions in downsampling (bug fix) (#145458)

#112645 introduced support
for multi-value dimensions but downsampling was not handling that
correctly.

**Observed behaviour**

Downsampling will store only one value in the downsampled document, if a
document has more than one values for a dimension field.

**Fix**

After this fix, downsampling will create an object array with all the
values and store that in the downsampled document. We also added extra
tests.

(cherry picked from commit 63dfd4a)

# Conflicts:
#	x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldDownsampler.java

* Fix assertion

* Remove unnecessary warning

* Sampling method was implemented not in 9.2
gmarouli added a commit that referenced this pull request Apr 2, 2026
) (#145535)

* Support multi-value dimensions in downsampling (bug fix) (#145458)

#112645 introduced support
for multi-value dimensions but downsampling was not handling that
correctly.

**Observed behaviour**

Downsampling will store only one value in the downsampled document, if a
document has more than one values for a dimension field.

**Fix**

After this fix, downsampling will create an object array with all the
values and store that in the downsampled document. We also added extra
tests.

(cherry picked from commit 63dfd4a)

# Conflicts:
#	modules/data-streams/src/main/java/org/elasticsearch/datastreams/DataStreamFeatures.java
#	x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldDownsampler.java

* Remove allowed warnings

* Fix assertion

* Remove sampling method, it was not implemented in 8.19
mromaios pushed a commit to mromaios/elasticsearch that referenced this pull request Apr 9, 2026
)

elastic#112645 introduced support
for multi-value dimensions but downsampling was not handling that
correctly. 

**Observed behaviour**

Downsampling will store only one value in the downsampled document, if a
document has more than one values for a dimension field.

**Fix**

After this fix, downsampling will create an object array with all the
values and store that in the downsampled document. We also added extra
tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport pending >enhancement :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.16.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support array values for dimension fields

6 participants