Add support for multi-value dimensions#112645
Add support for multi-value dimensions#112645elasticsearchmachine merged 18 commits intoelastic:mainfrom
Conversation
Closes elastic#110387 Having this in now affords us not having to introduce version checks in the ES exporter later.
|
Documentation preview: |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Hi @felixbarny, I've created a changelog YAML for you. |
modules/data-streams/src/yamlRestTest/resources/rest-api-spec/test/data_stream/150_tsdb.yml
Outdated
Show resolved
Hide resolved
modules/data-streams/src/yamlRestTest/resources/rest-api-spec/test/data_stream/150_tsdb.yml
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/IndexRouting.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/IndexRouting.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
|
I'm still on the fence for adding IP fields to the tsid, it does seem wasteful.. I see that they're part of the ECS template, but this will likely penalize indexing performance. Do we believe there's value in doing so? @martijnvg fyi. |
server/src/main/java/org/elasticsearch/cluster/routing/IndexRouting.java
Show resolved
Hide resolved
| if (values.size() == 1) { | ||
| // converts the immutable list that's optimized for the common case of having only one value to a mutable list | ||
| BytesReference previousValue = values.get(0); | ||
| values = new ArrayList<>(4); |
There was a problem hiding this comment.
In the multi valued case, do you often expect 4 ip values?
There was a problem hiding this comment.
This is assuming that when there are multiple values, we'll likely have less than 10 (the default size for ArrayLists), but probably more than 2.
carsonip
left a comment
There was a problem hiding this comment.
I haven't tried to understand the code change, but I built an image from this PR and it works with our otel setup.
Indexed document in ES with an array dimension named "arraydimension"
{
"_index": ".ds-metrics-generic.otel-default-2024.09.13-000001",
"_id": "cEnVhVfwBBT5RW1zAAABkezcIVI",
"_version": 1,
"_score": 0,
"_source": {
"@timestamp": "2024-09-13T19:28:55.122Z",
"attributes": {
"arraydimension": [
"bar",
"foo"
],
"event.outcome": "success",
"metricset.interval": "1m",
"metricset.name": "transaction",
"processor.event": "metric",
"transaction.name": "parent",
"transaction.result": "Success",
"transaction.root": true,
"transaction.type": "unknown"
},
"data_stream": {
"dataset": "generic.otel",
"namespace": "default",
"type": "metrics"
},
"metrics": {
"transaction.duration.histogram": {
"values": [
15000
],
"counts": [
1
]
}
},
"resource": {
"attributes": {
"agent.name": "opentelemetry/go",
"agent.version": "1.28.0",
"metricset.interval": "1m",
"service.name": "sendotlp",
"some.resource.attribute": "resource.attr",
"telemetry.sdk.language": "go",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.28.0"
},
"dropped_attributes_count": 0
},
"scope": {
"dropped_attributes_count": 0,
"name": "otelcol/spanmetricsconnectorv2"
}
},
"fields": {
"resource.attributes.agent.version": [
"1.28.0"
],
"scope.name": [
"otelcol/spanmetricsconnectorv2"
],
"resource.attributes.telemetry.sdk.name": [
"opentelemetry"
],
"resource.attributes.agent.name": [
"opentelemetry/go"
],
"resource.attributes.telemetry.sdk.language": [
"go"
],
"scope.dropped_attributes_count": [
0
],
"service.language.name": [
"go"
],
"transaction.result": [
"Success"
],
"telemetry.sdk.name": [
"opentelemetry"
],
"attributes.transaction.result": [
"Success"
],
"telemetry.sdk.language": [
"go"
],
"transaction.root": [
true
],
"processor.event": [
"metric"
],
"resource.attributes.metricset.interval": [
"1m"
],
"agent.name": [
"opentelemetry/go"
],
"telemetry.sdk.version": [
"1.28.0"
],
"attributes.metricset.name": [
"transaction"
],
"metrics.transaction.duration.histogram": [
{
"values": [
15000
],
"counts": [
1
]
}
],
"attributes.transaction.root": [
true
],
"event.outcome": [
"success"
],
"service.name": [
"sendotlp"
],
"attributes.arraydimension": [
"bar",
"foo"
],
"data_stream.namespace": [
"default"
],
"attributes.transaction.name": [
"parent"
],
"resource.attributes.some.resource.attribute": [
"resource.attr"
],
"some.resource.attribute": [
"resource.attr"
],
"metricset.interval": [
"1m"
],
"data_stream.type": [
"metrics"
],
"transaction.duration.histogram": [
{
"values": [
15000
],
"counts": [
1
]
}
],
"transaction.type": [
"unknown"
],
"metricset.name": [
"transaction"
],
"attributes.processor.event": [
"metric"
],
"@timestamp": [
"2024-09-13T19:28:55.122Z"
],
"resource.attributes.telemetry.sdk.version": [
"1.28.0"
],
"data_stream.dataset": [
"generic.otel"
],
"attributes.event.outcome": [
"success"
],
"attributes.transaction.type": [
"unknown"
],
"agent.version": [
"1.28.0"
],
"arraydimension": [
"bar",
"foo"
],
"attributes.metricset.interval": [
"1m"
],
"transaction.name": [
"parent"
],
"resource.dropped_attributes_count": [
0
],
"resource.attributes.service.name": [
"sendotlp"
]
}
}
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
Closes elastic#110387 Having this in now affords us not having to introduce version checks in the ES exporter later. We can simply use the same serialization logic for metric attributes as we do for other signals. This also enables us to properly map `*.ip` fields to the ip field type as ip fields containing a list of IPs are not converted to a comma-separated list. (cherry picked from commit 8d223cb) # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
* Add support for multi-value dimensions (#112645) Closes #110387 Having this in now affords us not having to introduce version checks in the ES exporter later. We can simply use the same serialization logic for metric attributes as we do for other signals. This also enables us to properly map `*.ip` fields to the ip field type as ip fields containing a list of IPs are not converted to a comma-separated list. (cherry picked from commit 8d223cb) # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java * Remove skip test for 8.x This was just needed for 8.x to 9.0 compatibility tests
|
@carsonip this has been merged and back ported to 8.x now. Could you take on the work to remove the workarounds from the ES exporter and the spec? |
The es exporter PR is ready to go: open-telemetry/opentelemetry-collector-contrib#35291 |
#112645 introduced support for multi-value dimensions but downsampling was not handling that correctly. **Observed behaviour** Downsampling will store only one value in the downsampled document, if a document has more than one values for a dimension field. **Fix** After this fix, downsampling will create an object array with all the values and store that in the downsampled document. We also added extra tests.
) elastic#112645 introduced support for multi-value dimensions but downsampling was not handling that correctly. **Observed behaviour** Downsampling will store only one value in the downsampled document, if a document has more than one values for a dimension field. **Fix** After this fix, downsampling will create an object array with all the values and store that in the downsampled document. We also added extra tests. (cherry picked from commit 63dfd4a) # Conflicts: # modules/data-streams/src/main/java/org/elasticsearch/datastreams/DataStreamFeatures.java # x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldDownsampler.java
) elastic#112645 introduced support for multi-value dimensions but downsampling was not handling that correctly. **Observed behaviour** Downsampling will store only one value in the downsampled document, if a document has more than one values for a dimension field. **Fix** After this fix, downsampling will create an object array with all the values and store that in the downsampled document. We also added extra tests. (cherry picked from commit 63dfd4a) # Conflicts: # x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldProducer.java
) elastic#112645 introduced support for multi-value dimensions but downsampling was not handling that correctly. **Observed behaviour** Downsampling will store only one value in the downsampled document, if a document has more than one values for a dimension field. **Fix** After this fix, downsampling will create an object array with all the values and store that in the downsampled document. We also added extra tests. (cherry picked from commit 63dfd4a) # Conflicts: # x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldDownsampler.java
…) (#145536) * Support multi-value dimensions in downsampling (bug fix) (#145458) #112645 introduced support for multi-value dimensions but downsampling was not handling that correctly. **Observed behaviour** Downsampling will store only one value in the downsampled document, if a document has more than one values for a dimension field. **Fix** After this fix, downsampling will create an object array with all the values and store that in the downsampled document. We also added extra tests. (cherry picked from commit 63dfd4a) # Conflicts: # x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldProducer.java * Fix test * Remove unnecessary warning
…) (#145539) * Support multi-value dimensions in downsampling (bug fix) (#145458) #112645 introduced support for multi-value dimensions but downsampling was not handling that correctly. **Observed behaviour** Downsampling will store only one value in the downsampled document, if a document has more than one values for a dimension field. **Fix** After this fix, downsampling will create an object array with all the values and store that in the downsampled document. We also added extra tests. (cherry picked from commit 63dfd4a) # Conflicts: # x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldDownsampler.java * Fix assertion * Remove unnecessary warning * Sampling method was implemented not in 9.2
) (#145535) * Support multi-value dimensions in downsampling (bug fix) (#145458) #112645 introduced support for multi-value dimensions but downsampling was not handling that correctly. **Observed behaviour** Downsampling will store only one value in the downsampled document, if a document has more than one values for a dimension field. **Fix** After this fix, downsampling will create an object array with all the values and store that in the downsampled document. We also added extra tests. (cherry picked from commit 63dfd4a) # Conflicts: # modules/data-streams/src/main/java/org/elasticsearch/datastreams/DataStreamFeatures.java # x-pack/plugin/downsample/src/main/java/org/elasticsearch/xpack/downsample/DimensionFieldDownsampler.java * Remove allowed warnings * Fix assertion * Remove sampling method, it was not implemented in 8.19
) elastic#112645 introduced support for multi-value dimensions but downsampling was not handling that correctly. **Observed behaviour** Downsampling will store only one value in the downsampled document, if a document has more than one values for a dimension field. **Fix** After this fix, downsampling will create an object array with all the values and store that in the downsampled document. We also added extra tests.
Closes #110387
Having this in now affords us not having to introduce version checks in the ES exporter later. We can simply use the same serialization logic for metric attributes as we do for other signals.
This also enables us to properly map
*.ipfields to the ip field type as ip fields containing a list of IPs are not converted to a comma-separated list.