Add aws.firehose.arn, aws.firehose.request_id and aws.metrics_names_fingerprint fields#11239
Add aws.firehose.arn, aws.firehose.request_id and aws.metrics_names_fingerprint fields#11239kaiyan-sheng merged 13 commits intoelastic:mainfrom
Conversation
axw
left a comment
There was a problem hiding this comment.
Why would we want these as dimensions? These fields describe the conduit for the data, rather than anything related to the metrics themselves. So I would see them as metadata rather than dimensions.
|
@axw For metrics coming in from the same Firehose stream, I see cases when two documents have the same timestamp, dimension, namespace, accountID, exportARN and region BUT from two different requests. Without specifying the request_id being a dimension, one of the documents get dropped. I'm still trying to test out if aws.firehose.arn needs to be a dimension with the use case of having same metrics ingesting from two different firehose streams. But I think this case the exportARN will be different so we should be ok. |
|
Testing with firehose integration assets, I was able to ingest documents with the same timestamp, dimension, namespace, accountID, exportARN and region but different But adding |
Do we know why there are multiple requests with all the same dimensions? Are they retries, where one of them should get dropped? |
|
@axw These are multiple requests with the same dimensions BUT different metric data points. For example |
|
Also discussed on Slack. We're having to choose the best of a bad lot here:
I don't like the second option (what this PR implements), but I think it's the most reasonable solution at the moment. The reason I don't like it is that it would make automatic rollups ineffective, without some knowledge that request_id is something to ignore in rollups. Not a problem for today, so we can kick that can down the road a bit longer. |
axw
left a comment
There was a problem hiding this comment.
@felixbarny and I had a chat on Slack about this, as it's another issue related to elastic/elasticsearch#99123. Although we can technically work around it using the request ID, this will reduce the storage efficiency of TSDB.
It's still a workaround, and not ideal, but we should probably hash the metric names instead. The hash can then be included as a dimension. That has been done for the Prometheus integration, so I'd suggest following suit. I think we could do it in the integration ingest pipeline?
🚀 Benchmarks reportPackage
|
| Data stream | Previous EPS | New EPS | Diff (%) | Result |
|---|---|---|---|---|
invocation |
480.54 | 364.83 | -115.71 (-24.08%) | 💔 |
Package awsfirehose 👍(0) 💚(1) 💔(1)
Expand to view
| Data stream | Previous EPS | New EPS | Diff (%) | Result |
|---|---|---|---|---|
metrics |
35714.29 | 10638.3 | -25075.99 (-70.21%) | 💔 |
To see the full report comment with /test benchmark fullreport
…ws_bedrock packages
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
axw
left a comment
There was a problem hiding this comment.
Looks good, thanks. Main issue is regarding the change in field access in routing rules - doesn't seem related to the main change?
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
efd6
left a comment
There was a problem hiding this comment.
LGTM as codeowner for aws_bedrock with one nit.
Some nits in other files.
packages/aws/data_stream/apigateway_metrics/fields/package-fields.yml
Outdated
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/awsfirehose/data_stream/metrics/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
|
💚 Build Succeeded
History
|
|
Package aws_bedrock - 0.10.0 containing this change is available at https://epr.elastic.co/search?package=aws_bedrock |
|
Package awsfirehose - 1.3.0 containing this change is available at https://epr.elastic.co/search?package=awsfirehose |
…ingerprint fields (elastic#11239) This PR adds aws.firehose.arn and aws.firehose.request_id field definitions for the firehose integration. This PR also adds aws.metrics_names_fingerprint which is a hash of the list of metric names exist in each document. This way we don't have to count request_id as a dimension.
…ingerprint fields (elastic#11239) This PR adds aws.firehose.arn and aws.firehose.request_id field definitions for the firehose integration. This PR also adds aws.metrics_names_fingerprint which is a hash of the list of metric names exist in each document. This way we don't have to count request_id as a dimension.



Proposed commit message
This PR adds
aws.firehose.arnandaws.firehose.request_idfield definitions for the firehose integration.This PR also adds
aws.metrics_names_fingerprintwhich is a hash of the list of metric names exist in each document. This way we don't have to countrequest_idas a dimension.Checklist
changelog.ymlfile.How to test this PR locally?
metrics-aws.cloudwatch-defaultdata stream to mimic metrics coming in from Firehose:POST metrics-aws.cloudwatch-default/_doc/Details
``` POST metrics-aws.cloudwatch-default/_doc/ { "@timestamp": "2024-09-25T23:12:00.000Z", "agent.type": "firehose", "cloud.provider": "aws", "cloud.account.id": "627286350134", "cloud.region": "eu-central-1", "aws.exporter.arn": "arn:aws:cloudwatch:eu-central-1:627286350134:metric-stream/CustomFull-KefpMG", "aws.cloudwatch.namespace": "AWS/Kinesis", "aws.firehose.arn": "arn:aws:firehose:eu-central-1:627286350134:deliverystream/KS-PUT-ELI-2", "aws.firehose.request_id": "7628c293-ba54-4ea4-9abe-d256976c1dbd", "aws.dimensions": { "StreamName": "test-esf-encrypted" }, "aws.kinesis.metrics.GetRecords_Success": { "count": 240, "sum": 240, "avg": 1, "max": 1, "min": 1 }, "aws.kinesis.metrics.GetRecords_Bytes": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.kinesis.metrics.GetRecords_Latency": { "count": 240, "sum": 1864, "avg": 7.766667, "max": 18, "min": 6 }, "aws.kinesis.metrics.GetRecords_IteratorAge": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.kinesis.metrics.GetRecords_Records": { "count": 240, "sum": 0, "avg": 0, "min": 0, "max": 0 }, "aws.firehose.parameters.X-Found-Cluster": "bb8e51259abe4c21996954f5cfe90af1", "data_stream.type": "metrics", "data_stream.dataset": "aws.cloudwatch", "data_stream.namespace": "default" } ``` ``` POST metrics-aws.cloudwatch-default/_doc/ { "@timestamp": "2024-09-25T23:12:00.000Z", "agent.type": "firehose", "cloud.provider": "aws", "cloud.account.id": "627286350134", "cloud.region": "eu-central-1", "aws.exporter.arn": "arn:aws:cloudwatch:eu-central-1:627286350134:metric-stream/CustomFull-KefpMG", "aws.cloudwatch.namespace": "AWS/Kinesis", "aws.firehose.arn": "arn:aws:firehose:eu-central-1:627286350134:deliverystream/KS-PUT-ELI-2", "aws.firehose.request_id": "70222524-bdc0-47f4-b7f8-adb7c6998bc7", "aws.dimensions": { "StreamName": "test-esf-encrypted" }, "aws.kinesis.metrics.GetRecords_IteratorAgeMilliseconds": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.kinesis.metrics.ReadProvisionedThroughputExceeded": { "count": 240, "sum": 0, "avg": 0, "max": 0, "min": 0 }, "aws.firehose.parameters.X-Found-Cluster": "bb8e51259abe4c21996954f5cfe90af1", "data_stream.type": "metrics", "data_stream.dataset": "aws.cloudwatch", "data_stream.namespace": "default" } ```Details
``` { "_index": ".ds-metrics-aws.kinesis-default-2024.09.25-000001", "_id": "kpqzKhZo8euJ-o_TAAABkiuFKMA", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 1 } ```