[AWS Usage] Overlapping documents when enabling TSDB - no more dimensions available #6783

constanca-m · 2023-07-03T10:44:54Z

Currently there is no way to distinguish between some documents from AWS Usage. If we enable TSDB with the dimensions set as of now, they will not be enough and we will end up losing data. However, there are no keyword fields available to differentiate between the these set of documents. Example:

Document 1

{
  "_index": ".ds-metrics-aws.usage-default-2023.06.29-000001",
  "_id": "VaIZB4kBLpMqNjezszQ9",
  "_version": 1,
  "_score": 0,
  "_source": {
    "cloud": {
      "provider": "aws",
      "region": "sa-east-1",
      "account": {
        "name": "elastic-observability",
        "id": "627286350134"
      }
    },
    "agent": {
      "name": "kind-control-plane",
      "id": "178edbcb-2132-497d-b6da-e8c7d8095a90",
      "type": "metricbeat",
      "ephemeral_id": "e63bc826-7b49-4d1d-85f0-40340b77461d",
      "version": "8.8.0"
    },
    "@timestamp": "2023-06-29T12:20:00.000Z",
    "ecs": {
      "version": "8.0.0"
    },
    "data_stream": {
      "namespace": "default",
      "type": "metrics",
      "dataset": "aws.usage"
    },
    "service": {
      "type": "aws"
    },
    "elastic_agent": {
      "id": "178edbcb-2132-497d-b6da-e8c7d8095a90",
      "version": "8.8.0",
      "snapshot": true
    },
    "host": {
      "hostname": "kind-control-plane",
      "os": {
        "kernel": "5.15.49-linuxkit",
        "codename": "focal",
        "name": "Ubuntu",
        "type": "linux",
        "family": "debian",
        "version": "20.04.6 LTS (Focal Fossa)",
        "platform": "ubuntu"
      },
      "containerized": false,
      "ip": [
        "10.244.0.1",
        "10.244.0.1",
        "10.244.0.1",
        "172.18.0.2",
        "fc00:f853:ccd:e793::2",
        "fe80::42:acff:fe12:2",
        "172.19.0.4"
      ],
      "name": "kind-control-plane",
      "id": "0aab3a64904042bdb1c956d6fe2fa4f1",
      "mac": [
        "02-42-AC-12-00-02",
        "02-42-AC-13-00-04",
        "06-DD-17-EE-41-97",
        "22-F1-EB-33-1A-13",
        "66-56-4C-AB-83-C0"
      ],
      "architecture": "x86_64"
    },
    "metricset": {
      "period": 60000,
      "name": "cloudwatch"
    },
    "aws": {
      "usage": {
        "metrics": {
          "CallCount": {
            "sum": 28
          }
        }
      },
      "cloudwatch": {
        "namespace": "AWS/Usage"
      },
      "dimensions": {
        "Type": "API",
        "Resource": "ListMetrics",
        "Service": "CloudWatch",
        "Class": "None"
      }
    },
    "event": {
      "duration": 9649888084,
      "agent_id_status": "verified",
      "ingested": "2023-06-29T12:21:12Z",
      "module": "aws",
      "dataset": "aws.usage"
    }
  },
  "fields": {
    "elastic_agent.version": [
      "8.8.0"
    ],
    "host.os.name.text": [
      "Ubuntu"
    ],
    "host.hostname": [
      "kind-control-plane"
    ],
    "host.mac": [
      "02-42-AC-12-00-02",
      "02-42-AC-13-00-04",
      "06-DD-17-EE-41-97",
      "22-F1-EB-33-1A-13",
      "66-56-4C-AB-83-C0"
    ],
    "service.type": [
      "aws"
    ],
    "host.ip": [
      "10.244.0.1",
      "10.244.0.1",
      "10.244.0.1",
      "172.18.0.2",
      "fc00:f853:ccd:e793::2",
      "fe80::42:acff:fe12:2",
      "172.19.0.4"
    ],
    "agent.type": [
      "metricbeat"
    ],
    "aws.dimensions.Class": [
      "None"
    ],
    "event.module": [
      "aws"
    ],
    "host.os.version": [
      "20.04.6 LTS (Focal Fossa)"
    ],
    "host.os.kernel": [
      "5.15.49-linuxkit"
    ],
    "host.os.name": [
      "Ubuntu"
    ],
    "aws.cloudwatch.namespace": [
      "AWS/Usage"
    ],
    "agent.name": [
      "kind-control-plane"
    ],
    "elastic_agent.snapshot": [
      true
    ],
    "host.name": [
      "kind-control-plane"
    ],
    "event.agent_id_status": [
      "verified"
    ],
    "aws.dimensions.Service": [
      "CloudWatch"
    ],
    "host.id": [
      "0aab3a64904042bdb1c956d6fe2fa4f1"
    ],
    "aws.usage.metrics.CallCount.sum": [
      28
    ],
    "cloud.region": [
      "sa-east-1"
    ],
    "host.os.type": [
      "linux"
    ],
    "cloud.account.name": [
      "elastic-observability"
    ],
    "elastic_agent.id": [
      "178edbcb-2132-497d-b6da-e8c7d8095a90"
    ],
    "data_stream.namespace": [
      "default"
    ],
    "metricset.period": [
      60000
    ],
    "aws.dimensions.Type": [
      "API"
    ],
    "host.os.codename": [
      "focal"
    ],
    "data_stream.type": [
      "metrics"
    ],
    "event.duration": [
      9649888084
    ],
    "host.architecture": [
      "x86_64"
    ],
    "metricset.name": [
      "cloudwatch"
    ],
    "cloud.provider": [
      "aws"
    ],
    "event.ingested": [
      "2023-06-29T12:21:12.000Z"
    ],
    "@timestamp": [
      "2023-06-29T12:20:00.000Z"
    ],
    "agent.id": [
      "178edbcb-2132-497d-b6da-e8c7d8095a90"
    ],
    "host.containerized": [
      false
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "host.os.platform": [
      "ubuntu"
    ],
    "cloud.account.id": [
      "627286350134"
    ],
    "data_stream.dataset": [
      "aws.usage"
    ],
    "agent.ephemeral_id": [
      "e63bc826-7b49-4d1d-85f0-40340b77461d"
    ],
    "agent.version": [
      "8.8.0"
    ],
    "aws.dimensions.Resource": [
      "ListMetrics"
    ],
    "host.os.family": [
      "debian"
    ],
    "event.dataset": [
      "aws.usage"
    ]
  }
}

Document 2

{
  "_index": ".ds-metrics-aws.usage-default-2023.06.29-000001",
  "_id": "aaIaB4kBLpMqNjezKDWL",
  "_version": 1,
  "_score": 0,
  "_source": {
    "cloud": {
      "provider": "aws",
      "region": "sa-east-1",
      "account": {
        "name": "elastic-observability",
        "id": "627286350134"
      }
    },
    "agent": {
      "name": "kind-control-plane",
      "id": "178edbcb-2132-497d-b6da-e8c7d8095a90",
      "type": "metricbeat",
      "ephemeral_id": "e63bc826-7b49-4d1d-85f0-40340b77461d",
      "version": "8.8.0"
    },
    "@timestamp": "2023-06-29T12:20:00.000Z",
    "ecs": {
      "version": "8.0.0"
    },
    "service": {
      "type": "aws"
    },
    "data_stream": {
      "namespace": "default",
      "type": "metrics",
      "dataset": "aws.usage"
    },
    "host": {
      "hostname": "kind-control-plane",
      "os": {
        "kernel": "5.15.49-linuxkit",
        "codename": "focal",
        "name": "Ubuntu",
        "type": "linux",
        "family": "debian",
        "version": "20.04.6 LTS (Focal Fossa)",
        "platform": "ubuntu"
      },
      "ip": [
        "10.244.0.1",
        "10.244.0.1",
        "10.244.0.1",
        "172.18.0.2",
        "fc00:f853:ccd:e793::2",
        "fe80::42:acff:fe12:2",
        "172.19.0.4"
      ],
      "containerized": false,
      "name": "kind-control-plane",
      "id": "0aab3a64904042bdb1c956d6fe2fa4f1",
      "mac": [
        "02-42-AC-12-00-02",
        "02-42-AC-13-00-04",
        "06-DD-17-EE-41-97",
        "22-F1-EB-33-1A-13",
        "66-56-4C-AB-83-C0"
      ],
      "architecture": "x86_64"
    },
    "elastic_agent": {
      "id": "178edbcb-2132-497d-b6da-e8c7d8095a90",
      "version": "8.8.0",
      "snapshot": true
    },
    "metricset": {
      "period": 60000,
      "name": "cloudwatch"
    },
    "aws": {
      "usage": {
        "metrics": {
          "CallCount": {
            "sum": 40
          }
        }
      },
      "cloudwatch": {
        "namespace": "AWS/Usage"
      },
      "dimensions": {
        "Type": "API",
        "Resource": "ListMetrics",
        "Service": "CloudWatch",
        "Class": "None"
      }
    },
    "event": {
      "duration": 9720431083,
      "agent_id_status": "verified",
      "ingested": "2023-06-29T12:21:42Z",
      "module": "aws",
      "dataset": "aws.usage"
    }
  },
  "fields": {
    "elastic_agent.version": [
      "8.8.0"
    ],
    "host.os.name.text": [
      "Ubuntu"
    ],
    "host.hostname": [
      "kind-control-plane"
    ],
    "host.mac": [
      "02-42-AC-12-00-02",
      "02-42-AC-13-00-04",
      "06-DD-17-EE-41-97",
      "22-F1-EB-33-1A-13",
      "66-56-4C-AB-83-C0"
    ],
    "service.type": [
      "aws"
    ],
    "host.ip": [
      "10.244.0.1",
      "10.244.0.1",
      "10.244.0.1",
      "172.18.0.2",
      "fc00:f853:ccd:e793::2",
      "fe80::42:acff:fe12:2",
      "172.19.0.4"
    ],
    "agent.type": [
      "metricbeat"
    ],
    "aws.dimensions.Class": [
      "None"
    ],
    "event.module": [
      "aws"
    ],
    "host.os.version": [
      "20.04.6 LTS (Focal Fossa)"
    ],
    "host.os.kernel": [
      "5.15.49-linuxkit"
    ],
    "host.os.name": [
      "Ubuntu"
    ],
    "aws.cloudwatch.namespace": [
      "AWS/Usage"
    ],
    "agent.name": [
      "kind-control-plane"
    ],
    "elastic_agent.snapshot": [
      true
    ],
    "host.name": [
      "kind-control-plane"
    ],
    "event.agent_id_status": [
      "verified"
    ],
    "aws.dimensions.Service": [
      "CloudWatch"
    ],
    "host.id": [
      "0aab3a64904042bdb1c956d6fe2fa4f1"
    ],
    "aws.usage.metrics.CallCount.sum": [
      40
    ],
    "cloud.region": [
      "sa-east-1"
    ],
    "host.os.type": [
      "linux"
    ],
    "cloud.account.name": [
      "elastic-observability"
    ],
    "elastic_agent.id": [
      "178edbcb-2132-497d-b6da-e8c7d8095a90"
    ],
    "data_stream.namespace": [
      "default"
    ],
    "metricset.period": [
      60000
    ],
    "aws.dimensions.Type": [
      "API"
    ],
    "host.os.codename": [
      "focal"
    ],
    "data_stream.type": [
      "metrics"
    ],
    "event.duration": [
      9720431083
    ],
    "host.architecture": [
      "x86_64"
    ],
    "metricset.name": [
      "cloudwatch"
    ],
    "cloud.provider": [
      "aws"
    ],
    "event.ingested": [
      "2023-06-29T12:21:42.000Z"
    ],
    "@timestamp": [
      "2023-06-29T12:20:00.000Z"
    ],
    "agent.id": [
      "178edbcb-2132-497d-b6da-e8c7d8095a90"
    ],
    "host.containerized": [
      false
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "host.os.platform": [
      "ubuntu"
    ],
    "cloud.account.id": [
      "627286350134"
    ],
    "data_stream.dataset": [
      "aws.usage"
    ],
    "agent.ephemeral_id": [
      "e63bc826-7b49-4d1d-85f0-40340b77461d"
    ],
    "agent.version": [
      "8.8.0"
    ],
    "aws.dimensions.Resource": [
      "ListMetrics"
    ],
    "host.os.family": [
      "debian"
    ],
    "event.dataset": [
      "aws.usage"
    ]
  }
}

This issue might be hard to reproduce. When testing, I got the output: Out of 40000 documents from the index .ds-metrics-aws.usage-default-2023.06.29-000001, 429 of them were discarded., which means that this is happening with just 1% of the documents.

The text was updated successfully, but these errors were encountered:

tetianakravchenko · 2023-07-05T20:18:56Z

I think I could reproduce similar issue but with the s3_request data_stream. The only 2 fields that are different for 2 documents are event.duration and event.ingested (additionally to the _id field):

From my understanding those fields are ingested by the bets.

Could be that it is a different issue, because for usage data_stream - there is one more field aws.usage.metrics.CallCount.sum that has different value for 2 documents

tommyers-elastic · 2023-07-06T15:18:18Z

been running the usage integration all day at 1hr collection periods and also not seen any dups. under what circumstances have you all seen this happen?

i'm conflicted about what to do about this, because if we decide to go ahead with TSDB for these datasets, it would be very unlikely we would see this issue again (the issue is masked). but i have a feeling that for the cases we have seen here, the duplicate values are acting like cumulative counters over the collection period - i.e. the newer value is the one we should take, and the opposite happens with TSDB (the first value is the one we take).

i think figuring out how to repro this, and determining if the above theory is correct regarding cumulative values should be the next step. although it doesn't seem like a particularly impactful issue, metric accuracy is important for us, and we should do what we can to figure it out.

constanca-m · 2023-07-06T15:26:56Z

been running the usage integration all day at 1hr collection periods and also not seen any dups. under what circumstances have you all seen this happen?

I honestly have no idea. I left AWS Usage running and it just appeared after some time with documents overlapping. I had many, many documents at that time. Sometimes I run the TSDB test for 40k documents and I do not see any overlap, so I have no idea what could be causing this.

How are you checking if you can reproduce this @tommyers-elastic ? My suggestion would be to leave AWS Usage for 1 day and test on all documents, I am sure you would end up seeing the overlap.

tetianakravchenko · 2023-07-06T20:50:22Z

It seems there are few (potential) issues:
first use case - if restarting elastic-agent there will be added the same document (since there were not many changes, so metrics are the same, tested with the S3 data_streams), but the _id, agent.ephemeral_id, event.duration and event.ingested are different.

How to reproduce

start stack using elastic-package-0.83.2 stack up -d --version 8.8.0 -vv
add aws integration to the Elastic-Agent (elastic-package) policy
check that you got some data
run docker restart elastic-package-stack-elastic-agent-1
verify that data was added again with the same time stamp as before

@constanca-m can you please check if you can reproduce it with usage data_stream?

for this case it is not clear why the documents are added with the same timestamp?

second use case - the one when _id, event.duration and event.ingested are different, but agent.ephemeral_id is the same - trying to reproduce

constanca-m · 2023-07-07T06:18:50Z

I don't think it is necessary to go that far. I was testing using Elastic Cloud and the documents I had on aws.usage data stream. I checked the overwritten documents, and indeed, some of them only have those fields as a difference:

And another example in the same data stream:

So the document is the same, but it is weird that some metric changes sometimes:

I believe this last case is even harder to find. From the set of 10 documents, I think only one had a change of value on a metric.

I think these documents are all the same, which in that case, it is exactly what TSDB is for: discard the same document to save storage space.

tetianakravchenko · 2023-07-07T06:25:11Z

I think these documents are all the same, which in that case, it is exactly what TSDB is for: discard the same document to save storage space.

with tsdb enabled those duplicated documents will be silently dropped, but still generated, processed on the beats side and sent to elasticsearch, that is not optimal.

kaiyan-sheng · 2023-08-29T17:30:36Z

One thing we can try is to calculate a document ID based on the unique identifiers of that document. Right now we don't specify the ID so when metricbeat/agent restarts, two documents will be sent to ES with diff ID but same metrics.

botelastic · 2024-08-28T18:30:00Z

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

constanca-m added the Integration:aws AWS label Jul 3, 2023

constanca-m mentioned this issue Jul 3, 2023

[AWS] TSDB enablement - track all metrics data streams #6293

Closed

tetianakravchenko mentioned this issue Jul 10, 2023

[AWS S3] Enable TSDB by default #6887

Merged

4 tasks

botelastic bot added the Stalled label Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWS Usage] Overlapping documents when enabling TSDB - no more dimensions available #6783

[AWS Usage] Overlapping documents when enabling TSDB - no more dimensions available #6783

constanca-m commented Jul 3, 2023

tetianakravchenko commented Jul 5, 2023

tommyers-elastic commented Jul 6, 2023

constanca-m commented Jul 6, 2023

tetianakravchenko commented Jul 6, 2023 •

edited

Loading

constanca-m commented Jul 7, 2023

tetianakravchenko commented Jul 7, 2023

kaiyan-sheng commented Aug 29, 2023

botelastic bot commented Aug 28, 2024

[AWS Usage] Overlapping documents when enabling TSDB - no more dimensions available #6783

[AWS Usage] Overlapping documents when enabling TSDB - no more dimensions available #6783

Comments

constanca-m commented Jul 3, 2023

tetianakravchenko commented Jul 5, 2023

tommyers-elastic commented Jul 6, 2023

constanca-m commented Jul 6, 2023

tetianakravchenko commented Jul 6, 2023 • edited Loading

constanca-m commented Jul 7, 2023

tetianakravchenko commented Jul 7, 2023

kaiyan-sheng commented Aug 29, 2023

botelastic bot commented Aug 28, 2024

tetianakravchenko commented Jul 6, 2023 •

edited

Loading