[Telemetry] Report data shippers by afharo · Pull Request #64935 · elastic/kibana

afharo · 2020-04-30T17:44:40Z

Summary

Report if well-known data shippers are used to index documents in the cluster.
Tested on OSS, X-Pack and Monitoring collectors.

Manual tests to explain the behaviour

Start a clean cluster:
The expected payload is an empty array stack_stats.data = []

I installed packetbeat in my machine and started it. stack_stats.data is now...

  "data": [
    {
      "shipper": "packetbeat",
      "index_count": 1,
      "ecs_index_count": 1,
      "doc_count": 56686,
      "size_in_bytes": 29042232
    }
  ],

For packetbeat indices pre-7.0, we didn't have the _meta.beat information, so we'll report it under pattern_name instead. But we'll report the shipper property because we know that index pattern is strictly linked to that shipper. The index pattern is defined as { pattern: 'packetbeat-*', patternName: 'packetbeat', shipper: 'packetbeat' }
```
    {
      "pattern_name": "packetbeat",
      "shipper": "packetbeat",
      "index_count": 1,
      "ecs_index_count": 0,
      "doc_count": 0,
      "size_in_bytes": 208
    }
```

If I create a new index called citrix-1234, matching the pattern *citrix*, the following is added to the array.

    {
      "pattern_name": "citrix",
      "index_count": 1,
      "ecs_index_count": 0,
      "doc_count": 0,
      "size_in_bytes": 208
    }

For the pattern { pattern: '*logs*', patternName: 'third-party-logs' }, I create some indices containing logs in their name:

PUT logs-custom-index-1234
PUT <logs-custom-index-{now%2Fd}-12345>
PUT <custom-logs-index-{now%2Fd}-12345>

The following payload is added to the data array:

    {
      "pattern_name": "third-party-logs",
      "index_count": 3,
      "ecs_index_count": 0,
      "doc_count": 0,
      "size_in_bytes": 624
    }

If I create an index following the New Indexing Strategy

PUT events-something-namespace-123124
{
  "mappings": {
    "_meta": {
      "beat": "my-beat"
    },
    "properties": {
      "ecs": {
        "properties": {
          "version": {
            "type": "keyword"
          }
        }
      },
      "dataset": {
        "properties": {
          "name": {
            "type": "constant_keyword",
            "value": "something"
          },
          "type": {
            "type": "constant_keyword",
            "value": "events"
          }
        }
      }
    }
  }
}

We read the values from the mappings and include the following object to the data array:

    {
      "dataset": {
        "name": "something",
        "type": "events"
      },
      "shipper": "my-beat",
      "index_count": 1,
      "ecs_index_count": 1,
      "doc_count": 0,
      "size_in_bytes": 208
    }

Final object after all

{
  "stack_stats": {
    "data": [
      {
        "shipper": "packetbeat",
        "index_count": 1,
        "ecs_index_count": 1,
        "doc_count": 56686,
        "size_in_bytes": 29042232
      },
      {
        "pattern_name": "packetbeat",
        "shipper": "packetbeat",
        "index_count": 1,
        "ecs_index_count": 0,
        "doc_count": 0,
        "size_in_bytes": 208
      },
      {
        "dataset": {
          "name": "something",
          "type": "events"
        },
        "shipper": "my-beat",
        "index_count": 1,
        "ecs_index_count": 1,
        "doc_count": 0,
        "size_in_bytes": 208
      },
      {
        "pattern_name": "citrix",
        "index_count": 1,
        "ecs_index_count": 0,
        "doc_count": 0,
        "size_in_bytes": 208
      },
      {
        "pattern_name": "third-party-logs",
        "index_count": 3,
        "ecs_index_count": 0,
        "doc_count": 0,
        "size_in_bytes": 624
      }
    ],
    "kibana": "..."
  }
}

When Monitoring is ON

With the currently limited information we can retrieve in Monitoring, the reported payload in the same scenario would be:

      "data": [
        {
          "pattern_name": "citrix",
          "index_count": 1,
          "doc_count": 0,
          "size_in_bytes": 208
        },
        {
          "pattern_name": "third-party-logs",
          "index_count": 2,
          "doc_count": 0,
          "size_in_bytes": 416
        },
        {
          "pattern_name": "packetbeat",
          "index_count": 2,
          "doc_count": 86535,
          "size_in_bytes": 43978411
        }
      ]

So I've removed the collection of this information when using Monitoring until we find a way to accurately retrieve it (#68998).

TODO:

Change the permissions of the role kibana_system (Add monitor and view_index_metadata to built-in kibana_system role elasticsearch#57755)
Finish up with the full listing: https://github.com/afharo/kibana/blob/telemetry/report-data-providers/src/plugins/telemetry/server/telemetry_collection/get_data_telemetry/constants.ts#L36 => @alexfrancoeur will provide an updated list. Only missing to confirm some of the patterns that are a bit too vague (i.e.: *bro*).
Try to identify if the mapping includes the field ecs.version
There is a concern, due to security reasons, the Kibana user might not be able to retrieve the stats of certain indices. We need to test it thoroughly (any help is appreciated here) => Tested! We are still able to list the indices but can't retrieve the doc_count nor the size. I've got confirmation that that's OK if those 2 fields are optional.

~~Adding the v7.8.0 label as tentative only.~~

Checklist

Delete any items that are not applicable to this PR.

Unit or functional tests were updated or added to match the most common scenarios

For maintainers

This was checked for breaking API changes and was labeled appropriately

…ected payload

afharo · 2020-05-04T17:10:48Z

src/plugins/telemetry/server/telemetry_collection/ingest_solutions/ingest_solutions.ts

+    // GET _cluster/state/metadata/<index>?filter_path=metadata.indices.*.version
+    callCluster<ClusterState>('cluster.state', {
+      index,
+      metric: 'metadata',
+      filterPath: [
+        // The payload is huge and we are only after the name (no other useful stuff so far)
+        'metadata.indices.*.version',
+        // Does it have `ecs.version` in the mappings?
+        'metadata.indices.*.mappings._doc.properties.ecs.properties.version.type',
+      ],
+    }),


There is a concern about using this API because its documentation says:

The response is an internal representation of the cluster state and its format may change from version to version. If possible, you should obtain any information from the cluster state using the other, more stable, cluster APIs.

I've added a functional test in test/api_integration/apis/telemetry/telemetry_local.js to make sure it works as expected. Although it brings up the risk of flaky tests in the future if the API changes the way it works.

How much tech debt do we want to get ourselves into? If it is possible to use a more stable API, I think it's worth changing.

I think the only alternative so far is to only use the GET <index>/_stats/ API. But that requires the kibana user to have a more permissive role.

Alternatively, there are some ongoing talks with the ES team to modify the same Cluster State API to provide the aggregated data all at once (meaning ES shouldn't drop support or make any changes in that API).

Can we discuss with other teams the possiblity of adding those permissions to the kibana role instead of using this API? im not completely against using this API if we have no other way, but i think it would make more sense to add those extra permissions and use the _stats api.

Also if you check the compatiblity grid: https://github.com/elastic/kibana/#version-compatibility-with-elasticsearch

Are we sure this API does not change behavior across the compatiblity grid? Our tests will test exact ES matching version, but not other compatible versions where ES minor/patch versinos are newer or ES patch version is lower than kibana's.

I'm not against updating the roles to be more permissive (that will allow us to consistently be able to retrieve the doc_count and size_in_bytes properties) but it involves some additional security concerns.

Not being able to retrieve the data from the cluster state API because of the compatibility grid will only result in not being able to provide this parameter in the telemetry (it will return it as {}) but it shouldn't break any other logic unless the API request itself throws any errors.
The warning in that API is about the format may change though, not the API behaviour as such.

N.B.: I just pushed a commit to catch the method and safely return {} if any of the API calls fail.

I think this approach is safer than opening the kibana_system role to be able to read from any index. But I'm happy to revisit this implementation if we think that's the way to go or any other approach (like ES providing this kind of info already embedded in the _cluster/stats API if they are willing the do the change and aggregation on their end).

Opening up the kibana_system role is not ideal. Could we explore creating a telemetry_system role that has all the permissions needed?
cc @kobelb

I don't think creating a telemetry_system roles would necessarily help us here... If we added the telemetry_system role to the kibana_system user we'd have an equivalent "threat profile". If we created a new telemetry_system user which had the telemetry_system role, it would make setup more complicated and have a slightly different "threat profile" as both user's credentials would be stored in the kibana.yml.

Augmenting the _cluster/stats API so we don't have to change the kibana_system privileges at all would be the safest option.

However, if we had to give the kibana_system role the monitor privilege on indices so we can use the index stats API, it's a lot safer than giving it access to read the documents themselves.

alexfrancoeur · 2020-05-04T19:10:47Z

@afharo minor suggestion, and fine to leave as is if this blocks anything, but would it be possible to rename ingest_solutions to ingest? At least initially, we aren't mapping the data providers directly to solutions but instead the shippers themselves

afharo · 2020-05-05T14:20:47Z

@afharo minor suggestion, and fine to leave as is if this blocks anything, but would it be possible to rename ingest_solutions to ingest? At least initially, we aren't mapping the data providers directly to solutions but instead the shippers themselves

@alexfrancoeur absolutely, it's a minor change. The only reason I decided not to use ingest on its own is that it may be too vague. We already have the concepts ingest pipelines, ingest nodes, ...

But happy to change it you think it would make things easier :)

chrisronline

LGTM from stack monitoring

igoristic · 2020-05-05T15:45:56Z

x-pack/plugins/monitoring/server/telemetry_collection/get_ingest_solutions.ts

+) {
+  const responses = await Promise.all(
+    clusterUuids.map(async clusterUuid => {
+      // Should we take into consideration CCS? https://github.com/elastic/kibana/blob/3a396027f669803e1a3143237578973fb1ab20d0/x-pack/plugins/monitoring/server/routes/api/v1/elasticsearch/indices.js#L42


That's a good question. Are there any repercussions if we do use CCS by default? (like will it be less efficient, or slower?). If not then my other concern is it would probably need to be conditional based on licensing (and maybe also if there are any config options tied with it). I think the terms are kind of confusing since we actually mean "Multi-stack monitoring" here, right?

Looks like it's only available for Gold license and above

However, I think CSS is available for all licenses:

Source: https://www.elastic.co/subscriptions

I think it's fine the way it is right now and can be added later if anything

igoristic

Looks good from Stack Monitoring pov (code and functionality) ✅

TinaHeiligers

The solution looks good, although I did add a couple of questions.
I also pulled the code and ran the changed and added tests, all of which passed.

src/plugins/telemetry/server/telemetry_collection/ingest_solutions/constants.ts

src/plugins/telemetry/server/telemetry_collection/ingest_solutions/ingest_solutions.ts

TinaHeiligers · 2020-05-05T21:45:00Z

src/plugins/telemetry/server/telemetry_collection/ingest_solutions/ingest_solutions.ts

+    // GET _cluster/state/metadata/<index>?filter_path=metadata.indices.*.version
+    callCluster<ClusterState>('cluster.state', {
+      index,
+      metric: 'metadata',
+      filterPath: [
+        // The payload is huge and we are only after the name (no other useful stuff so far)
+        'metadata.indices.*.version',
+        // Does it have `ecs.version` in the mappings?
+        'metadata.indices.*.mappings._doc.properties.ecs.properties.version.type',
+      ],
+    }),


How much tech debt do we want to get ourselves into? If it is possible to use a more stable API, I think it's worth changing.

afharo · 2020-05-06T11:00:58Z

@elasticmachine merge upstream

TinaHeiligers · 2020-06-18T17:34:44Z

@elasticmachine merge upstream

…emetry/get_data_telemetry.test.ts Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>

afharo · 2020-06-23T11:07:02Z

@elasticmachine merge upstream

afharo · 2020-06-23T11:35:15Z

src/plugins/telemetry/server/telemetry_collection/get_data_telemetry/get_data_telemetry.ts

+  }
+
+  // Otherwise, try with the list of known index patterns
+  return DATA_DATASETS_INDEX_PATTERNS.find(({ pattern }) => {


After a conversation with @alexfrancoeur and @kobelb, I need to change this to a .filter

afharo · 2020-06-25T14:20:20Z

@elasticmachine merge upstream

…ort-data-providers

afharo · 2020-07-01T15:19:56Z

@elasticmachine merge upstream

kibanamachine · 2020-07-01T18:34:57Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: 7434e99

Build metrics

✅ unchanged

History

💛 Build #57908 was flaky aaac999
💚 Build #57563 succeeded edbf2f5
💚 Build #57501 succeeded f69c491
💔 Build #57463 failed f13ccb8
💚 Build #56325 succeeded 8746be1

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

* master: (46 commits) [Visualize] Add missing advanced settings and custom label for pipeline aggs (elastic#69688) Use dynamic: false for config saved object mappings (elastic#70436) [Ingest Pipelines] Error messages (elastic#70167) [APM] Show transaction rate per minute on Observability Overview page (elastic#70336) Filter out error when calculating a label (elastic#69934) [Visualizations] Each visType returns its supported triggers (elastic#70177) [Telemetry] Report data shippers (elastic#64935) Reduce SavedObjects mappings for Application Usage (elastic#70475) [Lens] fix dimension label performance issues (elastic#69978) Skip failing endgame tests (elastic#70548) [SIEM] Reenabling Cypress tests (elastic#70397) [SIEM][Security Solution][Endpoint] Endpoint Artifact Manifest Management + Artifact Download and Distribution (elastic#67707) [Security] Adds field mapping support to rule creation (elastic#70288) SECURITY-ENDPOINT: add fields for events to metadata document (elastic#70491) Fixed assertion in hybrid index pattern test to iterate through indices (elastic#70130) [SIEM][Exceptions] - Exception builder component (elastic#67013) [Ingest Manager] Rename data sources to package configs (elastic#70259) skip suites blocking es snapshot promomotion (elastic#70532) [Metrics UI] Fix asynchronicity and error handling in Snapshot API (elastic#70503) fix export response (elastic#70473) ...

afharo added Feature:Telemetry v8.0.0 release_note:skip Skip the PR/issue when compiling release notes :Telemetry v7.8.0 v7.9.0 labels Apr 30, 2020

afharo force-pushed the telemetry/report-data-providers branch from 436f7a4 to c51aa04 Compare May 4, 2020 09:03

[Telemetry] Report data providers

db8096b

afharo force-pushed the telemetry/report-data-providers branch from c51aa04 to db8096b Compare May 4, 2020 09:25

afharo added 5 commits May 4, 2020 13:19

Detect if the index uses ECS

9f2b355

Report only the ingest providers that are found

7e30867

Revert telemetry_collection_manager changes for testing

ec31d30

Functional tests to make sure the _clusters/state API returns the exp…

73d6a1a

…ected payload

Report docCount and size 0 if that is the obtained value

cf640bc

afharo commented May 4, 2020

View reviewed changes

afharo marked this pull request as ready for review May 4, 2020 17:48

afharo requested a review from a team as a code owner May 4, 2020 17:48

afharo requested a review from a team May 4, 2020 17:48

kobelb self-requested a review May 4, 2020 19:28

chrisronline approved these changes May 5, 2020

View reviewed changes

igoristic reviewed May 5, 2020

View reviewed changes

igoristic approved these changes May 5, 2020

View reviewed changes

TinaHeiligers removed the v7.8.0 label May 5, 2020

TinaHeiligers reviewed May 5, 2020

View reviewed changes

Split buildIngestSolutionsPayload into a more functional approach

9ff6cd6

elasticmachine and others added 2 commits May 6, 2020 07:01

Merge branch 'master' into telemetry/report-data-providers

6f28129

Wrap the API calls in a try...catch block

03da076

elasticmachine and others added 3 commits June 18, 2020 11:34

Merge branch 'master' into telemetry/report-data-providers

4479fcd

Update src/plugins/telemetry/server/telemetry_collection/get_data_tel…

7b60dfd

…emetry/get_data_telemetry.test.ts Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>

Update src/plugins/telemetry/server/telemetry_collection/get_data_tel…

4e362a0

…emetry/get_data_telemetry.test.ts Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>

Merge branch 'master' into telemetry/report-data-providers

3af6fed

afharo commented Jun 23, 2020

View reviewed changes

afharo added 2 commits June 23, 2020 12:54

Report all matching index patterns instead of the first one

b000747

Update list of index patterns

bbbed77

elasticmachine and others added 5 commits June 25, 2020 08:20

Merge branch 'master' into telemetry/report-data-providers

8746be1

Merge branch 'master' of github.com:elastic/kibana into telemetry/rep…

b90e6ee

…ort-data-providers

Update list of index-patterns

f13ccb8

APICaller is now LegacyAPICaller

f69c491

Merge branch 'master' of github.com:elastic/kibana into telemetry/rep…

4803f96

…ort-data-providers

afharo force-pushed the telemetry/report-data-providers branch from 9395313 to 4803f96 Compare June 30, 2020 15:42

afharo added 2 commits June 30, 2020 16:42

Disable dataset.* fields collection for now

edbf2f5

Disable *wp*, *wix* and *aem* for now so we can merge

b31f1c0

elasticmachine and others added 2 commits July 1, 2020 09:20

Merge branch 'master' into telemetry/report-data-providers

aaac999

New Security entries in the list

7434e99

afharo merged commit 6607bf7 into elastic:master Jul 2, 2020

afharo deleted the telemetry/report-data-providers branch July 2, 2020 07:08

afharo mentioned this pull request Jul 2, 2020

[7.x] [Telemetry] Report data shippers (#64935) #70557

Merged

afharo mentioned this pull request Jul 13, 2020

[Telemetry] Data: Report dataset info only if there is known metadata #71419

Merged

2 tasks

afharo mentioned this pull request Jun 24, 2022

SAR: Add telemetry data for SAR usage elastic/elastic-serverless-forwarder#50

Closed

Conversation

afharo commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Manual tests to explain the behaviour

When Monitoring is ON

TODO:

Checklist

For maintainers

Uh oh!

afharo May 4, 2020

Choose a reason for hiding this comment

Uh oh!

TinaHeiligers May 5, 2020

Choose a reason for hiding this comment

Uh oh!

afharo May 6, 2020

Choose a reason for hiding this comment

Uh oh!

Bamieh May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afharo May 6, 2020

Choose a reason for hiding this comment

Uh oh!

TinaHeiligers May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kobelb May 6, 2020

Choose a reason for hiding this comment

Uh oh!

alexfrancoeur commented May 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afharo commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisronline left a comment

Choose a reason for hiding this comment

Uh oh!

igoristic May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

igoristic left a comment

Choose a reason for hiding this comment

Uh oh!

TinaHeiligers left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TinaHeiligers May 5, 2020

Choose a reason for hiding this comment

Uh oh!

afharo commented May 6, 2020

Uh oh!

TinaHeiligers commented Jun 18, 2020

Uh oh!

afharo commented Jun 23, 2020

Uh oh!

afharo Jun 23, 2020

Choose a reason for hiding this comment

Uh oh!

afharo commented Jun 25, 2020

Uh oh!

afharo commented Jul 1, 2020

Uh oh!

kibanamachine commented Jul 1, 2020

💚 Build Succeeded

Build metrics

History

Uh oh!

Reviewers

Assignees

Labels

Projects

afharo commented Apr 30, 2020 •

edited

Loading

Bamieh May 6, 2020 •

edited

Loading

TinaHeiligers May 6, 2020 •

edited

Loading

alexfrancoeur commented May 4, 2020 •

edited

Loading

afharo commented May 5, 2020 •

edited

Loading

igoristic May 5, 2020 •

edited

Loading

TinaHeiligers left a comment •

edited

Loading