[Telemetry] Report data shippers#64935
Conversation
436f7a4 to
c51aa04
Compare
c51aa04 to
db8096b
Compare
| // GET _cluster/state/metadata/<index>?filter_path=metadata.indices.*.version | ||
| callCluster<ClusterState>('cluster.state', { | ||
| index, | ||
| metric: 'metadata', | ||
| filterPath: [ | ||
| // The payload is huge and we are only after the name (no other useful stuff so far) | ||
| 'metadata.indices.*.version', | ||
| // Does it have `ecs.version` in the mappings? | ||
| 'metadata.indices.*.mappings._doc.properties.ecs.properties.version.type', | ||
| ], | ||
| }), |
There was a problem hiding this comment.
There is a concern about using this API because its documentation says:
The response is an internal representation of the cluster state and its format may change from version to version. If possible, you should obtain any information from the cluster state using the other, more stable, cluster APIs.
I've added a functional test in test/api_integration/apis/telemetry/telemetry_local.js to make sure it works as expected. Although it brings up the risk of flaky tests in the future if the API changes the way it works.
There was a problem hiding this comment.
How much tech debt do we want to get ourselves into? If it is possible to use a more stable API, I think it's worth changing.
There was a problem hiding this comment.
I think the only alternative so far is to only use the GET <index>/_stats/ API. But that requires the kibana user to have a more permissive role.
Alternatively, there are some ongoing talks with the ES team to modify the same Cluster State API to provide the aggregated data all at once (meaning ES shouldn't drop support or make any changes in that API).
There was a problem hiding this comment.
Can we discuss with other teams the possiblity of adding those permissions to the kibana role instead of using this API? im not completely against using this API if we have no other way, but i think it would make more sense to add those extra permissions and use the _stats api.
Also if you check the compatiblity grid: https://github.com/elastic/kibana/#version-compatibility-with-elasticsearch
Are we sure this API does not change behavior across the compatiblity grid? Our tests will test exact ES matching version, but not other compatible versions where ES minor/patch versinos are newer or ES patch version is lower than kibana's.
There was a problem hiding this comment.
I'm not against updating the roles to be more permissive (that will allow us to consistently be able to retrieve the doc_count and size_in_bytes properties) but it involves some additional security concerns.
Not being able to retrieve the data from the cluster state API because of the compatibility grid will only result in not being able to provide this parameter in the telemetry (it will return it as {}) but it shouldn't break any other logic unless the API request itself throws any errors.
The warning in that API is about the format may change though, not the API behaviour as such.
N.B.: I just pushed a commit to catch the method and safely return {} if any of the API calls fail.
I think this approach is safer than opening the kibana_system role to be able to read from any index. But I'm happy to revisit this implementation if we think that's the way to go or any other approach (like ES providing this kind of info already embedded in the _cluster/stats API if they are willing the do the change and aggregation on their end).
There was a problem hiding this comment.
Opening up the kibana_system role is not ideal. Could we explore creating a telemetry_system role that has all the permissions needed?
cc @kobelb
There was a problem hiding this comment.
I don't think creating a telemetry_system roles would necessarily help us here... If we added the telemetry_system role to the kibana_system user we'd have an equivalent "threat profile". If we created a new telemetry_system user which had the telemetry_system role, it would make setup more complicated and have a slightly different "threat profile" as both user's credentials would be stored in the kibana.yml.
Augmenting the _cluster/stats API so we don't have to change the kibana_system privileges at all would be the safest option.
However, if we had to give the kibana_system role the monitor privilege on indices so we can use the index stats API, it's a lot safer than giving it access to read the documents themselves.
|
@afharo minor suggestion, and fine to leave as is if this blocks anything, but would it be possible to rename |
@alexfrancoeur absolutely, it's a minor change. The only reason I decided not to use But happy to change it you think it would make things easier :) |
chrisronline
left a comment
There was a problem hiding this comment.
LGTM from stack monitoring
| ) { | ||
| const responses = await Promise.all( | ||
| clusterUuids.map(async clusterUuid => { | ||
| // Should we take into consideration CCS? https://github.com/elastic/kibana/blob/3a396027f669803e1a3143237578973fb1ab20d0/x-pack/plugins/monitoring/server/routes/api/v1/elasticsearch/indices.js#L42 |
There was a problem hiding this comment.
That's a good question. Are there any repercussions if we do use CCS by default? (like will it be less efficient, or slower?). If not then my other concern is it would probably need to be conditional based on licensing (and maybe also if there are any config options tied with it). I think the terms are kind of confusing since we actually mean "Multi-stack monitoring" here, right?

Looks like it's only available for Gold license and above
However, I think CSS is available for all licenses:

Source: https://www.elastic.co/subscriptions
I think it's fine the way it is right now and can be added later if anything
igoristic
left a comment
There was a problem hiding this comment.
Looks good from Stack Monitoring pov (code and functionality) ✅
src/plugins/telemetry/server/telemetry_collection/ingest_solutions/constants.ts
Outdated
Show resolved
Hide resolved
src/plugins/telemetry/server/telemetry_collection/ingest_solutions/ingest_solutions.ts
Outdated
Show resolved
Hide resolved
| // GET _cluster/state/metadata/<index>?filter_path=metadata.indices.*.version | ||
| callCluster<ClusterState>('cluster.state', { | ||
| index, | ||
| metric: 'metadata', | ||
| filterPath: [ | ||
| // The payload is huge and we are only after the name (no other useful stuff so far) | ||
| 'metadata.indices.*.version', | ||
| // Does it have `ecs.version` in the mappings? | ||
| 'metadata.indices.*.mappings._doc.properties.ecs.properties.version.type', | ||
| ], | ||
| }), |
There was a problem hiding this comment.
How much tech debt do we want to get ourselves into? If it is possible to use a more stable API, I think it's worth changing.
|
@elasticmachine merge upstream |
|
@elasticmachine merge upstream |
…emetry/get_data_telemetry.test.ts Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
…emetry/get_data_telemetry.test.ts Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
|
@elasticmachine merge upstream |
| } | ||
|
|
||
| // Otherwise, try with the list of known index patterns | ||
| return DATA_DATASETS_INDEX_PATTERNS.find(({ pattern }) => { |
There was a problem hiding this comment.
After a conversation with @alexfrancoeur and @kobelb, I need to change this to a .filter
|
@elasticmachine merge upstream |
9395313 to
4803f96
Compare
|
@elasticmachine merge upstream |
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* master: (46 commits) [Visualize] Add missing advanced settings and custom label for pipeline aggs (elastic#69688) Use dynamic: false for config saved object mappings (elastic#70436) [Ingest Pipelines] Error messages (elastic#70167) [APM] Show transaction rate per minute on Observability Overview page (elastic#70336) Filter out error when calculating a label (elastic#69934) [Visualizations] Each visType returns its supported triggers (elastic#70177) [Telemetry] Report data shippers (elastic#64935) Reduce SavedObjects mappings for Application Usage (elastic#70475) [Lens] fix dimension label performance issues (elastic#69978) Skip failing endgame tests (elastic#70548) [SIEM] Reenabling Cypress tests (elastic#70397) [SIEM][Security Solution][Endpoint] Endpoint Artifact Manifest Management + Artifact Download and Distribution (elastic#67707) [Security] Adds field mapping support to rule creation (elastic#70288) SECURITY-ENDPOINT: add fields for events to metadata document (elastic#70491) Fixed assertion in hybrid index pattern test to iterate through indices (elastic#70130) [SIEM][Exceptions] - Exception builder component (elastic#67013) [Ingest Manager] Rename data sources to package configs (elastic#70259) skip suites blocking es snapshot promomotion (elastic#70532) [Metrics UI] Fix asynchronicity and error handling in Snapshot API (elastic#70503) fix export response (elastic#70473) ...
Summary
Closes #64790
Report if well-known data shippers are used to index documents in the cluster.
Tested on OSS, X-Pack and Monitoring collectors.
Manual tests to explain the behaviour
Start a clean cluster:
The expected payload is an empty array
stack_stats.data = []I installed
packetbeatin my machine and started it.stack_stats.datais now...For
packetbeatindices pre-7.0, we didn't have the_meta.beatinformation, so we'll report it underpattern_nameinstead. But we'll report theshipperproperty because we know that index pattern is strictly linked to that shipper. The index pattern is defined as{ pattern: 'packetbeat-*', patternName: 'packetbeat', shipper: 'packetbeat' }{ "pattern_name": "packetbeat", "shipper": "packetbeat", "index_count": 1, "ecs_index_count": 0, "doc_count": 0, "size_in_bytes": 208 }If I create a new index called
citrix-1234, matching the pattern*citrix*, the following is added to the array.{ "pattern_name": "citrix", "index_count": 1, "ecs_index_count": 0, "doc_count": 0, "size_in_bytes": 208 }For the pattern
{ pattern: '*logs*', patternName: 'third-party-logs' }, I create some indices containinglogsin their name:The following payload is added to the
dataarray:{ "pattern_name": "third-party-logs", "index_count": 3, "ecs_index_count": 0, "doc_count": 0, "size_in_bytes": 624 }If I create an index following the New Indexing Strategy
We read the values from the mappings and include the following object to the
dataarray:{ "dataset": { "name": "something", "type": "events" }, "shipper": "my-beat", "index_count": 1, "ecs_index_count": 1, "doc_count": 0, "size_in_bytes": 208 }Final object after all
{ "stack_stats": { "data": [ { "shipper": "packetbeat", "index_count": 1, "ecs_index_count": 1, "doc_count": 56686, "size_in_bytes": 29042232 }, { "pattern_name": "packetbeat", "shipper": "packetbeat", "index_count": 1, "ecs_index_count": 0, "doc_count": 0, "size_in_bytes": 208 }, { "dataset": { "name": "something", "type": "events" }, "shipper": "my-beat", "index_count": 1, "ecs_index_count": 1, "doc_count": 0, "size_in_bytes": 208 }, { "pattern_name": "citrix", "index_count": 1, "ecs_index_count": 0, "doc_count": 0, "size_in_bytes": 208 }, { "pattern_name": "third-party-logs", "index_count": 3, "ecs_index_count": 0, "doc_count": 0, "size_in_bytes": 624 } ], "kibana": "..." } }When Monitoring is ON
With the currently limited information we can retrieve in Monitoring, the reported payload in the same scenario would be:
So I've removed the collection of this information when using Monitoring until we find a way to accurately retrieve it (#68998).
TODO:
kibana_system(Add monitor and view_index_metadata to built-inkibana_systemrole elasticsearch#57755)*bro*).ecs.versiondoc_countnor thesize. I've got confirmation that that's OK if those 2 fields are optional.Adding thev7.8.0label as tentative only.Checklist
Delete any items that are not applicable to this PR.
For maintainers