V2: Elastic agent monitoring data stream names should not be per process #1814

cmacknz · 2022-11-28T22:42:11Z

The changes in #1702 changed the naming convention of the data streams used for sending agent process logs to Fleet to match how the processed are modelled in V2.

This has resulted in monitoring logs for spawned processes failing to be indexed in Elasticsearch when monitoring is enabled in Fleet. The root cause is that Fleet hard codes the list of expected monitoring data streams and uses them when adding index permissions into the agent policy. The result is that the agent has no permission to write to data streams using the new naming convention.

Example errors:

action [indices:admin/auto_create] is unauthorized for API key id [...] of user [elastic/fleet-server] on indices [metrics-elastic_agent.log_default-default], this action is granted by the index privileges [auto_configure,create_index,manage,all]

action [indices:admin/auto_create] is unauthorized for API key id [...] of user [elastic/fleet-server] on indices [metrics-elastic_agent.system/metrics_default-default], this action is granted by the index privileges [auto_configure,create_index,manage,all]

The existing monitoring data streams are defined in Fleet at https://github.com/elastic/kibana/blob/c5f20721e1879f1ebe161b0fa3b207f10ed2b6f7/x-pack/plugins/fleet/common/constants/agent_policy.ts#L15-L28

Monitoring permissions are generated here using this list: https://github.com/elastic/kibana/blob/19413b7daae983b95dbb9f5c7b39cb8f3578ebfa/x-pack/plugins/fleet/server/services/agent_policies/monitoring_permissions.ts#L21-L52

We need to either modify Fleet to generate index permissions for the new data stream names or we need to update the agent to use the index names expected in v1 (effectively they match the binary name).

Note that logs for the agent itself can successfully be sent to Fleet.

The text was updated successfully, but these errors were encountered:

kpollich · 2022-11-29T14:40:45Z

Could you confirm the list of new indices we'd need permissions for here? I'm happy to update the hardcoded list in Fleet. What I see so far from the error messages includes:

// Agent v2 datasets
'elastic_agent.filestream_monitoring',
'elastic_agent.log_default', // Why the `_default` here?
'elastic_agent.system',

It's probably helpful to look at Fleet's snapshot tests for these permissions as well. The full possible set of of permissions is

"logs-elastic_agent-testnamespace123",
"logs-elastic_agent.elastic_agent-testnamespace123",
"logs-elastic_agent.apm_server-testnamespace123",
"logs-elastic_agent.filebeat-testnamespace123",
"logs-elastic_agent.fleet_server-testnamespace123",
"logs-elastic_agent.metricbeat-testnamespace123",
"logs-elastic_agent.osquerybeat-testnamespace123",
"logs-elastic_agent.packetbeat-testnamespace123",
"logs-elastic_agent.endpoint_security-testnamespace123",
"logs-elastic_agent.auditbeat-testnamespace123",
"logs-elastic_agent.heartbeat-testnamespace123",
"logs-elastic_agent.cloudbeat-testnamespace123",
"metrics-elastic_agent-testnamespace123",
"metrics-elastic_agent.elastic_agent-testnamespace123",
"metrics-elastic_agent.apm_server-testnamespace123",
"metrics-elastic_agent.filebeat-testnamespace123",
"metrics-elastic_agent.fleet_server-testnamespace123",
"metrics-elastic_agent.metricbeat-testnamespace123",
"metrics-elastic_agent.osquerybeat-testnamespace123",
"metrics-elastic_agent.packetbeat-testnamespace123",
"metrics-elastic_agent.endpoint_security-testnamespace123",
"metrics-elastic_agent.auditbeat-testnamespace123",
"metrics-elastic_agent.heartbeat-testnamespace123",
"metrics-elastic_agent.cloudbeat-testnamespace123",

blakerouse · 2022-11-29T14:50:38Z

@kpollich Is it possible to use wildcards? As there is a lot of possibilities and the new data_streams are based on what is defined in the policy.

It is defined as:

elastic_agent.%{input_type}_%{output_name}.%{monitoring_namespace}

Also how is this handled on upgrade? Will it result in a new policy revision? As existing keys will need to be upgraded, as well as not removed the older data-streams as pre-8.6 will still push to those older data-streams.

kpollich · 2022-11-29T19:27:27Z

Is it possible to use wildcards? As there is a lot of possibilities and the new data_streams are based on what is defined in the policy.

Maybe? I don't really know how these permissions get handled once they're written to the agent policy. The Fleet code we're looking at here generates this chunk of an agent policy

output_permissions:
  default:
    _elastic_agent_monitoring:
      indices:
        - names:
            - logs-elastic_agent.apm_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.apm_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.auditbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.auditbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.cloudbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.cloudbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.elastic_agent-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.endpoint_security-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.endpoint_security-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.filebeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.filebeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.fleet_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.fleet_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.heartbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.heartbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.metricbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.metricbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.osquerybeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.osquerybeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.packetbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.packetbeat-default
          privileges:
            - auto_configure
            - create_doc

I think Fleet Server parses these permissions when it generates API keys for agent? If that's correct then wildcards should be supported as far as I know: https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-role.html#security-api-put-role-request-body

If I understand correctly, the reason we don't use wildcards today is a security precaution to prevent arbitrary access to Elasticsearch by agents. Whether that holds up in this case or not, I'm not sure. @joshdover can probably help out a bit here with some of the gaps in my knowledge.

Also how is this handled on upgrade? Will it result in a new policy revision? As existing keys will need to be upgraded, as well as not removed the older data-streams as pre-8.6 will still push to those older data-streams.

Yes we'd need to migrate all policies to the new version that includes any new permissions, which would generate a new revision.

cmacknz · 2022-11-29T22:03:59Z

Just adding another note here, in the Fleet agent details page there is a view for the logs that includes a way to filter by monitoring data stream. I think these are just prepared queries using the hard coded list of data streams that exist today. This would need to be adjusted in the UI as well. If you click through to the Kibana logs view via the "Open in Logs" link you can filter on whatever you want though.

cmacknz · 2022-11-30T03:29:31Z

I am increasingly convinced we should use the existing data streams for monitoring data in 8.6 and deal with migrating to the format we want to use afterwards in 8.7. The hard coded data streams will cause problems as we add new input types to monitor, I'm just not convinced there is a way to remove the hard coded names in the time we have left in 8.6

kpollich · 2022-11-30T12:23:46Z

+1 to @cmacknz suggestion - I think there are enough side effects to track down as a result of these hardcoded Fleet values that it'd be best if we could solve this on the agent side for 8.6, then iterate in 8.7.

joshdover · 2022-11-30T14:18:29Z

If I understand correctly, the reason we don't use wildcards today is a security precaution to prevent arbitrary access to Elasticsearch by agents. Whether that holds up in this case or not, I'm not sure. @joshdover can probably help out a bit here with some of the gaps in my knowledge.

Yes, in theory ES should accept wildcards in the index privilege list here and it should work. This would need to be tested though.

I think these are just prepared queries using the hard coded list of data streams that exist today. This would need to be adjusted in the UI as well. If you click through to the Kibana logs view via the "Open in Logs" link you can filter on whatever you want though.

I don't think that's the case. IIRC we do terms enum query on that data_stream.dataset field on logs-elastic_agent* to populate this list. Ingesting the data into a new data stream would be feasible. We also would need to update the elastic_agent integration package for all the new stream names. Ideally, we'd switch to using the dataset_is_prefix option that APM uses to have a wildcard on the index template naming scheme. This way we don't need to add new templates for every new agent process.

That said, I'm +1 to use the existing names if it all possible for this release.

blakerouse · 2022-11-30T14:43:22Z

I am happy to revert 8.6 only to use old names, but 8.7 needs to use the new names for the data streams.

Wildcarding is the best path forward, I believe the Elastic Agent being able to write-only to the logs-elastic_agent.* is all that we need. That allows the monitoring namespace to be change, new spawned components and new integrations without having to generate a new API key just to ingest monitoring logs from the Elastic Agent.

blakerouse · 2022-11-30T15:01:04Z

This UI element doesn't seem hard-coded as I have mine showing the new stream names when I have my elastic-agent injecting with superuser permissions.

blakerouse · 2022-11-30T15:17:34Z

Wildcard for metrics-elastic_agent.* will also be needed.

cmacknz · 2022-11-30T15:18:02Z

Wildcarding is the best path forward, I believe the Elastic Agent being able to write-only to the logs-elastic_agent.* is all that we need.

Agreed, let's fix this for 8.6 and then open a separate issue summarizing the changes needed for 8.7. I have this issue listed as a blocker for the 8.6 release so we'll want to close it once 8.6 is fixed.

blakerouse · 2022-11-30T15:18:49Z

Being this issue describes what is needed for 8.7, I would rather leave this one open and create a new one just for 8.6.

cmacknz · 2022-11-30T15:20:55Z

Makes sense, I created #1847 for 8.6 I'll update it as the tracking issue everywhere.

joshdover · 2022-12-08T10:00:42Z

While I agree that per-process data streams doesn't fit the long-term architecture we want to provide (where the processes are implementation details and not something the user should know or care about), I think we should discuss more about what the data streams should really be to provide a good experience, especially around applying features like ILM and index authorization.

IMO we should be following the pattern laid down by the data stream naming scheme in general, that is that the data stream names correlate to what kind of data is contained in the data stream rather than where the data came from. Of course we probably still want an elastic_agent.input field for filtering, but I doubt users will want to apply different data retention policies to filestream monitoring data vs. httpjson data.

Here's a quick rough idea of groupings that could make more sense from my perspective:

logs-elastic_agent.core
logs-elastic_agent.output
logs-elastic_agent.input
metrics-elastic_agent.core
metrics-elastic_agent.output
metrics-elastic_agent.input
metrics-elastic_agent.cpu
metrics-elastic_agent.memory
metrics-elastic_agent.network
metrics-elastic_agent.filesystem

This type of grouping would allow users to retain information like input and output metrics for longer periods of time than cpu and memory metrics of the agent. This may be useful for more general ingest performance tracking over time to help pinpoint data sources that may be causing a lot more noise or have seasonal patterns. cpu and memory don't have that same useful shelf-life IMO as they're more useful for debugging interactively something that you're trying to optimize.

joshdover · 2022-12-08T13:29:40Z

cc @nimarezainia I think you likely have useful info to weigh in here

blakerouse · 2022-12-12T14:40:34Z

@joshdover At the moment we map the data streams to the component.dataset. I like @joshdover idea on its probably not something customers will want that level of control over in the case of an ILM for a specific component. We can still offer the filtering in the log viewer but with the ILM mapping to similar roles listed above.

nimarezainia · 2022-12-21T00:55:42Z

+1 on this approach. CPU and Mem metrics don't need longevity and giving users the ability to apply other Elastic features like ILM to the data they collect is important.

Question:
As an example What grouping would "logs-elastic_agent.apm_server-testnamespace123" map to under this proposal?

joshdover · 2022-12-21T10:22:02Z

As an example What grouping would "logs-elastic_agent.apm_server-testnamespace123" map to under this proposal?

Those logs would go to logs-elastic_agent.input-testnamespace123 with a field like unit: apm-server for filtering.

jlind23 · 2023-01-05T06:42:59Z

@blakerouse @cmacknz What is the status here?
@joshdover will your PR fix all the problems mentioned here?

blakerouse · 2023-01-05T16:14:09Z

@jlind23 We just need to come to a 100% agreement on the index names we want to use. Then we need to make the change in the Elastic Agent as well as the permissions that are requested for the API key.

cmacknz · 2023-01-05T20:15:18Z

I think Josh's suggestions in #1814 (comment) are what we should go with, nobody seems opposed to it.

joshdover · 2023-01-09T13:52:02Z

@joshdover will your PR fix all the problems mentioned here?

My PR doesn't address this. Agree we should be unblocked to move forward. It will require changes across Agent, Fleet Server, and the elastic_agent package.

blakerouse · 2023-01-09T15:44:03Z

The more I think about this structure I do see an issue.

What does into the *.output data stream? Only the shipper logs? At the moment the beats do not place unit.id in its log message so the Elastic Agent could identify the difference between a log message for an input versus and output log.

Do we need to seperate the *.input and *.output events? Maybe we could just go with *.component and all component info go into that index?

cmacknz · 2023-01-09T16:32:07Z

Do we need to seperate the *.input and *.output events?

It seems like a useful separation to me, particularly for allowing output metrics to be separated and with a possibly different ILM policy than input metrics. The output performance metrics can be quite critical to diagnosing performance issues and tuning ingest performance.

I do agree that we are unlikely to use the logs-elastic_agent.output initially since we'd have to modify Beats to write to it. The first use of this will be the shipper in the near-ish future. I don't know that it is a problem for this to be empty initially.

blakerouse · 2023-01-09T16:41:58Z

Well being that beats cannot be divided easily the .input and .output I believe that calling it .input would be incorrect.

cmacknz · 2023-01-09T18:11:50Z

The Beats logs could likely be divided with enough effort but it's probably not worth doing. This input and output division problem is solved by the introduction of the shipper, so we could just not implement this until the shipper is ready.

We could go with components instead of the input and output division, but I wouldn't want to do that if it is just to work around a temporary state of the V2 architecture where the shipper isn't usable yet.

blakerouse · 2023-01-23T18:49:11Z

I like the idea of just placing them all under components. Do we really see ILM needed between a shipper and a component? To me they seem like the would almost always share the same lifecycle.

cmacknz · 2023-01-24T19:02:47Z

The output/shipper logs and metrics will have a much stronger correlation with overall ingest performance, so to me they will always be more important and it seems valuable to separate them out for that reason.

We could combine everything into a components index to start and then split the output index out later I suppose. That is definitely much easier to start and avoids having to deal with routing the Beat output metrics and logs to the right index initially.

cmacknz added bug Something isn't working V2-Architecture v8.6.0 labels Nov 28, 2022

cmacknz assigned blakerouse Nov 28, 2022

cmacknz mentioned this issue Nov 28, 2022

Capture stdout/stderr of spawned components #1702

Merged

5 tasks

pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Nov 29, 2022

cmacknz mentioned this issue Nov 29, 2022

[Fleet]: Filebeat and Metricbeat running agent logs are not available under Logs tab. #1823

Closed

kpollich mentioned this issue Nov 30, 2022

[Fleet]: Agent logs are not shown on Logs page for Linux, Mac and Windows agent elastic/kibana#146656

Closed

cmacknz mentioned this issue Nov 30, 2022

[8.6] Use v1 data stream names for log shipping #1845

Merged

2 tasks

cmacknz mentioned this issue Nov 30, 2022

(8.6) V2: Spawned process logs cannot be sent to fleet #1847

Closed

cmacknz added v8.7.0 and removed v8.6.0 labels Nov 30, 2022

This was referenced Nov 30, 2022

Elastic Agent should aggregate the log from his subprocesses #221

Closed

[Fleet]: No predefined filters available under Agent Logs tab. elastic/kibana#146361

Closed

jen-huang mentioned this issue Dec 1, 2022

[Fleet]: No elastic-agent.filebeat/metricbeat logs/metrics available under data streams tab. #1840

Closed

amolnater-qasource mentioned this issue Dec 5, 2022

${host.platform} != 'windows' conditional is not respected in agent policy #1697

Closed

kpollich mentioned this issue Dec 5, 2022

[APM & Fleet]: 'apm.log' logs NOT available after enabling Logs and metrics and restarting Kibana from Cloud deployment page. elastic/kibana#134559

Open

cmacknz changed the title ~~V2: Spawned process logs cannot be sent to fleet.~~ V2: Elastic agent monitoring data stream names should be per input, not per process Dec 7, 2022

cmacknz changed the title ~~V2: Elastic agent monitoring data stream names should be per input, not per process~~ V2: Elastic agent monitoring data stream names should be per input not per process Dec 7, 2022

cmacknz changed the title ~~V2: Elastic agent monitoring data stream names should be per input not per process~~ V2: Elastic agent monitoring data stream names should not be per process Dec 8, 2022

joshdover mentioned this issue Dec 23, 2022

Fix metrics destination index and dataset to match v1 #2004

Closed

6 tasks

jlind23 assigned cmacknz and unassigned blakerouse Jan 5, 2023

jlind23 removed the v8.7.0 label Jan 5, 2023

cmacknz removed their assignment Jan 5, 2023

cmacknz added the v8.8.0 label Jan 5, 2023

joshdover mentioned this issue Jan 10, 2023

[Stack Monitoring] Support for apm-server/beats Agent subprocesses elastic/kibana#144701

Closed

3 tasks

kcreddy mentioned this issue Jan 30, 2023

[Elastic Agent] Monitor Filebeat Inputs elastic/integrations#5077

Merged

4 tasks

cmacknz mentioned this issue Feb 28, 2023

Support monitoring the shipper from the Elastic Agent elastic/elastic-agent-shipper#267

Closed

cmacknz mentioned this issue Mar 23, 2023

Filestream monitoring processors used to prevent error loops are wrong #2388

Closed

2 tasks

jlind23 removed the v8.8.0 label Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2: Elastic agent monitoring data stream names should not be per process #1814

V2: Elastic agent monitoring data stream names should not be per process #1814

cmacknz commented Nov 28, 2022 •

edited

Loading

kpollich commented Nov 29, 2022

blakerouse commented Nov 29, 2022

kpollich commented Nov 29, 2022

cmacknz commented Nov 29, 2022 •

edited

Loading

cmacknz commented Nov 30, 2022

kpollich commented Nov 30, 2022

joshdover commented Nov 30, 2022

blakerouse commented Nov 30, 2022

blakerouse commented Nov 30, 2022

blakerouse commented Nov 30, 2022

cmacknz commented Nov 30, 2022

blakerouse commented Nov 30, 2022

cmacknz commented Nov 30, 2022

joshdover commented Dec 8, 2022

joshdover commented Dec 8, 2022

blakerouse commented Dec 12, 2022

nimarezainia commented Dec 21, 2022

joshdover commented Dec 21, 2022

jlind23 commented Jan 5, 2023

blakerouse commented Jan 5, 2023

cmacknz commented Jan 5, 2023

joshdover commented Jan 9, 2023

blakerouse commented Jan 9, 2023

cmacknz commented Jan 9, 2023

blakerouse commented Jan 9, 2023

cmacknz commented Jan 9, 2023 •

edited

Loading

blakerouse commented Jan 23, 2023

cmacknz commented Jan 24, 2023

V2: Elastic agent monitoring data stream names should not be per process #1814

V2: Elastic agent monitoring data stream names should not be per process #1814

Comments

cmacknz commented Nov 28, 2022 • edited Loading

kpollich commented Nov 29, 2022

blakerouse commented Nov 29, 2022

kpollich commented Nov 29, 2022

cmacknz commented Nov 29, 2022 • edited Loading

cmacknz commented Nov 30, 2022

kpollich commented Nov 30, 2022

joshdover commented Nov 30, 2022

blakerouse commented Nov 30, 2022

blakerouse commented Nov 30, 2022

blakerouse commented Nov 30, 2022

cmacknz commented Nov 30, 2022

blakerouse commented Nov 30, 2022

cmacknz commented Nov 30, 2022

joshdover commented Dec 8, 2022

joshdover commented Dec 8, 2022

blakerouse commented Dec 12, 2022

nimarezainia commented Dec 21, 2022

joshdover commented Dec 21, 2022

jlind23 commented Jan 5, 2023

blakerouse commented Jan 5, 2023

cmacknz commented Jan 5, 2023

joshdover commented Jan 9, 2023

blakerouse commented Jan 9, 2023

cmacknz commented Jan 9, 2023

blakerouse commented Jan 9, 2023

cmacknz commented Jan 9, 2023 • edited Loading

blakerouse commented Jan 23, 2023

cmacknz commented Jan 24, 2023

cmacknz commented Nov 28, 2022 •

edited

Loading

cmacknz commented Nov 29, 2022 •

edited

Loading

cmacknz commented Jan 9, 2023 •

edited

Loading