-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2: Elastic agent monitoring data stream names should not be per process #1814
Comments
Could you confirm the list of new indices we'd need permissions for here? I'm happy to update the hardcoded list in Fleet. What I see so far from the error messages includes:
It's probably helpful to look at Fleet's snapshot tests for these permissions as well. The full possible set of of permissions is
|
@kpollich Is it possible to use wildcards? As there is a lot of possibilities and the new data_streams are based on what is defined in the policy. It is defined as:
Also how is this handled on upgrade? Will it result in a new policy revision? As existing keys will need to be upgraded, as well as not removed the older data-streams as pre-8.6 will still push to those older data-streams. |
Maybe? I don't really know how these permissions get handled once they're written to the agent policy. The Fleet code we're looking at here generates this chunk of an agent policy
I think Fleet Server parses these permissions when it generates API keys for agent? If that's correct then wildcards should be supported as far as I know: https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-role.html#security-api-put-role-request-body If I understand correctly, the reason we don't use wildcards today is a security precaution to prevent arbitrary access to Elasticsearch by agents. Whether that holds up in this case or not, I'm not sure. @joshdover can probably help out a bit here with some of the gaps in my knowledge.
Yes we'd need to migrate all policies to the new version that includes any new permissions, which would generate a new revision. |
Just adding another note here, in the Fleet agent details page there is a view for the logs that includes a way to filter by monitoring data stream. I think these are just prepared queries using the hard coded list of data streams that exist today. This would need to be adjusted in the UI as well. If you click through to the Kibana logs view via the "Open in Logs" link you can filter on whatever you want though. |
I am increasingly convinced we should use the existing data streams for monitoring data in 8.6 and deal with migrating to the format we want to use afterwards in 8.7. The hard coded data streams will cause problems as we add new input types to monitor, I'm just not convinced there is a way to remove the hard coded names in the time we have left in 8.6 |
+1 to @cmacknz suggestion - I think there are enough side effects to track down as a result of these hardcoded Fleet values that it'd be best if we could solve this on the agent side for 8.6, then iterate in 8.7. |
Yes, in theory ES should accept wildcards in the index privilege list here and it should work. This would need to be tested though.
I don't think that's the case. IIRC we do terms enum query on that data_stream.dataset field on That said, I'm +1 to use the existing names if it all possible for this release. |
I am happy to revert 8.6 only to use old names, but 8.7 needs to use the new names for the data streams. Wildcarding is the best path forward, I believe the Elastic Agent being able to write-only to the |
Wildcard for |
Agreed, let's fix this for 8.6 and then open a separate issue summarizing the changes needed for 8.7. I have this issue listed as a blocker for the 8.6 release so we'll want to close it once 8.6 is fixed. |
Being this issue describes what is needed for 8.7, I would rather leave this one open and create a new one just for 8.6. |
Makes sense, I created #1847 for 8.6 I'll update it as the tracking issue everywhere. |
While I agree that per-process data streams doesn't fit the long-term architecture we want to provide (where the processes are implementation details and not something the user should know or care about), I think we should discuss more about what the data streams should really be to provide a good experience, especially around applying features like ILM and index authorization. IMO we should be following the pattern laid down by the data stream naming scheme in general, that is that the data stream names correlate to what kind of data is contained in the data stream rather than where the data came from. Of course we probably still want an Here's a quick rough idea of groupings that could make more sense from my perspective:
This type of grouping would allow users to retain information like input and output metrics for longer periods of time than cpu and memory metrics of the agent. This may be useful for more general ingest performance tracking over time to help pinpoint data sources that may be causing a lot more noise or have seasonal patterns. cpu and memory don't have that same useful shelf-life IMO as they're more useful for debugging interactively something that you're trying to optimize. |
cc @nimarezainia I think you likely have useful info to weigh in here |
@joshdover At the moment we map the data streams to the |
+1 on this approach. CPU and Mem metrics don't need longevity and giving users the ability to apply other Elastic features like ILM to the data they collect is important. Question: |
Those logs would go to |
@blakerouse @cmacknz What is the status here? |
@jlind23 We just need to come to a 100% agreement on the index names we want to use. Then we need to make the change in the Elastic Agent as well as the permissions that are requested for the API key. |
I think Josh's suggestions in #1814 (comment) are what we should go with, nobody seems opposed to it. |
My PR doesn't address this. Agree we should be unblocked to move forward. It will require changes across Agent, Fleet Server, and the elastic_agent package. |
The more I think about this structure I do see an issue. What does into the Do we need to seperate the |
It seems like a useful separation to me, particularly for allowing output metrics to be separated and with a possibly different ILM policy than input metrics. The output performance metrics can be quite critical to diagnosing performance issues and tuning ingest performance. I do agree that we are unlikely to use the |
Well being that beats cannot be divided easily the |
The Beats logs could likely be divided with enough effort but it's probably not worth doing. This input and output division problem is solved by the introduction of the shipper, so we could just not implement this until the shipper is ready. We could go with |
I like the idea of just placing them all under |
The output/shipper logs and metrics will have a much stronger correlation with overall ingest performance, so to me they will always be more important and it seems valuable to separate them out for that reason. We could combine everything into a components index to start and then split the output index out later I suppose. That is definitely much easier to start and avoids having to deal with routing the Beat output metrics and logs to the right index initially. |
The changes in #1702 changed the naming convention of the data streams used for sending agent process logs to Fleet to match how the processed are modelled in V2.
This has resulted in monitoring logs for spawned processes failing to be indexed in Elasticsearch when monitoring is enabled in Fleet. The root cause is that Fleet hard codes the list of expected monitoring data streams and uses them when adding index permissions into the agent policy. The result is that the agent has no permission to write to data streams using the new naming convention.
Example errors:
The existing monitoring data streams are defined in Fleet at https://github.com/elastic/kibana/blob/c5f20721e1879f1ebe161b0fa3b207f10ed2b6f7/x-pack/plugins/fleet/common/constants/agent_policy.ts#L15-L28
Monitoring permissions are generated here using this list: https://github.com/elastic/kibana/blob/19413b7daae983b95dbb9f5c7b39cb8f3578ebfa/x-pack/plugins/fleet/server/services/agent_policies/monitoring_permissions.ts#L21-L52
We need to either modify Fleet to generate index permissions for the new data stream names or we need to update the agent to use the index names expected in v1 (effectively they match the binary name).
Note that logs for the agent itself can successfully be sent to Fleet.
The text was updated successfully, but these errors were encountered: