-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selective monitoring of Kubernetes workloads using annotations for agent standalone #613
Comments
We had an initial chat about this with @gizas and in order to start pushing this forward we first need to decide on the approach we will follow to make this feature available to our users. Keep in mind that this feature requires cross team effort from Cloudnative, Agent and Fleet UI teams. The implementation can happen from one or more teams accordingly but first we need to have a solid plan about the way forward. Once we have decided the what and the how we can create smaller, detail specific issues for the various parts of the feature's implementation. Here is a first outline of the current status of the issue and a high level proposal for its implementation across the "stack". Hint based autodiscovery in standalone AgentRelated issues from the past[1] Hints based autodiscovery Current StatusWhat we have so far?There is available the implementation to watch for Pods and collect metadata for them. What we needWe need a mechanism to receive those hints and match them with inputs/integrations. This mechanism can either "live" in Agent to cover standalone mode or in Fleet UI OR in an external component like an additional "operator". In this proposal we only focus on standalone mode. Standalone modeThe Agent needs to know how packages/inputs look like in order to match them with the hints. One way to achieve this is by using input templates. This would make the flow quite similar to Beats. Feature design proposal (technical/high-level)As it was already mentioned, Agent can identify Pods (and other workloads) in Kubernetes and communicate this information to Agent's controller so as to enable and populate inputs' configuration accordingly.
How templates are createdOne important "detail" that needs to be discussed is how the templates would become available in Elastic Agent locally. At elastic/beats#24054 (comment) (Part 2) it was proposed to ship the templates along with the Agent binary which implies that we would include them in the build/packaging stage after retrieving them from the Package Registry. This could work but we come with an easier solution (kudos to @gizas :D) which proposes to leave this work to our users and give them the option to download the templates from Fleet UI in the same was they do for the standalone policy. So Fleet UI can create a User journeyHaving described the high level of the technical part we need to also describe how users will us the feature. Here is an example user journey:
** The UX details/wording etc can be improved but this is the high level idea. Proposed implementation issues1.
|
The nice part about the proposed solution is that step 1 and 2 can happen fully independent from Fleet. Also having support for input templates in Elastic Agent means it does not only support templates coming from Fleet but also users could build their own if needed. Also if later on we have an orchestrator solution, the templates will still be useful. We should consider to bundle some templates like An additional argument for this approach is that it does not only work with k8s but also any other standalone setup. |
As pointed out in elastic/beats#24054 (comment), looking at the described workflow suggested in the comment. Concerning the download from the UI we could instead make the agent policy bigger with the template embedded in it, something like this. inputs:
- type: filestream
- type: tcp
# Instead of looking up on disk we can look at this instead, which could be a VFS representation of the structure on disk.
template.d:
nginx/1.8:
....
The user only have one thing to download and deploy, the yaml is bigger but this reduce the risk from having mistmatch. |
Thanks for your reviews @ruflin and @ph :) Heads-up on this. After the initial feedback and some more brainstorming around this, it seems that for Hints' feature implementation we don't necessarily need the template support. The rational behind a such solution is that we can just re-use the k8s provider to emit specific mappings (hints specific ones) that would then be capable to populate the proper “templates” produced by the Fleet UI with the proper conventions. Let me put some examples together to illustrate the point: Updated proposal"Templates" formatWe take for granted that templates are constructed by Fleet UI, using the available packages' definitions, and are placed in the For simplicity we can add only the latest packages in the list of templates but we can simply extend this while using the proper conditions accordingly. Find below a "template sample": inputs:
...
- name: templates.d/redis/0.3.6
type: redis/metrics
use_output: default
# condition: "${docker.hints.redis.version|kubernetes.hints.redis.version == "0.3.6"}"
meta:
package:
name: redis
version: 0.3.6
data_stream:
namespace: default
streams:
- data_stream:
dataset: redis.info
type: metrics
metricsets:
- info
hosts:
- "${docker.hints.redis.info.host|kubernetes.hints.redis.info.host|'127.0.0.1:6379'}"
idle_timeout: 20s
maxconn: 10
network: tcp
period: "${docker.hints.redis.info.period|kubernetes.hints.redis.info.period|'10s'}"
condition: "${docker.hints.redis.info.enabled|kubernetes.hints.redis.info.enabled|true}"
- data_stream:
dataset: redis.key
type: metrics
metricsets:
- key
hosts:
- "${docker.hints.redis.key.host|kubernetes.hints.redis.key.host|'127.0.0.1:6379'}"
idle_timeout: 20s
key.patterns:
- limit: 20
pattern: '*'
maxconn: 10
network: tcp
period: "${docker.hints.redis.key.period|kubernetes.hints.redis.key.period|'10s'}"
condition: "${docker.hints.redis.key.enabled|kubernetes.hints.redis.key.enabled|false}" For now we can follow a simple approach and support only a few "hints" similarly to what we do in Metricbeat/Filebeat. In the above example I only leverage "period" and "host" hints but we can extend it as much as we think. Note that the conditions per data_stream are suffixed with an Hints' specific mappingsHaving defined the "templates" as above we can now leverage the So in the kubernetes provider we can check for specific annotations like A Redis Pod would be annotated like this: annotations:
co.elastic.hints/package: redis
co.elastic.hints/data_streams: info, key
co.elastic.hints/host: '${kubernetes.pod.ip}:6379'
co.elastic.hints/period: 1m Note that the settings can also be data_stream specific if users want it: annotations:
co.elastic.hints/package: redis
co.elastic.hints/data_streams: info, key
co.elastic.hints/host: '${kubernetes.pod.ip}:6379'
co.elastic.hints/info.period: 1m
co.elastic.hints/key.period: 10m If settings are not data_stream specific then common settings will be used across the data_streams. With the above annotations the {
"kubernetes": {
"hints": {
"redis": {
"info": {
"enabled": true,
"period": "1m",
"host": "152.10.67.976:6379"
},
"key": {
"enabled": true,
"period": "10m",
"host": "152.10.67.976:6379"
}
}
}
}
} Note that Having the above mapping we can emit this at
The rendered input configuration will be the following: inputs:
...
- name: templates.d/redis/0.3.6
type: redis/metrics
use_output: default
meta:
package:
name: redis
version: 0.3.6
data_stream:
namespace: default
streams:
- data_stream:
dataset: redis.info
type: metrics
metricsets:
- info
hosts:
- "152.10.67.976:6379"
idle_timeout: 20s
maxconn: 10
network: tcp
period: 1m
- data_stream:
dataset: redis.key
type: metrics
metricsets:
- key
hosts:
- "152.10.67.976:6379"
idle_timeout: 20s
key.patterns:
- limit: 20
pattern: '*'
maxconn: 10
network: tcp
period: 10m Additional considerations
Since we are considering the templates' construction in Fleet UI, @jen-huang could you chime in and provide feedback on this and more specifically on |
@kpollich If you want to provide feed from the Fleet side. |
I like that the above approach is allowing us to use mostly existing mechanism. To make it as easy as possible for users to update their "redis" template for example, it would be great to have an @ph I know we have discussed |
@ruflin as far as I can see This seems to be more structured but compared to @ph 's suggestion to include everything in However could you elaborate more on how a user journey would look like in that case? I guess users would need to download the templates (in batch) from Fleet UI in a second step after the standalone policy and extract them into the Something that we miss so far is that in Kubernetes everything can be quite packed before deployment time. For example in order to deploy Elastic Agent (standalone) in k8s users would use https://github.com/elastic/elastic-agent/blob/main/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml following the docs, or they can use the recently added UX added in Fleet UI with elastic/kibana#114439. For us this would mean that we have several options here:
My approach here would be that https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-standalone manifests are shipped with a dummy Having said this I see the following implementation parts:
|
You have a good point here. It is really neat that we can offer the user a single yml file to download and it just works. For the Elastic Agent configs, I started to think about separating output configs from input configs. Output config in the standalone case rarely change, inputs much more often and are likely managed by the user eventually in Git or similar. Could still have a single yaml but with multiple sections inside? Have some questions around this, will ping you directly. |
@ChrsMark looking at the user journey, how do we satisfy the two items below?
|
Quick answer is:
Beyond this we can go quite further and maybe provide some extra troubleshooting functionality with However I think these should not affect the base implementation we are discussing here since are mostly enhancements on top of the core implementation? |
@ruflin Yes we do have support for We can support both worklow without too much trouble the IAC and a more click and get. @ChrsMark I am +1 for the proposal above. Concerning 2, we might want to cleanup the log to make sure we reduce the noise to problem ratio. |
Just wanted to chime in from the Fleet UI side of things and make sure I'm understanding correctly. At first glance this looks reasonable. So, Fleet UI would be responsible for generating a Then, the user will be responsible for manually managing the For context, here's an example standalone manifest generated today by Fleet for an integration policy with Show full YML blockapiVersion: v1
kind: ConfigMap
metadata:
name: agent-node-datastreams
namespace: kube-system
labels:
k8s-app: elastic-agent
data:
agent.yml: |-
id: d3a50f00-f64c-11ec-82c5-8b396fc6520e
outputs:
default:
type: elasticsearch
hosts:
- 'http://192.168.65.2:9200'
username: '{ES_USERNAME}'
password: '{ES_PASSWORD}'
inputs:
- id: logfile-system-e7822224-fa5c-4291-acd8-49b69e4a44ac
revision: 1
name: system-6
type: logfile
data_stream:
namespace: default
use_output: default
streams:
- id: logfile-system.auth-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: logs
dataset: system.auth
paths:
- /var/log/auth.log*
- /var/log/secure*
exclude_files:
- .gz$
multiline:
pattern: ^\s
match: after
processors:
- add_locale: null
- id: logfile-system.syslog-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: logs
dataset: system.syslog
paths:
- /var/log/messages*
- /var/log/syslog*
exclude_files:
- .gz$
multiline:
pattern: ^\s
match: after
processors:
- add_locale: null
meta:
package:
name: system
version: 1.16.2
- id: winlog-system-e7822224-fa5c-4291-acd8-49b69e4a44ac
revision: 1
name: system-6
type: winlog
data_stream:
namespace: default
use_output: default
streams:
- id: winlog-system.application-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: logs
dataset: system.application
name: Application
condition: '${host.platform} == ''windows'''
ignore_older: 72h
- id: winlog-system.security-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: logs
dataset: system.security
name: Security
condition: '${host.platform} == ''windows'''
ignore_older: 72h
- id: winlog-system.system-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: logs
dataset: system.system
name: System
condition: '${host.platform} == ''windows'''
ignore_older: 72h
meta:
package:
name: system
version: 1.16.2
- id: system/metrics-system-e7822224-fa5c-4291-acd8-49b69e4a44ac
revision: 1
name: system-6
type: system/metrics
data_stream:
namespace: default
use_output: default
streams:
- id: system/metrics-system.fsstat-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: metrics
dataset: system.fsstat
metricsets:
- fsstat
period: 1m
processors:
- drop_event.when.regexp:
system.fsstat.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
- id: >-
system/metrics-system.filesystem-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: metrics
dataset: system.filesystem
metricsets:
- filesystem
period: 1m
processors:
- drop_event.when.regexp:
system.filesystem.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
- id: system/metrics-system.diskio-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: metrics
dataset: system.diskio
metricsets:
- diskio
diskio.include_devices: null
period: 10s
- id: system/metrics-system.cpu-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: metrics
dataset: system.cpu
metricsets:
- cpu
cpu.metrics:
- percentages
- normalized_percentages
period: 10s
- id: system/metrics-system.process-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: metrics
dataset: system.process
metricsets:
- process
period: 10s
process.include_top_n.by_cpu: 5
process.include_top_n.by_memory: 5
process.cmdline.cache.enabled: true
process.cgroups.enabled: false
process.include_cpu_ticks: false
processes:
- .*
- id: system/metrics-system.memory-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: metrics
dataset: system.memory
metricsets:
- memory
period: 10s
- id: system/metrics-system.network-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: metrics
dataset: system.network
metricsets:
- network
period: 10s
network.interfaces: null
- id: >-
system/metrics-system.socket_summary-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: metrics
dataset: system.socket_summary
metricsets:
- socket_summary
period: 10s
- id: >-
system/metrics-system.process.summary-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: metrics
dataset: system.process.summary
metricsets:
- process_summary
period: 10s
- id: system/metrics-system.load-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: metrics
dataset: system.load
metricsets:
- load
condition: '${host.platform} != ''windows'''
period: 10s
- id: system/metrics-system.uptime-e7822224-fa5c-4291-acd8-49b69e4a44ac
data_stream:
type: metrics
dataset: system.uptime
metricsets:
- uptime
period: 10s
meta:
package:
name: system
version: 1.16.2
- id: kubernetes/metrics-kubelet-1e00917a-19e7-4c02-8050-c21ba0578e27
revision: 1
name: kubernetes-1
type: kubernetes/metrics
data_stream:
namespace: default
use_output: default
streams:
- id: >-
kubernetes/metrics-kubernetes.container-1e00917a-19e7-4c02-8050-c21ba0578e27
data_stream:
type: metrics
dataset: kubernetes.container
metricsets:
- container
add_metadata: true
hosts:
- 'https://${env.NODE_NAME}:10250'
period: 10s
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
ssl.verification_mode: none
- id: >-
kubernetes/metrics-kubernetes.node-1e00917a-19e7-4c02-8050-c21ba0578e27
data_stream:
type: metrics
dataset: kubernetes.node
metricsets:
- node
add_metadata: true
hosts:
- 'https://${env.NODE_NAME}:10250'
period: 10s
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
ssl.verification_mode: none
- id: >-
kubernetes/metrics-kubernetes.pod-1e00917a-19e7-4c02-8050-c21ba0578e27
data_stream:
type: metrics
dataset: kubernetes.pod
metricsets:
- pod
add_metadata: true
hosts:
- 'https://${env.NODE_NAME}:10250'
period: 10s
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
ssl.verification_mode: none
- id: >-
kubernetes/metrics-kubernetes.system-1e00917a-19e7-4c02-8050-c21ba0578e27
data_stream:
type: metrics
dataset: kubernetes.system
metricsets:
- system
add_metadata: true
hosts:
- 'https://${env.NODE_NAME}:10250'
period: 10s
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
ssl.verification_mode: none
- id: >-
kubernetes/metrics-kubernetes.volume-1e00917a-19e7-4c02-8050-c21ba0578e27
data_stream:
type: metrics
dataset: kubernetes.volume
metricsets:
- volume
add_metadata: true
hosts:
- 'https://${env.NODE_NAME}:10250'
period: 10s
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
ssl.verification_mode: none
meta:
package:
name: kubernetes
version: 1.21.1
- id: kubernetes/metrics-kube-apiserver-1e00917a-19e7-4c02-8050-c21ba0578e27
revision: 1
name: kubernetes-1
type: kubernetes/metrics
data_stream:
namespace: default
use_output: default
streams:
- id: >-
kubernetes/metrics-kubernetes.apiserver-1e00917a-19e7-4c02-8050-c21ba0578e27
data_stream:
type: metrics
dataset: kubernetes.apiserver
metricsets:
- apiserver
hosts:
- >-
https://${env.KUBERNETES_SERVICE_HOST}:${env.KUBERNETES_SERVICE_PORT}
period: 30s
condition: '${kubernetes_leaderelection.leader} == true'
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
ssl.certificate_authorities:
- /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
meta:
package:
name: kubernetes
version: 1.21.1
- id: kubernetes/metrics-kube-proxy-1e00917a-19e7-4c02-8050-c21ba0578e27
revision: 1
name: kubernetes-1
type: kubernetes/metrics
data_stream:
namespace: default
use_output: default
streams:
- id: >-
kubernetes/metrics-kubernetes.proxy-1e00917a-19e7-4c02-8050-c21ba0578e27
data_stream:
type: metrics
dataset: kubernetes.proxy
metricsets:
- proxy
hosts:
- 'localhost:10249'
period: 10s
meta:
package:
name: kubernetes
version: 1.21.1
- id: kubernetes/metrics-events-1e00917a-19e7-4c02-8050-c21ba0578e27
revision: 1
name: kubernetes-1
type: kubernetes/metrics
data_stream:
namespace: default
use_output: default
streams:
- id: >-
kubernetes/metrics-kubernetes.event-1e00917a-19e7-4c02-8050-c21ba0578e27
data_stream:
type: metrics
dataset: kubernetes.event
metricsets:
- event
period: 10s
add_metadata: true
skip_older: true
condition: '${kubernetes_leaderelection.leader} == true'
meta:
package:
name: kubernetes
version: 1.21.1
- id: filestream-container-logs-1e00917a-19e7-4c02-8050-c21ba0578e27
revision: 1
name: kubernetes-1
type: filestream
data_stream:
namespace: default
use_output: default
streams:
- id: >-
filestream-kubernetes.container_logs-1e00917a-19e7-4c02-8050-c21ba0578e27
data_stream:
type: logs
dataset: kubernetes.container_logs
paths:
- '/var/log/containers/*${kubernetes.container.id}.log'
prospector.scanner.symlinks: true
parsers:
- container:
stream: all
format: auto
meta:
package:
name: kubernetes
version: 1.21.1
- id: logfile-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
revision: 1
name: redis-1
type: logfile
data_stream:
namespace: default
use_output: default
streams:
- id: logfile-redis.log-fc56b140-3154-4907-9eb3-991ea4ab6569
data_stream:
type: logs
dataset: redis.log
paths:
- /var/log/redis/redis-server.log*
- /var/log/redis/my-cool-redis.log*
tags:
- redis-log
exclude_files:
- .gz$
exclude_lines:
- '^\s+[\-`(''.|_]'
meta:
package:
name: redis
version: 1.3.1
- id: redis-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
revision: 1
name: redis-1
type: redis
data_stream:
namespace: default
use_output: default
streams:
- id: redis-redis.slowlog-fc56b140-3154-4907-9eb3-991ea4ab6569
data_stream:
type: logs
dataset: redis.slowlog
hosts:
- '127.0.0.1:6379'
- 'https://my.redis:6379'
password: ''
meta:
package:
name: redis
version: 1.3.1
- id: redis/metrics-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
revision: 1
name: redis-1
type: redis/metrics
data_stream:
namespace: default
use_output: default
streams:
- id: redis/metrics-redis.info-fc56b140-3154-4907-9eb3-991ea4ab6569
data_stream:
type: metrics
dataset: redis.info
metricsets:
- info
hosts:
- '127.0.0.1:6379'
- 'https://my.redis:6379'
idle_timeout: 20s
maxconn: 10
network: tcp
period: 10s
- id: redis/metrics-redis.key-fc56b140-3154-4907-9eb3-991ea4ab6569
data_stream:
type: metrics
dataset: redis.key
metricsets:
- key
hosts:
- '127.0.0.1:6379'
- 'https://my.redis:6379'
idle_timeout: 20s
key.patterns:
- limit: 20
pattern: '*'
maxconn: 10
network: tcp
period: 10s
- id: redis/metrics-redis.keyspace-fc56b140-3154-4907-9eb3-991ea4ab6569
data_stream:
type: metrics
dataset: redis.keyspace
metricsets:
- keyspace
hosts:
- '127.0.0.1:6379'
- 'https://my.redis:6379'
idle_timeout: 20s
maxconn: 10
network: tcp
period: 10s
meta:
package:
name: redis
version: 1.3.1
revision: 3
agent:
monitoring:
namespace: default
use_output: default
enabled: true
logs: true
metrics: true
output_permissions:
default:
_elastic_agent_monitoring:
indices:
- names:
- logs-elastic_agent.apm_server-default
privileges: &ref_0
- auto_configure
- create_doc
- names:
- metrics-elastic_agent.apm_server-default
privileges: *ref_0
- names:
- logs-elastic_agent.auditbeat-default
privileges: *ref_0
- names:
- metrics-elastic_agent.cloudbeat-default
privileges: *ref_0
- names:
- metrics-elastic_agent.elastic_agent-default
privileges: *ref_0
- names:
- logs-elastic_agent.cloudbeat-default
privileges: *ref_0
- names:
- metrics-elastic_agent.endpoint_security-default
privileges: *ref_0
- names:
- logs-elastic_agent-default
privileges: *ref_0
- names:
- metrics-elastic_agent.auditbeat-default
privileges: *ref_0
- names:
- logs-elastic_agent.endpoint_security-default
privileges: *ref_0
- names:
- logs-elastic_agent.filebeat-default
privileges: *ref_0
- names:
- metrics-elastic_agent.filebeat-default
privileges: *ref_0
- names:
- logs-elastic_agent.fleet_server-default
privileges: *ref_0
- names:
- logs-elastic_agent.heartbeat-default
privileges: *ref_0
- names:
- metrics-elastic_agent.fleet_server-default
privileges: *ref_0
- names:
- logs-elastic_agent.metricbeat-default
privileges: *ref_0
- names:
- metrics-elastic_agent.metricbeat-default
privileges: *ref_0
- names:
- metrics-elastic_agent.heartbeat-default
privileges: *ref_0
- names:
- logs-elastic_agent.osquerybeat-default
privileges: *ref_0
- names:
- logs-elastic_agent.packetbeat-default
privileges: *ref_0
- names:
- metrics-elastic_agent.osquerybeat-default
privileges: *ref_0
- names:
- metrics-elastic_agent.packetbeat-default
privileges: *ref_0
_elastic_agent_checks:
cluster:
- monitor
system-6:
indices:
- names:
- logs-system.auth-default
privileges: *ref_0
- names:
- logs-system.syslog-default
privileges: *ref_0
- names:
- logs-system.application-default
privileges: *ref_0
- names:
- logs-system.security-default
privileges: *ref_0
- names:
- logs-system.system-default
privileges: *ref_0
- names:
- metrics-system.fsstat-default
privileges: *ref_0
- names:
- metrics-system.filesystem-default
privileges: *ref_0
- names:
- metrics-system.diskio-default
privileges: *ref_0
- names:
- metrics-system.cpu-default
privileges: *ref_0
- names:
- metrics-system.process-default
privileges: *ref_0
- names:
- metrics-system.memory-default
privileges: *ref_0
- names:
- metrics-system.network-default
privileges: *ref_0
- names:
- metrics-system.socket_summary-default
privileges: *ref_0
- names:
- metrics-system.process.summary-default
privileges: *ref_0
- names:
- metrics-system.load-default
privileges: *ref_0
- names:
- metrics-system.uptime-default
privileges: *ref_0
kubernetes-1:
indices:
- names:
- metrics-kubernetes.container-default
privileges: *ref_0
- names:
- metrics-kubernetes.node-default
privileges: *ref_0
- names:
- metrics-kubernetes.pod-default
privileges: *ref_0
- names:
- metrics-kubernetes.system-default
privileges: *ref_0
- names:
- metrics-kubernetes.volume-default
privileges: *ref_0
- names:
- metrics-kubernetes.apiserver-default
privileges: *ref_0
- names:
- metrics-kubernetes.proxy-default
privileges: *ref_0
- names:
- metrics-kubernetes.event-default
privileges: *ref_0
- names:
- logs-kubernetes.container_logs-default
privileges: *ref_0
redis-1:
indices:
- names:
- logs-redis.log-default
privileges: *ref_0
- names:
- logs-redis.slowlog-default
privileges: *ref_0
- names:
- metrics-redis.info-default
privileges: *ref_0
- names:
- metrics-redis.key-default
privileges: *ref_0
- names:
- metrics-redis.keyspace-default
privileges: *ref_0
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: elastic-agent
namespace: kube-system
labels:
app: elastic-agent
spec:
selector:
matchLabels:
app: elastic-agent
template:
metadata:
labels:
app: elastic-agent
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
serviceAccountName: elastic-agent
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: elastic-agent
image: docker.elastic.co/beats/elastic-agent:8.4.0
args: [
"-c", "/etc/agent.yml",
"-e",
"-d", "'*'",
]
env:
- name: ES_USERNAME
value: "elastic"
- name: ES_PASSWORD
value: "changeme"
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
securityContext:
runAsUser: 0
resources:
limits:
memory: 500Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: datastreams
mountPath: /etc/agent.yml
readOnly: true
subPath: agent.yml
- name: proc
mountPath: /hostfs/proc
readOnly: true
- name: etc-kubernetes
mountPath: /hostfs/etc/kubernetes
- name: var-lib
mountPath: /hostfs/var/lib
readOnly: true
- name: cgroup
mountPath: /hostfs/sys/fs/cgroup
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
- name: passwd
mountPath: /hostfs/etc/passwd
readOnly: true
- name: group
mountPath: /hostfs/etc/group
readOnly: true
- name: systemd
mountPath: /hostfs/etc/systemd
readOnly: true
volumes:
- name: datastreams
configMap:
defaultMode: 0640
name: agent-node-datastreams
- name: proc
hostPath:
path: /proc
- name: etc-kubernetes
hostPath:
path: /etc/kubernetes
- name: var-lib
hostPath:
path: /var/lib
- name: passwd
hostPath:
path: /etc/passwd
- name: group
hostPath:
path: /etc/group
- name: cgroup
hostPath:
path: /sys/fs/cgroup
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: varlog
hostPath:
path: /var/log
- name: systemd
hostPath:
path: /etc/systemd
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: elastic-agent
subjects:
- kind: ServiceAccount
name: elastic-agent
namespace: kube-system
roleRef:
kind: ClusterRole
name: elastic-agent
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: kube-system
name: elastic-agent
subjects:
- kind: ServiceAccount
name: elastic-agent
namespace: kube-system
roleRef:
kind: Role
name: elastic-agent
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: elastic-agent-kubeadm-config
namespace: kube-system
subjects:
- kind: ServiceAccount
name: elastic-agent
namespace: kube-system
roleRef:
kind: Role
name: elastic-agent-kubeadm-config
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: elastic-agent
labels:
k8s-app: elastic-agent
rules:
- apiGroups: [""]
resources:
- nodes
- namespaces
- events
- pods
- services
- configmaps
- serviceaccounts
verbs: ["get", "list", "watch"]
# Enable this rule only if planing to use kubernetes_secrets provider
#- apiGroups: [""]
# resources:
# - secrets
# verbs: ["get"]
- apiGroups: ["extensions"]
resources:
- replicasets
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
- deployments
- replicasets
verbs: ["get", "list", "watch"]
- apiGroups: ["batch"]
resources:
- jobs
- cronjobs
verbs: ["get", "list", "watch"]
- apiGroups:
- ""
resources:
- nodes/stats
verbs:
- get
# required for apiserver
- nonResourceURLs:
- "/metrics"
verbs:
- get
# required for cloudbeat
- apiGroups: ["rbac.authorization.k8s.io"]
resources:
- clusterrolebindings
- clusterroles
- rolebindings
- roles
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
resources:
- ingressclasses
- ingresses
verbs: ["get", "list", "watch"]
- apiGroups: ["policy"]
resources:
- podsecuritypolicies
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: elastic-agent
# should be the namespace where elastic-agent is running
namespace: kube-system
labels:
k8s-app: elastic-agent
rules:
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: elastic-agent-kubeadm-config
namespace: kube-system
labels:
k8s-app: elastic-agent
rules:
- apiGroups: [""]
resources:
- configmaps
resourceNames:
- kubeadm-config
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: elastic-agent
namespace: kube-system
labels:
k8s-app: elastic-agent
--- The relevant section is probably the various redis inputs, e.g. - id: logfile-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
revision: 1
name: redis-1
type: logfile
data_stream:
namespace: default
use_output: default
streams:
- id: logfile-redis.log-fc56b140-3154-4907-9eb3-991ea4ab6569
data_stream:
type: logs
dataset: redis.log
paths:
- /var/log/redis/redis-server.log*
- /var/log/redis/my-cool-redis.log*
tags:
- redis-log
exclude_files:
- .gz$
exclude_lines:
- '^\s+[\-`(''.|_]'
meta:
package:
name: redis
version: 1.3.1
- id: redis-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
revision: 1
name: redis-1
type: redis
data_stream:
namespace: default
use_output: default
streams:
- id: redis-redis.slowlog-fc56b140-3154-4907-9eb3-991ea4ab6569
data_stream:
type: logs
dataset: redis.slowlog
hosts:
- '127.0.0.1:6379'
- 'https://my.redis:6379'
password: ''
meta:
package:
name: redis
version: 1.3.1
- id: redis/metrics-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
revision: 1
name: redis-1
type: redis/metrics
data_stream:
namespace: default
use_output: default
streams:
- id: redis/metrics-redis.info-fc56b140-3154-4907-9eb3-991ea4ab6569
data_stream:
type: metrics
dataset: redis.info
metricsets:
- info
hosts:
- '127.0.0.1:6379'
- 'https://my.redis:6379'
idle_timeout: 20s
maxconn: 10
network: tcp
period: 10s
- id: redis/metrics-redis.key-fc56b140-3154-4907-9eb3-991ea4ab6569
data_stream:
type: metrics
dataset: redis.key
metricsets:
- key
hosts:
- '127.0.0.1:6379'
- 'https://my.redis:6379'
idle_timeout: 20s
key.patterns:
- limit: 20
pattern: '*'
maxconn: 10
network: tcp
period: 10s
- id: redis/metrics-redis.keyspace-fc56b140-3154-4907-9eb3-991ea4ab6569
data_stream:
type: metrics
dataset: redis.keyspace
metricsets:
- keyspace
hosts:
- '127.0.0.1:6379'
- 'https://my.redis:6379'
idle_timeout: 20s
maxconn: 10
network: tcp
period: 10s
meta:
package:
name: redis
version: 1.3.1 So, rather than a block like this, which uses the Redis integration's stream template hosts:
- '127.0.0.1:6379'
- 'https://my.redis:6379' Fleet would output a block like this: hosts:
- "${docker.hints.redis.info.host|kubernetes.hints.redis.info.host|'127.0.0.1:6379'}" My question here is how does "know" about the hint string? Is it included in the package definition somewhere? e.g. would the
Also, do we ignore user input in cases where we're using these hint strings instead, or are they intended to solve as default values overridden by the user? e.g. should the |
For the first I was thinking that as long as we produce the templates of input.d directory we can also produce the list of available hints to be used. But this can be done as even in later stages of project. We can even render only the relevant template according only to the integrations enabled per case if makes sense. |
Thanks for the extra feedback folks! @kpollich trying to answer your questions below:
In k8s we can avoid having the users to manually place the The only thing FleetUI needs to do is to produce sth like the following: apiVersion: v1
kind: ConfigMap
metadata:
name: elastic-agent-standalone-inputs
data:
redis.yml: |-
- data_stream: ...
....
apache.yml: |-
....
nginx.yml: |-
.... Then we just need to mount it inside the Agent's Pod as a directory using sth like the following: ...
volumeMounts:
- name: inputs
mountPath: /usr/share/elastic-agent/inputs.d
readOnly: true
volumes:
- name: modules
configMap:
defaultMode: 0640
name: elastic-agent-standalone-inputs
... See https://github.com/elastic/beats/blob/main/deploy/kubernetes/metricbeat-kubernetes.yaml#L197-L220 for a complete example. In terms of UX, since we are maintaining all those manifests I see 2 possible paths that could be supported at the same time.
Ofc we can improve the UX a lot here but this can happen on top of the basic/core implementation. At some point we could even provide the users to only use step 1 instead of downloading the manifests from the upstream. I would leave it to @mlunadia to provide better directions here but as I mentioned for now we can just expose it with the 2 simple ways provided above.
The current idea is to only provide support for a specific set/list of hints like Also keeping the list of hints specific provides more control to what we accept as inputs from the users and also it's more secure than providing the full flexibility to set everything via the hints. Keep in mind that this mechanism will be exposed to everyone that has access to So to answer your question if we have a specific set of hints that we support then Fleet UI can just look and replace the respective settings, leveraging their default values etc. Nothing changes in the way packages are developed. I guess this would be doable?
For all the settings that will not be exposed as hints we will use the default values as defined in the packages' specs. For those that are exposed as hints (as proposed above) we will have a logic to set them like Since I have already started experimenting a bit with the backend implementation here is a ConfigMap that we would need to have: apiVersion: v1
kind: ConfigMap
metadata:
name: elastic-agent-standalone-inputs
data:
redis.yml: |-
inputs:
- name: templates.d/redis/0.3.6
type: redis/metrics
data_stream.namespace: default
use_output: default
streams:
- data_stream:
dataset: redis.info
type: metrics
metricsets:
- info
hosts:
- "${kubernetes.hints.redis.info.host|'127.0.0.1:6379'}"
idle_timeout: 20s
maxconn: 10
network: tcp
period: "${kubernetes.hints.redis.info.period|'10s'}"
condition: ${kubernetes.hints.redis.info.enabled} == true
- data_stream:
dataset: redis.key
type: metrics
metricsets:
- key
hosts:
- "${kubernetes.hints.redis.key.host|'127.0.0.1'}:${kubernetes.hints.redis.info.port|'6379'}"
idle_timeout: 20s
key.patterns:
- limit: 20
pattern: '*'
maxconn: 10
network: tcp
period: "${kubernetes.hints.redis.key.period|'10s'}"
condition: ${kubernetes.hints.redis.key.enabled} == true |
I understand Fleet's responsibility in terms of generating these
I'm assuming that number 3 above is accurate, based on this point:
This makes sense to me. We'll have a hardcoded list of supported settings and their respective template strings stored in the Fleet codebase.
I'm thinking of the case where
Does this actually happen or matter? I'm just worried about confusion in the UX if we continue to allow for configuration of these integrations via Fleet's policy editor, but the actual configuration provided is ignored in favor of the hardcoded hints strings. |
Thanks @kpollich for extra round of feedback. I see where the confusion comes from. The key point here is: the new So in order for this to happen we leverage the structure that the available packages provide (from the registry etc) and we construct config-blocks/templates using the default values the packages provide and adding some extra conditions along with hints's placeholders. Users should have NO access to these templates in time of creation so Fleet does not take into account the installed integrations. So the flow for creating this new
To make it more clear we could use other options to make the generation of this long configuration template available, such as |
Thanks @ChrsMark - I understand now that we're working with more of a "static" templating implementation as opposed to dynamically generating the |
Follow-up/project specific implementation issues created:
|
Realizing I think I misunderstood. The hints themselves aren't coming from package spec right? We're just talking about the |
Yeap the hints list will be defined in Kibana/Fleet code. |
Thanks @ChrsMark really great work here overall. Let us sync once more for the two quality criteria we had defined :
After that we can close it |
@gizas Based on the experience from using and maintaining the feature in Beats I had provided my input at #613 (comment). Any other ideas? Keep in mind that the feature is still in beta so if we identify specific implementation details that are missing we can file them as follow up issues and add them as enhancements. |
Also known as hints based auto-discovery for agent standalone
Picking up context from elastic/beats#23876
We want to enable users to monitor Kubernetes workloads from the resource side for users of agent standalone.
User outcome
Kubernetes users can declare their workloads and include hints, similar to those described in beats documentation metricbeat and filebeat manifests and then expect agent whilst enrolled in standalone mode to pick these hints launch the proper config for it.
Acceptance criteria
On top of the outcome described above
Implementation issues
Add support for template construction compatible with kubernetes hints kibana#135624will be implemented in CI jobsThe text was updated successfully, but these errors were encountered: