Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selective monitoring of Kubernetes workloads using annotations for agent standalone #613

Closed
3 tasks done
mlunadia opened this issue Jun 8, 2022 · 24 comments
Closed
3 tasks done
Assignees
Labels
8.5-candidate Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team v8.5.0

Comments

@mlunadia
Copy link

mlunadia commented Jun 8, 2022

Also known as hints based auto-discovery for agent standalone
Picking up context from elastic/beats#23876

We want to enable users to monitor Kubernetes workloads from the resource side for users of agent standalone.

User outcome
Kubernetes users can declare their workloads and include hints, similar to those described in beats documentation metricbeat and filebeat manifests and then expect agent whilst enrolled in standalone mode to pick these hints launch the proper config for it.

Acceptance criteria
On top of the outcome described above

  • Application operator (developer) should have an accessible way to find the list of available hints
  • Application operator (developer) should receive feedback for when a hint is not valid

Implementation issues

@mlunadia mlunadia added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Jun 8, 2022
@mlunadia mlunadia changed the title Enable monitoring of Kubernetes workloads from the resource side for agent standalone Selective monitoring of Kubernetes workloads from the resource side for agent standalone Jun 16, 2022
@ChrsMark
Copy link
Member

ChrsMark commented Jun 20, 2022

We had an initial chat about this with @gizas and in order to start pushing this forward we first need to decide on the approach we will follow to make this feature available to our users. Keep in mind that this feature requires cross team effort from Cloudnative, Agent and Fleet UI teams. The implementation can happen from one or more teams accordingly but first we need to have a solid plan about the way forward. Once we have decided the what and the how we can create smaller, detail specific issues for the various parts of the feature's implementation. Here is a first outline of the current status of the issue and a high level proposal for its implementation across the "stack".

Hint based autodiscovery in standalone Agent

Related issues from the past

[1] Hints based autodiscovery
[2] Templates support

Current Status

What we have so far?

There is available the implementation to watch for Pods and collect metadata for them.
We can easily identify Pods (Nodes etc) and we can easily identify the hints out of the Resources' annotations.

What we need

We need a mechanism to receive those hints and match them with inputs/integrations. This mechanism can either "live" in Agent to cover standalone mode or in Fleet UI OR in an external component like an additional "operator". In this proposal we only focus on standalone mode.

Standalone mode

The Agent needs to know how packages/inputs look like in order to match them with the hints. One way to achieve this is by using input templates. This would make the flow quite similar to Beats.

Feature design proposal (technical/high-level)

As it was already mentioned, Agent can identify Pods (and other workloads) in Kubernetes and communicate this information to Agent's controller so as to enable and populate inputs' configuration accordingly.
The current proposal is based in the template's support which was discussed already in the past.
The proposed way to add hint's support is the following:

  1. Identify hints in Pods' annotations and emit the "hints' event" at the same time the provider emits the mappings for the composable controller at
    _ = p.comm.AddOrUpdate(data.uid, PodPriority, data.mapping, data.processors)
    . We can use a different channel to emit these events or the same but this is an implementation detail to be decided later.
  2. Agent's composable controller (or a new kind of controller) receives the hints and tries to utilise them in order to populate the values of the respective template which would be available in data/templates.d. This functionality was outlined at [Elastic Agent] Support for input templates beats#24054 (comment). We can figure out the details accordingly but this is the high level idea.
  3. After the successful evaluation of the hints and the input configuration construction Agent adds the new configuration block into the local policy and the input is enabled.

How templates are created

One important "detail" that needs to be discussed is how the templates would become available in Elastic Agent locally. At elastic/beats#24054 (comment) (Part 2) it was proposed to ship the templates along with the Agent binary which implies that we would include them in the build/packaging stage after retrieving them from the Package Registry. This could work but we come with an easier solution (kudos to @gizas :D) which proposes to leave this work to our users and give them the option to download the templates from Fleet UI in the same was they do for the standalone policy. So Fleet UI can create a templates.d.tar on the fly using the latest available packages and then users can extract this zipped directory at data/templates.d.
This gives us the opportunity to make the delivery of the templates easier using the latest versions of the available packages from Fleet UI, and at the same time we follow similar UX with the one we have for the standalone policy. It would be Fleet UI's responsibility to collect the packages and construct the templates to make them available to the users.

User journey

Having described the high level of the technical part we need to also describe how users will us the feature. Here is an example user journey:

  1. Users download Elastic Agent binary locally to run in standalone mode. The process is the same with today (https://www.elastic.co/guide/en/fleet/current/install-standalone-elastic-agent.html).
  2. In the next step users would need to create and download the policy as explained at https://www.elastic.co/guide/en/fleet/current/create-standalone-agent-policy.html
  3. Final step is that users download the templates from Fleet UI the same way they do for step2
    Screenshot 2022-06-20 at 5 10 15 PM

** The UX details/wording etc can be improved but this is the high level idea.

Proposed implementation issues

1. Implement hints' identification mechanism in k8s provider side and make hints' events available to Agent's controller/core.

a. specify the list of supported hints
b. lookup for supported hints in the Pod's annotations
c. emit a "hints's event" to a specific channel to be handled by codebase described at issue no2.

code repo: https://github.com/elastic/elastic-agent/tree/7db7406cc04d6dd8f2d47f8a37e299da4598056b/internal/pkg/composable/providers/kubernetes
maintainer team: cloudnative-monitoring

2. Implement hints-to-inputs logic based on input templates (based on item 2)

a. Define template schema. Provide a sample.
b. Implement support for input templates in Agent. This is actually what was proposed with elastic/beats#24054 but it seems it was closed after some time.
c. Implement the linking codebase that would receive hints events (item 1) and match them with the proper template. Then enable the respective input.

code repo: https://github.com/elastic/elastic-agent/tree/7db7406cc04d6dd8f2d47f8a37e299da4598056b/internal/pkg/composable
maintainer team: Elastic Agent

3. Implement template construction out of the packages on Fleet UI and make them available to users in the same way the standalone policy is provided

code repo: https://github.com/elastic/kibana/tree/main/x-pack/plugins/fleet
maintainer team: Fleet UI

@gizas @mlunadia @ruflin let me know what you think about this and how we could move this forward.

@ruflin
Copy link
Member

ruflin commented Jun 22, 2022

The nice part about the proposed solution is that step 1 and 2 can happen fully independent from Fleet. Also having support for input templates in Elastic Agent means it does not only support templates coming from Fleet but also users could build their own if needed.

Also if later on we have an orchestrator solution, the templates will still be useful.

We should consider to bundle some templates like system by default with the Elastic Agent for convenience.

An additional argument for this approach is that it does not only work with k8s but also any other standalone setup.

@ph
Copy link
Contributor

ph commented Jun 22, 2022

As pointed out in elastic/beats#24054 (comment), looking at the described workflow suggested in the comment.
Downloading the package or embedding will works. One thing I believe we are missing is a way to add a runtime constraint on the package itself.
IE you can only run this package with Elastic Agent version 8.5+ or 8.6 and lower.

Concerning the download from the UI we could instead make the agent policy bigger with the template embedded in it, something like this.

inputs:
- type: filestream
- type: tcp

# Instead of looking up on disk we can look at this instead, which could be a VFS representation of the structure on disk.
template.d:
 nginx/1.8:
    ....

The user only have one thing to download and deploy, the yaml is bigger but this reduce the risk from having mistmatch.

@ChrsMark
Copy link
Member

Thanks for your reviews @ruflin and @ph :)

Heads-up on this. After the initial feedback and some more brainstorming around this, it seems that for Hints' feature implementation we don't necessarily need the template support.
This implies that we can skip the no2 and no3 items from the Proposed implementation issues list posted above.

The rational behind a such solution is that we can just re-use the k8s provider to emit specific mappings (hints specific ones) that would then be capable to populate the proper “templates” produced by the Fleet UI with the proper conventions.

Let me put some examples together to illustrate the point:

Updated proposal

"Templates" format

We take for granted that templates are constructed by Fleet UI, using the available packages' definitions, and are placed in the elastic-agent.yml as @ph mentioned above.
The difference is that those "templates" would not be listed in a different level but can be listed under inputs level, while being annotated with the proper hints's based conditions. Having these inputs/templates populated by fleet UI then Agent can just leverage the "dynamic variable resolution" capabilities and the "kubernetes provider" to enable those inputs/templates if the proper hints are provided.

For simplicity we can add only the latest packages in the list of templates but we can simply extend this while using the proper conditions accordingly.

Find below a "template sample":

inputs:
...
- name: templates.d/redis/0.3.6
  type: redis/metrics
  use_output: default
  # condition: "${docker.hints.redis.version|kubernetes.hints.redis.version == "0.3.6"}"
  meta:
  package:
    name: redis
    version: 0.3.6
  data_stream:
  namespace: default
  streams:
  - data_stream:
      dataset: redis.info
      type: metrics
    metricsets:
      - info
    hosts:
      - "${docker.hints.redis.info.host|kubernetes.hints.redis.info.host|'127.0.0.1:6379'}"
    idle_timeout: 20s
    maxconn: 10
    network: tcp
    period: "${docker.hints.redis.info.period|kubernetes.hints.redis.info.period|'10s'}"
    condition: "${docker.hints.redis.info.enabled|kubernetes.hints.redis.info.enabled|true}"
   - data_stream:
       dataset: redis.key
       type: metrics
    metricsets:
      - key
    hosts:
      - "${docker.hints.redis.key.host|kubernetes.hints.redis.key.host|'127.0.0.1:6379'}"
    idle_timeout: 20s
    key.patterns:
      - limit: 20
        pattern: '*'
    maxconn: 10
    network: tcp
    period: "${docker.hints.redis.key.period|kubernetes.hints.redis.key.period|'10s'}"
    condition: "${docker.hints.redis.key.enabled|kubernetes.hints.redis.key.enabled|false}"

For now we can follow a simple approach and support only a few "hints" similarly to what we do in Metricbeat/Filebeat. In the above example I only leverage "period" and "host" hints but we can extend it as much as we think.

Note that the conditions per data_stream are suffixed with an |true if the data_stream is enabled by default or |false if the data_stream is disabled by default. This gives us the option to explicitly enable/disable data_streams using the hints or enabling just the integration implying to enable the default data_streams.

Hints' specific mappings

Having defined the "templates" as above we can now leverage the kubernetes provider in order to populate those with the respective values as settings and enable them.

So in the kubernetes provider we can check for specific annotations like co.elastic.hints/*.

A Redis Pod would be annotated like this:

annotations:
  co.elastic.hints/package: redis
  co.elastic.hints/data_streams: info, key
  co.elastic.hints/host: '${kubernetes.pod.ip}:6379'
  co.elastic.hints/period: 1m

Note that the settings can also be data_stream specific if users want it:

annotations:
  co.elastic.hints/package: redis
  co.elastic.hints/data_streams: info, key
  co.elastic.hints/host: '${kubernetes.pod.ip}:6379'
  co.elastic.hints/info.period: 1m
  co.elastic.hints/key.period: 10m

If settings are not data_stream specific then common settings will be used across the data_streams.

With the above annotations the kubernetes provider would produce the following mapping:

{
  "kubernetes": {
    "hints": {
      "redis": {
        "info": {
          "enabled": true,
          "period": "1m",
          "host": "152.10.67.976:6379"
        },
        "key": {
          "enabled": true,
          "period": "10m",
          "host": "152.10.67.976:6379"
        }
      }
    }
  }
}

Note that host field can be populated by the kubernetes provider on the fly since the kubernetes provider already holds the Pod's metadata at the same place.

Having the above mapping we can emit this at

_ = p.comm.AddOrUpdate(data.uid, PodPriority, data.mapping, data.processors)
and this would be enough to enable the respective "template" provided in the standalone configuration.

The rendered input configuration will be the following:

inputs:
...
- name: templates.d/redis/0.3.6
  type: redis/metrics
  use_output: default
  meta:
  package:
    name: redis
    version: 0.3.6
  data_stream:
  namespace: default
  streams:
  - data_stream:
      dataset: redis.info
      type: metrics
    metricsets:
      - info
    hosts:
      - "152.10.67.976:6379"
    idle_timeout: 20s
    maxconn: 10
    network: tcp
    period: 1m
   - data_stream:
       dataset: redis.key
       type: metrics
    metricsets:
      - key
    hosts:
      - "152.10.67.976:6379"
    idle_timeout: 20s
    key.patterns:
      - limit: 20
        pattern: '*'
    maxconn: 10
    network: tcp
    period: 10m

Additional considerations

  1. One major issue here is how we would install the assets like dashboards and pipelines but this is an issue that is not hints specific but is generic when one run standalone agent. Nevertheless, this is mentioned and documented accordingly at https://www.elastic.co/guide/en/fleet/current/install-standalone-elastic-agent.html and https://www.elastic.co/guide/en/fleet/current/install-uninstall-integration-assets.html#install-integration-assets.
  2. We should also think about if another component should be responsible for constructing the "templates". For example elastic-package tool could provide this functionality which would allow advanced users to use it directly and construct the templates on their ends. This would also allow elastic-agent to re-use the tool and construct the templates at build time in order to include them in the final distrubution. However I'm leaning towards having Fleet UI to construct these templates and embed them in the elastic-agent.yml policy configuration. This makes things easier and would provide a seamless UX since users are already directed to download the standalone policy through Fleet UI. But we can discuss more about this if there are different opinions.

Since we are considering the templates' construction in Fleet UI, @jen-huang could you chime in and provide feedback on this and more specifically on "Templates" construction by Fleet UI part?

@ph
Copy link
Contributor

ph commented Jun 23, 2022

@kpollich If you want to provide feed from the Fleet side.

@ruflin
Copy link
Member

ruflin commented Jun 24, 2022

I like that the above approach is allowing us to use mostly existing mechanism. To make it as easy as possible for users to update their "redis" template for example, it would be great to have an inputs.d/redis-036.yml file or similar. When downloading a new template from Fleet, a user would not have to figure out where their input lists for redis starts or stops, but just replacing the file.

@ph I know we have discussed inputs.d support for Elastic Agent in the past. Did this ever happen?

@ChrsMark
Copy link
Member

@ruflin as far as I can see inputs.d are already supported because of 3c2072d, so we are good here 🙂 .

This seems to be more structured but compared to @ph 's suggestion to include everything in elastic-agent.yml standalone policy might be more error prone since we leave a lot of flexibility to users. Including everything in the standalone policy would "hide" everything from the users and allow us to handle it as an implementation detail for our purposes.

However could you elaborate more on how a user journey would look like in that case? I guess users would need to download the templates (in batch) from Fleet UI in a second step after the standalone policy and extract them into the inputs.d directory?

Something that we miss so far is that in Kubernetes everything can be quite packed before deployment time. For example in order to deploy Elastic Agent (standalone) in k8s users would use https://github.com/elastic/elastic-agent/blob/main/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml following the docs, or they can use the recently added UX added in Fleet UI with elastic/kibana#114439.

For us this would mean that we have several options here:

  1. We can easily include the templates in a ConfigMap which will be mounted inside the Agent's Pod. This is exactly the same we do with Beats for example at https://github.com/elastic/beats/blob/main/deploy/kubernetes/metricbeat/metricbeat-daemonset-configmap.yaml#L79
    which is then mounted in Metrcibeat's Pod at https://github.com/elastic/beats/blob/main/deploy/kubernetes/metricbeat-kubernetes.yaml#L198. Similarly to modules.d directory we will have inputs.d.
  2. The way this ConfigMap will be produced can be through Fleet UI similarly to what we have in order to produce the k8s manifest from Fleet UI-> In case of kubernetes integration detected return manifest in standalone agent layout instead of policy kibana#114439

My approach here would be that https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-standalone manifests are shipped with a dummy inputs.d ConfigMap (or even an empty one) and then users are directed to download the inputs.d confgiMap from the Fleet UI page and replace it accordingly.
With this approach k8s users/operators can even downloading the inputs.d ConfigMap from time to time and updating it accordingly in order to push it to their repos for version controlling purposes.
There is a lot of flexibility here on how the UX could look like but we can start with sth and iterate accordingly (cc: @mlunadia @gizas)

Having said this I see the following implementation parts:

  1. Update the kubernetes provider accordingly to extract hints' from annotations and produce hints-specific mappings as described at Selective monitoring of Kubernetes workloads using annotations for agent standalone #613 (@elastic/obs-cloudnative-monitoring can take care of this)
  2. Update the manifests for standalone Agent at https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-standalone to include an inputs.d ConfigMap similar to modules.d of Metricbeat. Provide a dummy inputs.d as an example/placeholder.(@elastic/obs-cloudnative-monitoring can take care of this)
  3. Update Fleet UI accordingly to produce the inputs.d ConfigMap that will then be placed inside the elastic-agent-standalone/elastic-agent-standalone-daemonset-configmap.yaml

@ChrsMark ChrsMark transferred this issue from elastic/beats Jun 24, 2022
@ruflin
Copy link
Member

ruflin commented Jun 24, 2022

However could you elaborate more on how a user journey would look like in that case? I guess users would need to download the templates (in batch) from Fleet UI in a second step after the standalone policy and extract them into the inputs.d directory?

You have a good point here. It is really neat that we can offer the user a single yml file to download and it just works. For the Elastic Agent configs, I started to think about separating output configs from input configs. Output config in the standalone case rarely change, inputs much more often and are likely managed by the user eventually in Git or similar. Could still have a single yaml but with multiple sections inside? Have some questions around this, will ping you directly.

@mlunadia
Copy link
Author

@ChrsMark looking at the user journey, how do we satisfy the two items below?

Acceptance criteria
On top of the outcome described above

  • Application operator (developer) should have an accessible way to find the list of available hints
  • Application operator (developer) should receive feedback for when a hint is not valid

@ChrsMark
Copy link
Member

@ChrsMark looking at the user journey, how do we satisfy the two items below?

Acceptance criteria
On top of the outcome described above

  • Application operator (developer) should have an accessible way to find the list of available hints
  • Application operator (developer) should receive feedback for when a hint is not valid

Quick answer is:

  1. docs like at https://www.elastic.co/guide/en/beats/metricbeat/current/configuration-autodiscover-hints.html
  2. logs of Agent. As much verbose as we want

Beyond this we can go quite further and maybe provide some extra troubleshooting functionality with elastic-agent inspect command.

However I think these should not affect the base implementation we are discussing here since are mostly enhancements on top of the core implementation?

@ph
Copy link
Contributor

ph commented Jun 27, 2022

@ruflin Yes we do have support for input.d, but as I mentioned having the possibility of making the configuration atomic and a single thing to deploy make sense to reduce the possible error to ensure that the right thing at the right version is deployed.

We can support both worklow without too much trouble the IAC and a more click and get.

@ChrsMark I am +1 for the proposal above.

Concerning 2, we might want to cleanup the log to make sure we reduce the noise to problem ratio.

@kpollich
Copy link
Member

Update Fleet UI accordingly to produce the inputs.d ConfigMap that will then be placed inside the elastic-agent-standalone/elastic-agent-standalone-daemonset-configmap.yaml

Just wanted to chime in from the Fleet UI side of things and make sure I'm understanding correctly. At first glance this looks reasonable.

So, Fleet UI would be responsible for generating a ConfigMap similar to how we currently generate the manifest file in the "Add Agent" flyout, e.g.

image

Then, the user will be responsible for manually managing the ConfigMap and placing it correctly within the inputs.d directory.

For context, here's an example standalone manifest generated today by Fleet for an integration policy with system, redis, and kubernetes installed:

Show full YML block
apiVersion: v1
kind: ConfigMap
metadata:
  name: agent-node-datastreams
  namespace: kube-system
  labels:
    k8s-app: elastic-agent
data:
  agent.yml: |-
    id: d3a50f00-f64c-11ec-82c5-8b396fc6520e
    outputs:
      default:
        type: elasticsearch
        hosts:
          - 'http://192.168.65.2:9200'
        username: '{ES_USERNAME}'
        password: '{ES_PASSWORD}'
    inputs:
      - id: logfile-system-e7822224-fa5c-4291-acd8-49b69e4a44ac
        revision: 1
        name: system-6
        type: logfile
        data_stream:
          namespace: default
        use_output: default
        streams:
          - id: logfile-system.auth-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: logs
              dataset: system.auth
            paths:
              - /var/log/auth.log*
              - /var/log/secure*
            exclude_files:
              - .gz$
            multiline:
              pattern: ^\s
              match: after
            processors:
              - add_locale: null
          - id: logfile-system.syslog-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: logs
              dataset: system.syslog
            paths:
              - /var/log/messages*
              - /var/log/syslog*
            exclude_files:
              - .gz$
            multiline:
              pattern: ^\s
              match: after
            processors:
              - add_locale: null
        meta:
          package:
            name: system
            version: 1.16.2
      - id: winlog-system-e7822224-fa5c-4291-acd8-49b69e4a44ac
        revision: 1
        name: system-6
        type: winlog
        data_stream:
          namespace: default
        use_output: default
        streams:
          - id: winlog-system.application-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: logs
              dataset: system.application
            name: Application
            condition: '${host.platform} == ''windows'''
            ignore_older: 72h
          - id: winlog-system.security-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: logs
              dataset: system.security
            name: Security
            condition: '${host.platform} == ''windows'''
            ignore_older: 72h
          - id: winlog-system.system-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: logs
              dataset: system.system
            name: System
            condition: '${host.platform} == ''windows'''
            ignore_older: 72h
        meta:
          package:
            name: system
            version: 1.16.2
      - id: system/metrics-system-e7822224-fa5c-4291-acd8-49b69e4a44ac
        revision: 1
        name: system-6
        type: system/metrics
        data_stream:
          namespace: default
        use_output: default
        streams:
          - id: system/metrics-system.fsstat-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: metrics
              dataset: system.fsstat
            metricsets:
              - fsstat
            period: 1m
            processors:
              - drop_event.when.regexp:
                  system.fsstat.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
          - id: >-
              system/metrics-system.filesystem-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: metrics
              dataset: system.filesystem
            metricsets:
              - filesystem
            period: 1m
            processors:
              - drop_event.when.regexp:
                  system.filesystem.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
          - id: system/metrics-system.diskio-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: metrics
              dataset: system.diskio
            metricsets:
              - diskio
            diskio.include_devices: null
            period: 10s
          - id: system/metrics-system.cpu-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: metrics
              dataset: system.cpu
            metricsets:
              - cpu
            cpu.metrics:
              - percentages
              - normalized_percentages
            period: 10s
          - id: system/metrics-system.process-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: metrics
              dataset: system.process
            metricsets:
              - process
            period: 10s
            process.include_top_n.by_cpu: 5
            process.include_top_n.by_memory: 5
            process.cmdline.cache.enabled: true
            process.cgroups.enabled: false
            process.include_cpu_ticks: false
            processes:
              - .*
          - id: system/metrics-system.memory-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: metrics
              dataset: system.memory
            metricsets:
              - memory
            period: 10s
          - id: system/metrics-system.network-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: metrics
              dataset: system.network
            metricsets:
              - network
            period: 10s
            network.interfaces: null
          - id: >-
              system/metrics-system.socket_summary-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: metrics
              dataset: system.socket_summary
            metricsets:
              - socket_summary
            period: 10s
          - id: >-
              system/metrics-system.process.summary-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: metrics
              dataset: system.process.summary
            metricsets:
              - process_summary
            period: 10s
          - id: system/metrics-system.load-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: metrics
              dataset: system.load
            metricsets:
              - load
            condition: '${host.platform} != ''windows'''
            period: 10s
          - id: system/metrics-system.uptime-e7822224-fa5c-4291-acd8-49b69e4a44ac
            data_stream:
              type: metrics
              dataset: system.uptime
            metricsets:
              - uptime
            period: 10s
        meta:
          package:
            name: system
            version: 1.16.2
      - id: kubernetes/metrics-kubelet-1e00917a-19e7-4c02-8050-c21ba0578e27
        revision: 1
        name: kubernetes-1
        type: kubernetes/metrics
        data_stream:
          namespace: default
        use_output: default
        streams:
          - id: >-
              kubernetes/metrics-kubernetes.container-1e00917a-19e7-4c02-8050-c21ba0578e27
            data_stream:
              type: metrics
              dataset: kubernetes.container
            metricsets:
              - container
            add_metadata: true
            hosts:
              - 'https://${env.NODE_NAME}:10250'
            period: 10s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            ssl.verification_mode: none
          - id: >-
              kubernetes/metrics-kubernetes.node-1e00917a-19e7-4c02-8050-c21ba0578e27
            data_stream:
              type: metrics
              dataset: kubernetes.node
            metricsets:
              - node
            add_metadata: true
            hosts:
              - 'https://${env.NODE_NAME}:10250'
            period: 10s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            ssl.verification_mode: none
          - id: >-
              kubernetes/metrics-kubernetes.pod-1e00917a-19e7-4c02-8050-c21ba0578e27
            data_stream:
              type: metrics
              dataset: kubernetes.pod
            metricsets:
              - pod
            add_metadata: true
            hosts:
              - 'https://${env.NODE_NAME}:10250'
            period: 10s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            ssl.verification_mode: none
          - id: >-
              kubernetes/metrics-kubernetes.system-1e00917a-19e7-4c02-8050-c21ba0578e27
            data_stream:
              type: metrics
              dataset: kubernetes.system
            metricsets:
              - system
            add_metadata: true
            hosts:
              - 'https://${env.NODE_NAME}:10250'
            period: 10s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            ssl.verification_mode: none
          - id: >-
              kubernetes/metrics-kubernetes.volume-1e00917a-19e7-4c02-8050-c21ba0578e27
            data_stream:
              type: metrics
              dataset: kubernetes.volume
            metricsets:
              - volume
            add_metadata: true
            hosts:
              - 'https://${env.NODE_NAME}:10250'
            period: 10s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            ssl.verification_mode: none
        meta:
          package:
            name: kubernetes
            version: 1.21.1
      - id: kubernetes/metrics-kube-apiserver-1e00917a-19e7-4c02-8050-c21ba0578e27
        revision: 1
        name: kubernetes-1
        type: kubernetes/metrics
        data_stream:
          namespace: default
        use_output: default
        streams:
          - id: >-
              kubernetes/metrics-kubernetes.apiserver-1e00917a-19e7-4c02-8050-c21ba0578e27
            data_stream:
              type: metrics
              dataset: kubernetes.apiserver
            metricsets:
              - apiserver
            hosts:
              - >-
                https://${env.KUBERNETES_SERVICE_HOST}:${env.KUBERNETES_SERVICE_PORT}
            period: 30s
            condition: '${kubernetes_leaderelection.leader} == true'
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            ssl.certificate_authorities:
              - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        meta:
          package:
            name: kubernetes
            version: 1.21.1
      - id: kubernetes/metrics-kube-proxy-1e00917a-19e7-4c02-8050-c21ba0578e27
        revision: 1
        name: kubernetes-1
        type: kubernetes/metrics
        data_stream:
          namespace: default
        use_output: default
        streams:
          - id: >-
              kubernetes/metrics-kubernetes.proxy-1e00917a-19e7-4c02-8050-c21ba0578e27
            data_stream:
              type: metrics
              dataset: kubernetes.proxy
            metricsets:
              - proxy
            hosts:
              - 'localhost:10249'
            period: 10s
        meta:
          package:
            name: kubernetes
            version: 1.21.1
      - id: kubernetes/metrics-events-1e00917a-19e7-4c02-8050-c21ba0578e27
        revision: 1
        name: kubernetes-1
        type: kubernetes/metrics
        data_stream:
          namespace: default
        use_output: default
        streams:
          - id: >-
              kubernetes/metrics-kubernetes.event-1e00917a-19e7-4c02-8050-c21ba0578e27
            data_stream:
              type: metrics
              dataset: kubernetes.event
            metricsets:
              - event
            period: 10s
            add_metadata: true
            skip_older: true
            condition: '${kubernetes_leaderelection.leader} == true'
        meta:
          package:
            name: kubernetes
            version: 1.21.1
      - id: filestream-container-logs-1e00917a-19e7-4c02-8050-c21ba0578e27
        revision: 1
        name: kubernetes-1
        type: filestream
        data_stream:
          namespace: default
        use_output: default
        streams:
          - id: >-
              filestream-kubernetes.container_logs-1e00917a-19e7-4c02-8050-c21ba0578e27
            data_stream:
              type: logs
              dataset: kubernetes.container_logs
            paths:
              - '/var/log/containers/*${kubernetes.container.id}.log'
            prospector.scanner.symlinks: true
            parsers:
              - container:
                  stream: all
                  format: auto
        meta:
          package:
            name: kubernetes
            version: 1.21.1
      - id: logfile-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
        revision: 1
        name: redis-1
        type: logfile
        data_stream:
          namespace: default
        use_output: default
        streams:
          - id: logfile-redis.log-fc56b140-3154-4907-9eb3-991ea4ab6569
            data_stream:
              type: logs
              dataset: redis.log
            paths:
              - /var/log/redis/redis-server.log*
              - /var/log/redis/my-cool-redis.log*
            tags:
              - redis-log
            exclude_files:
              - .gz$
            exclude_lines:
              - '^\s+[\-`(''.|_]'
        meta:
          package:
            name: redis
            version: 1.3.1
      - id: redis-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
        revision: 1
        name: redis-1
        type: redis
        data_stream:
          namespace: default
        use_output: default
        streams:
          - id: redis-redis.slowlog-fc56b140-3154-4907-9eb3-991ea4ab6569
            data_stream:
              type: logs
              dataset: redis.slowlog
            hosts:
              - '127.0.0.1:6379'
              - 'https://my.redis:6379'
            password: ''
        meta:
          package:
            name: redis
            version: 1.3.1
      - id: redis/metrics-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
        revision: 1
        name: redis-1
        type: redis/metrics
        data_stream:
          namespace: default
        use_output: default
        streams:
          - id: redis/metrics-redis.info-fc56b140-3154-4907-9eb3-991ea4ab6569
            data_stream:
              type: metrics
              dataset: redis.info
            metricsets:
              - info
            hosts:
              - '127.0.0.1:6379'
              - 'https://my.redis:6379'
            idle_timeout: 20s
            maxconn: 10
            network: tcp
            period: 10s
          - id: redis/metrics-redis.key-fc56b140-3154-4907-9eb3-991ea4ab6569
            data_stream:
              type: metrics
              dataset: redis.key
            metricsets:
              - key
            hosts:
              - '127.0.0.1:6379'
              - 'https://my.redis:6379'
            idle_timeout: 20s
            key.patterns:
              - limit: 20
                pattern: '*'
            maxconn: 10
            network: tcp
            period: 10s
          - id: redis/metrics-redis.keyspace-fc56b140-3154-4907-9eb3-991ea4ab6569
            data_stream:
              type: metrics
              dataset: redis.keyspace
            metricsets:
              - keyspace
            hosts:
              - '127.0.0.1:6379'
              - 'https://my.redis:6379'
            idle_timeout: 20s
            maxconn: 10
            network: tcp
            period: 10s
        meta:
          package:
            name: redis
            version: 1.3.1
    revision: 3
    agent:
      monitoring:
        namespace: default
        use_output: default
        enabled: true
        logs: true
        metrics: true
    output_permissions:
      default:
        _elastic_agent_monitoring:
          indices:
            - names:
                - logs-elastic_agent.apm_server-default
              privileges: &ref_0
                - auto_configure
                - create_doc
            - names:
                - metrics-elastic_agent.apm_server-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.auditbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.cloudbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.elastic_agent-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.cloudbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.endpoint_security-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.auditbeat-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.endpoint_security-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.filebeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.filebeat-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.fleet_server-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.heartbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.fleet_server-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.metricbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.metricbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.heartbeat-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.osquerybeat-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.packetbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.osquerybeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.packetbeat-default
              privileges: *ref_0
        _elastic_agent_checks:
          cluster:
            - monitor
        system-6:
          indices:
            - names:
                - logs-system.auth-default
              privileges: *ref_0
            - names:
                - logs-system.syslog-default
              privileges: *ref_0
            - names:
                - logs-system.application-default
              privileges: *ref_0
            - names:
                - logs-system.security-default
              privileges: *ref_0
            - names:
                - logs-system.system-default
              privileges: *ref_0
            - names:
                - metrics-system.fsstat-default
              privileges: *ref_0
            - names:
                - metrics-system.filesystem-default
              privileges: *ref_0
            - names:
                - metrics-system.diskio-default
              privileges: *ref_0
            - names:
                - metrics-system.cpu-default
              privileges: *ref_0
            - names:
                - metrics-system.process-default
              privileges: *ref_0
            - names:
                - metrics-system.memory-default
              privileges: *ref_0
            - names:
                - metrics-system.network-default
              privileges: *ref_0
            - names:
                - metrics-system.socket_summary-default
              privileges: *ref_0
            - names:
                - metrics-system.process.summary-default
              privileges: *ref_0
            - names:
                - metrics-system.load-default
              privileges: *ref_0
            - names:
                - metrics-system.uptime-default
              privileges: *ref_0
        kubernetes-1:
          indices:
            - names:
                - metrics-kubernetes.container-default
              privileges: *ref_0
            - names:
                - metrics-kubernetes.node-default
              privileges: *ref_0
            - names:
                - metrics-kubernetes.pod-default
              privileges: *ref_0
            - names:
                - metrics-kubernetes.system-default
              privileges: *ref_0
            - names:
                - metrics-kubernetes.volume-default
              privileges: *ref_0
            - names:
                - metrics-kubernetes.apiserver-default
              privileges: *ref_0
            - names:
                - metrics-kubernetes.proxy-default
              privileges: *ref_0
            - names:
                - metrics-kubernetes.event-default
              privileges: *ref_0
            - names:
                - logs-kubernetes.container_logs-default
              privileges: *ref_0
        redis-1:
          indices:
            - names:
                - logs-redis.log-default
              privileges: *ref_0
            - names:
                - logs-redis.slowlog-default
              privileges: *ref_0
            - names:
                - metrics-redis.info-default
              privileges: *ref_0
            - names:
                - metrics-redis.key-default
              privileges: *ref_0
            - names:
                - metrics-redis.keyspace-default
              privileges: *ref_0

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: elastic-agent
  namespace: kube-system
  labels:
    app: elastic-agent
spec:
  selector:
    matchLabels:
      app: elastic-agent
  template:
    metadata:
      labels:
        app: elastic-agent
    spec:
      tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      serviceAccountName: elastic-agent
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
        - name: elastic-agent
          image: docker.elastic.co/beats/elastic-agent:8.4.0
          args: [
            "-c", "/etc/agent.yml",
            "-e",
            "-d", "'*'",
          ]
          env:
            - name: ES_USERNAME
              value: "elastic"
            - name: ES_PASSWORD
              value: "changeme"
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          securityContext:
            runAsUser: 0
          resources:
            limits:
              memory: 500Mi
            requests:
              cpu: 100m
              memory: 200Mi
          volumeMounts:
            - name: datastreams
              mountPath: /etc/agent.yml
              readOnly: true
              subPath: agent.yml
            - name: proc
              mountPath: /hostfs/proc
              readOnly: true
            - name: etc-kubernetes
              mountPath: /hostfs/etc/kubernetes
            - name: var-lib
              mountPath: /hostfs/var/lib
              readOnly: true
            - name: cgroup
              mountPath: /hostfs/sys/fs/cgroup
              readOnly: true
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: passwd
              mountPath: /hostfs/etc/passwd
              readOnly: true
            - name: group
              mountPath: /hostfs/etc/group
              readOnly: true
            - name: systemd
              mountPath: /hostfs/etc/systemd
              readOnly: true
      volumes:
        - name: datastreams
          configMap:
            defaultMode: 0640
            name: agent-node-datastreams
        - name: proc
          hostPath:
            path: /proc
        - name: etc-kubernetes
          hostPath:
            path: /etc/kubernetes
        - name: var-lib
          hostPath:
            path: /var/lib
        - name: passwd
          hostPath:
            path: /etc/passwd
        - name: group
          hostPath:
            path: /etc/group
        - name: cgroup
          hostPath:
            path: /sys/fs/cgroup
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: varlog
          hostPath:
            path: /var/log
        - name: systemd
          hostPath:
            path: /etc/systemd
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: elastic-agent
subjects:
  - kind: ServiceAccount
    name: elastic-agent
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: elastic-agent
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: kube-system
  name: elastic-agent
subjects:
  - kind: ServiceAccount
    name: elastic-agent
    namespace: kube-system
roleRef:
  kind: Role
  name: elastic-agent
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: elastic-agent-kubeadm-config
  namespace: kube-system
subjects:
  - kind: ServiceAccount
    name: elastic-agent
    namespace: kube-system
roleRef:
  kind: Role
  name: elastic-agent-kubeadm-config
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: elastic-agent
  labels:
    k8s-app: elastic-agent
rules:
  - apiGroups: [""]
    resources:
      - nodes
      - namespaces
      - events
      - pods
      - services
      - configmaps
      - serviceaccounts
    verbs: ["get", "list", "watch"]
  # Enable this rule only if planing to use kubernetes_secrets provider
  #- apiGroups: [""]
  #  resources:
  #  - secrets
  #  verbs: ["get"]
  - apiGroups: ["extensions"]
    resources:
      - replicasets
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apps"]
    resources:
      - statefulsets
      - deployments
      - replicasets
    verbs: ["get", "list", "watch"]
  - apiGroups: ["batch"]
    resources:
      - jobs
      - cronjobs
    verbs: ["get", "list", "watch"]
  - apiGroups:
      - ""
    resources:
      - nodes/stats
    verbs:
      - get
  # required for apiserver
  - nonResourceURLs:
      - "/metrics"
    verbs:
      - get
  # required for cloudbeat
  - apiGroups: ["rbac.authorization.k8s.io"]
    resources:
      - clusterrolebindings
      - clusterroles
      - rolebindings
      - roles
    verbs: ["get", "list", "watch"]
  - apiGroups: ["networking.k8s.io"]
    resources:
      - ingressclasses
      - ingresses
    verbs: ["get", "list", "watch"]
  - apiGroups: ["policy"]
    resources:
      - podsecuritypolicies
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: elastic-agent
  # should be the namespace where elastic-agent is running
  namespace: kube-system
  labels:
    k8s-app: elastic-agent
rules:
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: elastic-agent-kubeadm-config
  namespace: kube-system
  labels:
    k8s-app: elastic-agent
rules:
  - apiGroups: [""]
    resources:
      - configmaps
    resourceNames:
      - kubeadm-config
    verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-agent
  namespace: kube-system
  labels:
    k8s-app: elastic-agent
---

The relevant section is probably the various redis inputs, e.g.

- id: logfile-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
  revision: 1
  name: redis-1
  type: logfile
  data_stream:
    namespace: default
  use_output: default
  streams:
    - id: logfile-redis.log-fc56b140-3154-4907-9eb3-991ea4ab6569
      data_stream:
        type: logs
        dataset: redis.log
      paths:
        - /var/log/redis/redis-server.log*
        - /var/log/redis/my-cool-redis.log*
      tags:
        - redis-log
      exclude_files:
        - .gz$
      exclude_lines:
        - '^\s+[\-`(''.|_]'
  meta:
    package:
      name: redis
      version: 1.3.1
  - id: redis-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
  revision: 1
  name: redis-1
  type: redis
  data_stream:
    namespace: default
  use_output: default
  streams:
    - id: redis-redis.slowlog-fc56b140-3154-4907-9eb3-991ea4ab6569
      data_stream:
        type: logs
        dataset: redis.slowlog
      hosts:
        - '127.0.0.1:6379'
        - 'https://my.redis:6379'
      password: ''
  meta:
    package:
      name: redis
      version: 1.3.1
  - id: redis/metrics-redis-fc56b140-3154-4907-9eb3-991ea4ab6569
  revision: 1
  name: redis-1
  type: redis/metrics
  data_stream:
    namespace: default
  use_output: default
  streams:
    - id: redis/metrics-redis.info-fc56b140-3154-4907-9eb3-991ea4ab6569
      data_stream:
        type: metrics
        dataset: redis.info
      metricsets:
        - info
      hosts:
        - '127.0.0.1:6379'
        - 'https://my.redis:6379'
      idle_timeout: 20s
      maxconn: 10
      network: tcp
      period: 10s
    - id: redis/metrics-redis.key-fc56b140-3154-4907-9eb3-991ea4ab6569
      data_stream:
        type: metrics
        dataset: redis.key
      metricsets:
        - key
      hosts:
        - '127.0.0.1:6379'
        - 'https://my.redis:6379'
      idle_timeout: 20s
      key.patterns:
        - limit: 20
          pattern: '*'
      maxconn: 10
      network: tcp
      period: 10s
    - id: redis/metrics-redis.keyspace-fc56b140-3154-4907-9eb3-991ea4ab6569
      data_stream:
        type: metrics
        dataset: redis.keyspace
      metricsets:
        - keyspace
      hosts:
        - '127.0.0.1:6379'
        - 'https://my.redis:6379'
      idle_timeout: 20s
      maxconn: 10
      network: tcp
      period: 10s
  meta:
    package:
      name: redis
      version: 1.3.1

So, rather than a block like this, which uses the Redis integration's stream template

hosts:
  - '127.0.0.1:6379'
  - 'https://my.redis:6379'

Fleet would output a block like this:

hosts:
  - "${docker.hints.redis.info.host|kubernetes.hints.redis.info.host|'127.0.0.1:6379'}"

My question here is how does "know" about the hint string? Is it included in the package definition somewhere? e.g. would the redis package include a k8s/templates/info.yml file or something like that for the info input? I'm imagining a workflow like

Fleet shows "add agent" flyout
User selects standalone option
Fleet generates ConfigMap YML block
  for each integration policy on the current agent policy:
    if the package exports a k8's input template:
      append a compiled input block to the ConfigMap YML based on the exported template
Fleet renders ConfigMap YML block

Also, do we ignore user input in cases where we're using these hint strings instead, or are they intended to solve as default values overridden by the user? e.g. should the 127.0.0.1:6379 string at the end of the hint string honor user input or is it just a hardcoded default?

@gizas
Copy link
Contributor

gizas commented Jun 28, 2022

  • Application operator (developer) should have an accessible way to find the list of available hints

For the first I was thinking that as long as we produce the templates of input.d directory we can also produce the list of available hints to be used. But this can be done as even in later stages of project. We can even render only the relevant template according only to the integrations enabled per case if makes sense.

@ChrsMark
Copy link
Member

Thanks for the extra feedback folks!

@kpollich trying to answer your questions below:

Then, the user will be responsible for manually managing the ConfigMap and placing it correctly within the inputs.d directory.

In k8s we can avoid having the users to manually place the ConfigMap in the inputs.d directory since we can define the file "tree" inside the ConfigMap like what we do at https://github.com/elastic/beats/blob/main/deploy/kubernetes/metricbeat/metricbeat-daemonset-configmap.yaml#L79
which is then mounted in Metrcibeat's Pod at https://github.com/elastic/beats/blob/main/deploy/kubernetes/metricbeat-kubernetes.yaml#L198.

The only thing FleetUI needs to do is to produce sth like the following:

apiVersion: v1
kind: ConfigMap
metadata:
  name: elastic-agent-standalone-inputs
data:
  redis.yml: |-
    - data_stream: ...
  ....
  apache.yml: |-
  ....
  nginx.yml: |-
  ....

Then we just need to mount it inside the Agent's Pod as a directory using sth like the following:

...
   volumeMounts:
     - name: inputs
          mountPath: /usr/share/elastic-agent/inputs.d
          readOnly: true
   volumes:
      - name: modules
        configMap:
          defaultMode: 0640
          name: elastic-agent-standalone-inputs
...

See https://github.com/elastic/beats/blob/main/deploy/kubernetes/metricbeat-kubernetes.yaml#L197-L220 for a complete example.

In terms of UX, since we are maintaining all those manifests I see 2 possible paths that could be supported at the same time.

  1. If users use the generate the manifest file in the "Add Agent" flyout, e.g. then everything will be constructed by Fleet UI and users just get the manifest with everything needed inside.
  2. If users are downloading the manifests from our upstream following https://www.elastic.co/guide/en/fleet/current/running-on-kubernetes-standalone.html, then we can just add an extra step to our docs to point them to go to Fleet UI and generate only the inputs ConfigMap. This will be a single ConfigMap with only the "templates" inside (it is the same in step 1 too) which then users should place in the same directory with where they have the elastic-agent-standalone-kubernetes.yaml manifest and deploy this ConfigMap first before deploying Agent.

Ofc we can improve the UX a lot here but this can happen on top of the basic/core implementation. At some point we could even provide the users to only use step 1 instead of downloading the manifests from the upstream. I would leave it to @mlunadia to provide better directions here but as I mentioned for now we can just expose it with the 2 simple ways provided above.

My question here is how does "know" about the hint string? Is it included in the package definition somewhere? e.g. would the redis package include a k8s/templates/info.yml file or something like that for the info input?

The current idea is to only provide support for a specific set/list of hints like host, period, timeout, data_stream, log_stream etc. This is how the feature was supported in Beats and we didn't have any big indicator that users would need sth different.

Also keeping the list of hints specific provides more control to what we accept as inputs from the users and also it's more secure than providing the full flexibility to set everything via the hints. Keep in mind that this mechanism will be exposed to everyone that has access to deploy Pods on the cluster.

So to answer your question if we have a specific set of hints that we support then Fleet UI can just look and replace the respective settings, leveraging their default values etc. Nothing changes in the way packages are developed. I guess this would be doable?

Also, do we ignore user input in cases where we're using these hint strings instead, or are they intended to solve as default values overridden by the user? e.g. should the 127.0.0.1:6379 string at the end of the hint string honor user input or is it just a hardcoded default?

For all the settings that will not be exposed as hints we will use the default values as defined in the packages' specs. For those that are exposed as hints (as proposed above) we will have a logic to set them like "${kubernetes.hints.redis.info.host|'127.0.0.1:6379'}". So in that case we imply that for this setting we will either use the hint if provided or we will use the default value of the package.
Tbh I don't see how users could overwrite those :thinking_face: , the idea here is that we generate the "templates" in a low level format compatible to be populated by hints mechanism. So since we have conditions like condition: "${kubernetes.hints.redis.key.enabled|}" it's almost impossible this input block to be enabled by anything else other than the "hints" mechanism.

Since I have already started experimenting a bit with the backend implementation here is a ConfigMap that we would need to have:

apiVersion: v1
kind: ConfigMap
metadata:
  name: elastic-agent-standalone-inputs
data:
  redis.yml: |-
    inputs:
     - name: templates.d/redis/0.3.6
       type: redis/metrics
       data_stream.namespace: default
       use_output: default
       streams:
         - data_stream:
             dataset: redis.info
             type: metrics
           metricsets:
           - info
           hosts:
           - "${kubernetes.hints.redis.info.host|'127.0.0.1:6379'}"
           idle_timeout: 20s
           maxconn: 10
           network: tcp
           period: "${kubernetes.hints.redis.info.period|'10s'}"
           condition: ${kubernetes.hints.redis.info.enabled} == true
         - data_stream:
             dataset: redis.key
             type: metrics
           metricsets:
           - key
           hosts:
           - "${kubernetes.hints.redis.key.host|'127.0.0.1'}:${kubernetes.hints.redis.info.port|'6379'}"
           idle_timeout: 20s
           key.patterns:
             - limit: 20
               pattern: '*'
           maxconn: 10
           network: tcp
           period: "${kubernetes.hints.redis.key.period|'10s'}"
           condition: ${kubernetes.hints.redis.key.enabled} == true

@kpollich
Copy link
Member

The only thing FleetUI needs to do is to produce sth like the following: ...
data:
redis.yml: |-
- data_stream: ...
....
apache.yml: |-
....
nginx.yml: |-

I understand Fleet's responsibility in terms of generating these [integration].yml blocks, but my confusion is around how Fleet determines what data blocks to generate.

  1. Is there a subset of supported integrations that result in these unique data blocks being generated?
  2. Does Fleet UI always generate the same list of data blocks? e.g. is it a hard-coded list?
  3. Is the generated list of data blocks dependent on the user's set of installed integrations? e.g. if I install the redis integration to my policy, should I expect to see the redis.yml block in my data section of the ConfigMap?
  4. If number 3 is the case, should I be able to edit my integration policy in Fleet UI to set variables, namespace, etc? Should those settings be included in the redis.yml block?

I'm assuming that number 3 above is accurate, based on this point:

If users use the generate the manifest file in the "Add Agent" flyout, e.g. then everything will be constructed by Fleet UI and users just get the manifest with everything needed inside.


So to answer your question if we have a specific set of hints that we support then Fleet UI can just look and replace the respective settings, leveraging their default values etc. Nothing changes in the way packages are developed. I guess this would be doable?

This makes sense to me. We'll have a hardcoded list of supported settings and their respective template strings stored in the Fleet codebase.


Tbh I don't see how users could overwrite those :thinking_face: , the idea here is that we generate the "templates" in a low level format compatible to be populated by hints mechanism

I'm thinking of the case where

  1. User installs k8's integration
  2. User installs redis integration
  3. User sets host variable in their redis integration policy
  4. Fleet UI displays k8's ConfigMap when prompting for agent enrollment, but a hardcoded "hints" string is displayed instead of the user's input

Does this actually happen or matter? I'm just worried about confusion in the UX if we continue to allow for configuration of these integrations via Fleet's policy editor, but the actual configuration provided is ignored in favor of the hardcoded hints strings.

@ChrsMark
Copy link
Member

ChrsMark commented Jun 30, 2022

Thanks @kpollich for extra round of feedback. I see where the confusion comes from.

The key point here is: the new ConfigMap with the "templates" inside will consist of all the integrations available and not only the installed ones.

So in order for this to happen we leverage the structure that the available packages provide (from the registry etc) and we construct config-blocks/templates using the default values the packages provide and adding some extra conditions along with hints's placeholders.

Users should have NO access to these templates in time of creation so Fleet does not take into account the installed integrations.

So the flow for creating this new ConfigMap is like this:

  1. Fleet retrieves all the available packages/integrations from the Registry.
  2. One by one constructs the config blocks using the default values defined in the package spec.
    a . For every setting that is a known "hint", populates its value with the hint placeholder/variable like ${kubernetes.hints.redis.info.host}". The fallback of this should be the default value so the final value of the setting is like ${kubernetes.hints.redis.info.host|'127.0.0.1:6379'}".
    b. for every data_stream in the config block we add the proper condition so as this to be enabled only by the hint mechanism: condition: ${kubernetes.hints.redis.key.enabled} == true

To make it more clear we could use other options to make the generation of this long configuration template available, such as elastic-package. The goal here is to leverage the available packages so as to produce templates that are ready to be enabled by the hints. We choose to have Fleet UI to be the component that generates those since Fleet already has access to the registry and handles packages and in order to have a seamless User Experience.

@kpollich
Copy link
Member

Thanks @ChrsMark - I understand now that we're working with more of a "static" templating implementation as opposed to dynamically generating the ConfigMap based on what integrations are installed. All sounds good to me!

@ChrsMark
Copy link
Member

ChrsMark commented Jul 4, 2022

Follow-up/project specific implementation issues created:

  1. Add support for template construction compatible with kubernetes hints: Add support for template construction compatible with kubernetes hints kibana#135624 will be implemented in CI jobs
  2. Add support for hints' based autodiscovery in kubernetes provider #662

@mlunadia mlunadia changed the title Selective monitoring of Kubernetes workloads from the resource side for agent standalone Selective monitoring of Kubernetes workloads from the K8s cluster side for agent standalone Jul 5, 2022
@mlunadia mlunadia changed the title Selective monitoring of Kubernetes workloads from the K8s cluster side for agent standalone Selective monitoring of Kubernetes workloads from the K8s cluster for agent standalone Jul 5, 2022
@kpollich
Copy link
Member

kpollich commented Jul 5, 2022

One by one constructs the config blocks using the default values defined in the package spec.

@ChrsMark is there any work needed on the package spec to add support for this? If so can we link an implementation from the package-spec repo here as well?

Realizing I think I misunderstood. The hints themselves aren't coming from package spec right? We're just talking about the default: foo type values in a package manifest here I think. The hints are hardcoded in the Kibana/Fleet code, correct?

@ChrsMark
Copy link
Member

ChrsMark commented Jul 5, 2022

Yeap the hints list will be defined in Kibana/Fleet code.
The package spec will not need any change at the moment.

@ChrsMark
Copy link
Member

@gizas @mlunadia all items of this one have been completed. The feature will be released as beta with 8.5 and there is still plenty of time for minor improvements etc in docs etc. In this regard let me know if we can close this and handle anything else in targeted follow-ups.

@gizas
Copy link
Contributor

gizas commented Sep 19, 2022

Thanks @ChrsMark really great work here overall.

Let us sync once more for the two quality criteria we had defined :

Application operator (developer) should have an accessible way to find the list of available hints
Application operator (developer) should receive feedback for when a hint is not valid

After that we can close it

@ChrsMark
Copy link
Member

ChrsMark commented Sep 20, 2022

@gizas Based on the experience from using and maintaining the feature in Beats I had provided my input at #613 (comment). Any other ideas?

Keep in mind that the feature is still in beta so if we identify specific implementation details that are missing we can file them as follow up issues and add them as enhancements.

@rameshelastic
Copy link

Closing this as discussed with @ChrsMark and @gizas

@mlunadia mlunadia changed the title Selective monitoring of Kubernetes workloads from the K8s cluster for agent standalone Selective monitoring of Kubernetes workloads using annotations for agent standalone Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.5-candidate Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team v8.5.0
Projects
None yet
Development

No branches or pull requests

7 participants