Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart the Fluentbit pods when a Filter/Parser/Input/Output is created/removed. #465

Closed
Kristian-ZH opened this issue Nov 23, 2022 · 18 comments
Assignees

Comments

@Kristian-ZH
Copy link
Member

Is your feature request related to a problem? Please describe.

Currently, the operator consists of two controllers: fluentbit-controller and fluentbit-config-controller. Both controllers do not share resources and in this way when we deploy a ClusterFilter for example after the FB Daemonset is created, the new filter will not be applied to the fluent-bit.
The fluentbit-config-controller will pick this Filter and will add it to the Secret containing all the configurations, but nothing will restart the fluent-bit pods.

I think that one operator should handle its components end-to-end and such dynamic configurations should be also possible, so please consider this scenario.

Describe the solution you'd like

I was thinking about the following solution which follows the K8S immutability approach.

  1. The data of the configuration secret can be hashed and added as a suffix of the secret name (the last 5 chars are enough I think).
  2. After the fluentbit-config-controller update the Secret, it can add the Secret name or the secret hash as an annotation to the FluentBit resource. It could be done with a simple Patch request.
  3. After the Patch request, the fb-controller will take the new FluentBit object and will update the FluentBit Daemonset with the new SecretRef (taken from the annotation)
  4. The K8S will automatically restart the pods when the SecretRef is changed.

Also with this approach the fluentbit-config-controller can keep only last two config secrets in case of debugging and the other could be deleted.

Screenshot 2022-11-23 at 13 53 57

Additional context

In case you like the mentioned approach I can open a PR with the implementation. I am also open to discussions of other possible approaches to this scenario.

@wanjunlei
Copy link
Collaborator

Thanks for engaging @Kristian-ZH

First, the creation, update, and deletion of Filter/Parser/Input/Output will trigger the configuration reload of fluent bit. This will happen within the pod, so the pod does not need to restart.

I am not sure what you mean by not be applied to the fluent bit. If this means the fluent bit does not restart, this is as expected.
If this means the filter no worked, maybe this is a bug.

Can you provide more information to determine if this is a bug?

@Kristian-ZH
Copy link
Member Author

Hi,

Yes I did not know about this feature, because I see that FB still does not provide dynamic support: fluent/fluent-bit#365

But this still does not work entirely in my setup. The new configurations are dynamically injected in the /fluent-bit/config directory but the FB pods continue to use the old configurations and their behaviour is not getting changed.

How to reproduce it.

  1. Deploy CustomInput (tail) probably some ClusterFilters (in my setup I have lua filters) and a ClusterOutput (in my setup I have custom Output plugin).
  2. Then you can try to delete the Filter or the Output
  3. You can verify that the configurations in the fb dir are changed but the fb behaviour does not change and still works with the initial configurations.

Can you please check it and tell me if the bug exists in your setup as well or if it is something on my side?

@wanjunlei
Copy link
Collaborator

I need the following information too.

The k8s version.
The container runtime of k8s.
The fluent operator version and the fluent-bit version.

@benjaminhuo
Copy link
Member

benjaminhuo commented Nov 24, 2022

@Kristian-ZH Fluent Operator will restart the fluentbit process whenever the config is updated without restarting the fluentbit pod, you can take a look at the code here:

https://github.com/fluent/fluent-operator/blob/master/cmd/fluent-watcher/fluentbit/main.go
https://github.com/fluent/fluent-operator/blob/master/cmd/fluent-watcher/fluentbit/Dockerfile#L16

It's a bug to fix if it doesn't pickup the latest config changes

@Kristian-ZH
Copy link
Member Author

Kristian-ZH commented Nov 24, 2022

k8s version: 1.22.16
container runtime: containerd://1.5.13
The fluent operator version and the fluent-bit version.
the fluent operator version: I use this commit: 953d596 and the fluent-bit from here: fluent/fluent-bit:2.0.3-debug

I also install the operator via helm: helm install fluent-operator --create-namespace -n fluent charts/fluent-operator/ --set containerRuntime=containerd

@wanjunlei
Copy link
Collaborator

Native fluentbit does not support dynamic loading configuration. You need to use our packaged image kubesphere/fluent-bit:v1.9.9.

@Kristian-ZH
Copy link
Member Author

Ahh, got it. Will try it :), thanks.
Can you just share if kubesphere/fluent-bit:v1.9.9 is the same as fluent/fluent-bit:v1.9.9 + the dynamic configurations?
E.g. All the features and bug fixes from the original fluent-bit will exist but it has yet another dynamic config feature.

@benjaminhuo
Copy link
Member

Its base image is the official fluent/fluent-bit:1.9.9, you'll understand it if you take a look at https://github.com/fluent/fluent-operator/blob/master/cmd/fluent-watcher/fluentbit/Dockerfile#L16

@Kristian-ZH
Copy link
Member Author

I have tested it with the kubesphere/fluent-bit:v1.9.9 now but it still does not load the configuration dynamically

@wenchajun
Copy link
Member

I have tested it with the kubesphere/fluent-bit:v1.9.9 now but it still does not load the configuration dynamically

Can you show the logs for it? It usually changes as its secret changes.

@Kristian-ZH
Copy link
Member Author

Sure,

Here I deployed FB with just one input plugin (tail). After that, I applied a ClusterOutput (stdout).
In the logs there are no indications that a new configuration is loaded:

Fluent Bit v1.9.9
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/11/24 12:40:39] [ info] [fluent bit] version=1.9.9, commit=5c03b2e555, pid=1
[2022/11/24 12:40:39] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/11/24 12:40:39] [ info] [cmetrics] version=0.3.7
[2022/11/24 12:40:39] [ info] [output:null:null.0] worker #0 started
[2022/11/24 12:40:39] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2022/11/24 12:40:39] [ info] [sp] stream processor started
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=528601 watch_fd=1 name=/var/log/containers/apiserver-proxy-jmc8x_kube-system_proxy-63fdbc80edf4c7f501158707ab2c95d071a183ce1e889766e6b13cad8abe13e0.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=134431 watch_fd=2 name=/var/log/containers/csi-driver-node-l6cgw_kube-system_csi-driver-66c768bd4c72399cbe7f275c7b06397bd0756a62681be71139c5584d4bbd7d61.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=790551 watch_fd=3 name=/var/log/containers/fluent-bit-czkxh_fluent_install-plugin-11db44f0bd42790452918814579cdce3a63a3e97f5f940ea185f23070d6e5272.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=790607 watch_fd=4 name=/var/log/containers/fluent-bit-dc6lh_fluent_install-plugin-fe77ce468ab92dee06946d8e43a8a86dc3d0b9c6eb4a3f3184d1091b0b29d148.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=790443 watch_fd=5 name=/var/log/containers/loki-0_shoot--i355448--local-shoot_loki-b372376e5a6bde14b178b7c67d641aaa61ad642dd7abe4484b1397bf3f378733.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=396583 watch_fd=6 name=/var/log/containers/network-problem-detector-host-nmbcr_kube-system_network-problem-detector-host-d7ea484d7847c82dd0fb9fda729c17715cea3a37a9af1f9aded3948dfc017bb6.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=267989 watch_fd=7 name=/var/log/containers/network-problem-detector-pod-fdhrv_kube-system_network-problem-detector-pod-3c8e9c59b5d21ae4ce5664d68812a7338e60f4c9f2ddb148b8862510c90f9aec.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=790566 watch_fd=8 name=/var/log/containers/shoot-dns-service-55f98795b8-76b8t_shoot--i355448--local-shoot_shoot-dns-service-613204b12810e79ee79c10a3998bea5349a178422ae0fda44efe436415808a42.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=134454 watch_fd=9 name=/var/log/containers/calico-node-jkwmw_kube-system_calico-node-7e9bcd437bf2350233e060bcd91c94ba77000108bce68e6de2de8bfd2fef0216.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=790490 watch_fd=10 name=/var/log/containers/fluent-bit-dc6lh_fluent_fluent-bit-cdaea96840cd444442cd8e09745126f88131d3dfdcb91a74fe04a7c72febc639.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=134393 watch_fd=11 name=/var/log/containers/kube-proxy-worker-rm668-v1.22.16-h5z2g_kube-system_kube-proxy-c8d6fa28be0be85837efafa0dd2943060ab765a45133f915f921798c3f45e847.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=396589 watch_fd=12 name=/var/log/containers/ofd-fluentbit-qdwcv_logging_fluentbit-407ccd89c880d080d4f66db73ee16a07510a2d3dde3978fec21a43da2fc389b3.log
[2022/11/24 12:40:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=790615 watch_fd=13 name=/var/log/containers/fluent-bit-czkxh_fluent_fluent-bit-4777c4feb89d47450f983367aae322339affc91ff7c17b89d318f6bb7477162f.log
[2022/11/24 12:41:44] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=790490 watch_fd=10
[2022/11/24 12:41:44] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=790607 watch_fd=4
[2022/11/24 12:42:19] [ info] [input:tail:tail.0] inotify_fs_add(): inode=790482 watch_fd=14 name=/var/log/containers/shoot-dns-service-55f98795b8-76b8t_shoot--i355448--local-shoot_shoot-dns-service-c20fc4506fe0bb8e49f9eaae6531cfe1e51270d87ac7b1609d5701e8e596ba0a.log
[2022/11/24 12:42:19] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=790566 watch_fd=8

@Kristian-ZH
Copy link
Member Author

Ahh I think that I found the problem,
Now I saw that in your Dockerfile, you are using this entrypoint ENTRYPOINT ["/fluent-bit/bin/fluent-bit-watcher"]
But in the fluent-bit I have command field in the which loads my custom output plugin and it looks like this:

    - /fluent-bit/bin/fluent-bit
    - -e
    - /fluent-bit/plugins/out_loki.so
    - -c
    - /fluent-bit/config/fluent-bit.conf

And what I think is that it overrides your entrypoint and does not start the fluent-bit-watcher bynary

@benjaminhuo
Copy link
Member

Ahh I think that I found the problem, Now I saw that in your Dockerfile, you are using this entrypoint ENTRYPOINT ["/fluent-bit/bin/fluent-bit-watcher"] But in the fluent-bit I have command field in the which loads my custom output plugin and it looks like this:

    - /fluent-bit/bin/fluent-bit
    - -e
    - /fluent-bit/plugins/out_loki.so
    - -c
    - /fluent-bit/config/fluent-bit.conf

And what I think is that it overrides your entrypoint and does not start the fluent-bit-watcher bynary

Bingo! Don't start to run before you can walk :)

@Kristian-ZH
Copy link
Member Author

Now the question is: How can I start the watcher binary and at the same time load a custom plugin to the fluent-bit :D
As I see the watcher binary does not support -e argument

@benjaminhuo
Copy link
Member

benjaminhuo commented Nov 24, 2022

@Kristian-ZH
Copy link
Member Author

Yes...
Okay thanks, as the problem is in my setup and it is not the one I have mentioned in the issue I will close it.
Thanks for the support. I will open a PR with the additional flag probably once I test it and it works as expected :)

@benjaminhuo
Copy link
Member

  • /fluent-bit/plugins/out_loki.so

FluentBit supports loki plugin, why using a custom loki plugin?

@Kristian-ZH
Copy link
Member Author

Long story short, because our setup needs a dynamic hostpath configuration based on the tags and the loki plugin is not powerful to achieve this. That's why we are using a fork of the plugin: https://github.com/gardener/logging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants