Stale kernel pods #1027
-
We are using enterprise gateway with kubernetes. We are seeing stale kernel pods build up in our system
However, as you can see the EG has been restarted just few days back.
Whats causing these stale kernel pods? When EG gets killed, doesnt it bring down the kernel pods launched by it? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
I think we need to figure out under what scenario these "leaks" are occurring. Yes, when EG gets shutdown (gracefully), it will attempt to cycle through the kernels it knows about and issue shutdown commands for those kernels (it knows about). However, if the shutdown isn't long-lived enough, I suppose there could be some orphaned pods. Another place to look for this is across kernel restarts, since that consists of shutting down the pod and starting a new one. I suspect the kernel managers believe they are NOT tracking the kernels associated with these pods and its the pods themselves that are the issue. Is culling enabled? If so, you should be able to monitor the EG logs for kernel activity since each culling cycle (60 seconds by default) will produce a DEBUG entry corresponding to each kernel it knows about. If you see pods relative to kernel-ids that are not in the logs for each culling poll cycle, then it might help identify under what circumstances the leak was triggered. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot @kevin-bates and sorry for the delay. Yes, we have culling enabled
I think i understand how can i repro this issue consistently. We regularly do EG helm upgrade. What i see if the kernel pods which are still up, launched by the previous EG, dont go away and the new EG obviously doesnt know about to cull them. We are solving this by adding an extra script which kills all kernel namespace after the helm upgrade, not a solution i am proud of, but it does the job done. |
Beta Was this translation helpful? Give feedback.
I think we need to figure out under what scenario these "leaks" are occurring.
Yes, when EG gets shutdown (gracefully), it will attempt to cycle through the kernels it knows about and issue shutdown commands for those kernels (it knows about). However, if the shutdown isn't long-lived enough, I suppose there could be some orphaned pods.
Another place to look for this is across kernel restarts, since that consists of shutting down the pod and starting a new one.
I suspect the kernel managers believe they are NOT tracking the kernels associated with these pods and its the pods themselves that are the issue.
Is culling enabled? If so, you should be able to monitor the EG logs for kernel acti…