English | 中文
KubeEye aims to find various problems on Kubernetes, such as application misconfiguration(using OPA), cluster components unhealthy and node problems(using Node-Problem-Detector). Besides predefined rules, it also supports custom defined rules.
KubeEye gets cluster diagnostic data by calling the Kubernetes API, by regular matching of key error messages in resources and by rule matching of container syntax. See Architecture for details.
- Install KubeEye on your machine
-
Download pre built executables from Releases.
-
Or you can build from source code
-
Note: make install will create kubeeye in /usr/local/bin/ on your machine.
```shell
git clone https://github.com/kubesphere/kubeeye.git
cd kubeeye
make install
```
- [Optional] Install Node-problem-Detector
Note: This line will install npd on your cluster, only required if you want detailed report.
kubeeye install -e npd
- Run KubeEye
Note: The results of kubeeye sort by resource kind.
root@node1:# kubeeye audit
NAMESPACE NAME KIND MESSAGE
default nginx Deployment [nginx CPU limits should be set. nginx CPU requests should be set. nginx image tag not specified, do not use 'latest'. nginx livenessProbe should be set. nginx memory limits should be set. nginx memory requests should be set. nginx priorityClassName can be set. nginx root file system should be set read only. nginx readinessProbe should be set. nginx runAsNonRoot can be set.]
default testcronjob CronJob [testcronjob CPU limits should be set. testcronjob CPU requests should be set. testcronjob allowPrivilegeEscalation should be set false. testcronjob have HighRisk capabilities. testcronjob hostIPC should not be set. testcronjob hostNetwork should not be set. testcronjob hostPID should not be set. testcronjob hostPort should not be set. testcronjob imagePullPolicy should be set 'Always'. testcronjob image tag not specified, do not use 'latest'. testcronjob have insecure capabilities. testcronjob livenessProbe should be set. testcronjob memory limits should be set. testcronjob memory requests should be set. testcronjob priorityClassName can be set. testcronjob privileged should be set false. testcronjob root file system should be set read only. testcronjob readinessProbe should be set.]
kube-system testrole Role [testrole can impersonate user. testrole can delete resources. testrole can modify workloads.]
testclusterrole ClusterRole [testclusterrole can impersonate user. testclusterrole can delete resource. testclusterrole can modify workloads.]
NAMESPACE SEVERITY PODNAME EVENTTIME REASON MESSAGE
kube-system Warning vpnkit-controller.16acd7f7536c62e8 2021-10-11T15:55:08+08:00 BackOff Back-off restarting failed container
NODENAME SEVERITY HEARTBEATTIME REASON MESSAGE
node18 Fatal 2020-11-19T10:32:03+08:00 NodeStatusUnknown Kubelet stopped posting node status.
node19 Fatal 2020-11-19T10:31:37+08:00 NodeStatusUnknown Kubelet stopped posting node status.
node2 Fatal 2020-11-19T10:31:14+08:00 NodeStatusUnknown Kubelet stopped posting node status.
node3 Fatal 2020-11-27T17:36:53+08:00 KubeletNotReady Container runtime not ready: RuntimeReady=false reason:DockerDaemonNotReady message:docker: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
NAME SEVERITY TIME MESSAGE
scheduler Fatal 2020-11-27T17:09:59+08:00 Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0 Fatal 2020-11-27T17:56:37+08:00 Get https://192.168.13.8:2379/health: dial tcp 192.168.13.8:2379: connect: connection refused
You can refer to the FAQ content to optimize your cluster.
- KubeEye validates your workloads yaml specs against industry best practice, helps you make your cluster stable.
- KubeEye can find problems of your cluster control plane, including kube-apiserver/kube-controller-manager/etcd, etc.
- KubeEye helps you detect all kinds of node problems, including memory/cpu/disk pressure, unexpected kernel error logs, etc.
YES/NO | CHECK ITEM | Description |
---|---|---|
✅ | NodeDockerHung | Docker hung, you can check docker log |
✅ | InsecureCapabilities | Have insecure options in capabilities such as KILL/SYS_CHROOT/CHOWN |
✅ | PodNoSuchFileOrDirectory | Go into the container to see if the corresponding file exists |
✅ | PodIOError | This is usually due to file IO performance bottlenecks |
✅ | PodNoSuchDeviceOrAddress | Check corresponding net |
✅ | PodNoSpaceLeftOnDevice | Check for disk and inode usage |
✅ | NodeApiServerExpiredPeriod | ApiServer certificate expiration date less than 30 days will be checked |
NodeNotReadyAndUseOfClosedNetworkConnection | http2-max-streams-per-connection | |
NodeNotReady | Failed to start ContainerManager Cannot set property TasksAccounting, or unknown property |
Embedded rules, package the rules into kubeeye for easy use
Function check rules provide more customized rule checks. For example, by using a shell and calling a third-party interface, you can enclose the function and return the output according to the agreed format, which can be displayed uniformly in the report.
Separate management of commands and rules, specify the external OPA rule directory, kubeeye load the rules in the directory and merge them with the default rules.
- Add custom OPA rules files
opa package Note: package name must be select from tabel
type | package |
---|---|
RBAC | kubeeye_RBAC_rego |
workloads | kubeeye_workloads_rego |
nodes | kubeeye_nodes_rego |
events | kubeeye_events_rego |
- Save the following rule to rule file such as imageRegistryRule.rego for audit the image registry address complies with rules.
package kubeeye_workloads_rego
deny[msg] {
resource := input
type := resource.Object.kind
resourcename := resource.Object.metadata.name
resourcenamespace := resource.Object.metadata.namespace
workloadsType := {"Deployment","ReplicaSet","DaemonSet","StatefulSet","Job"}
workloadsType[type]
not workloadsImageRegistryRule(resource)
msg := {
"Name": sprintf("%v", [resourcename]),
"Namespace": sprintf("%v", [resourcenamespace]),
"Type": sprintf("%v", [type]),
"Message": "ImageRegistryNotmyregistry"
}
}
workloadsImageRegistryRule(resource) {
regex.match("^myregistry.public.kubesphere/basic/.+", resource.Object.spec.template.spec.containers[_].image)
}
- create a directory for OPA rules
mkdir opa
- Save the following rule to rule file such as imageRegistryRule.rego for audit the image registry address complies with rules.
package kubeeye_workloads_rego
deny[msg] {
resource := input
type := resource.Object.kind
resourcename := resource.Object.metadata.name
resourcenamespace := resource.Object.metadata.namespace
workloadsType := {"Deployment","ReplicaSet","DaemonSet","StatefulSet","Job"}
workloadsType[type]
not workloadsImageRegistryRule(resource)
msg := {
"Name": sprintf("%v", [resourcename]),
"Namespace": sprintf("%v", [resourcenamespace]),
"Type": sprintf("%v", [type]),
"Message": "ImageRegistryNotmyregistry"
}
}
workloadsImageRegistryRule(resource) {
regex.match("^myregistry.public.kubesphere/basic/.+", resource.Object.spec.template.spec.containers[_].image)
}
- Run KubeEye with custom rules
Note: Specify the path then Kubeeye will read all files in the directory that end with .rego.
root:# kubeeye audit -p ./opa -f ~/.kube/config
NAMESPACE NAME KIND MESSAGE
default nginx1 Deployment [ImageRegistryNotmyregistry NotReadOnlyRootFilesystem NotRunAsNonRoot]
default nginx11 Deployment [ImageRegistryNotmyregistry PrivilegeEscalationAllowed HighRiskCapabilities HostIPCAllowed HostPortAllowed ImagePullPolicyNotAlways ImageTagIsLatest InsecureCapabilities NoPriorityClassName PrivilegedAllowed NotReadOnlyRootFilesystem NotRunAsNonRoot]
default nginx111 Deployment [ImageRegistryNotmyregistry NoCPULimits NoCPURequests ImageTagMiss NoLivenessProbe NoMemoryLimits NoMemoryRequests NoPriorityClassName NotReadOnlyRootFilesystem NoReadinessProbe NotRunAsNonRoot]
Thanks goes to these wonderful people (emoji key):
ruiyaoOps 💻 📖 |
Forest 📖 |
zryfish 📖 |
shaowenchen 📖 |
pixiake 📖 |
pengfei 📖 |
Harsh Thakur 💻 |
This project follows the all-contributors specification. Contributions of any kind welcome!