-
Notifications
You must be signed in to change notification settings - Fork 211
Description
Hi,
This error happens on Ubuntu 18.04 only so far. There are no errors when trying to unbind/bind manually as root user on the host level. With privileged: true
added to the securityContext in the pod spec the permission error is gone and unbinding works fine.
OS: Ubuntu 18.04.4 LTS
QAT device model: 8086:37c8
Kernel : 4.15.0-101-generic
Logs:
Device scan failed: open /sys/bus/pci/devices/0000:da:01.0/driver/unbind: permission denied
Unbinding from kernel driver failed for the device da:01.0
github.com/intel/intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv.(*DevicePlugin).bindDevice
/intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv/dpdkdrv.go:208
github.com/intel/intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv.(*DevicePlugin).scan
/intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv/dpdkdrv.go:292
github.com/intel/intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv.(*DevicePlugin).Scan
/intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv/dpdkdrv.go:86
github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.(*Manager).Run.func1
/intel-device-plugins-for-kubernetes/pkg/deviceplugin/manager.go:96
runtime.goexit
/usr/lib/golang/src/runtime/asm_amd64.s:1357
- Does master branch work?
Happens both with v0.17.0 and master branch (:devel image from DockerHub) - Are container runtime versions the same when this fails on ubuntu 18.04 and works with others?
Yes, Docker version 19.03.9, build 9d988398e7 - Is this vanilla Kubernetes?
Yes, installed using Kubespray. - Device plugin process capabilities. Perhaps the device plugin runs with limited CAPs on ubuntu 18.04
Caps listed below, I obtained the output from /proc/1/status from a session where the QAT DP process didn't crash because all interfaces were already bound to the vfio-pci driver.
sh-5.0# /bin/cat status
Name: intel_qat_devic
Umask: 0022
State: S (sleeping)
Tgid: 1
Ngid: 0
Pid: 1
PPid: 0
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 64
Groups:
NStgid: 1
NSpid: 1
NSpgid: 1
NSsid: 1
VmPeak: 1214692 kB
VmSize: 1149156 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 13276 kB
VmRSS: 13276 kB
RssAnon: 3848 kB
RssFile: 9428 kB
RssShmem: 0 kB
VmData: 220608 kB
VmStk: 132 kB
VmExe: 4964 kB
VmLib: 1808 kB
VmPTE: 224 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
Threads: 15
SigQ: 0/750305
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: ffffffffffc1feff
CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: ffffffff,ffffffff,ffffffff
Cpus_allowed_list: 0-95
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 183
nonvoluntary_ctxt_switches: 4
There's no problem when using privileged: true
. AppArmor is running on the system and the QAT device plugin process is in enforce mode, using "docker-default" profile.
Had a look here:
https://github.com/moby/moby/blob/master/profiles/apparmor/template.go#L37
and it seems like all /sys/fs write operations are denied by default if apparmor is running. Not sure what changes in the privileged mode or how to make it work with apparmor.
Any help appreciated.