Skip to content

qat: permission denied error when trying to unbind device, results in crash  #381

@przemeklal

Description

@przemeklal

Hi,

This error happens on Ubuntu 18.04 only so far. There are no errors when trying to unbind/bind manually as root user on the host level. With privileged: true added to the securityContext in the pod spec the permission error is gone and unbinding works fine.

OS: Ubuntu 18.04.4 LTS
QAT device model: 8086:37c8
Kernel : 4.15.0-101-generic

Logs:

Device scan failed: open /sys/bus/pci/devices/0000:da:01.0/driver/unbind: permission denied
Unbinding from kernel driver failed for the device da:01.0
github.com/intel/intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv.(*DevicePlugin).bindDevice
        /intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv/dpdkdrv.go:208
github.com/intel/intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv.(*DevicePlugin).scan
        /intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv/dpdkdrv.go:292
github.com/intel/intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv.(*DevicePlugin).Scan
        /intel-device-plugins-for-kubernetes/cmd/qat_plugin/dpdkdrv/dpdkdrv.go:86
github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.(*Manager).Run.func1
        /intel-device-plugins-for-kubernetes/pkg/deviceplugin/manager.go:96
runtime.goexit
        /usr/lib/golang/src/runtime/asm_amd64.s:1357
  • Does master branch work?
    Happens both with v0.17.0 and master branch (:devel image from DockerHub)
  • Are container runtime versions the same when this fails on ubuntu 18.04 and works with others?
    Yes, Docker version 19.03.9, build 9d988398e7
  • Is this vanilla Kubernetes?
    Yes, installed using Kubespray.
  • Device plugin process capabilities. Perhaps the device plugin runs with limited CAPs on ubuntu 18.04
    Caps listed below, I obtained the output from /proc/1/status from a session where the QAT DP process didn't crash because all interfaces were already bound to the vfio-pci driver.
sh-5.0# /bin/cat status
Name:   intel_qat_devic
Umask:  0022
State:  S (sleeping)
Tgid:   1
Ngid:   0
Pid:    1
PPid:   0
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 64
Groups:
NStgid: 1
NSpid:  1
NSpgid: 1
NSsid:  1
VmPeak:  1214692 kB
VmSize:  1149156 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:     13276 kB
VmRSS:     13276 kB
RssAnon:            3848 kB
RssFile:            9428 kB
RssShmem:              0 kB
VmData:   220608 kB
VmStk:       132 kB
VmExe:      4964 kB
VmLib:      1808 kB
VmPTE:       224 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
CoreDumping:    0
Threads:        15
SigQ:   0/750305
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: ffffffffffc1feff
CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
NoNewPrivs:     0
Seccomp:        0
Speculation_Store_Bypass:       thread vulnerable
Cpus_allowed:   ffffffff,ffffffff,ffffffff
Cpus_allowed_list:      0-95
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list:      0-1
voluntary_ctxt_switches:        183
nonvoluntary_ctxt_switches:     4

There's no problem when using privileged: true. AppArmor is running on the system and the QAT device plugin process is in enforce mode, using "docker-default" profile.

Had a look here:
https://github.com/moby/moby/blob/master/profiles/apparmor/template.go#L37
and it seems like all /sys/fs write operations are denied by default if apparmor is running. Not sure what changes in the privileged mode or how to make it work with apparmor.

Any help appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions