Skip to content

Commit

Permalink
Resource Topology Exporter with podresource API
Browse files Browse the repository at this point in the history
 Kubelet exposes endpoint at `/var/lib/kubelet/pod-resources/kubelet.sock` for exposing information about
 assignment of devices to containers. It obtains this information from the internal state of the kubelet's
 Device Manager and returns a single PodResourcesResponse enabling monitor applications to poll for resources
 allocated to pods and containers on the node. This makes PodResource API a reasonable way of obtaining
 allocated resource information.

 However, PodResource API:https://godoc.org/k8s.io/kubernetes/pkg/kubelet/apis/podresources/v1alpha1
 currently only exposes devices as the container resources (without topology info) and hence we are
 proposing KEP:kubernetes/enhancements#1884 to enhance it to expose CPU
 information along with device topology info.

 In order to use pod-resource-api source in Resource Topology Exporter, we need to use
 this:https://github.com/kubernetes/kubernetes/pull/93243/files patched version of kubelet implementing
 chnages proposed in the aforementioned KEP. This will no longer be needed once the KEP and the PR are merged.

 - Added command line argument to specify source, enabling user to specify either cri or pod-resource-api
 - Created PodResourceFinder struct and supporting methods to enable support for pod-resource-api as a way
   of gathering information of allocated resources
 - Created NodeResources struct to be used for storing node resource information
 - Moved functions (updateNUMAMap(), Aggregate() and makePCI2ResourceMap()) used by both crifinder.go and
   podresourcefinder.go to finder.go
 - Narrowed down volume mounts to the required subtree
 - Updated README

Signed-off-by: Swati Sehgal <[email protected]>
  • Loading branch information
swatisehgal committed Aug 12, 2020
1 parent 7055146 commit 3eae82b
Show file tree
Hide file tree
Showing 10 changed files with 923 additions and 510 deletions.
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,16 @@ type NUMANodeResource struct {
Resources v1.ResourceList
}
```
## Design based on Pod Resource API
Kubelet exposes endpoint at `/var/lib/kubelet/pod-resources/kubelet.sock` for exposing information about assignment of devices to containers. It obtains this information from the internal state of the kubelet's Device Manager and returns a single PodResourcesResponse enabling monitor applications to poll for resources allocated to pods and containers on the node. This makes PodResource API a reasonable way of obtaining allocated resource information.

However, [PodResource API](https://godoc.org/k8s.io/kubernetes/pkg/kubelet/apis/podresources/v1alpha1) currently only exposes devices as the container resources (without topology info) and hence we are proposing [KEP](https://github.com/kubernetes/enhancements/pull/1884) to enhance it to expose CPU information along with device topology info.
In order to use pod-resource-api source in Resource Topology Exporter, you will need to use [this](https://github.com/kubernetes/kubernetes/pull/93243/files) patched version of kubelet implementing changes proposed in the aforementioned KEP. This will no longer be needed once the KEP and the PR are merged.

Furthermore, changes are being proposed to enhance ([KEP](https://github.com/kubernetes/enhancements/pull/1926)) PodResource API to support a Watch() endpoint, enabling monitor applications to be notified of new resource allocation, release or resource allocation updates. This will be useful to enable Resource Topology Exporter to become more event based as opposed to its current mechanism of polling.

## Design based on CRI
This daemon gathers resource information using the Container Runtime interface.
This daemon can also gather resource information using the Container Runtime interface.


The containerStatusResponse returned as a response to the ContainerStatus rpc contains `Info` field which is used by the container runtime for capturing ContainerInfo.
Expand Down
6 changes: 3 additions & 3 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,12 @@ require (
github.com/evanphx/json-patch v4.5.0+incompatible // indirect
github.com/fromanirh/numalign v0.0.2
github.com/ghodss/yaml v1.0.0
github.com/googleapis/gnostic v0.3.1 // indirect
github.com/hashicorp/golang-lru v0.5.4 // indirect
github.com/intel/sriov-network-device-plugin v0.0.0-20200603101849-e116e9c7d0b8
github.com/json-iterator/go v1.1.10 // indirect
github.com/onsi/ginkgo v1.12.1
github.com/onsi/gomega v1.10.1
github.com/opencontainers/runtime-spec v1.0.0
github.com/opencontainers/runtime-spec v1.0.3-0.20200520003142-237cc4f519e2
github.com/swatisehgal/topologyapi v0.0.0-20200802230855-6f9c5ac0d357
golang.org/x/text v0.3.3 // indirect
google.golang.org/grpc v1.28.1
Expand All @@ -29,6 +28,7 @@ require (

// Pinned to kubernetes-1.18.6
replace (
github.com/googleapis/gnostic => github.com/googleapis/gnostic v0.4.0
k8s.io/api => k8s.io/api v0.18.6
k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.18.6
k8s.io/apimachinery => k8s.io/apimachinery v0.18.6
Expand All @@ -47,7 +47,7 @@ replace (
k8s.io/kube-scheduler => k8s.io/kube-scheduler v0.18.6
k8s.io/kubectl => k8s.io/kubectl v0.18.6
k8s.io/kubelet => k8s.io/kubelet v0.18.6
k8s.io/kubernetes => k8s.io/kubernetes v1.18.6
k8s.io/kubernetes => ../../../k8s.io/kubernetes
k8s.io/legacy-cloud-providers => k8s.io/legacy-cloud-providers v0.18.6
k8s.io/metrics => k8s.io/metrics v0.18.6
k8s.io/sample-apiserver => k8s.io/sample-apiserver v0.18.6
Expand Down
191 changes: 191 additions & 0 deletions go.sum

Large diffs are not rendered by default.

80 changes: 62 additions & 18 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@ import (

const (
// ProgramName is the canonical name of this program
ProgramName = "resource-topology-exporter"
ProgramName = "resource-topology-exporter"
ContainerdRuntime = "containerd"
CRIORuntime = "cri-o"
CRISource = "cri"
PodResourceSource = "pod-resource-api"
)

func main() {
Expand All @@ -28,7 +32,6 @@ func main() {
if args.SRIOVConfigFile == "" {
log.Fatalf("missing SRIOV device plugin configuration file path")
}

klConfig, err := kubeconf.GetKubeletConfigFromLocalFile(args.KubeletConfigFile)
if err != nil {
log.Fatalf("error getting topology Manager Policy: %v", err)
Expand All @@ -44,7 +47,13 @@ func main() {
}

// Get new finder instance
instance, err := finder.NewFinder(args, pci2ResMap)
var finderInstance finder.Finder
if args.Source == CRISource {
finderInstance, err = finder.NewCRIFinder(args, pci2ResMap)
} else {
//args.Source == PodResourceSource
finderInstance, err = finder.NewPodResourceFinder(args, pci2ResMap)
}
if err != nil {
log.Fatalf("Failed to initialize Finder instance: %v", err)
}
Expand All @@ -54,14 +63,19 @@ func main() {
log.Fatalf("Failed to initialize crdExporter instance: %v", err)
}

// CAUTION: these resources are expected to change rarely - if ever. So we are intentionally do this once during the process lifecycle.
nodeResourceData, err := finder.NewNodeResources(args.SysfsRoot, pci2ResMap)
if err != nil {
log.Fatalf("Failed to obtain node resource information: %v", err)
}
for {
podResources, err := instance.Scan()
podResources, err := finderInstance.Scan(nodeResourceData.GetPCI2ResourceMap())
if err != nil {
log.Printf("CRI scan failed: %v\n", err)
log.Printf("Scan failed: %v\n", err)
continue
}

perNumaResources := instance.Aggregate(podResources)
perNumaResources := finder.Aggregate(podResources, nodeResourceData)
log.Printf("allocatedResourcesNumaInfo:%v", spew.Sdump(perNumaResources))

if err = crdExporter.CreateOrUpdate("default", perNumaResources); err != nil {
Expand All @@ -76,29 +90,29 @@ func main() {
// The argument argv is passed only for testing purposes.
func argsParse(argv []string) (finder.Args, error) {
args := finder.Args{
ContainerRuntime: "containerd",
CRIEndpointPath: "/host-run/containerd/containerd.sock",
Source: "pod-resource-api",
SleepInterval: time.Duration(3 * time.Second),
SysfsRoot: "/host-sys",
SRIOVConfigFile: "/etc/sriov-config/config.json",
KubeletConfigFile: "/host-etc/kubernetes/kubelet.conf",
}
usage := fmt.Sprintf(`Usage:
%s [--sleep-interval=<seconds>] [--cri-path=<path>] [--watch-namespace=<namespace>] [--sysfs=<mountpoint>] [--sriov-config-file=<path>] [--container-runtime=<runtime>] [--kubelet-config-file=<path>]
%s [--sleep-interval=<seconds>] [--source=<path>] [--container-runtime=<runtime>] [--cri-socket=<path>] [--podresources-socket=<path>] [--watch-namespace=<namespace>] [--sysfs=<mountpoint>] [--sriov-config-file=<path>] [--kubelet-config-file=<path>]
%s -h | --help
Options:
-h --help Show this screen.
--container-runtime=<runtime> Container Runtime to be used (containerd|cri-o). [Default: %v]
--cri-path=<path> CRI Endpoint file path to use. [Default: %v]
--source=<source> Evaluation source to be used (pod-resource-api|cri). [Default: %v]
--container-runtime=<runtime> Container Runtime to be used (containerd|cri-o).
--cri-socket=<path> CRI Socket path to use.
--podresources-socket=<path> Pod Resource Socket path to use.
--sleep-interval=<seconds> Time to sleep between updates. [Default: %v]
--watch-namespace=<namespace> Namespace to watch pods for. Use "" for all namespaces.
--sysfs=<mountpoint> Mount point of the sysfs. [Default: %v]
--sriov-config-file=<path> SRIOV device plugin config file path. [Default: %v]
--kubelet-config-file=<path> Kubelet config file path. [Default: %v]`,
ProgramName,
ProgramName,
args.ContainerRuntime,
args.CRIEndpointPath,
args.Source,
args.SleepInterval,
args.SysfsRoot,
args.SRIOVConfigFile,
Expand All @@ -118,12 +132,42 @@ func argsParse(argv []string) (finder.Args, error) {
args.KubeletConfigFile = kubeletConfigPath
}
args.SysfsRoot = arguments["--sysfs"].(string)
runtime := arguments["--container-runtime"].(string)
if !(runtime == "containerd" || runtime == "cri-o") {
return args, fmt.Errorf("invalid --container-runtime specified")

if source, ok := arguments["--source"].(string); ok {
args.Source = source
}
if args.Source != PodResourceSource && args.Source != CRISource {
return args, fmt.Errorf("invalid --source specified")

} else if args.Source == PodResourceSource {
//podresource source
if path, ok := arguments["--podresources-socket"].(string); ok {
args.PodResourceSocketPath = path
}
//return error in case cri-socket path is specified in case of pod-resource-socket source
if _, ok := arguments["--cri-socket"].(string); ok {
return args, fmt.Errorf("No need to specify CRI socket path in case pod-resource-api is specified as the source")
}
//return error in case container-runtime is specified in case of pod-resource-socket source
if _, ok := arguments["--container-runtime"].(string); ok {
return args, fmt.Errorf("No need to specify container runtime in case pod-resource-api is specified as the source")
}
} else {
//cri source
if path, ok := arguments["--cri-socket"].(string); ok {
args.CRISocketPath = path
}
runtime := arguments["--container-runtime"].(string)
if runtime != ContainerdRuntime && runtime != CRIORuntime {
return args, fmt.Errorf("invalid --container-runtime specified")
}
args.ContainerRuntime = runtime
//return error in case pod-resource-socket path is specified in case of cri source
if _, ok := arguments["--podresources-socket"].(string); ok {
return args, fmt.Errorf("No need to specify Pod Resource socket path in case CRI is specified as the source")
}
}
args.ContainerRuntime = runtime
args.CRIEndpointPath = arguments["--cri-path"].(string)

args.SleepInterval, err = time.ParseDuration(arguments["--sleep-interval"].(string))
if err != nil {
return args, fmt.Errorf("invalid --sleep-interval specified: %s", err.Error())
Expand Down
19 changes: 11 additions & 8 deletions manifests/resource-topology-exporter-ds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,22 +53,22 @@ spec:
containers:
- name: resource-topology-exporter-container
image: quay.io/swsehgal/resource-topology-exporter:latest
# command:
# - /usr/local/bin/resource-topology-exporter
# - --sleep-interval=1s
command:
- sleep
args:
- "1000000"
- /usr/local/bin/resource-topology-exporter
- --watch-namespace=rte
- --source=pod-resource-api
- --podresources-socket=/host-var/lib/kubelet/pod-resources/kubelet.sock
volumeMounts:
- name: host-sys
mountPath: "/host-sys"
- name: host-run
mountPath: "/host-run"
- name: host-etc
mountPath: "/host-etc"
mountPath: "/host-etc/kubernetes"
- mountPath: /etc/sriov-config
name: my-sriov-config-vol
- name: host-podresources
mountPath: "/host-var/lib/kubelet/pod-resources"
volumes:
- name: host-sys
hostPath:
Expand All @@ -78,7 +78,10 @@ spec:
path: "/run"
- name: host-etc
hostPath:
path: "/etc"
path: "/etc/kubernetes"
- configMap:
name: sriovdp-config
name: my-sriov-config-vol
- name: host-podresources
hostPath:
path: "/var/lib/kubelet/pod-resources"
2 changes: 1 addition & 1 deletion manifests/test-sriov-pod-3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ spec:
resources:
requests:
openshift.io/sriov: 3
cpu: 3
cpu: 3
memory: 200Mi
limits:
openshift.io/sriov: 3
Expand Down
Loading

0 comments on commit 3eae82b

Please sign in to comment.