Skip to content

Commit

Permalink
Add OpenShift deployment support
Browse files Browse the repository at this point in the history
- Add new RBAC roles & clusterroles for all stages
- Add OCP specific artifacts
- Updated examples
- Fixed some file permissions
- Make SCC objects conditional on OCP
- Add OSName fields back to state_shared_dp

Signed-off-by: Sebastian Jug <[email protected]>
  • Loading branch information
sjug committed Mar 9, 2021
1 parent 9735c36 commit 40e3a76
Show file tree
Hide file tree
Showing 26 changed files with 374 additions and 22 deletions.
13 changes: 13 additions & 0 deletions deploy/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,13 @@ rules:
- get
- list
- watch
- apiGroups:
- rbac.authorization.k8s.io
resources:
- roles
- rolebindings
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
Expand Down Expand Up @@ -270,3 +277,9 @@ rules:
- update
- patch
- delete
- apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
verbs:
- '*'
4 changes: 2 additions & 2 deletions example/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,12 @@ tools to test RDMA and GPU-Direct RDMA traffic.
##### RDMA
__Pod1:__ Run `ib_write_bw` as server
```bash
# ib_write_bw -d <RDMA device e.g mlx5_0> -a -F --report_gbits -R -q 2
# ib_write_bw -d <RDMA device e.g mlx5_0> -a -F --report_gbits -R
```

__Pod2:__ Run `ib_write_bw` as client
```bash
# ib_write_bw -d <RDMA device e.g mlx5_0> -a -F --report_gbits -R -q 2 <Pod1 IP address>
# ib_write_bw -d <RDMA device e.g mlx5_0> -a -F --report_gbits -R <Pod1 IP address>
```

##### GPU-Direct RDMA
Expand Down
45 changes: 45 additions & 0 deletions example/crs/mellanox.com_v1alpha1_nicclusterpolicy_cr-ocp.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright 2020 NVIDIA
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: mellanox.com/v1alpha1
kind: NicClusterPolicy
metadata:
name: nic-cluster-policy
spec:
ofedDriver:
image: mofed
repository: mellanox
version: 5.2-1.0.4.0
devicePlugin:
image: k8s-rdma-shared-dev-plugin
repository: mellanox
version: v1.1.0
# The config below directly propagates to k8s-rdma-shared-device-plugin configuration.
# Replace 'devices' with your (RDMA capable) netdevice name.
config: |
{
"configList": [
{
"resourceName": "hca_shared_devices_a",
"rdmaHcaMax": 1000,
"selectors": {
"ifNames": ["ens2f0"]
}
}
]
}
nvPeerDriver:
image: nv-peer-mem-driver
repository: mellanox
version: 1.0-9
gpuDriverSourcePath: /run/nvidia/driver
14 changes: 7 additions & 7 deletions example/rdma-gpu-test-pod1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@ metadata:
annotations:
k8s.v1.cni.cncf.io/networks: rdma-net-ipam
# If a network with static IPAM is used replace network annotation with the below.
#k8s.v1.cni.cncf.io/networks: '[
# { "name": "rmda-net",
# "ips": ["192.168.111.101/24"],
# "gateway": ["192.168.111.1"]
# }
#]'
# k8s.v1.cni.cncf.io/networks: '[
# { "name": "rdma-net",
# "ips": ["192.168.111.101/24"],
# "gateway": ["192.168.111.1"]
# }
# ]'
spec:
nodeSelector:
# Note: Replace hostname or remove selector altogether
kubernetes.io/hostname: ubuntu
kubernetes.io/hostname: worker01
restartPolicy: OnFailure
containers:
- image: mellanox/cuda-perftest
Expand Down
6 changes: 3 additions & 3 deletions example/rdma-gpu-test-pod2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ metadata:
k8s.v1.cni.cncf.io/networks: rdma-net-ipam
# If a network with static IPAM is used replace network annotation with the below.
#k8s.v1.cni.cncf.io/networks: '[
# { "name": "rmda-net",
# "ips": ["192.168.111.101/24"],
# { "name": "rdma-net",
# "ips": ["192.168.111.102/24"],
# "gateway": ["192.168.111.1"]
# }
#]'
spec:
nodeSelector:
# Note: Replace hostname or remove selector altogether
kubernetes.io/hostname: ubuntu00
kubernetes.io/hostname: worker02
restartPolicy: OnFailure
containers:
- image: mellanox/cuda-perftest
Expand Down
4 changes: 2 additions & 2 deletions example/rdma-test-pod1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ metadata:
k8s.v1.cni.cncf.io/networks: rdma-net-ipam
# If a network with static IPAM is used replace network annotation with the below.
#k8s.v1.cni.cncf.io/networks: '[
# { "name": "rmda-net",
# { "name": "rdma-net",
# "ips": ["192.168.111.101/24"],
# "gateway": ["192.168.111.1"]
# }
#]'
spec:
nodeSelector:
# Note: Replace hostname or remove selector altogether
kubernetes.io/hostname: ubuntu
kubernetes.io/hostname: worker01
restartPolicy: OnFailure
containers:
- image: mellanox/rping-test
Expand Down
6 changes: 3 additions & 3 deletions example/rdma-test-pod2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ metadata:
k8s.v1.cni.cncf.io/networks: rdma-net-ipam
# If a network with static IPAM is used replace network annotation with the below.
#k8s.v1.cni.cncf.io/networks: '[
# { "name": "rmda-net",
# "ips": ["192.168.111.101/24"],
# { "name": "rdma-net",
# "ips": ["192.168.111.102/24"],
# "gateway": ["192.168.111.1"]
# }
#]'
spec:
nodeSelector:
# Note: Replace hostname or remove selector altogether
kubernetes.io/hostname: ubuntu00
kubernetes.io/hostname: worker02
restartPolicy: OnFailure
containers:
- image: mellanox/rping-test
Expand Down
5 changes: 5 additions & 0 deletions manifests/stage-nv-peer-mem-driver/0010_service-account.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: nv-peer-mem-driver
namespace: {{ .RuntimeSpec.Namespace }}
14 changes: 14 additions & 0 deletions manifests/stage-nv-peer-mem-driver/0020_role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: nv-peer-mem-driver
namespace: {{ .RuntimeSpec.Namespace }}
rules:
- apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
verbs:
- use
resourceNames:
- privileged
16 changes: 16 additions & 0 deletions manifests/stage-nv-peer-mem-driver/0030_rolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: nv-peer-mem-driver
namespace: {{ .RuntimeSpec.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: nv-peer-mem-driver
namespace: {{ .RuntimeSpec.Namespace }}
subjects:
- kind: ServiceAccount
name: nv-peer-mem-driver
namespace: {{ .RuntimeSpec.Namespace }}
userNames:
- system:serviceaccount:{{ .RuntimeSpec.Namespace }}:nv-peer-mem-driver
49 changes: 49 additions & 0 deletions manifests/stage-nv-peer-mem-driver/0040_scc.openshift.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
{{if eq .RuntimeSpec.OSName "rhcos"}}
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: false
allowHostPID: true
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities:
- '*'
allowedUnsafeSysctls:
- '*'
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
type: RunAsAny
groups:
- system:cluster-admins
- system:nodes
- system:masters
kind: SecurityContextConstraints
metadata:
annotations:
kubernetes.io/description: 'privileged allows access to all privileged and host
features and the ability to run as any user, any group, any fsGroup, and with
any SELinux context. WARNING: this is the most relaxed SCC and should be used
only for cluster administration. Grant with caution.'

name: nv-peer-mem-driver
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities: null
runAsUser:
type: RunAsAny
seLinuxContext:
type: RunAsAny
seccompProfiles:
- '*'
supplementalGroups:
type: RunAsAny
users:
- system:serviceaccount:{{ .RuntimeSpec.Namespace }}:nv-peer-mem-driver
volumes:
- '*'
{{end}}
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ spec:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
serviceAccountName: nv-peer-mem-driver
hostNetwork: true
containers:
- image: {{ .CrSpec.Repository }}/{{ .CrSpec.Image }}-{{ .CrSpec.Version }}:{{ .RuntimeSpec.CPUArch }}-{{ .RuntimeSpec.OSName }}{{ .RuntimeSpec.OSVer }}
Expand Down
5 changes: 5 additions & 0 deletions manifests/stage-ofed-driver/0010_service-account.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: ofed-driver
namespace: {{ .RuntimeSpec.Namespace }}
14 changes: 14 additions & 0 deletions manifests/stage-ofed-driver/0020_role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ofed-driver
namespace: {{ .RuntimeSpec.Namespace }}
rules:
- apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
verbs:
- use
resourceNames:
- privileged
16 changes: 16 additions & 0 deletions manifests/stage-ofed-driver/0030_rolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ofed-driver
namespace: {{ .RuntimeSpec.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ofed-driver
namespace: {{ .RuntimeSpec.Namespace }}
subjects:
- kind: ServiceAccount
name: ofed-driver
namespace: {{ .RuntimeSpec.Namespace }}
userNames:
- system:serviceaccount:{{ .RuntimeSpec.Namespace }}:ofed-driver
49 changes: 49 additions & 0 deletions manifests/stage-ofed-driver/0040_scc.openshift.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
{{if eq .RuntimeSpec.OSName "rhcos"}}
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: false
allowHostPID: true
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities:
- '*'
allowedUnsafeSysctls:
- '*'
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
type: RunAsAny
groups:
- system:cluster-admins
- system:nodes
- system:masters
kind: SecurityContextConstraints
metadata:
annotations:
kubernetes.io/description: 'privileged allows access to all privileged and host
features and the ability to run as any user, any group, any fsGroup, and with
any SELinux context. WARNING: this is the most relaxed SCC and should be used
only for cluster administration. Grant with caution.'

name: ofed-driver
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities: null
runAsUser:
type: RunAsAny
seLinuxContext:
type: RunAsAny
seccompProfiles:
- '*'
supplementalGroups:
type: RunAsAny
users:
- system:serviceaccount:{{ .RuntimeSpec.Namespace }}:ofed-driver
volumes:
- '*'
{{end}}
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ spec:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
serviceAccountName: ofed-driver
hostNetwork: true
containers:
- image: {{ .CrSpec.Repository }}/{{ .CrSpec.Image }}-{{ .CrSpec.Version }}:{{ .RuntimeSpec.OSName }}{{ .RuntimeSpec.OSVer }}-{{ .RuntimeSpec.CPUArch }}
Expand Down
5 changes: 5 additions & 0 deletions manifests/stage-rdma-device-plugin/0010_service-account.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: rdma-shared
namespace: {{ .RuntimeSpec.Namespace }}
14 changes: 14 additions & 0 deletions manifests/stage-rdma-device-plugin/0020_role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: rdma-shared
namespace: {{ .RuntimeSpec.Namespace }}
rules:
- apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
verbs:
- use
resourceNames:
- privileged
16 changes: 16 additions & 0 deletions manifests/stage-rdma-device-plugin/0030_rolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rdma-shared
namespace: {{ .RuntimeSpec.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rdma-shared
namespace: {{ .RuntimeSpec.Namespace }}
subjects:
- kind: ServiceAccount
name: rdma-shared
namespace: {{ .RuntimeSpec.Namespace }}
userNames:
- system:serviceaccount:{{ .RuntimeSpec.Namespace }}:rdma-shared
Loading

0 comments on commit 40e3a76

Please sign in to comment.