Skip to content

Commit 8e6ac74

Browse files
committed
runtime: config: linux: add cgroups informations
- add information to cgroup resources controllers with examples - add pids cgroup information and example - reflect kernel types Signed-off-by: Antonio Murdaca <[email protected]>
1 parent 96bcd04 commit 8e6ac74

File tree

2 files changed

+155
-41
lines changed

2 files changed

+155
-41
lines changed

runtime-config-linux.md

Lines changed: 140 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,24 @@ A namespace wraps a global system resource in an abstraction that makes it appea
44
Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes.
55
For more information, see [the man page](http://man7.org/linux/man-pages/man7/namespaces.7.html).
66

7-
Namespaces are specified in the spec as an array of entries.
8-
Each entry has a type field with possible values described below and an optional path element.
7+
Namespaces are specified as an array of entries inside the `namespaces` root field.
8+
The following parameters can be specified to setup namespaces:
9+
10+
* **`type`** *(string, required)* - namespace type. The following namespaces types are supported:
11+
* **`pid`** processes inside the container will only be able to see other processes inside the same container
12+
* **`network`** the container will have its own network stack
13+
* **`mount`** the container will have an isolated mount table
14+
* **`ipc`** processes inside the container will only be able to communicate to other processes inside the same container via system level IPC
15+
* **`uts`** the container will be able to have its own hostname and domain name
16+
* **`user`** the container will be able to remap user and group IDs from the host to local users and groups within the container
17+
18+
* **`path`** *(string, optional)* - path to namespace file
19+
920
If a path is specified, that particular file is used to join that type of namespace.
1021
Also, when a path is specified, a runtime MUST assume that the setup for that particular namespace has already been done and error out if the config specifies anything else related to that namespace.
1122

23+
###### Example
24+
1225
```json
1326
"namespaces": [
1427
{
@@ -34,32 +47,27 @@ Also, when a path is specified, a runtime MUST assume that the setup for that pa
3447
]
3548
```
3649

37-
#### Namespace types
50+
## Devices
3851

39-
* **pid** processes inside the container will only be able to see other processes inside the same container.
40-
* **network** the container will have its own network stack.
41-
* **mount** the container will have an isolated mount table.
42-
* **ipc** processes inside the container will only be able to communicate to other processes inside the same
43-
container via system level IPC.
44-
* **uts** the container will be able to have its own hostname and domain name.
45-
* **user** the container will be able to remap user and group IDs from the host to local users and groups
46-
within the container.
52+
`devices` is an array specifying the list of devices to be created in the container.
4753

48-
## Devices
54+
The following parameters can be specified:
55+
56+
* **`path`** *(string, required)* - full path to device inside container
57+
58+
* **`type`** *(char, required)* - type of device: `c`, `b`, `u` or `p`. More info in `man mknod`.
4959

50-
Devices is an array specifying the list of devices to be created in the container.
51-
Next parameters can be specified:
60+
* **`major, minor`** *(int64, optional)* - major, minor numbers for device. More info in `man mknod`. There is a special value: `-1`, which means `*` for `device` cgroup setup.
5261

53-
* **type** - type of device: `c`, `b`, `u` or `p`. More info in `man mknod`
54-
* **path** - full path to device inside container
55-
* **major, minor** - major, minor numbers for device. More info in `man mknod`.
56-
There is special value: `-1`, which means `*` for `device`
57-
cgroup setup.
58-
* **permissions** - cgroup permissions for device. A composition of `r`
59-
(read), `w` (write), and `m` (mknod).
60-
* **fileMode** - file mode for device file
61-
* **uid** - uid of device owner
62-
* **gid** - gid of device owner
62+
* **`permissions`** *(string, optional)* - cgroup permissions for device. A composition of `r` (*read*), `w` (*write*), and `m` (*mknod*).
63+
64+
* **`fileMode`** *(uint32, optional)* - file mode for device file
65+
66+
* **`uid`** *(uint32, optional)* - uid of device owner
67+
68+
* **`gid`** *(uint32, optional)* - gid of device owner
69+
70+
###### Example
6371

6472
```json
6573
"devices": [
@@ -152,12 +160,38 @@ For example, to run a new process in an existing container without updating limi
152160

153161
#### Disable out-of-memory killer
154162

163+
`disableOOMKiller` contains a flag (`true` or `false`) that enables or disables
164+
the Out of Memory killer for a cgroup. If enabled (`false`), tasks that attempt
165+
to consume more memory than they are allowed are immediately killed by the OOM killer.
166+
The OOM killer is enabled by default in every cgroup using the `memory` subsystem.
167+
To disable it, specify a value of `true`.
168+
For more information, see [the memory cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/memory.txt).
169+
170+
###### Example
171+
155172
```json
156173
"disableOOMKiller": false
157174
```
158175

159176
#### Memory
160177

178+
`memory` represents the cgroup subsystem `memory` and it's used to set limits on memory use of the container.
179+
For more information, see [the memory cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/memory.txt).
180+
181+
The following parameters can be specified to setup the memory controller:
182+
183+
* **`limit`** *(uint64, optional)* - set limit of memory usage
184+
185+
* **`reservation`** *(uint64, optional)* - set soft limit of memory usage
186+
187+
* **`swap`** *(uint64, optional)* - set limit of memory+Swap usage
188+
189+
* **`kernel`** *(uint64, optional)* - set hard limit for kernel memory
190+
191+
* **`swappiness`** *(uint64, optional)* - set swappiness parameter of vmscan (See sysctl's vm.swappiness)
192+
193+
###### Example
194+
161195
```json
162196
"memory": {
163197
"limit": 0,
@@ -170,6 +204,27 @@ For example, to run a new process in an existing container without updating limi
170204

171205
#### CPU
172206

207+
`cpu` represents the cgroup subsystems `cpu` and `cpusets`.
208+
For more information, see [the cpusets cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/cpusets.txt).
209+
210+
The following parameters can be specified to setup the cpu controller:
211+
212+
* **`shares`** *(uint64, optional)* - specifies a relative share of CPU time available to the tasks in a cgroup
213+
214+
* **`quota`** *(uint64, optional)* - specifies the total amount of time in microseconds for which all tasks in a cgroup can run during one period (as defined by **`period`** below)
215+
216+
* **`period`** *(uint64, optional)* - specifies a period of time in microseconds for how regularly a cgroup's access to CPU resources should be reallocated (cfs scheduler only)
217+
218+
* **`realtimeRuntime`** *(uint64, optional)* - specifies a period of time in microseconds for the longest continuous period in which the tasks in a cgroup have access to CPU resources
219+
220+
* **`realtimePeriod`** *(uint64, optional)* - same as **`period`** but applies on realtime scheduler only
221+
222+
* **`cpus`** *(cpus, optional)* - list of CPUs the container will run in
223+
224+
* **`mems`** *(mems, optional)* - list of Memory Nodes the container will run in
225+
226+
###### Example
227+
173228
```json
174229
"cpu": {
175230
"shares": 0,
@@ -185,7 +240,7 @@ For example, to run a new process in an existing container without updating limi
185240
#### Block IO Controller
186241

187242
`blockIO` represents the cgroup subsystem `blkio` which implements the block io controller.
188-
For more information, see the [kernel cgroups documentation about `blkio`](https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt).
243+
For more information, see [the kernel cgroups documentation about blkio](https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt).
189244

190245
The following parameters can be specified to setup the block io controller:
191246

@@ -195,8 +250,8 @@ The following parameters can be specified to setup the block io controller:
195250

196251
* **`blkioWeightDevice`** *(array, optional)* - specifies the list of devices which will be bandwidth rate limited. The following parameters can be specified per-device:
197252
* **`major, minor`** *(int64, required)* - major, minor numbers for device. More info in `man mknod`.
198-
* **`weight`** *(uint16, optional)* - bandwidth rate for the device, range is from 10 to 1000.
199-
* **`leafWeight`** *(uint16, optional)* - bandwidth rate for the device while competing with the cgroup's child cgroups, range is from 10 to 1000, cfq scheduler only.
253+
* **`weight`** *(uint16, optional)* - bandwidth rate for the device, range is from 10 to 1000
254+
* **`leafWeight`** *(uint16, optional)* - bandwidth rate for the device while competing with the cgroup's child cgroups, range is from 10 to 1000, cfq scheduler only
200255

201256
You must specify at least one of `weight` or `leafWeight` in a given entry, and can specify both.
202257

@@ -242,6 +297,18 @@ The following parameters can be specified to setup the block io controller:
242297

243298
#### Huge page limits
244299

300+
`hugepageLimits` represents the `hugetlb` controller which allows to limit the
301+
HugeTLB usage per control group and enforces the controller limit during page fault.
302+
For more information, see the [kernel cgroups documentation about HugeTLB](https://www.kernel.org/doc/Documentation/cgroups/hugetlb.txt).
303+
304+
`hugepageLimits` is an array of entries, each having the following structure:
305+
306+
* **`pageSize`** *(string, required)* - hugepage size
307+
308+
* **`limit`** *(uint64, required)* - limit of *hugepagesize* hugetlb usage
309+
310+
###### Example
311+
245312
```json
246313
"hugepageLimits": [
247314
{
@@ -253,9 +320,23 @@ The following parameters can be specified to setup the block io controller:
253320

254321
#### Network
255322

323+
`network` represents the cgroup subsystems `net_cls` and `net_prio`.
324+
For more information, see [the net\_cls cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/net_cls.txt) and [the net\_prio cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/net_prio.txt).
325+
326+
The following parameters can be specified to setup these cgroup controllers:
327+
328+
* **`classID`** *(string, optional)* - network class identifier the cgroup's network packets will be tagged with
329+
330+
* **`priorities`** *(array, optional)* - specifies a list of map of the priorities assigned to traffic originating from
331+
processes in the group and egressing the system on various interfaces. The following parameters can be specified per-map:
332+
* **`name`** *(string, required)* - interface name
333+
* **`priority`** *(uint32, required)* - priority applied to the interface
334+
335+
###### Example
336+
256337
```json
257338
"network": {
258-
"classId": "ClassId",
339+
"classID": "0x100001",
259340
"priorities": [
260341
{
261342
"name": "eth0",
@@ -269,11 +350,31 @@ The following parameters can be specified to setup the block io controller:
269350
}
270351
```
271352

353+
#### Pids
354+
355+
`pids` represents the cgroup subsystem `pids`.
356+
For more information, see [the pids cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/pids.txt
357+
).
358+
359+
The following paramters can be specified to setup the pids controller:
360+
361+
* **`limit`** *(int, required)* - specifies the maximum number of tasks in the cgroup
362+
363+
###### Example
364+
365+
```json
366+
"pids": {
367+
"limit": 32771
368+
}
369+
```
370+
272371
## Sysctl
273372

274373
sysctl allows kernel parameters to be modified at runtime for the container.
275374
For more information, see [the man page](http://man7.org/linux/man-pages/man8/sysctl.8.html)
276375

376+
###### Example
377+
277378
```json
278379
"sysctl": {
279380
"net.ipv4.ip_forward": "1",
@@ -287,6 +388,8 @@ rlimits allow setting resource limits.
287388
`type` is a string with a value from those defined in [the man page](http://man7.org/linux/man-pages/man2/setrlimit.2.html).
288389
The kernel enforces the `soft` limit for a resource while the `hard` limit acts as a ceiling for that value that could be set by an unprivileged process.
289390

391+
###### Example
392+
290393
```json
291394
"rlimits": [
292395
{
@@ -301,6 +404,9 @@ The kernel enforces the `soft` limit for a resource while the `hard` limit acts
301404

302405
SELinux process label specifies the label with which the processes in a container are run.
303406
For more information about SELinux, see [Selinux documentation](http://selinuxproject.org/page/Main_Page)
407+
408+
###### Example
409+
304410
```json
305411
"selinuxProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c124,c675"
306412
```
@@ -310,6 +416,8 @@ For more information about SELinux, see [Selinux documentation](http://selinuxp
310416
Apparmor profile specifies the name of the apparmor profile that will be used for the container.
311417
For more information about Apparmor, see [Apparmor documentation](https://wiki.ubuntu.com/AppArmor)
312418

419+
###### Example
420+
313421
```json
314422
"apparmorProfile": "acme_secure_profile"
315423
```
@@ -321,6 +429,8 @@ Seccomp configuration allows one to configure actions to take for matched syscal
321429
For more information about Seccomp, see [Seccomp kernel documentation](https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt)
322430
The actions and operators are strings that match the definitions in seccomp.h from [libseccomp](https://github.com/seccomp/libseccomp) and are translated to corresponding values.
323431

432+
###### Example
433+
324434
```json
325435
"seccomp": {
326436
"defaultAction": "SCMP_ACT_ALLOW",
@@ -339,6 +449,8 @@ rootfsPropagation sets the rootfs's mount propagation.
339449
Its value is either slave, private, or shared.
340450
[The kernel doc](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt) has more information about mount propagation.
341451

452+
###### Example
453+
342454
```json
343455
"rootfsPropagation": "slave",
344456
```

runtime_config_linux.go

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ type InterfacePriority struct {
103103
// Name is the name of the network interface
104104
Name string `json:"name"`
105105
// Priority for the interface
106-
Priority int64 `json:"priority"`
106+
Priority uint32 `json:"priority"`
107107
}
108108

109109
// blockIODevice holds major:minor format supported in blkio cgroup
@@ -151,29 +151,29 @@ type BlockIO struct {
151151
// Memory for Linux cgroup 'memory' resource management
152152
type Memory struct {
153153
// Memory limit (in bytes)
154-
Limit int64 `json:"limit"`
154+
Limit uint64 `json:"limit"`
155155
// Memory reservation or soft_limit (in bytes)
156-
Reservation int64 `json:"reservation"`
156+
Reservation uint64 `json:"reservation"`
157157
// Total memory usage (memory + swap); set `-1' to disable swap
158-
Swap int64 `json:"swap"`
158+
Swap uint64 `json:"swap"`
159159
// Kernel memory limit (in bytes)
160-
Kernel int64 `json:"kernel"`
160+
Kernel uint64 `json:"kernel"`
161161
// How aggressive the kernel will swap memory pages. Range from 0 to 100. Set -1 to use system default
162-
Swappiness int64 `json:"swappiness"`
162+
Swappiness uint64 `json:"swappiness"`
163163
}
164164

165165
// CPU for Linux cgroup 'cpu' resource management
166166
type CPU struct {
167167
// CPU shares (relative weight vs. other cgroups with cpu shares)
168-
Shares int64 `json:"shares"`
168+
Shares uint64 `json:"shares"`
169169
// CPU hardcap limit (in usecs). Allowed cpu time in a given period
170-
Quota int64 `json:"quota"`
170+
Quota uint64 `json:"quota"`
171171
// CPU period to be used for hardcapping (in usecs). 0 to use system default
172-
Period int64 `json:"period"`
172+
Period uint64 `json:"period"`
173173
// How many time CPU will use in realtime scheduling (in usecs)
174-
RealtimeRuntime int64 `json:"realtimeRuntime"`
174+
RealtimeRuntime uint64 `json:"realtimeRuntime"`
175175
// CPU period to be used for realtime scheduling (in usecs)
176-
RealtimePeriod int64 `json:"realtimePeriod"`
176+
RealtimePeriod uint64 `json:"realtimePeriod"`
177177
// CPU to use within the cpuset
178178
Cpus string `json:"cpus"`
179179
// MEM to use within the cpuset
@@ -183,13 +183,15 @@ type CPU struct {
183183
// Pids for Linux cgroup 'pids' resource management (Linux 4.3)
184184
type Pids struct {
185185
// Maximum number of PIDs. A value < 0 implies "no limit".
186-
Limit int64 `json:"limit"`
186+
Limit int `json:"limit"`
187187
}
188188

189189
// Network identification and priority configuration
190190
type Network struct {
191191
// Set class identifier for container's network packets
192-
ClassID string `json:"classId"`
192+
// this is actually a string instead of a uint64 to overcome the json
193+
// limitation of specifying hex numbers
194+
ClassID string `json:"classID"`
193195
// Set priority of network traffic for container
194196
Priorities []InterfacePriority `json:"priorities"`
195197
}

0 commit comments

Comments
 (0)