-
Notifications
You must be signed in to change notification settings - Fork 184
MBM MBA how to guide
Intel® Resource Director Technology (RDT) is designed to monitor and manage CPU resources and maintains performance of applications and VMs sharing CPU resources.
Intel® RDT includes monitoring and control technologies. Monitoring technologies include CMT (Cache Monitoring Technology), which monitors occupancy of last level cache, and MBM (Memory Bandwidth Monitoring). Control technologies include CAT (Cache Allocation Technology), CDP (Code Data Prioritization) and MBA (Memory Bandwidth Allocation).
MBA allows to limit memory bandwidth available to specified cores/processes.
MBM enables monitoring of memory traffic (to and from RAM) for a specified cores/processes.
MBA and MBM can be used to identify and manage applications that are over utilizing memory bandwidth and thus have negative impacts on other applications competing for this resource.
MBA has been introduced in Intel® Xeon® Scalable processors, while MBM in Intel® Xeon® D-15XX processors.
To see which software release is needed for each feature, visit https://github.com/intel/intel-cmt-cat/wiki\#msr-interface-feature-support and https://github.com/intel/intel-cmt-cat/wiki\#os-interface-feature-support
PQoS allows to monitor memory bandwidth per core or per process/task.
Local memory bandwidth is bandwidth sourced from local memory controllers, on the same package. Remote memory bandwidth is a bandwidth sourced from remote memory controllers.
For example, if there is a two-socket server, each die has its own memory controller. If CPU from the first socket reads data from a controller located in the same socket, then it counts as local memory bandwidth. If it reads memory from a controller located in the second socket, then it counts as remote memory bandwidth. This guide uses MBL and MBR as abbreviations of local and remote memory bandwidth, respectively.
To monitor memory bandwidth usage for a specified core(s) run the following command:
sudo pqos [-I] -m 'mbl:cores;mbr:cores
For example:
sudo pqos -m 'mbl:1,3-4;mbr:1,3-4'
or
sudo pqos -I -m 'mbl:1,3-4;mbr:1,3-4'
It will print the current local and remote memory bandwidth usage.
sudo pqos -m 'mbl:1,3-4;mbr:1,3-4'
TIME 2019-05-07 08:01:16
CORE IPC MISSES MBL\[MB/s\] MBR\[MB/s\]
1 1.17 124254k 15161.8 0.1
3 1.54 38010k 4595.0 0.1
4 1.41 135584k 8210.6 0.2
PQoS also allows to monitor memory bandwidth usage per process/task. For that it requires OS interface (-I). To monitor memory bandwidth usage for process/task run the following command:
sudo pqos -I -p 'mbl:pids;mbr:pids'
For example:
sudo pqos -I -p 'mbl:2357;mbr:2357'
PQoS allows to configure MBA per class of service (COS). COS can be associated with multiple cores whereas each core can have only 1 COS.
Before allocating memory bandwidth, it is useful to check how many classes of service are available for MBA. PQoS utility can print this number using –d flag:
sudo pqos -d
Hardware capabilities
Monitoring
Cache Monitoring Technology (CMT) events:
LLC Occupancy (LLC)
Memory Bandwidth Monitoring (MBM) events:
Total Memory Bandwidth (TMEM)
Local Memory Bandwidth (LMEM)
Remote Memory Bandwidth (RMEM) (calculated)
PMU events:
Instructions/Clock (IPC)
LLC misses
Allocation
Cache Allocation Technology (CAT)
L3 CAT
CDP: enabled
Num COS: 16
Memory Bandwidth Allocation (MBA)
Num COS: 8
Please note that the number of classes of service available for Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA) might be different.
Before allocating memory bandwidth to a core, it must be first associated with a COS. It can be done by running:
sudo pqos [-I] -a 'core:cos=cores'
For example:
sudo pqos -a 'core:1=3'
or
sudo pqos -I -a 'core:1=3'
This command will associate COS 1 with core 3. Please note it is important which interface is used: MSR (a default one) or OS (-I). It is important not to mix use of interfaces as this can lead to RDT configuration corruption.
$ sudo pqos -a 'core:1=3'
NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
Allocation configuration altered.
After associating cores with COS, now MBA can be configured using the associated COS by running:
sudo pqos -e 'mba:cos=mba'
For example:
sudo pqos -e 'mba:1=20'
or
sudo pqos -I -e 'mba:1=20'
The command above will restrict memory bandwidth of COS1 to 20%. In other words 80% delay will be added to cores associated with COS1.
$ sudo pqos -e 'mba:1=20'
NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
SOCKET 0 MBA COS1 =\> 20% requested, 20% applied
SOCKET 1 MBA COS1 =\> 20% requested, 20% applied
Allocation configuration altered.
Unlike configuring MBA in percentage, it is also possible to allocate precise value in megabytes per second (MB/s). For that MBA works together with MBM to dynamically adjust memory bandwidth to the level specified by MBM in MB/s. Please note that the operating system must support MBA CTRL. To check if your environment is able to use MBA CTRL, please refer to https://github.com/intel/intel-cmt-cat/blob/master/README#L248. It works only on OS interface.
It requires resctrl mounted with mba_MBps flag. It can be also enabled by resetting allocation settings with mbaCtrl-on option:
sudo pqos -I -R mbaCtrl-on
Note: This will reset all allocation configuration including COS-cores association.
MBA can be configured in MB/s by running:
sudo pqos -I -e 'mba_max:cos=mba'
For example:
sudo pqos -I -e 'mba_max:1=4000'
It will set the cap (maximum level) to 4000 MB/s of memory bandwidth for COS 1.
$ sudo pqos -I -e 'mba_max:1=4000'
NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
SOCKET 0 MBA COS1 =\> 4000 MBps
SOCKET 1 MBA COS1 =\> 4000 MBps
Allocation configuration altered.
membw is a tool to simulate memory usage by performing memory intensive
operations. It is used in the examples below. To generate memory traffic
run this command:
sudo membw –c <CPU> -b <BANDWIDTH> <OPERATION>
For example:
sudo ./membw -c 3 -b 10000 --read
will generate memory traffic for core 3 of approximately 10000 MB/s using x86 loads.
$ sudo ./membw -c 3 -b 10000 --read
- THREAD logical core id: 3, memory bandwidth [MB]: 10000, starting...
- Run task on a specified core as an example you may run
membw
from tools/ on core 3:
sudo ./membw -c 3 -b 10000 --read
- Monitor MBL and MBR on core 3:
sudo pqos -m 'mbl:3;mbr:3'
- Associate COS 1 with core 3:
sudo pqos -a 'core:1=3'
- Configure MBA on COS 1:
sudo pqos -e 'mba:1=20'
- Monitor MBL and MBR on core 3:
sudo pqos -m 'mbl:3;mbr:3'
It is expected that pqos reports lower memory bandwidth usage that in step 2.
Note: Above commands can be also run with -I flag to make use of OS interface.
- Reset allocation configuration and enable MBA CTRL:
sudo pqos -I -R mbaCtrl-on
- Run task on a specified core As an example you may run
membw
from tools/ on core 3:
sudo ./membw -c 3 -b 10000 --read
- Monitor MBL and MBR on core 3:
sudo pqos -I -m 'mbl:3;mbr:3'
- Associate COS 1 with core 3:
sudo pqos -I -a 'core:1=3'
- Configure MBA using MBA CTRL on COS 1:
sudo pqos -I -e 'mba_max:1=4000'
- Monitor MBL and MBR on core 3:
sudo pqos -I -m 'mbl:3;mbr:3'
It is expected that pqos reports much lower memory bandwidth usage that in step 3 (approximately 4000 MB/s). The value sometimes can be greater than the MBA setting in MB/s due to the fact that the underlying mechanism must take into account MBA granularity the platform provides.
- Run an application on a specified core. As an example you may run
membw
from tools/ on core 3:
sudo ./membw -c 3 -b 10000 --read
- Find PID of the running application
pidof membw
Let’s assume the application has PID 1234.
- Monitor MBL and MBR for PID 1234:
sudo pqos -I -p 'mbl:1234;mbr:1234'
- Associate COS 1 with PID 1234:
sudo pqos -I -a 'pid:1=1234'
- Configure MBA on COS 1:
sudo pqos -I -e 'mba:1=20'
- Monitor MBL and MBR for PID 1234:
sudo pqos -I -p 'mbl:1234;mbr:1234'
It is expected that pqos reports much lower memory bandwidth usage that in step 3 (about 20% of the previously reported values).
- Reset allocation configuration and enable MBA CTRL:
sudo pqos -I -R mbaCtrl-on
- Run an application on a specified core. As an example you may run
membw
from tools/ on core 3:
sudo ./membw -c 3 -b 10000 --read
- Find PID of the running application
pidof membw
Let’s assume the application has PID 1234.
- Monitor MBL and MBR for PID 1234:
sudo pqos -I -p 'mbl:1234;mbr:1234'
- Associate COS 1 with PID 1234:
sudo pqos -I -a 'pid:1=1234'
- Configure MBA using MBA CTRL on COS 1:
sudo pqos -I -e 'mba_max:1=4000'
- Monitor MBL and MBR for PID 1234:
sudo pqos -I -p 'mbl:1234;mbr:1234'
It is expected that pqos reports much lower memory bandwidth usage that in step 4 (approximately 4000 MB/s). The value sometimes can be greater than the MBA setting in MB/s due to the fact that the underlying mechanism must take into account MBA granularity the platform provides.
MBA can be also configured using rdtset utility instead of pqos. rdtset is a taskset-like application that provides easy to use interface for running applications and configuring RDT for them. Once the application is no longer running, rdtset will automatically revert RDT configuration.
MBA configuration via pqos can be replaced with rdtset. The following command can be used:
sudo rdtset -t 'mba=20;cpu=3' –c 3 ./membw –c 3 –b 10000 –read
replaces
sudo pqos -a 'core:<COS>=3'
sudo pqos -e 'mba:<COS>=20'
sudo ./membw -c 3 -b 10000 –read
sudo rdtset -t 'mba=20' -I -p 1234
replaces
sudo pqos –I -a 'pid:<COS>=1234'
sudo pqos –I -e 'mba:<COS>=20'
sudo ./membw -c 3 -b 10000 –read # assuming this process has PID 1234
sudo rdtset -t 'mba_max=4000;cpu=3' –c 3 ./membw –c 3 –b 10000 –read
replaces
sudo pqos –I -a 'core:<COS>=3'
sudo pqos –I -e 'mba_max:<COS>=4000'
sudo ./membw -c 3 -b 10000 –read
sudo rdtset -t 'mba_max=4000' –I –p 1234
replaces
sudo pqos –I -a 'pid:<COS>=1234'
sudo pqos –I -e 'mba_max:<COS>=4000'
sudo ./membw -c 3 -b 10000 –read # assuming this process has PID 1234