Installation instructions for the AWS Cloud Digital Interface (CDI) SDK on Linux instances.
In addition to filing AWS CDI SDK bugs/issues, please use the discussion pages for Q&A, Ideas, Show and Tell or other General topics so the whole community can benefit.
- Linux Installation Guide
- Upgrading from previous releases
- Create an EFA enabled instance
- Install EFA driver
- Install Package Dependencies
- Install AWS CDI SDK
- Install AWS CloudWatch and AWS CLI
- Build CDI libraries and test applications
- Enable huge pages
- Validate the EFA environment
- Build the HTML documentation
- Creating additional instances
- Pinning AWS CDI SDK Poll Threads to Specific CPU Cores
Upgrading from CDI SDK 2.4 or earlier
- Must install the latest EFA driver and REBOOT the system.
- Must download and install a second version of libfabric, which requires rdma-core. See steps in the libfabric section of Install the AWS CDI SDK.
Note: When adding CDI-SDK libraries to your application's build process, ensure you don't link directly with the libfabric libraries. You should only be linking to libcdisdk.so.x. The CDI-SDK dynamically loads the libfabric libraries.
Follow the steps in create an EFA-enabled instance.
For Linux installations, follow step 3 in launch an Elastic Fabric Adapter (EFA)-capable instance, with the following additions to the step Install the EFA software:
Note: EFA installer version v1.22.1 is the last installer version that will support Ubuntu 18.04 LTS (Bionic Beaver) as this distribution reaches the end of 5-year standard support by May 2023. For details, see Ubuntu lifecycle documentation. Future versions of EFA installer will not contain support for Ubuntu 18.04 LTS.
-
During Connect to the instance you launched, once your instance has booted, you can find the public IP you requested earlier by clicking on the instance and looking for “IPv4 Public IP” in the pane below your instance list. Use that IP address to SSH to your new instance.
- If you cannot connect (connection times out), you may have forgotten to add an SSH rule to your security group, or you may need to set up an internet gateway for your Virtual Private Cloud (VPC) and add a route to your subnet. You can find more information about setting up SSH access and connecting to the instance at accessing Linux instances.
- The default user name for Amazon Linux 2 instances is
ec2-user
, on CentOS it’scentos
, and on Ubuntu, it’subuntu
.
-
During Install the EFA software., install the minimum version of the EFA software using the command shown below. This will not install libfabric; the AWS CDI SDK uses its own versions.
sudo ./efa_installer.sh -y --minimal
Note: The SDK may be installed on a system that does not contain an EFA adapter. However, the system can only be used for building EFA enabled applications or testing applications that use the socket adapters. Use this command to skip EFA kernel installation and device verification:sudo ./efa_installer.sh -y --minimal --skip-kmod --no-verify
- During Confirm that the EFA software components were successfully installed, note that the
fi_info
command does not work when installing the minimum version of EFA software. You will perform this check later after installing the AWS CDI SDK.
Installation of dependent packages is required before building the AWS CDI SDK:
-
CentOS 7 and Amazon Linux 2:
sudo yum update -y sudo yum -y install gcc-c++ make cmake3 curl-devel openssl-devel autoconf automake libtool doxygen ncurses-devel unzip git
-
Rocky Linux 8:
sudo dnf update -y sudo dnf install epel-release -y sudo dnf config-manager --set-enabled powertools sudo dnf -y install gcc-c++ make cmake3 curl-devel openssl-devel autoconf automake libtool doxygen ncurses-devel unzip git
-
Rocky Linux 9:
sudo dnf update -y sudo dnf install epel-release -y sudo dnf config-manager --set-enabled crb sudo dnf -y install gcc-c++ make cmake3 curl-devel openssl-devel autoconf automake libtool doxygen ncurses-devel unzip git
-
Ubuntu:
sudo apt update sudo apt-get upgrade -y sudo apt-get -y install build-essential libncurses-dev autoconf automake libtool cmake doxygen libcurl4-openssl-dev libssl-dev uuid-dev zlib1g-dev libpulse-dev unzip git
-
Download AWS CDI SDK from GitHub.
Note: Instructions to install git can be found at Getting Started Installing Git.
Caution: Do not install a new version of the AWS CDI SDK over an old one.
mkdir <install_dir> cd <install_dir> git clone https://github.com/aws/aws-cdi-sdk
-
Install libfabric versions. The folder
libfabric
is used for libfabric v1.9, which is required to support AWS CDI SDK versions prior to 3.x.x. The folderlibfabric_new
is used for libfabric versions after v1.9, which is required to support AWS CDI SDK versions 3.x.x and later.git clone --single-branch --branch v1.9.x-cdi https://github.com/aws/libfabric libfabric git clone --single-branch --branch v1.15.2 https://github.com/ofiwg/libfabric libfabric_new
Note: libfabric_new also requires the development version of rdma-core v27 or later, which is installed as part of the EFA Driver installation described above using
efa_installer.sh
.
The <install_dir> should now contain the folder hierarchy as shown below:
<install_dir>/aws-cdi-sdk
<install_dir>/libfabric
<install_dir>/libfabric_new
- libfabric is a customized version of the open-source libfabric project based on libfabric v1.9.x.
- libfabric_new is a mainline version of the open-source libfabric project.
- aws-cdi-sdk is the directory that contains the source code for the AWS CDI SDK and its test application. The contents of the AWS CDI SDK include the following directories: doc, include, src, and proj.
- The root folder contains an overall Makefile that builds libfabric, the AWS CDI SDK, the test applications, and the Doxygen-generated HTML documentation. The build of libfabric and the AWS CDI SDK produce shared libraries,
libfabric.so.x
andlibcdisdk.so.x.x
, along with the test applications:cdi_test
,cdi_test_min_rx
,cdi_test_min_tx
, andcdi_test_unit
.- The doc folder contains Doxygen source files used to generate the AWS CDI SDK HTML documentation.
- The documentation builds to this path: aws-cdi-sdk/build/documentation
- The include directory exposes the API to the AWS CDI SDK in C header files.
- The src directory contains the source code for the implementation of the AWS CDI SDK.
- AWS CDI SDK:
aws-cdi-sdk/src/cdi
- Common utilities:
aws-cdi-sdk/src/common
- Test application:
aws-cdi-sdk/src/test
- Minimal test applications:
aws-cdi-sdk/src/test_minimal
- Common test application utilities:
aws-cdi-sdk/src/test_common
- AWS CDI SDK:
- The proj directory contains the Microsoft Visual Studio project solution for Windows development.
- The build directory is generated after a make of the project is performed. The build folder contains the generated libraries listed above along with the generated HTML documentation.
- The doc folder contains Doxygen source files used to generate the AWS CDI SDK HTML documentation.
- The root folder contains an overall Makefile that builds libfabric, the AWS CDI SDK, the test applications, and the Doxygen-generated HTML documentation. The build of libfabric and the AWS CDI SDK produce shared libraries,
AWS CloudWatch is required to build the AWS CDI SDK, and is provided in AWS SDK C++.
AWS CLI is required to setup configuration files for AWS CloudWatch.
-
Run the following command to determine if AWS CLI is installed:
aws --version
If AWS CLI is installed, you will see a response that looks something like this:
aws-cli/2.8.9 Python/3.9.11 Linux/5.15.0-1019-aws exe/x86_64.ubuntu.22 prompt/off
-
If AWS CLI is not installed, perform the steps in install AWS CLI (version 2).
-
Create an IAM User with CloudWatch and performance metrics permissions.
-
Navigate to the AWS console IAM Policies
- Select Create policy and then select JSON.
- The minimum security IAM policy is below:
- Note: You may receive an IAM Policy editor warning such as:
Errors: Invalid Action
on the line with"mediaconnect:PutMetricGroups"
, which can be safely ignored.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "cloudwatch:PutMetricData", "mediaconnect:PutMetricGroups" ], "Resource": "*" } ] }
-
To create an IAM User click on Users under Access management.
- Select Add user and provide a name and select Programmatic access.
- Select Next: Permissions and then select Create group to create a new user group.
- Put in a Group name for the new group and select the policies for the group.
- Select the policy that was made in the step above for CloudWatch access.
- In order for the AWS CDI SDK to be able to connect to the performance metrics service, you must also add
mediaconnect:PutMetricGroups
permission as per the example policy above. Note: This may result in an IAM warning such as:IAM does not recognize one or more actions. The action name might include a typo or might be part of a previewed or custom service
, which can be safely ignored. - Select Create group
- Select Next:Tags the select Next:Review.
- Select Create user
- Save your Access Key ID and Secret Access Key from this IAM User creation for use in step 5.
-
-
Next, configure AWS CLI:
aws configure
-
When prompted for the Access Key and Secret Access Key, enter these keys from the IAM role you created in step 3.
-
If successful, two files are created in the
~/.aws/
directory:config
andcredentials
. Verify they exist by using:ls ~/.aws
AWS SDK C++ will be compiled during the build process of AWS CDI SDK, so it is only necessary to download it.
Note: The AWS SDK for C++ is essential for metrics gathering functions of AWS CDI SDK to operate properly. Although not recommended, see these instructions to learn how to optionally disable metrics gathering.
- Verify that the necessary requirements are met and libraries installed for AWS SDK for C++.
- Download AWS SDK for C++ source code.
-
Note: This procedure replaces these instructions: "Setting Up AWS SDK for C++".
-
Commands to clone AWS SDK for C++ from git for Amazon Linux 2 and Linux CentOS 7 are listed below:
cd <install_dir> git clone --recurse-submodules https://github.com/aws/aws-sdk-cpp
-
The <install_dir> should now contain the folder hierarchy as shown below:
<install_dir>/aws-cdi-sdk
<install_dir>/aws-sdk-cpp
<install_dir>/libfabric
<install_dir>/libfabric_new
Run the Makefile in aws-cdi-sdk to build the static libraries and test application in debug mode. This step also automatically builds the necessary shared library files from AWS SDK C++ and links them to AWS CDI SDK, as well as generates the HTML documentation. You should run the debug build for ALL initial development as it will catch many problems (i.e. asserts). However, performance testing should always be done with the release build, which can be built without the DEBUG=y Make option.
Note: You need to specify the location of AWS SDK C++ when building AWS CDI SDK through the value of the AWS_SDK
make variable.
The following commands build the DEBUG variety of the SDK:
cd aws-cdi-sdk/
make DEBUG=y AWS_SDK=<path to AWS SDK C++>
Note: A trailing /
may be required on the path given in <path to AWS SDK C++> above. For example:
cd aws-cdi-sdk/
make DEBUG=y AWS_SDK=../aws-sdk-cpp/
Note: Pipe the StdOut/Err to a log file for future review/debug:
cd aws-cdi-sdk/
make DEBUG=y AWS_SDK=../aws-sdk-cpp/ 2>&1 | tee build.log
Note: If you experience library not found errors during linking, you may have to change the rpath in the Makefile from using $$ORIGIN to using an absolute path that points to the AWS CDI SDK lib folder (ie. <install path>/build/debug/lib
).
After a successful compile, the locations for the results are at:
- Test application:
cdi_test
is placed ataws-cdi-sdk/build/debug/bin
- Minimal test applications:
cdi_test_min_tx
andcdi_test_min_rx
are placed ataws-cdi-sdk/build/debug/bin
- AWS CDI SDK, libcdi_libfabric_api and libfabric shared libraries
libcdisdk.so.x.x
,libfabric.so.x
andlibfabric_new.so.x
are placed ataws-cdi-sdk/build/debug/lib.
- HTML documentation can be found at
aws-cdi-sdk/build/documentation
To disable the display of performance metrics to your Amazon CloudWatch account:
- In the file
src/cdi/configuration.h
, comment out#define CLOUDWATCH_METRICS_ENABLED
.
Note: For the change to take effect, the CDI SDK library and related applications must be rebuilt.
Applications that use AWS CDI SDK see a performance benefit when using huge pages. To enable huge pages:
- Edit
/etc/sysctl.conf
(you will likely need to use sudo with your edit command). Add the linevm.nr_hugepages = 1024
Note: If using more than 6 connections, you may have to use a larger value such as 2048. - Issue the command
sudo sysctl -p
- Check that huge pages have updated by issuing the command
cat /proc/meminfo | grep Huge
.
This section helps you to verify that the EFA interface is operational. Note: This assumes that you followed the EFA installation guide, and that both the aws-efa-installer and the CDI version of libfabric are in the following locations:
$HOME/aws-efa-installer
path/to/dir/libfabric
, wherepath/to/dir
is the location where you have installed the libfabric directory.
Run the following commands to verify that the EFA interface is operational, replacing path/to/dir with your actual path:
PATH=path/to/dir/libfabric/build/debug/util:$PATH
fi_info -p efa -t FI_EP_RDM
This command should return information about the Libfabric EFA interface. The following example shows the command output:
provider: efa
fabric: EFA-fe80::4dd:7eff:fe99:4620
domain: rdmap0s6-rdm
version: 2.0
type: FI_EP_RDM
protocol: FI_PROTO_EFA
If successful, proceed with the next command:
cd $HOME/aws-efa-installer
./efa_test.sh
If the EFA is working properly, the following output displays in the console:
Starting server...
Starting client...
bytes #sent #ack total time MB/sec usec/xfer Mxfers/sec
64 10 =10 1.2k 0.03s 0.05 1362.35 0.00
256 10 =10 5k 0.00s 12.86 19.90 0.05
1k 10 =10 20k 0.00s 58.85 17.40 0.06
4k 10 =10 80k 0.00s 217.29 18.85 0.05
64k 10 =10 1.2m 0.00s 717.02 91.40 0.01
1m 10 =10 20m 0.01s 2359.00 444.50 0.00
If you get an error, please review these Troubleshooting steps and check this issue: #48
Note: Although not required, the same test can be performed using the libfabric version located at libfabric_new
.
The normal build process will create the documentation; however it is possible to just build the documentation alone. To do this, use the following command:
make docs docs_api
After this completes, use a web browser to navigate to build/documentation/index.html
.
To create a new instance, create an Amazon Machine Image (AMI) of the existing instance and use it to launch additional instances as described below:
-
To create an AMI, select the previously created instance in the EC2 console, and then in the Action menu, select Image and Templates -> Create Image. For details see Create AMI.
-
In the EC2 console Under Images, select AMIs. Locate the AMI and wait until the Status shows available. It will may several minutes for the AMI to be created and become available for use. After it becomes available, select the AMI and use the Action menu to select Launch.
-
Select the same instance type as the existing instance and select Next: Configure Instance Details.
-
To Configure Instance Details, Add Storage, Add Tags and Configure Security Group follow the steps at Create an EFA-enabled instance.
-
If access to this instance from outside the Amazon network is needed, enable Auto-assign public IP.
-
Make sure to enable EFA by checking the Elastic Fabric Adapter checkbox here. Note: To enable the checkbox, you must select the subnet even if using the default subnet value.
-
Amazon recommends putting EFA-enabled instances using AWS CDI SDK in a placement group, so select or create one under Placement Group – Add instance to placement group. The Placement Group Strategy should be set to cluster.
-
The new instance will contain the host name stored in the AMI and not the private IP name used by the new instance. Edit /etc/hostname. The private IP name can be obtained from the AWS Console or use ifconfig on the instance. For example, if ifconfig shows 1.2.3.4 then the name should look something like (region may be different):
ip-1-2-3-4.us-west-2.compute.internal
This change requires a reboot. Depending on OS variant, there are commands that can be used to avoid a reboot. For example:
sudo hostname <new-name>
On Linux, the transmit and receive poll threads should be pinned to specific CPU cores in order to prevent thread starvation resulting in poor packet transmission performance and other problems.
The CPU used for the poll thread is defined when creating a new CDI connection through the AWS CDI SDK API using a configuration setting called thread_core_num. For transmit connections the value is in the CdiTxConfigData structure when using the CdiAvmTxCreate(), CdiRawTxCreate() or CdiAvmTxStreamEndpointCreate() APIs. For receive connections the value is in the CdiRxConfigData structure when using the CdiAvmRxCreate() or CdiRawRxCreate() APIs.
In addition to defining which CPU core to use, the CDI enabled application must be launched using cset. If cset is not already installed, the steps shown below are for Amazon Linux 2, but should be similar for other distributions.
Note: cset cannot be used with Docker. See the next section for information on thread pinning with Docker.
-
Obtain the cpuset package and install it. NOTE: Can be built from source at: https://github.com/lpechacek/cpuset/archive/refs/tags/v1.6.tar.gz
-
Then run these steps:
sudo yum install -y cpuset-1.6-1.noarch.rpm sudo yum install -y python-pip sudo pip install future configparser
Make sure the command shown below has been run first. This command will move kernel threads, so the CDI enabled application can use a specific set of CPU cores. In the example shown below, the CDI-enabled application will use CPU cores 1-24.
sudo cset shield -k on -c 1-24
On a system with 72 CPU cores (for example), output should look something like this:
cset: --> activating shielding:
cset: moving 1386 tasks from root into system cpuset...
[==================================================]%
cset: kthread shield activated, moving 45 tasks into system cpuset...
[==================================================]%
cset: **> 35 tasks are not movable, impossible to move
cset: "system" cpuset of CPUSPEC(0,25-71) with 1396 tasks running
cset: "user" cpuset of CPUSPEC(1-24) with 0 tasks running
To run the CDI enabled application, launch it using cset. An example command line is shown below:
sudo cset shield -e <application>
NOTE: The use of sudo requires root privileges and may not be desired for the application.
To list cpusets, use this command:
cset set -l
NOTE: If docker shows up in the list you MUST remove it, otherwise trying to use any of the shield commands will fail. Use this command to remove it:
sudo cset set --destroy docker
This is required in order to use docker:
sudo cset shield --reset
Below are some general tips and results of experimentation when using thread-pinning in CDI enabled applications within Docker containers.
- Don't use hyper-threading. Intermittent problems will occur when hyper-threading is enabled. The first half of the available cores are the real cores, and the second half are the hyper-threading cores. Only use the first half.
- Tried using isolcpus to do thread pinning at the kernel level, but were unsuccessful.
- Tried using pthread_set_affinity to pin all other threads away from the CDI cores, but caused application instability.
- Tried pinning CDI threads to a unique core, but this resulted in kernel lockups and CDI crashes.
- Tried multiple custom AffinityManager experiments, one was utilizing the file-based method of pinning non CDI threads. All efforts were unsuccessful.
- Any time we tried pinning CDI threads to less than three cores we ran into issues, so three is the minimum.
What worked: Once the AWS CDI SDK poll-threads were pinned to specific cores, we pinned all other application threads away from those cores using the AWS CDI SDK API CdiOsThreadCreatePinned().
When launching a Docker container, we typically use command lines such as the one shown below:
docker run --rm --shm-size=8g --security-opt seccomp=unconfined --cap-add net_raw --cap-add NET_ADMIN --tty --name [my_container] --tmpfs=/var/your_generic_tmp --ulimit rtprio=100 --ulimit core=-1 --cpu-rt-runtime=30645 --cap-add SYS_NICE
NOTE: The docker command shown above was run with docker version 20.10.4.
If you have additional findings, please start a Show and Tell Discussion so others may also benefit.