Skip to content

Latest commit

 

History

History
496 lines (358 loc) · 25.1 KB

INSTALL_GUIDE_LINUX.md

File metadata and controls

496 lines (358 loc) · 25.1 KB

Linux Installation Guide

Installation instructions for the AWS Cloud Digital Interface (CDI) SDK on Linux instances.

In addition to filing AWS CDI SDK bugs/issues, please use the discussion pages for Q&A, Ideas, Show and Tell or other General topics so the whole community can benefit.



Upgrading from previous releases

Upgrading from CDI SDK 2.4 or earlier

  • Must install the latest EFA driver and REBOOT the system.
  • Must download and install a second version of libfabric, which requires rdma-core. See steps in the libfabric section of Install the AWS CDI SDK.

Note: When adding CDI-SDK libraries to your application's build process, ensure you don't link directly with the libfabric libraries. You should only be linking to libcdisdk.so.x. The CDI-SDK dynamically loads the libfabric libraries.


Create an EFA enabled instance

Follow the steps in create an EFA-enabled instance.

Install EFA driver

For Linux installations, follow step 3 in launch an Elastic Fabric Adapter (EFA)-capable instance, with the following additions to the step Install the EFA software:

Note: EFA installer version v1.22.1 is the last installer version that will support Ubuntu 18.04 LTS (Bionic Beaver) as this distribution reaches the end of 5-year standard support by May 2023. For details, see Ubuntu lifecycle documentation. Future versions of EFA installer will not contain support for Ubuntu 18.04 LTS.

  • During Connect to the instance you launched, once your instance has booted, you can find the public IP you requested earlier by clicking on the instance and looking for “IPv4 Public IP” in the pane below your instance list. Use that IP address to SSH to your new instance.

    • If you cannot connect (connection times out), you may have forgotten to add an SSH rule to your security group, or you may need to set up an internet gateway for your Virtual Private Cloud (VPC) and add a route to your subnet. You can find more information about setting up SSH access and connecting to the instance at accessing Linux instances.
    • The default user name for Amazon Linux 2 instances is ec2-user, on CentOS it’s centos, and on Ubuntu, it’s ubuntu.
  • During Install the EFA software., install the minimum version of the EFA software using the command shown below. This will not install libfabric; the AWS CDI SDK uses its own versions.

    sudo ./efa_installer.sh -y --minimal Note: The SDK may be installed on a system that does not contain an EFA adapter. However, the system can only be used for building EFA enabled applications or testing applications that use the socket adapters. Use this command to skip EFA kernel installation and device verification:

    sudo ./efa_installer.sh -y --minimal --skip-kmod --no-verify

  1. During Confirm that the EFA software components were successfully installed, note that the fi_info command does not work when installing the minimum version of EFA software. You will perform this check later after installing the AWS CDI SDK.

Install Package Dependencies

Installation of dependent packages is required before building the AWS CDI SDK:

  • CentOS 7 and Amazon Linux 2:

    sudo yum update -y
    sudo yum -y install gcc-c++ make cmake3 curl-devel openssl-devel autoconf automake libtool doxygen ncurses-devel unzip git
  • Rocky Linux 8:

    sudo dnf update -y
    sudo dnf install epel-release -y
    sudo dnf config-manager --set-enabled powertools
    sudo dnf -y install gcc-c++ make cmake3 curl-devel openssl-devel autoconf automake libtool doxygen ncurses-devel unzip git
  • Rocky Linux 9:

    sudo dnf update -y
    sudo dnf install epel-release -y
    sudo dnf config-manager --set-enabled crb
    sudo dnf -y install gcc-c++ make cmake3 curl-devel openssl-devel autoconf automake libtool doxygen ncurses-devel unzip git
  • Ubuntu:

    sudo apt update
    sudo apt-get upgrade -y
    sudo apt-get -y install build-essential libncurses-dev autoconf automake libtool cmake doxygen libcurl4-openssl-dev libssl-dev uuid-dev zlib1g-dev libpulse-dev unzip git

Install AWS CDI SDK

  1. Download AWS CDI SDK from GitHub.

    Note: Instructions to install git can be found at Getting Started Installing Git.

    Caution: Do not install a new version of the AWS CDI SDK over an old one.

    mkdir <install_dir>
    cd <install_dir>
    git clone https://github.com/aws/aws-cdi-sdk
  2. Install libfabric versions. The folder libfabric is used for libfabric v1.9, which is required to support AWS CDI SDK versions prior to 3.x.x. The folder libfabric_new is used for libfabric versions after v1.9, which is required to support AWS CDI SDK versions 3.x.x and later.

    git clone --single-branch --branch v1.9.x-cdi https://github.com/aws/libfabric libfabric
    git clone --single-branch --branch v1.15.2 https://github.com/ofiwg/libfabric libfabric_new

    Note: libfabric_new also requires the development version of rdma-core v27 or later, which is installed as part of the EFA Driver installation described above using efa_installer.sh.

The <install_dir> should now contain the folder hierarchy as shown below:

  <install_dir>/aws-cdi-sdk
  <install_dir>/libfabric
  <install_dir>/libfabric_new
  • libfabric is a customized version of the open-source libfabric project based on libfabric v1.9.x.
  • libfabric_new is a mainline version of the open-source libfabric project.
  • aws-cdi-sdk is the directory that contains the source code for the AWS CDI SDK and its test application. The contents of the AWS CDI SDK include the following directories: doc, include, src, and proj.
    • The root folder contains an overall Makefile that builds libfabric, the AWS CDI SDK, the test applications, and the Doxygen-generated HTML documentation. The build of libfabric and the AWS CDI SDK produce shared libraries, libfabric.so.x and libcdisdk.so.x.x, along with the test applications: cdi_test, cdi_test_min_rx, cdi_test_min_tx, and cdi_test_unit.
      • The doc folder contains Doxygen source files used to generate the AWS CDI SDK HTML documentation.
        • The documentation builds to this path: aws-cdi-sdk/build/documentation
      • The include directory exposes the API to the AWS CDI SDK in C header files.
      • The src directory contains the source code for the implementation of the AWS CDI SDK.
        • AWS CDI SDK: aws-cdi-sdk/src/cdi
        • Common utilities: aws-cdi-sdk/src/common
        • Test application: aws-cdi-sdk/src/test
        • Minimal test applications: aws-cdi-sdk/src/test_minimal
        • Common test application utilities: aws-cdi-sdk/src/test_common
      • The proj directory contains the Microsoft Visual Studio project solution for Windows development.
      • The build directory is generated after a make of the project is performed. The build folder contains the generated libraries listed above along with the generated HTML documentation.

Install AWS CloudWatch and AWS CLI

AWS CloudWatch is required to build the AWS CDI SDK, and is provided in AWS SDK C++.

Install AWS CLI

AWS CLI is required to setup configuration files for AWS CloudWatch.

  1. Run the following command to determine if AWS CLI is installed:

    aws --version

    If AWS CLI is installed, you will see a response that looks something like this:

    aws-cli/2.8.9 Python/3.9.11 Linux/5.15.0-1019-aws exe/x86_64.ubuntu.22 prompt/off
  2. If AWS CLI is not installed, perform the steps in install AWS CLI (version 2).

  3. Create an IAM User with CloudWatch and performance metrics permissions.

    • Navigate to the AWS console IAM Policies

      • Select Create policy and then select JSON.
      • The minimum security IAM policy is below:
      • Note: You may receive an IAM Policy editor warning such as: Errors: Invalid Action on the line with "mediaconnect:PutMetricGroups", which can be safely ignored.
      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": [
                      "cloudwatch:PutMetricData",
                      "mediaconnect:PutMetricGroups"
                  ],
                  "Resource": "*"
              }
          ]
      }
    • To create an IAM User click on Users under Access management.

      • Select Add user and provide a name and select Programmatic access.
      • Select Next: Permissions and then select Create group to create a new user group.
      • Put in a Group name for the new group and select the policies for the group.
        • Select the policy that was made in the step above for CloudWatch access.
        • In order for the AWS CDI SDK to be able to connect to the performance metrics service, you must also add mediaconnect:PutMetricGroups permission as per the example policy above. Note: This may result in an IAM warning such as: IAM does not recognize one or more actions. The action name might include a typo or might be part of a previewed or custom service, which can be safely ignored.
        • Select Create group
          • Select Next:Tags the select Next:Review.
          • Select Create user
      • Save your Access Key ID and Secret Access Key from this IAM User creation for use in step 5.
  4. Next, configure AWS CLI:

    aws configure
  5. When prompted for the Access Key and Secret Access Key, enter these keys from the IAM role you created in step 3.

  6. If successful, two files are created in the ~/.aws/ directory: config and credentials. Verify they exist by using:

    ls ~/.aws

Download AWS SDK

AWS SDK C++ will be compiled during the build process of AWS CDI SDK, so it is only necessary to download it.

Note: The AWS SDK for C++ is essential for metrics gathering functions of AWS CDI SDK to operate properly. Although not recommended, see these instructions to learn how to optionally disable metrics gathering.

  1. Verify that the necessary requirements are met and libraries installed for AWS SDK for C++.
  2. Download AWS SDK for C++ source code.
    • Note: This procedure replaces these instructions: "Setting Up AWS SDK for C++".

    • Commands to clone AWS SDK for C++ from git for Amazon Linux 2 and Linux CentOS 7 are listed below:

      cd <install_dir>
      git clone --recurse-submodules https://github.com/aws/aws-sdk-cpp

The <install_dir> should now contain the folder hierarchy as shown below:

<install_dir>/aws-cdi-sdk
<install_dir>/aws-sdk-cpp
<install_dir>/libfabric
<install_dir>/libfabric_new

Build CDI libraries and test applications

Run the Makefile in aws-cdi-sdk to build the static libraries and test application in debug mode. This step also automatically builds the necessary shared library files from AWS SDK C++ and links them to AWS CDI SDK, as well as generates the HTML documentation. You should run the debug build for ALL initial development as it will catch many problems (i.e. asserts). However, performance testing should always be done with the release build, which can be built without the DEBUG=y Make option.

Note: You need to specify the location of AWS SDK C++ when building AWS CDI SDK through the value of the AWS_SDK make variable.

The following commands build the DEBUG variety of the SDK:

cd aws-cdi-sdk/
make DEBUG=y AWS_SDK=<path to AWS SDK C++>

Note: A trailing / may be required on the path given in <path to AWS SDK C++> above. For example:

cd aws-cdi-sdk/
make DEBUG=y AWS_SDK=../aws-sdk-cpp/

Note: Pipe the StdOut/Err to a log file for future review/debug:

cd aws-cdi-sdk/
make DEBUG=y AWS_SDK=../aws-sdk-cpp/ 2>&1 | tee build.log

Note: If you experience library not found errors during linking, you may have to change the rpath in the Makefile from using $$ORIGIN to using an absolute path that points to the AWS CDI SDK lib folder (ie. <install path>/build/debug/lib).

After a successful compile, the locations for the results are at:

  • Test application: cdi_test is placed at aws-cdi-sdk/build/debug/bin
  • Minimal test applications: cdi_test_min_tx and cdi_test_min_rx are placed at aws-cdi-sdk/build/debug/bin
  • AWS CDI SDK, libcdi_libfabric_api and libfabric shared libraries libcdisdk.so.x.x, libfabric.so.x and libfabric_new.so.x are placed at aws-cdi-sdk/build/debug/lib.
  • HTML documentation can be found at aws-cdi-sdk/build/documentation

(Optional) Disable the display of performance metrics to your Amazon CloudWatch account

To disable the display of performance metrics to your Amazon CloudWatch account:

  • In the file src/cdi/configuration.h, comment out #define CLOUDWATCH_METRICS_ENABLED.

Note: For the change to take effect, the CDI SDK library and related applications must be rebuilt.


Enable huge pages

Applications that use AWS CDI SDK see a performance benefit when using huge pages. To enable huge pages:

  1. Edit /etc/sysctl.conf (you will likely need to use sudo with your edit command). Add the line vm.nr_hugepages = 1024 Note: If using more than 6 connections, you may have to use a larger value such as 2048.
  2. Issue the command sudo sysctl -p
  3. Check that huge pages have updated by issuing the command cat /proc/meminfo | grep Huge.

Validate the EFA environment

This section helps you to verify that the EFA interface is operational. Note: This assumes that you followed the EFA installation guide, and that both the aws-efa-installer and the CDI version of libfabric are in the following locations:

  • $HOME/aws-efa-installer
  • path/to/dir/libfabric, where path/to/dir is the location where you have installed the libfabric directory.

Run the following commands to verify that the EFA interface is operational, replacing path/to/dir with your actual path:

PATH=path/to/dir/libfabric/build/debug/util:$PATH
fi_info -p efa -t FI_EP_RDM

This command should return information about the Libfabric EFA interface. The following example shows the command output:

provider: efa
    fabric: EFA-fe80::4dd:7eff:fe99:4620
    domain: rdmap0s6-rdm
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_EFA

If successful, proceed with the next command:

cd $HOME/aws-efa-installer
./efa_test.sh

If the EFA is working properly, the following output displays in the console:

Starting server...
Starting client...
bytes   #sent   #ack     total       time     MB/sec    usec/xfer   Mxfers/sec
64      10      =10      1.2k        0.03s      0.05    1362.35       0.00
256     10      =10      5k          0.00s     12.86      19.90       0.05
1k      10      =10      20k         0.00s     58.85      17.40       0.06
4k      10      =10      80k         0.00s    217.29      18.85       0.05
64k     10      =10      1.2m        0.00s    717.02      91.40       0.01
1m      10      =10      20m         0.01s   2359.00     444.50       0.00

If you get an error, please review these Troubleshooting steps and check this issue: #48

Note: Although not required, the same test can be performed using the libfabric version located at libfabric_new.


Build the HTML documentation

The normal build process will create the documentation; however it is possible to just build the documentation alone. To do this, use the following command:

make docs docs_api

After this completes, use a web browser to navigate to build/documentation/index.html.


Creating additional instances

To create a new instance, create an Amazon Machine Image (AMI) of the existing instance and use it to launch additional instances as described below:

  1. To create an AMI, select the previously created instance in the EC2 console, and then in the Action menu, select Image and Templates -> Create Image. For details see Create AMI.

  2. In the EC2 console Under Images, select AMIs. Locate the AMI and wait until the Status shows available. It will may several minutes for the AMI to be created and become available for use. After it becomes available, select the AMI and use the Action menu to select Launch.

  3. Select the same instance type as the existing instance and select Next: Configure Instance Details.

  4. To Configure Instance Details, Add Storage, Add Tags and Configure Security Group follow the steps at Create an EFA-enabled instance.

  5. If access to this instance from outside the Amazon network is needed, enable Auto-assign public IP.

  6. Make sure to enable EFA by checking the Elastic Fabric Adapter checkbox here. Note: To enable the checkbox, you must select the subnet even if using the default subnet value.

  7. Amazon recommends putting EFA-enabled instances using AWS CDI SDK in a placement group, so select or create one under Placement Group – Add instance to placement group. The Placement Group Strategy should be set to cluster.

  8. The new instance will contain the host name stored in the AMI and not the private IP name used by the new instance. Edit /etc/hostname. The private IP name can be obtained from the AWS Console or use ifconfig on the instance. For example, if ifconfig shows 1.2.3.4 then the name should look something like (region may be different):

    ip-1-2-3-4.us-west-2.compute.internal
    

    This change requires a reboot. Depending on OS variant, there are commands that can be used to avoid a reboot. For example:

    sudo hostname <new-name>
    

Pinning AWS CDI SDK Poll Threads to Specific CPU Cores

On Linux, the transmit and receive poll threads should be pinned to specific CPU cores in order to prevent thread starvation resulting in poor packet transmission performance and other problems.

The CPU used for the poll thread is defined when creating a new CDI connection through the AWS CDI SDK API using a configuration setting called thread_core_num. For transmit connections the value is in the CdiTxConfigData structure when using the CdiAvmTxCreate(), CdiRawTxCreate() or CdiAvmTxStreamEndpointCreate() APIs. For receive connections the value is in the CdiRxConfigData structure when using the CdiAvmRxCreate() or CdiRawRxCreate() APIs.

In addition to defining which CPU core to use, the CDI enabled application must be launched using cset. If cset is not already installed, the steps shown below are for Amazon Linux 2, but should be similar for other distributions.

Note: cset cannot be used with Docker. See the next section for information on thread pinning with Docker.

  1. Obtain the cpuset package and install it. NOTE: Can be built from source at: https://github.com/lpechacek/cpuset/archive/refs/tags/v1.6.tar.gz

  2. Then run these steps:

    sudo yum install -y cpuset-1.6-1.noarch.rpm
    sudo yum install -y python-pip
    sudo pip install future configparser
    

Make sure the command shown below has been run first. This command will move kernel threads, so the CDI enabled application can use a specific set of CPU cores. In the example shown below, the CDI-enabled application will use CPU cores 1-24.

sudo cset shield -k on -c 1-24

On a system with 72 CPU cores (for example), output should look something like this:

cset: --> activating shielding:
cset: moving 1386 tasks from root into system cpuset...
[==================================================]%
cset: kthread shield activated, moving 45 tasks into system cpuset...
[==================================================]%
cset: **> 35 tasks are not movable, impossible to move
cset: "system" cpuset of CPUSPEC(0,25-71) with 1396 tasks running
cset: "user" cpuset of CPUSPEC(1-24) with 0 tasks running

To run the CDI enabled application, launch it using cset. An example command line is shown below:

sudo cset shield -e <application>

NOTE: The use of sudo requires root privileges and may not be desired for the application.

Additional Notes/Commands when using cset

Display current cpusets

To list cpusets, use this command:

cset set -l

NOTE: If docker shows up in the list you MUST remove it, otherwise trying to use any of the shield commands will fail. Use this command to remove it:

sudo cset set --destroy docker

Disable Thread Pinning (stop the shield)

This is required in order to use docker:

sudo cset shield --reset

Thread pinning applications running within Docker Containers

Below are some general tips and results of experimentation when using thread-pinning in CDI enabled applications within Docker containers.

  • Don't use hyper-threading. Intermittent problems will occur when hyper-threading is enabled. The first half of the available cores are the real cores, and the second half are the hyper-threading cores. Only use the first half.
  • Tried using isolcpus to do thread pinning at the kernel level, but were unsuccessful.
  • Tried using pthread_set_affinity to pin all other threads away from the CDI cores, but caused application instability.
  • Tried pinning CDI threads to a unique core, but this resulted in kernel lockups and CDI crashes.
  • Tried multiple custom AffinityManager experiments, one was utilizing the file-based method of pinning non CDI threads. All efforts were unsuccessful.
  • Any time we tried pinning CDI threads to less than three cores we ran into issues, so three is the minimum.

What worked: Once the AWS CDI SDK poll-threads were pinned to specific cores, we pinned all other application threads away from those cores using the AWS CDI SDK API CdiOsThreadCreatePinned().

Launching Docker Containers

When launching a Docker container, we typically use command lines such as the one shown below:

docker run --rm --shm-size=8g --security-opt seccomp=unconfined --cap-add net_raw --cap-add NET_ADMIN --tty --name [my_container] --tmpfs=/var/your_generic_tmp --ulimit rtprio=100 --ulimit core=-1 --cpu-rt-runtime=30645 --cap-add SYS_NICE

NOTE: The docker command shown above was run with docker version 20.10.4.

If you have additional findings, please start a Show and Tell Discussion so others may also benefit.

[Return to README]