Skip to content

Commit

Permalink
Add support for basic features on CentOS 8
Browse files Browse the repository at this point in the history
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Install iptables
* Enable EPEL repo by default

## IntelMPI

The `environment-modules` package installation automatically creates
the `/usr/share/Modules/` folder, required by the intel_mpi recipe.

References:
* https://forums.centos.org/viewtopic.php?t=74035

## NFS

* Fix nfs logic in base_install by calling nfs::server4 recipe and providing correct idmap service name, nfs-idmapd
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116

## EBS

* Modify logic to get EBS device to volume id mapping. Specifically ec2_dev_2_volid.py and parallelcluster-ebsnvme-id are modified for CentOS 8 to use nvme-cli to retrieve volume id for a device following this guide: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
* parallelcluster-ebsnvme-id needs to accept the options -v/-b/-u to output volume id and block device information when called from ec2_dev_2_volid.py and attachVolume.py
* Modify centos8 specific parallelcluster-ebsnvme-id to output correct info based on option specified
* Centos8 specific ec2_dev_2_volid.py no longer needed and removed, as new parallelcluster-ebsnvme-id script will accept -v option to output volume id

## DNS configuration

* Configure DNS settings for CentOS 8. Note dhclient is not enabled by default, so need to provide modified NetworkManager config. Afterwards same logic as CentOS 7 can be used

# Torque

I'm using some compilation flags, like we're already doing for Ubuntu 18 and Amazon Linux 2.

`c++03` is the 1998 ISO C++ standard plus the 2003 technical corrigendum and some additional defect reports.
Note: The compilation succeeded even without the `c++03` flag. I'm adding it for coherency.
Source: https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html

`-fpermissive` downgrades some diagnostics about nonconformant code from errors to warnings.
Source: https://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/C_002b_002b-Dialect-Options.html#index-fpermissive-140

## FSx

* Use `package` in place of `yum_package` to support `dnf`.
* Added `gdisk` package, required by `update initramfs` action, called by `kernel_module 'lnet'` resource.
* Use `['platform_version'].to_f`  to compare minor version of the OS in place of `['platform_version'].split('.')[1].to_i`
  to be compatible with multiple major OS values.
* Explicitly added `x86_64` at the end of the `base_url` parameter, like we have for CentOS 7.
* Improved check for CentOS 7.5. and 7.6.
* Use `package` in place of `apt_package` to be aligned to other OSes.

References:
* Lustre installation guide: https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html
* gdisk issue: https://www.spinics.net/lists/centos-devel/msg18766.html
* package resource: https://docs.chef.io/resources/package/

## EFA

Mark CentOS 8 as unsupported OS for EFA. Supported AMIs are: Amazon Linux, Amazon Linux 2,
RHEL 7.6, RHEL 7.7, RHEL 7.8, CentOS 7, Ubuntu 16.04, and Ubuntu 18.04.
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-working-with.html

Signed-off-by: Enrico Usai <[email protected]>
Signed-off-by: Rex <[email protected]>
  • Loading branch information
enrico-usai committed Nov 3, 2020
1 parent 75bd85b commit 505c22d
Show file tree
Hide file tree
Showing 14 changed files with 245 additions and 44 deletions.
17 changes: 16 additions & 1 deletion attributes/default.rb
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,20 @@
libical-devel postgresql-devel postgresql-server sendmail libxml2-devel libglvnd-devel mdadm python python-pip
libssh2-devel libgcrypt-devel libevent-devel glibc-static bind-utils]
end
if node['platform_version'].to_i == 8
# Install python3 instead of unversioned python
default['cfncluster']['base_packages'].delete('python')
default['cfncluster']['base_packages'].delete('python-pip')
# Install iptables used in configure-pat.sh
# Install nvme-cli package used to retrieve info about EBS volumes in parallelcluster-ebsnvme-id
default['cfncluster']['base_packages'].push(%w[python3 python3-pip iptables nvme-cli])
end
if node['platform_version'].to_i >= 8
# gdisk required for FSx
# environment-modules required for IntelMPI
default['cfncluster']['base_packages'].push('gdisk', 'environment-modules')
end

default['cfncluster']['kernel_devel_pkg']['name'] = "kernel-lt-devel" if node['platform'] == 'centos' && node['platform_version'].to_i >= 6 && node['platform_version'].to_i < 7
default['cfncluster']['rhel']['extra_repo'] = 'rhui-REGION-rhel-server-releases-optional' if node['platform'] == 'redhat' && node['platform_version'].to_i >= 6 && node['platform_version'].to_i < 7
default['cfncluster']['rhel']['extra_repo'] = 'rhui-REGION-rhel-server-optional' if node['platform'] == 'redhat' && node['platform_version'].to_i >= 7
Expand Down Expand Up @@ -292,7 +306,8 @@
)
default['cfncluster']['lustre']['base_url'] = value_for_platform(
'centos' => {
'>=7.7' => "https://fsx-lustre-client-repo.s3.amazonaws.com/el/7.#{get_rhel7_kernel_minor_version}/x86_64/"
'>=8' => "https://fsx-lustre-client-repo.s3.amazonaws.com/el/8/x86_64/",
'default' => "https://fsx-lustre-client-repo.s3.amazonaws.com/el/7.#{get_rhel7_kernel_minor_version}/x86_64/"
},
'ubuntu' => { 'default' => "https://fsx-lustre-client-repo.s3.amazonaws.com/ubuntu" }
)
Expand Down
51 changes: 51 additions & 0 deletions files/centos-8/NetworkManager.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Configuration file for NetworkManager.
#
# See "man 5 NetworkManager.conf" for details.
#
# The directories /usr/lib/NetworkManager/conf.d/ and /run/NetworkManager/conf.d/
# can contain additional configuration snippets installed by packages. These files are
# read before NetworkManager.conf and have thus lowest priority.
# The directory /etc/NetworkManager/conf.d/ can contain additional configuration
# snippets. Those snippets are merged last and overwrite the settings from this main
# file.
#
# The files within one conf.d/ directory are read in asciibetical order.
#
# If /etc/NetworkManager/conf.d/ contains a file with the same name as
# /usr/lib/NetworkManager/conf.d/, the latter file is shadowed and thus ignored.
# Hence, to disable loading a file from /usr/lib/NetworkManager/conf.d/ you can
# put an empty file to /etc with the same name. The same applies with respect
# to the directory /run/NetworkManager/conf.d where files in /run shadow
# /usr/lib and are themselves shadowed by files under /etc.
#
# If two files define the same key, the one that is read afterwards will overwrite
# the previous one.

[main]
plugins = ifcfg-rh,
dhcp = dhclient


[logging]
# When debugging NetworkManager, enabling debug logging is of great help.
#
# Logfiles contain no passwords and little sensitive information. But please
# check before posting the file online. You can also personally hand over the
# logfile to a NM developer to treat it confidential. Meet us on #nm on freenode.
# Please post full logfiles except minimal modifications of private data.
#
# You can also change the log-level at runtime via
# $ nmcli general logging level TRACE domains ALL
# However, usually it's cleaner to enable debug logging
# in the configuration and restart NetworkManager so that
# debug logging is enabled from the start.
#
# You will find the logfiles in syslog, for example via
# $ journalctl -u NetworkManager
#
# Note that debug logging of NetworkManager can be quite verbose. Some messages
# might be rate-limited by the logging daemon (see RateLimitIntervalSec, RateLimitBurst
# in man journald.conf). Please disable rate-limiting before collecting debug logs.
#
#level=TRACE
#domains=ALL
6 changes: 6 additions & 0 deletions files/centos-8/ganglia-webfrontend.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Alias /ganglia /usr/share/ganglia

<Directory "/usr/share/ganglia">
AllowOverride All
Require all granted
</Directory>
90 changes: 90 additions & 0 deletions files/centos-8/parallelcluster-ebsnvme-id
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
#!/bin/bash

# Copyright (C) 2020 Amazon.com, Inc. or its affiliates.
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
# OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the
# License.

# Usage:
# Read EBS device information using nvme-cli and provide information about the volume.
display_help() {
echo "Usage: $0 [options] {device_name}" >&2
echo
echo " -v, --volume Return volume-id"
echo " -b, --block-dev Return block device mapping"
echo " -u, --udev Output data in format suitable for udev rules, i.e. /dev/sdb -> sdb"
echo " -h, --help Print usage info"
echo
}

print_all=1
print_volume=0
print_block_device_mapping=0
print_udev_format=0

# Parse arguments
for i in "$@"
do
case $i in
-h|--help)
display_help
exit 0
;;
-v|--volume)
print_volume=1
print_all=0
shift
;;
-b|--block-dev)
print_block_device_mapping=1
print_all=0
shift
;;
-u|--udev)
print_udev_format=1
print_all=0
shift
;;
*)
;;
esac
done

# Check if device argument is provided
if [[ "$#" -ne 1 ]]; then
display_help
exit 1
fi

if [[ $print_all -eq 1 || $print_volume -eq 1 ]]; then
# Sample volume info from nvme-cli:
# sn : vol01234567890abcdef
# See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
# Insert '-' after 'vol' so that output looks like vol-067f083a4f6xxxxx
vol_id=$(sudo nvme id-ctrl -v ${1} | grep -oP "sn\s+:\s\K(.+)" | sed 's/^vol/&-/')
echo "Volume ID: ${vol_id}"
fi

if [[ $print_all -eq 1 || $print_block_device_mapping -eq 1 || $print_udev_format -eq 1 ]]; then
# Sample device name info from nvme-cli:
# 0000: 2f 64 65 76 2f 73 64 6a 20 20 20 20 20 20 20 20 "/dev/sdf..."
# See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
if [[ $print_udev_format -eq 1 ]]; then
# Strip /dev/ prefix if -u option is specified
device_name=$(sudo nvme id-ctrl -v ${1} | grep -oP '^0000:.+"\K([\/\w]+)(?=\.*"$)' | sed "s/^\/dev\///")
else
device_name=$(sudo nvme id-ctrl -v ${1} | grep -oP '^0000:.+"\K([\/\w]+)(?=\.*"$)')
fi

echo "${device_name}"
fi
2 changes: 1 addition & 1 deletion files/default/attachVolume.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
def convert_dev(dev):
# Translate the device name as provided by the OS to the one used by EC2
# FIXME This approach could be broken in some OS variants, see
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
if '/nvme' in dev:
return '/dev/' + os.popen('sudo /usr/local/sbin/parallelcluster-ebsnvme-id -u -b ' + dev).read().strip()
elif '/hd' in dev:
Expand Down
2 changes: 1 addition & 1 deletion libraries/helpers.rb
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ def arm_instance?
# Check if this is an OS on which EFA is supported
#
def platform_supports_efa?
[node['platform'] == 'centos' && node['platform_version'].to_i >= 7,
[node['platform'] == 'centos' && node['platform_version'].to_i >= 7 && node['platform_version'].to_i < 8,
node['platform'] == 'amazon',
node['platform'] == 'ubuntu'].any?
end
Expand Down
21 changes: 12 additions & 9 deletions recipes/_lustre_install.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@

return unless node['conditions']['lustre_supported']

# Install only on Centos 7.6 and 7.5
if node['platform'] == 'centos' && (5..6).cover?(node['platform_version'].split('.')[1].to_i)
if node['platform'] == 'centos' && %w[7.5 7.6].include?(node['platform_version'].to_f)
# Centos 7.6 and 7.5

lustre_kmod_rpm = "#{node['cfncluster']['sources_dir']}/kmod-lustre-client-#{node['cfncluster']['lustre']['version']}.x86_64.rpm"
lustre_client_rpm = "#{node['cfncluster']['sources_dir']}/lustre-client-#{node['cfncluster']['lustre']['version']}.x86_64.rpm"

Expand All @@ -41,28 +42,30 @@
end

# Install lustre mount drivers
yum_package 'lustre_kmod' do
package 'lustre_kmod' do
source lustre_kmod_rpm
end

# Install lustre mount drivers
yum_package 'lustre_client' do
package 'lustre_client' do
source lustre_client_rpm
end

kernel_module 'lnet'
elsif node['platform'] == 'centos' && node['platform_version'].split('.')[1].to_i >= 7

elsif node['platform'] == 'centos' && node['platform_version'].to_f >= 7.7
# Centos 8 and >= 7.7

# add fsx lustre repository
yum_repository "aws-fsx" do
description "AWS FSx Packages - $basearch"
description "AWS FSx Packages - $basearch"
baseurl node['cfncluster']['lustre']['base_url']
gpgkey node['cfncluster']['lustre']['public_key']
retries 3
retry_delay 5
end

yum_package %w[kmod-lustre-client lustre-client] do
package %w[kmod-lustre-client lustre-client] do
retries 3
retry_delay 5
end
Expand All @@ -86,12 +89,12 @@

apt_update

apt_package "lustre-client-modules-#{node['kernel']['release']}" do
package "lustre-client-modules-#{node['kernel']['release']}" do
retries 3
retry_delay 5
end

apt_package "lustre-client-modules-aws" do
package "lustre-client-modules-aws" do
retries 3
retry_delay 5
end
Expand Down
10 changes: 8 additions & 2 deletions recipes/_update_packages.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,14 @@
# not CentOS6
case node['platform_family']
when 'rhel', 'amazon'
execute 'yum-update' do
command "yum -y update && package-cleanup -y --oldkernels --count=1"
if node['platform'] == 'centos' && node['platform_version'].to_i == 8
execute 'dnf-update' do
command "dnf -y update"
end
else
execute 'yum-update' do
command "yum -y update && package-cleanup -y --oldkernels --count=1"
end
end
when 'debian'
apt_update
Expand Down
1 change: 0 additions & 1 deletion recipes/base_config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@
# EFA runtime configuration
include_recipe "aws-parallelcluster::efa_config"

# case node['cfncluster']['cfn_node_type']
case node['cfncluster']['cfn_node_type']
when 'MasterServer'
include_recipe 'aws-parallelcluster::_master_base_config'
Expand Down
37 changes: 24 additions & 13 deletions recipes/base_install.rb
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,18 @@
include_recipe "yum-epel"
end


unless node['platform_version'].to_i < 7
execute 'yum-config-manager_skip_if_unavail' do
command "yum-config-manager --setopt=\*.skip_if_unavailable=1 --save"
end
end
if node['platform'] == 'centos' && node['platform_version'].to_i == 8
# Enable PowerTools Repo so *-devel packages can be installed with DNF
# Enable EPEL repos
execute 'dnf enable powertools and EPEL repos' do
command "dnf config-manager --set-enabled PowerTools && dnf install -y epel-release"
end
end

if node['platform'] == 'redhat'
execute 'yum-config-manager-rhel' do
Expand Down Expand Up @@ -66,18 +72,19 @@
end
end

case node['platform_family']
when 'rhel', 'amazon'
yum_package node['cfncluster']['kernel_devel_pkg']['name'] do
version node['cfncluster']['kernel_devel_pkg']['version']
retries 3
retry_delay 5
end
when 'debian'
apt_package node['cfncluster']['kernel_generic_pkg'] do
retries 3
retry_delay 5
package "install kernel packages" do
case node['platform_family']
when 'rhel', 'amazon'
package_name node['cfncluster']['kernel_devel_pkg']['name']
if node['platform'] == 'centos' && node['platform_version'].to_i < 8
# Do not enforce kernel_devel version on CentOS8 because kernel_devel package with same version as kernel release version cannot be found
version node['cfncluster']['kernel_devel_pkg']['version']
end
when 'debian'
package_name node['cfncluster']['kernel_generic_pkg']
end
retries 3
retry_delay 5
end

bash "install awscli" do
Expand All @@ -86,7 +93,7 @@
set -e
curl --retry 5 --retry-delay 5 "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
unzip awscli-bundle.zip
./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
#{node['cfncluster']['cookbook_virtualenv_path']}/bin/python awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
CLI
not_if { ::File.exist?("/usr/local/bin/aws") }
end
Expand Down Expand Up @@ -120,6 +127,10 @@
# FIXME: https://github.com/atomic-penguin/cookbook-nfs/issues/93
include_recipe "nfs::server"
end
if node['platform'] == 'centos' && node['platform_version'].to_i == 8
# Workaround for issue: https://github.com/atomic-penguin/cookbook-nfs/issues/116
node.force_override['nfs']['service']['idmap'] = 'nfs-idmapd'
end
include_recipe "nfs::server4"

# Put configure-pat.sh onto the host
Expand Down
12 changes: 12 additions & 0 deletions recipes/dns_config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,18 @@
line "append domain-name \" #{node['cfncluster']['cfn_dns_domain']}\";"
end
end

if platform?('centos') && node['platform_version'].to_i == 8
# On CentOS8 dhclient is not enabled by default
# Put pcluster version of NetworkManager.conf in place
# dhcp = dhclient needs to be added under [main] section to enable dhclient
cookbook_file 'NetworkManager.conf' do
path '/etc/NetworkManager/NetworkManager.conf'
user 'root'
group 'root'
mode '0644'
end
end
end
restart_network_service
end
Expand Down
2 changes: 1 addition & 1 deletion recipes/ganglia_config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@

# For ComputeFleet and MasterServer

if node['platform_family'] == 'rhel' && node['platform_version'].to_i == 7 || node['platform'] == 'amazon' && node['platform_version'].to_i == 2
if node['platform'] == 'centos' && node['platform_version'].to_i >= 7 || node['platform'] == 'amazon' && node['platform_version'].to_i == 2
# Fix circular dependency multi-user.target -> cloud-init-> gmond -> multi-user.target
# gmond is started by chef during cloud-init, but gmond service is configured to start after multi-user.target
# which doesn't start until cloud-init run is finished. So gmond service is stuck into starting, which keep
Expand Down
1 change: 1 addition & 0 deletions recipes/tests.rb
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,7 @@
###################
if node['conditions']['efa_supported'] && node['conditions']['intel_mpi_supported']
case node['cfncluster']['os']
# TODO add centos8 once EFA package is available
when 'alinux', 'centos7', 'alinux2'
execute 'check efa rpm installed' do
command "rpm -qa | grep libfabric && rpm -qa | grep efa-"
Expand Down
Loading

0 comments on commit 505c22d

Please sign in to comment.