Skip to content

Commit

Permalink
Add Centos8 support
Browse files Browse the repository at this point in the history
# Packer and kitchen

* Create packer file for CentOS8 with similar logic to CentOS7
* Using `dnf` instead of `yum` as package manager everywhere
* Add kitchen tests for CentOS 8
* Install `python3` version of `aws-cfn-bootstrap` scripts to support CentOS8

# Basic features

* Enable `PowerTools` repo so `*-devel` packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Do not enforce `kernel_devel` version because `kernel_devel` package with same version as kernel release version cannot be found
* Install `iptables`
* Enable `EPEL` repo by default

# IntelMPI

The `environment-modules` package installation automatically creates
the `/usr/share/Modules/` folder, required by the intel_mpi recipe.

References:
* https://forums.centos.org/viewtopic.php?t=74035

# NFS

* Fix nfs logic in `base_install` by calling `nfs::server4` recipe and providing correct idmap service name, `nfs-idmapd`
* Workaround to only run `nfs::server` instead of `nfs::server4` for CentOS 8 due to issue: sous-chefs/nfs#116

# EBS

* Modify logic to get EBS device to volume id mapping.
  Specifically `ec2_dev_2_volid.py` and `parallelcluster-ebsnvme-id` are modified for CentOS 8
  to use `nvme-cli` to retrieve volume id for a device following this guide:
  https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
* `parallelcluster-ebsnvme-id` needs to accept the options `-v/-b/-u` to output volume id and block device
  information when called from `ec2_dev_2_volid.py` and `attachVolume.py`
* Centos8 specific `ec2_dev_2_volid.py` no needed, as new `parallelcluster-ebsnvme-id` script will accept `-v` option to output volume id

# DNS

* Configure DNS settings for CentOS 8. Note dhclient is not enabled by default, so need to provide modified NetworkManager config. Afterwards same logic as CentOS 7 can be used

# Torque

I'm using some compilation flags, like we're already doing for Ubuntu 18 and Amazon Linux 2.

* `c++03` is the 1998 ISO C++ standard plus the 2003 technical corrigendum and some additional defect reports.
  Note: The compilation succeeded even without the `c++03` flag. I'm adding it for coherency.
  Source: https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
* `-fpermissive` downgrades some diagnostics about nonconformant code from errors to warnings.
  Source: https://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/C_002b_002b-Dialect-Options.html#index-fpermissive-140

# FSx

* Use `package` in place of `yum_package` to support `dnf`.
* Added `gdisk` package, required by `update initramfs` action, called by `kernel_module 'lnet'` resource.
* Use `['platform_version'].to_f`  to compare minor version of the OS in place of `['platform_version'].split('.')[1].to_i`
  to be compatible with multiple major OS values.
* Explicitly added `x86_64` at the end of the `base_url` parameter, like we have for CentOS 7.
* Improved check for CentOS 7.5. and 7.6.
* Use `package` in place of `apt_package` to be aligned to other OSes.

References:
* Lustre installation guide: https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html
* gdisk issue: https://www.spinics.net/lists/centos-devel/msg18766.html
* package resource: https://docs.chef.io/resources/package/

# NICE DCV

* Disable Wayland, the default GNOME Display Manager for CentOS 8, that is not supported by DCV
* Add default value for dcv_port
* Add Centos8 to pcluster_dcv_connect.sh

## Selinux notes
Default SELinux policies in RHEL8 can lead to failures in `xdm` processes using NVIDIA drivers/libraries,
thus for example the `gnome-shell` and the `dcv` system agent (being children of `gdm`) can be impacted.
By default SELinux is disabled so it doesn't impact our installation.
If SELinux is re-enabled, custom policies must be defined to grant `xdm_t` processes the permissions they need.

## Tests
* Verified gdm configuration file
* Started session and listed

## References
DCV guide: https://docs.aws.amazon.com/dcv/latest/adminguide/setting-up-installing-linux-server.html

# CloudWatch logging

* Use RedHat ARM cloudwatch agent, since there is no official Centos8 ARM agent available

# Intel HPC

* Prepare recipe to install Intel HPC packages on CentOS 8
* Install Intel PSXE 2020, which does not require yum4 to be used
* Set `keepcache=True` so downloaded packages are not removed after successful installation of any package
* Add retries to Intel install
* The recipes are ready but IntelHPC is not officially supported
  because Centos 8 is not supported by Intel(R) Cluster Checker 2019 Update 9 (build 20200609).
* The tool is looking for `libstdc++.so.5` and is unable to detect `libstdc++.so.6` present in Centos8

The current status is: the packages can be installed and the
recipes are ready to be used but cluster checker doesn't support CentOS 8.

## References
* https://software.intel.com/content/www/us/en/develop/tools/parallel-studio-xe/choose-download/free-trial-cluster-linux.html
* https://software.intel.com/content/www/us/en/develop/documentation/cluster-checker-user-guide/top/installation.html

# RAID

Use version-1 superblock format for RAID on Centos8.

Linux raid reserves a bit of space (called a superblock) on each component device.
This space holds metadata about the RAID device and allows correct assembly of the array.

The Linux kernel RAID subsystem recognizes version-0.90 and version-1 Superblock formats.

Old Linux Kernels can only autodetect arrays with superblock version 0.90.
The older version-0.90 used to be the default format until 2009 but it has several limitations
that limit its applicability for use on large arrays or arrays with many component devices.
The newer version-1 is the default as of Kernel v3.1.1. More specifically, 1.2 is used as of v3.1.2.

The default value for the `metadata` property of `mdadm` chef resource is `0.90`
and it causes failures on Centos8. We're changing this value to `1.2`.

## References
* https://raid.wiki.kernel.org/index.php/Superblock
* https://raid.wiki.kernel.org/index.php/RAID_superblock_formats
* https://man7.org/linux/man-pages/man4/md.4.html
* https://docs.chef.io/resources/mdadm/

# SGE

## SGE code patches
1. Patch to the source code of SGE to support OpenSSL 1.1 that is replacing OpenSSL 1.0 in CentOS8
1. Patch for TCSH 3rd party library included on SGE to support newer versions of glibc.
1. Patch for gmake 3rd party library to build with newer versions of automake

Source:
* OpenSSL: https://copr-dist-git.fedorainfracloud.org/cgit/loveshack/SGE/gridengine.git/tree/?h=epel8
* TCSH: https://copr-dist-git.fedorainfracloud.org/cgit/loveshack/SGE/gridengine.git/tree/sge-tcsh.patch?h=epel8
* Qmake: https://copr-dist-git.fedorainfracloud.org/cgit/loveshack/SGE/gridengine.git/tree/sge-qmake.patch?h=epel8

## Compilation flags
I'm installing libtirpc and libtirpc-devel libraries and using compilation flags because the default path
is `/usr/include/` instead of `/usr/local/include`

To pass these compilation flags to `aimk` it's required to use `SGE_INPUT_CFLAGS` and `SGE_INPUT_LDFLAGS`
as described in Aimk documentation: https://arc.liv.ac.uk/trac/SGE/browser/sge/source/README.aimk

# EFA

Mark CentOS 8 as unsupported OS for EFA. Supported AMIs are: Amazon Linux, Amazon Linux 2,
RHEL 7.6, RHEL 7.7, RHEL 7.8, CentOS 7, Ubuntu 16.04, and Ubuntu 18.04.
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-working-with.html

Signed-off-by: Enrico Usai <[email protected]>
Signed-off-by: Rex <[email protected]>
  • Loading branch information
rexcsn authored and enrico-usai committed Nov 4, 2020
1 parent de61252 commit 7e5bf7a
Show file tree
Hide file tree
Showing 29 changed files with 1,038 additions and 83 deletions.
61 changes: 61 additions & 0 deletions .kitchen.cloud.yml
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,67 @@ platforms:
transport:
username: centos
ssh_key: <%= ENV['EC2_SSH_KEY_PATH'] %>
- name: centos-8-minimal
driver_plugin: ec2
driver_config:
image_id: <%= ENV['CENTOS8_IMAGE_ID'] %>
block_device_mappings:
- device_name: /dev/sda1
ebs:
volume_size: <%= ENV['VOLUME_SIZE'] || 25 %>
volume_type: gp2
delete_on_termination: true
- device_name: /dev/xvdba
virtual_name: ephemeral0
- device_name: /dev/xvdbb
virtual_name: ephemeral1
- device_name: /dev/xvdbc
virtual_name: ephemeral2
- device_name: /dev/xvdbd
virtual_name: ephemeral3
- device_name: /dev/xvdbe
virtual_name: ephemeral4
- device_name: /dev/xvdbf
virtual_name: ephemeral5
- device_name: /dev/xvdbg
virtual_name: ephemeral6
- device_name: /dev/xvdbh
virtual_name: ephemeral7
- device_name: /dev/xvdbi
virtual_name: ephemeral8
- device_name: /dev/xvdbj
virtual_name: ephemeral9
- device_name: /dev/xvdbk
virtual_name: ephemeral10
- device_name: /dev/xvdbl
virtual_name: ephemeral11
- device_name: /dev/xvdbm
virtual_name: ephemeral12
- device_name: /dev/xvdbn
virtual_name: ephemeral13
- device_name: /dev/xvdbo
virtual_name: ephemeral14
- device_name: /dev/xvdbp
virtual_name: ephemeral15
- device_name: /dev/xvdbq
virtual_name: ephemeral16
- device_name: /dev/xvdbr
virtual_name: ephemeral17
- device_name: /dev/xvdbs
virtual_name: ephemeral18
- device_name: /dev/xvdbt
virtual_name: ephemeral19
- device_name: /dev/xvdbu
virtual_name: ephemeral20
- device_name: /dev/xvdbv
virtual_name: ephemeral21
- device_name: /dev/xvdbw
virtual_name: ephemeral22
- device_name: /dev/xvdbx
virtual_name: ephemeral23
transport:
username: centos
ssh_key: <%= ENV['EC2_SSH_KEY_PATH'] %>
- name: ubuntu-16-04-lts
driver_plugin: ec2
driver_config:
Expand Down
10 changes: 5 additions & 5 deletions amis/build_ami.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#
# Usage: build_ami.sh --os <os> --region <region> --partition <partition> [--public] [--custom]
# [--build-date <build-date>] [--arch <arch>]
# os: the os to build (supported values: all|centos6|centos7|alinux|alinux2|ubuntu1604|ubuntu1804)
# os: the os to build (supported values: all|centos6|centos7|centos8|alinux|alinux2|ubuntu1604|ubuntu1804)
# partition: partition to build in (supported values: commercial|govcloud|china|region)
# region: region to copy ami too (supported values: all|us-east-1|us-gov-west-1|...)
# custom: specifies to create the AMI from a custom AMI-id, which must be specified by variable CUSTOM_AMI_ID in the environment (optional)
Expand Down Expand Up @@ -114,7 +114,7 @@ parse_options() {
check_options() {
set -e

available_arm_os="ubuntu1804 alinux2" # subset of supported OSes for which ARM AMIs are available
available_arm_os="ubuntu1804 alinux2 centos8" # subset of supported OSes for which ARM AMIs are available
available_os="centos6 centos7 alinux ubuntu1604 ${available_arm_os}"
cwd="$(dirname $0)"
export COOKBOOK_PATH="$(cd ${cwd}/..; pwd)"
Expand Down Expand Up @@ -184,8 +184,8 @@ check_options() {
esac

# Ensure the specified architecture-OS combination is valid
if [ "${_arch}" == "arm64" ] && [[ "${_os}" =~ ^centos[0-9]+ ]]; then
echo "Currently there are no CentOS arm64 AMIs available."
if [ "${_arch}" == "arm64" ] && [[ "${_os}" =~ ^centos[6-7]$ ]]; then
echo "Currently there are no arm64 AMIs available for ${_os}."
exit 1
elif [ "${_arch}" == "arm64" ] && [ "${_os}" == "alinux" ]; then
echo "Currently there are no alinux (AL1) arm64 AMIs available."
Expand Down Expand Up @@ -222,7 +222,7 @@ do_command() {
RC=$?
done
;;
centos6|centos7|alinux|ubuntu1604|ubuntu1804|alinux2)
centos6|centos7|centos8|alinux|ubuntu1604|ubuntu1804|alinux2)
packer build -color=false -var-file="${cwd}/packer_variables.json" -only=${only} "${cwd}/packer_${_os}.json"
RC=$?
;;
Expand Down
Loading

0 comments on commit 7e5bf7a

Please sign in to comment.