Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
# Packer and kitchen * Create packer file for CentOS8 with similar logic to CentOS7 * Using `dnf` instead of `yum` as package manager everywhere * Add kitchen tests for CentOS 8 * Install `python3` version of `aws-cfn-bootstrap` scripts to support CentOS8 # Basic features * Enable `PowerTools` repo so `*-devel` packages can be installed with DNF * Install Python3 as system python for CentOS 8 * Do not enforce `kernel_devel` version because `kernel_devel` package with same version as kernel release version cannot be found * Install `iptables` * Enable `EPEL` repo by default # IntelMPI The `environment-modules` package installation automatically creates the `/usr/share/Modules/` folder, required by the intel_mpi recipe. References: * https://forums.centos.org/viewtopic.php?t=74035 # NFS * Fix nfs logic in `base_install` by calling `nfs::server4` recipe and providing correct idmap service name, `nfs-idmapd` * Workaround to only run `nfs::server` instead of `nfs::server4` for CentOS 8 due to issue: sous-chefs/nfs#116 # EBS * Modify logic to get EBS device to volume id mapping. Specifically `ec2_dev_2_volid.py` and `parallelcluster-ebsnvme-id` are modified for CentOS 8 to use `nvme-cli` to retrieve volume id for a device following this guide: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device * `parallelcluster-ebsnvme-id` needs to accept the options `-v/-b/-u` to output volume id and block device information when called from `ec2_dev_2_volid.py` and `attachVolume.py` * Centos8 specific `ec2_dev_2_volid.py` no needed, as new `parallelcluster-ebsnvme-id` script will accept `-v` option to output volume id # DNS * Configure DNS settings for CentOS 8. Note dhclient is not enabled by default, so need to provide modified NetworkManager config. Afterwards same logic as CentOS 7 can be used # Torque I'm using some compilation flags, like we're already doing for Ubuntu 18 and Amazon Linux 2. * `c++03` is the 1998 ISO C++ standard plus the 2003 technical corrigendum and some additional defect reports. Note: The compilation succeeded even without the `c++03` flag. I'm adding it for coherency. Source: https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html * `-fpermissive` downgrades some diagnostics about nonconformant code from errors to warnings. Source: https://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/C_002b_002b-Dialect-Options.html#index-fpermissive-140 # FSx * Use `package` in place of `yum_package` to support `dnf`. * Added `gdisk` package, required by `update initramfs` action, called by `kernel_module 'lnet'` resource. * Use `['platform_version'].to_f` to compare minor version of the OS in place of `['platform_version'].split('.')[1].to_i` to be compatible with multiple major OS values. * Explicitly added `x86_64` at the end of the `base_url` parameter, like we have for CentOS 7. * Improved check for CentOS 7.5. and 7.6. * Use `package` in place of `apt_package` to be aligned to other OSes. References: * Lustre installation guide: https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html * gdisk issue: https://www.spinics.net/lists/centos-devel/msg18766.html * package resource: https://docs.chef.io/resources/package/ # NICE DCV * Disable Wayland, the default GNOME Display Manager for CentOS 8, that is not supported by DCV * Add default value for dcv_port * Add Centos8 to pcluster_dcv_connect.sh ## Selinux notes Default SELinux policies in RHEL8 can lead to failures in `xdm` processes using NVIDIA drivers/libraries, thus for example the `gnome-shell` and the `dcv` system agent (being children of `gdm`) can be impacted. By default SELinux is disabled so it doesn't impact our installation. If SELinux is re-enabled, custom policies must be defined to grant `xdm_t` processes the permissions they need. ## Tests * Verified gdm configuration file * Started session and listed ## References DCV guide: https://docs.aws.amazon.com/dcv/latest/adminguide/setting-up-installing-linux-server.html # CloudWatch logging * Use RedHat ARM cloudwatch agent, since there is no official Centos8 ARM agent available # Intel HPC * Prepare recipe to install Intel HPC packages on CentOS 8 * Install Intel PSXE 2020, which does not require yum4 to be used * Set `keepcache=True` so downloaded packages are not removed after successful installation of any package * Add retries to Intel install * The recipes are ready but IntelHPC is not officially supported because Centos 8 is not supported by Intel(R) Cluster Checker 2019 Update 9 (build 20200609). * The tool is looking for `libstdc++.so.5` and is unable to detect `libstdc++.so.6` present in Centos8 The current status is: the packages can be installed and the recipes are ready to be used but cluster checker doesn't support CentOS 8. ## References * https://software.intel.com/content/www/us/en/develop/tools/parallel-studio-xe/choose-download/free-trial-cluster-linux.html * https://software.intel.com/content/www/us/en/develop/documentation/cluster-checker-user-guide/top/installation.html # RAID Use version-1 superblock format for RAID on Centos8. Linux raid reserves a bit of space (called a superblock) on each component device. This space holds metadata about the RAID device and allows correct assembly of the array. The Linux kernel RAID subsystem recognizes version-0.90 and version-1 Superblock formats. Old Linux Kernels can only autodetect arrays with superblock version 0.90. The older version-0.90 used to be the default format until 2009 but it has several limitations that limit its applicability for use on large arrays or arrays with many component devices. The newer version-1 is the default as of Kernel v3.1.1. More specifically, 1.2 is used as of v3.1.2. The default value for the `metadata` property of `mdadm` chef resource is `0.90` and it causes failures on Centos8. We're changing this value to `1.2`. ## References * https://raid.wiki.kernel.org/index.php/Superblock * https://raid.wiki.kernel.org/index.php/RAID_superblock_formats * https://man7.org/linux/man-pages/man4/md.4.html * https://docs.chef.io/resources/mdadm/ # SGE ## SGE code patches 1. Patch to the source code of SGE to support OpenSSL 1.1 that is replacing OpenSSL 1.0 in CentOS8 1. Patch for TCSH 3rd party library included on SGE to support newer versions of glibc. 1. Patch for gmake 3rd party library to build with newer versions of automake Source: * OpenSSL: https://copr-dist-git.fedorainfracloud.org/cgit/loveshack/SGE/gridengine.git/tree/?h=epel8 * TCSH: https://copr-dist-git.fedorainfracloud.org/cgit/loveshack/SGE/gridengine.git/tree/sge-tcsh.patch?h=epel8 * Qmake: https://copr-dist-git.fedorainfracloud.org/cgit/loveshack/SGE/gridengine.git/tree/sge-qmake.patch?h=epel8 ## Compilation flags I'm installing libtirpc and libtirpc-devel libraries and using compilation flags because the default path is `/usr/include/` instead of `/usr/local/include` To pass these compilation flags to `aimk` it's required to use `SGE_INPUT_CFLAGS` and `SGE_INPUT_LDFLAGS` as described in Aimk documentation: https://arc.liv.ac.uk/trac/SGE/browser/sge/source/README.aimk # EFA Mark CentOS 8 as unsupported OS for EFA. Supported AMIs are: Amazon Linux, Amazon Linux 2, RHEL 7.6, RHEL 7.7, RHEL 7.8, CentOS 7, Ubuntu 16.04, and Ubuntu 18.04. Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-working-with.html Signed-off-by: Enrico Usai <[email protected]> Signed-off-by: Rex <[email protected]>
- Loading branch information