-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.7.2: Lustre kmod modprobe breaks custom AMI based on RHEL8 #5913
Comments
A workaround could be to try not upgrading the os in the build image process by setting this config option to false: https://docs.aws.amazon.com/parallelcluster/latest/ug/Build-v3.html#Build-v3-UpdateOsPackages |
@hgreebe The documentation says that option is false by default. |
Can an option be added to the image builder to NOT include lustre/fsx support at all? Many setups do not require it and it would make it way easier to support many custom AMIs as it is the biggest sticking point in version compatibility. |
Hi @coderforlife , @hgreebe suggested @nyetsche to set it to
and the Anyway we tracked internally the feature to avoid installing FSx for lustre drivers and support updated kernels when the client is not yet available. Enrico |
My organization requires using RHEL8 (a supported OS) from the privately shared RedHat licensed base. We then use
pcluster build-image
to make it ready for ParallelCluster.The
pcluster build-image
task has started failing for us recently. The initial AMI starts withRHEL-8.8
(I also tried8.7
, but is updated toRHEL 8.9
from theredhat-release
RPM during build:That comes from the
UpdateOS
section of the playbook:The
yum -y update
brings the OS to all most recent packages, includingredhat-release
andkernel-*
.The failure occurs later, during a
kernel_module 'lnet'
: https://github.com/aws/aws-parallelcluster-cookbook/blob/v3.7.2/cookbooks/aws-parallelcluster-environment/resources/lustre/partial/_install_lustre_centos_redhat.rb#L36That is, there's no module in
/lib/modules/4.18.0-513.5.1.el8_9.x86_64
.The kernel matrix compability in this document https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html indeed doesn't mention
4.18.0-513
, and the upstream at https://downloads.whamcloud.com/public/lustre/latest-2.12-release/el8/client/ doesn't include it either. So I realize this is actually a Lustre packaging issue, but I'm not sure how to get in touch with the FSX Lustre team. Even so, it'd be great to have a workaround. Right now we can't use new AMIs for compute nodes.I'm unsure of the best way forward here - blacklist
redhat-release*
and/orkernel-*
frombuild-image
process? Ignore errors frommodprobe lnet
?The text was updated successfully, but these errors were encountered: