-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RHCOS via ibm_pi_instance timeout waiting for network #1620
Comments
@surajsub Can you have a look and update with your findings |
One possible solution would be to allow a choice to do away with the health check if users are ok to start using it right away (in my case getting network details). If health check is not so important lets do away with it? It take around 20+ mins for the instance resource to complete. Without the health check it takes around 3-5 mins for the instance resource to complete. Another 5-7 mins for ssh connection to the server. It is also time saving without waiting for health checks. |
Setting the status to OK is important because other functions like CPU mods are reliant on the LPAR being in the OK state. Let me evaluate . |
I forgot to add .It will turn active when the lpar is able to make a connection to novalink |
It is not working from RHCOS point of view where I would like the details. I am sure there could be other support scenarios. What I am saying is the TF times out which could be avoided since the instance status is Active but health status is Warning. I have tested the plugin by removing the health check from here and it works fine for RHEL and RHCOS as well. Not sure about the dependencies. |
Are you using cloud-init ? if so add these 3 lines to cloud-init in the right format
|
RHCOS works on Ignition config files. |
@surajsub one of the challenges is RHCOS (CoreOS) doesn't have rsct yet and hence the In the meantime would it be possible to evaluate if we can introduce a flag to ignore the health status. The default can be set to false to retain the existing behaviour. However at the same time it also gives the flexibility to use it in desired fashion. For our use case we just want to get the allocated Mac and IP address of the instance so that we can use this info to create a private DHCP server and use it for the instances. I understand there is no easy way. Just sharing few thoughts to start the discussion and figure out a way forward. |
Absolutely. I'm working on a fix and testing it . Give me a day or two please |
Code fix is in. Waiting to be merged by the Cloud team |
awesome. Thanks @surajsub |
Cool. Let me know if there are additional issues. Appreciate the patience |
Thanks @surajsub for the patch. I have tried it today and it works perfectly for our use-case. There is another issue I am facing after this fix. I am trying to create around 6 LPARs in parallel. Getting below error for random no of instances.
But actually the instance is created and running in WARNING status. To re-create them I have to delete them from the console manually since the TF does not pick them up. Let me know if you want me to create another issue for this. |
I have the actual error message printed after recompiling the code...
|
I have seen this issue happen with cloud , sometime with powervc as well. this is because if we provision multiple instances , powervc chokes at the back.. and the cloud api responds with the context deadline exceeded message. |
Thanks for confirming. I am trying on "frankfurt1". Just to add, I have tried multiple deployments on PowerVC with similar automation where we create around 10 VMs in one go. Never seen such issue there. However, I have seen "context deadline exceeded" errors in one of the private OpenStack setup. |
@yussufsh there will be other users provisioning on the same region as well.. So it could very well be that we are hitting the limits. |
I presume the original issue has been resolved. Can you close this out please ? |
Thanks alot for helping us get this done! |
Hi team,
I am trying to ignite an RHCOS instance using the Power Systems resources. When creating the resource it timeout waiting for network connection.
Console:
I am trying to configure a DHCP server to provide address to the machine. But that to work I need the network information of the instance such as
ip
andmacaddress
. Terraform won't allow me to fetch these details because the instance resource is not completed. The status of the instance is in 'Warning' state and never turn 'Active'.Terraform Version
Terraform v0.12.20
Affected Resource(s)
Please list the resources as a list, for example:
Terraform Configuration Files
Debug Output
RHCOS node times out waiting for network.
Panic Output
Expected Behavior
Need to complete the resource so that the network information can be read and fed to a internal DHCP server.
Actual Behavior
Steps to Reproduce
Important Factoids
References
The text was updated successfully, but these errors were encountered: