-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CreatingLoadBalancerFailed on AKS cluster with advanced networking #357
Comments
Someone asked me to eleborate on the workaround. I guess they figured it out, since they deleted their comment, but here are the more detailed steps anyway:
|
Thanks for catching this, we overlooked it in docs. Will update instructions for existing VNet. |
I'm also experiencing this behavior on an AKS cluster with Advanced Networking (using a new VNet). |
I tried and can reproduce this issue, my procedures are:
Also checked the events and something I noticed are:
Btw, another mitigation works for me is to provide a pre-created SP instead of create new SP in place. Will continue to investigate this and update. |
I'm having the same issue here, the command
Updated: i Fixed my issue giving access as contributor to the subnet on the VMNet for the app account nameofyoucluster-somenumbers. After that, the Loadbalancer connection works fine, a New public ip is generated and the service is deployed correctly (Even the NSG is updated properly) |
Ok, I believe the issue is if "(new) default service principal" is chosen when create AKS cluster, the newly created SP will only have "Contributor" role for the created AKS resource group "MC_xxx". There is no additional role assignment for this SP toward the Vnet/Subnet resource. When creating external LB, this SP does not have permission to interact with Subnet and it ends up with the permission error we saw. I see two solutions to correct this issue programatically:
I will bring this to the team and discuss for a solution. |
We decide to pick solution 1 from the above comment. The requests have been sent to Portal and CLI team. For now, please follow the @nphmuller 's mitigation steps: for the subnet, add "contributor" role for the newly created SP. |
+1 here having this issue. |
When using the portal to create an AKS cluster, and using advanced networking and specifying a custom subnet, it will help if the tooltip explicitly mentions that the SP provided when creating the cluster needs to have contributor rights on the subnet. |
I have same Issue, I followed @nphmuller workaround (with owner roles), deleted and created the service again, and I have this:
|
Adding contributor permission to whole VNet (not only subnet) seems to work (no more warnings) but I still cannot contact the service exposed (timeout).
I checked NSG created by AKS (in MC* RG) and I can see an entry allowing the traffic to that IP in correct port. Tried several services, in different namespaces, even allowing AKS to create the public IP, still I am not able to communicate with the service. Note "HTTP application routing" is disabled in AKS creation time |
So, destroyed AKS, created again, follow workaround (before creating nay service) and now it works. Timing issue? |
I put the Contributor permission to the whole resource group where I created the AKS Cluster and the VNet. It is working now. When you run When you run |
It worked for me with Contributor, but I put this permission in the resource group where my cluster and vnet are. Not just the virtual subnet like you described. |
@lmcarreiro |
Assigning SP the Owner role for the subnet didn't work for me, I still got the same permission error. Assigning Contributor to the Resource Group containing the vnet did. Since the vnet is the only resource in this RG, it might also work to assign Contributor to the vnet itself, though I didn't try that. |
Portal has add a fix to automatically add SP as contributor to the subnet used in creating AKS cluster. Confirmed the fix. Will close this issue now. |
Just ran into this issue with aks CLI and I assigned the owner permission to the service principal for MC_* group and subnet did not seem to be working for us for existing cluster and the addon-http-application-routing-nginx-ingress-controller pods still seem to be in CrashLoop. (Note: We did not pass the service principal in CLI and az cli had created the principal for us. |
@JunSun17 Surely the issue should remain open until the CLI has been updated as well? |
@iMartyn I've created a new AKS cluster via the CLI (version You can check it via Azure Portal by going to Virtual Networks -> YourVnet -> Subnets -> YourAksVnet -> Manage Users. The Service Principal should be in that list under |
I can confirm that it does not give the SP Owner permission - there is even an error message, whenever the CLI is asked to create a cluster with Advanced Networking : This was observed from two different machines on two different days and two different OS' (mac and linux) so I'm pretty sure it's not actually a timeout. It does seem very similar to Azure/azure-cli#5190 which was closed but other peole are saying that bug is back. |
Thanks. Seems like something broke since yesterday. But it looks like a different issue than this one. I recommend creating a new Github issue so the appropriate person can look at even. (Or even creating an azure support ticket/Twitter message. You'll probably get a quicker response that way) |
I highly doubt that, my experience of Azure support is not to the level that I would expect any kind of useful response. |
Hi all, I will pass this request to CLI dev, and will also check it a bit later to verify the issue then for a fix. Thanks! |
@JunSun17 Which release will this fix be in? |
The fix should be already deployed WW. Please report back if your issue is not resolved. |
@logcorner Are you using advanced netowrking? If so, have you tried the steps in: #357 (comment) If not, can you file an incident through support? |
Repro:
kubectl run nginx --image=nginx --replicas=1 --port=80
kubectl expose deployment nginx --port=80 --target-port=80 --type=LoadBalancer
kubectl get service nginx -w
:EXTERNAL-IP
stuck at<pending>
kubectl describe service nginx
will show the following events:Workaround:
Manually give Owner permission (Contributor doesn't work) to the service principal for the subnet.
The text was updated successfully, but these errors were encountered: