Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support shared LoadBalancerIP for multiple Services #6480

Merged
merged 1 commit into from
Jun 28, 2024

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Jun 24, 2024

Users want to share LoadBalancerIP between multiple Services when they face external IP shortage. It's possible to do it when the Services sharing an IP meet the requirements:

  • The Services use different ports
  • The Services use the Cluster external traffic policy, or they have identical Endpoints.

However, the ability of using any IP that is already allocated to another Service may incur a security risk that a Service can "steal" LoadBalancer traffic intended for another Service.

To support the use case without introducing the security risk, we use the annotation service.antrea.io/allow-shared-load-balancer-ip: true on Services to restrict IPs that can be shared. Services without the annotation will continue to have their LoadBalancerIPs exclusively used. Services with the annotation can share an IP between themselves when requesting the same IP.

Ideally, we should also check if the Services meet the first two requirements before assigning the IP to them. However, it's difficult to prevent Services from being changed to not meet the requirements after they get the IP assigned. Therefore, we assume that users using the feature know how to configure Services properly.

Fixes #4309

@tnqn tnqn added the action/release-note Indicates a PR that should be included in release notes. label Jun 24, 2024
@tnqn tnqn force-pushed the shared-ip branch 3 times, most recently from bcb27d8 to 9fc3e4e Compare June 25, 2024 10:52
@tnqn
Copy link
Member Author

tnqn commented Jun 25, 2024

/test-all

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only minor comments and questions from me
Are you planning on adding an e2e test for the "normal" case in a follow-up PR?

pkg/agent/types/annotations.go Outdated Show resolved Hide resolved
@@ -30,6 +30,9 @@ const (
// ServiceExternalIPPoolAnnotationKey is the key of the Service annotation that specifies the Service's desired external IP pool.
ServiceExternalIPPoolAnnotationKey string = "service.antrea.io/external-ip-pool"

// ServiceAllowSharedIPAnnotationKey is the key of the Service annotation that specifies whether the Service is allowed to use shared LoadBalancer IP.
ServiceAllowSharedIPAnnotationKey string = "service.antrea.io/allow-shared-ip"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it would make sense to add lb (for LoadBalancer) in the annotation, e.g. "service.antrea.io/allow-shared-lb-ip"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it and didn't add it because it would be very similiar to the MetalLB annotation but with a tiny difference, which may make it harder to remember if people is already familiar with the MetalLB one.

Another reason is we already use full name (load-balancer) in other annotation (load-balancer-mode), I didn't want to use different writing in this annotation while "allow-shared-load-balancer-ip" may be too long. But maybe "allow-shared-load-balancer-ip" is fine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to allow-shared-load-balancer-ip, please let me know if you prefer allow-shared-lb-ip compared with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either one is fine (allow-shared-load-balancer-ip or allow-shared-ip to match metalLB). In the case of metalLB I would say there is a little less ambiguity because it is a dedicated LB solution, so it makes sense that "IP" would refer to LB IP in this context.
I am not really concerned about the length of the annotation (allow-shared-load-balancer-ip).

docs/service-loadbalancer.md Outdated Show resolved Hide resolved
docs/service-loadbalancer.md Outdated Show resolved Hide resolved
Comment on lines 217 to 218
* The Services use the `Cluster` external traffic policy, or they have
identical Endpoints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a question from me: do the Endpoints need to be identical or do they just need to be located on the same subset of Nodes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually the latter, I thought the distribution of Pods may be out of user's control. I could change to the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you and I don't think there is a need to change it to the more "accurate" version. If anything, we should probably make it clear that we discouraged using the annotation for Services with the Local external traffic policy.

pkg/controller/serviceexternalip/controller.go Outdated Show resolved Hide resolved
Comment on lines 219 to 226
if knownIPsByPool[ipPool].Has(ipStr) {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log something if this happens but the current service is not sharable? Or will it anyway happen in syncService?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it will be released by syncService, but we can log unexpected events here.

I'm also considering preserving the IP, just like before restart. Then we can say the annotation affects IP allocation, allocated IP won't be revoked by changing the annotation. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on other discussions, I think my preference would be to preserve IPs just like before restart, and let the user correct the situation. I also don't think it is a good practice for us to mutate the Service and remove a previously-allocated LB IP just because the Agent restarted. Did I understand your question correctly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are on the same page.

Comment on lines 328 to 346
// Update other Services as they may be affected by the release of this IP.
c.enqueueServicesWithoutIPs(allocation.ipPool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that always true or only when the IP was used exclusively by the Service and actually released here? Probably not a worthy "optimization" though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for this case: a service sharing an IP with other Services changes its annotation from shareable to unsharable, in which case we didn't release its IP as I want to limit the annotation for requesting IP only, at the moment new Services can no longer get this IP. After deleting this Service, new Services can get this IP. Maybe I should add a comment for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, the new comment helps. This is because we can get in situations where Services are sharing the same IP even though not all Services have the annotation, based on user actions.

Comment on lines +462 to +481
if allowSharedIP != prevIPAllocation.sharable {
c.updateIPAllocation(key, prevIPAllocation.ipPool, prevIPAllocation.ip, allowSharedIP)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will not be an error log if the annotation is removed while the IP was already shared?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to limit the annotation for IP request stage only. Checking whether an annotation change is valid requires more code, but we can't do anything to prevent it, so perhaps not worth?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I think we can start off like this. If it seems that users of this feature keep running into situations that are difficult to troubleshoot without extra logs, we can look into adding more validation / logs.

@tnqn
Copy link
Member Author

tnqn commented Jun 26, 2024

Are you planning on adding an e2e test for the "normal" case in a follow-up PR?

I'm working e2e test, will add it to this PR.

@tnqn tnqn force-pushed the shared-ip branch 2 times, most recently from 4dae453 to f653c81 Compare June 26, 2024 17:16
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a theoretical question for you: if the ServiceExternalIP feature was not itself Alpha, would you have introduced a feature gate for this? I know that the feature only becomes "active" if the annotation is used, but I remember you saying in the past that we were too liberal with introducing new functionality without a corresponding feature gate (e.g., FQDN support in Network Policies). I guess this may not apply here given that ServiceExternalIP is still Alpha and we can consider this an integral part of the ServiceExternalIP feature.

Comment on lines +1989 to +1991
for _, mutator := range mutators {
mutator(&service)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea for a future PR: it may be convenient to have a ServiceBuilder for tests, like we have a PodBuilder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I could follow up with a PR.

@@ -368,6 +369,68 @@ func testServiceWithExternalIPCRUD(t *testing.T, data *TestData) {
}
}

func testServiceSharingLoadBalancerIP(t *testing.T, data *TestData) {
pool := data.createExternalIPPool(t, "pool-", v1beta1.IPRange{CIDR: "172.30.0.0/28"}, nil, nil, nil)
defer data.crdClient.CrdV1beta1().ExternalIPPools().Delete(context.TODO(), pool.Name, metav1.DeleteOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a preference for defining ctx := context.Background() once at the beginning of the test function, since we have not intention of replacing these "TODO"s in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 405 to 406
require.NoError(t, data.waitForServiceLoadBalancerIP(svc2, "172.30.0.1"), "svc2 should get the shared IP assigned")
require.NoError(t, data.waitForServiceLoadBalancerIP(svc3, ""), "svc3 should not get the shared IP assigned")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would use assert here since we could check the second assertion even if the first one fails and that could provide extra information in case of failure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, done

return false, err
}
} else {
if len(service.Status.LoadBalancer.Ingress) == 0 || service.Status.LoadBalancer.Ingress[0].IP != expectedLoadBalancerIP {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fail immediately with a hand-crafted error if expectedLoadBalancerIP != "" and service.Status.LoadBalancer.Ingress[0].IP != expectedLoadBalancerIP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code can also be used to validate if changing requested IP works, though we don't have this test case yet.
I agree failing immediately under that condition can help detect some transient issue, but if that could happen, it may also happen that the IP is changed from expected IP to unexpected IP, and returning true immedialy when expected IP is seen is also not enough..

Users want to share LoadBalancerIP between multiple Services when they
face external IP shortage. It's possible to do it when the Services
sharing an IP meet the requirements:

* The Services use different ports
* The Services use the `Cluster` external traffic policy, or they have
  identical Endpoints.

However, the ability of using any IP that is already allocated to another
Service may incur a security risk that a Service can "steal" LoadBalancer
traffic intended for another Service.

To support the use case without introducing the security risk, we use
the annotation `service.antrea.io/allow-shared-load-balancer-ip: true`
on Services to restrict IPs that can be shared. Services without the
annotation will continue to have their LoadBalancerIPs exclusively used.
Services with the annotation can share an IP between themselves when
requesting the same IP.

Ideally, we should also check if the Services meet the first two
requirements before assigning the IP to them. However, it's difficult to
prevent Services from being changed to not meet the requirements after
they get the IP assigned. Therefore, we assume that users using the
feature know how to configure Services properly.

Signed-off-by: Quan Tian <[email protected]>
@tnqn
Copy link
Member Author

tnqn commented Jun 27, 2024

More of a theoretical question for you: if the ServiceExternalIP feature was not itself Alpha, would you have introduced a feature gate for this? I know that the feature only becomes "active" if the annotation is used, but I remember you saying in the past that we were too liberal with introducing new functionality without a corresponding feature gate (e.g., FQDN support in Network Policies). I guess this may not apply here given that ServiceExternalIP is still Alpha and we can consider this an integral part of the ServiceExternalIP feature.

Yes, if ServiceExternalIP were enabled by default, I may add a new feature gate. I feel currently this doesn't require a new feature gate because of the same reason you think: this can be considered a part of the ServiceExternalIP feature, and the graduation of ServiceExternalIP feature gate will cover the capacity to share IPs.

@tnqn
Copy link
Member Author

tnqn commented Jun 27, 2024

/test-all

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn tnqn merged commit a77f1e6 into antrea-io:main Jun 28, 2024
52 of 55 checks passed
@tnqn tnqn deleted the shared-ip branch June 28, 2024 02:43
@luolanzone luolanzone added this to the Antrea v2.1 release milestone Jul 16, 2024
@tnqn tnqn added kind/bug Categorizes issue or PR as related to a bug. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ServiceExternalIP: Allow multiple services sharing one IP
3 participants