Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhanced gateway api topology #751

Merged
merged 5 commits into from
Jul 22, 2024
Merged

enhanced gateway api topology #751

merged 5 commits into from
Jul 22, 2024

Conversation

eguzki
Copy link
Contributor

@eguzki eguzki commented Jul 9, 2024

What

  • new topology option WithAcceptedRoutesLinkedOnly(): only accepted routes will be linked to the gateway
  • new topology option WithProgrammedGatewaysOnly(): only programmed (valid) gateways can have routes and attached policies. Not programmed gateways will be isolated nodes in the topology.
  • new public method Policies() that returns all policies included in the topology. Returned objects are PolicyNodes that allow to jump to topology nodes targeted by the policy (aka PolicyTargetNode).

Work being done as part of #530

This is part of a series of PR's that will end with a new controller that will only reconcile limitador limits configuration.

Copy link

codecov bot commented Jul 9, 2024

Codecov Report

Attention: Patch coverage is 81.13208% with 30 lines in your changes missing coverage. Please review.

Project coverage is 82.90%. Comparing base (ece13e8) to head (e3527f8).
Report is 138 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #751      +/-   ##
==========================================
+ Coverage   80.20%   82.90%   +2.69%     
==========================================
  Files          64       77      +13     
  Lines        4492     6078    +1586     
==========================================
+ Hits         3603     5039    +1436     
- Misses        600      690      +90     
- Partials      289      349      +60     
Flag Coverage Δ
bare-k8s-integration 4.47% <0.00%> (?)
controllers-integration 72.61% <67.29%> (?)
gatewayapi-integration 10.89% <0.00%> (?)
integration ?
istio-integration 56.72% <67.29%> (?)
unit 33.82% <71.69%> (+3.78%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
api/v1beta1 (u) 71.42% <ø> (ø)
api/v1beta2 (u) 92.17% <94.11%> (+0.75%) ⬆️
pkg/common (u) 88.13% <ø> (-0.70%) ⬇️
pkg/istio (u) 73.88% <ø> (-0.03%) ⬇️
pkg/log (u) 94.73% <ø> (ø)
pkg/reconcilers (u) ∅ <ø> (∅)
pkg/rlptools (u) 84.27% <ø> (+4.81%) ⬆️
controllers (i) 82.20% <84.15%> (+5.39%) ⬆️
Files Coverage Δ
controllers/authpolicy_status.go 86.81% <100.00%> (+1.09%) ⬆️
...lers/ratelimitpolicy_enforced_status_controller.go 83.42% <100.00%> (ø)
controllers/ratelimitpolicy_limits.go 81.35% <100.00%> (-0.91%) ⬇️
controllers/target_status_controller.go 89.47% <100.00%> (ø)
pkg/library/gatewayapi/topology_indexes.go 91.02% <100.00%> (ø)
pkg/library/mappers/gateway.go 81.25% <ø> (-18.75%) ⬇️
pkg/rlptools/overrides.go 100.00% <100.00%> (ø)
pkg/rlptools/topology_index.go 74.35% <100.00%> (ø)
pkg/library/gatewayapi/utils.go 87.15% <81.81%> (ø)
pkg/library/gatewayapi/topology.go 76.74% <78.29%> (ø)

... and 38 files with indirect coverage changes

@eguzki eguzki mentioned this pull request Jul 9, 2024
12 tasks
@eguzki eguzki marked this pull request as ready for review July 9, 2024 21:36
@eguzki eguzki requested a review from a team as a code owner July 9, 2024 21:36
if route == nil {
return nil
}
return utils.Filter(route.Spec.ParentRefs, IsParentGateway)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using github.com/samber/lo for this kind of thing. It's quite good IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any reason to change when utils.Filter() does what it is expected to do. Anything I could miss?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github.com/samber/lo is immensely popular and IMO very good at this kind of generic widely used functions. Wherever we can leverage something like this from the community – provided it suits our needs of course – instead of keeping our own home-made version of the same thing, I personally would be in favour of.

At the same time, I don't want to create a fuss. I understand this is not the main goal of this PR. Just something to think about.

@@ -37,7 +39,8 @@ func ApplyOverrides(topology *kuadrantgatewayapi.Topology, gateway *gatewayapiv1

for _, route := range topology.Routes() {
if !slices.Contains(kuadrantgatewayapi.GetRouteAcceptedGatewayParentKeys(route.HTTPRoute), client.ObjectKeyFromObject(gateway)) {
overriddenPolicies = append(overriddenPolicies, route.AttachedPolicies()...)
routePolicies := utils.Map(route.AttachedPolicies(), func(pNode kuadrantgatewayapi.PolicyNode) kuadrantgatewayapi.Policy { return pNode.Policy })
Copy link
Contributor

@guicassolato guicassolato Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid adding another dependency:

Suggested change
routePolicies := utils.Map(route.AttachedPolicies(), func(pNode kuadrantgatewayapi.PolicyNode) kuadrantgatewayapi.Policy { return pNode.Policy })
routePolicies := lo.Map(route.AttachedPolicies(), func(pNode kuadrantgatewayapi.PolicyNode, _ int) kuadrantgatewayapi.Policy { return pNode.Policy })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as #751 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, for other PR, we could remove all utils.Map and utils.Filter and rewrite using github.com/samber/lo (or something else). Then we can remove utils code and have the external dependency only.

}

// TopologyOpts allows to manipulate topologyOptions.
type TopologyOpts func(*topologyOptions)

func WithAcceptedRoutesLinkedOnly() TopologyOpts {
return func(o *topologyOptions) {
o.linkAcceptedRoutesOnly = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't ever want non-accepted routes in the topology as we may occasionally want non-programmed gateways? I actually wonder what would be the use case for adding non-programmed gateways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some use cases I found that non-accepted routes should be linked to parent gateways: mappers. It allows getting events for "disconnected" policies, which status can be affected from the "disconnection".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wonder what would be the use case for adding non-programmed gateways.

this is a use case #751 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some use cases I found that non-accepted routes should be linked to parent gateways

So why not giving the option here then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. To clarify...

My point is that we have WithProgrammedGatewaysOnly(bool) for when we want to enforce programmed gateways only, i.e. WithProgrammedGatewaysOnly(true). I don't see why one would use WithProgrammedGatewaysOnly(false) though – versus simply not calling WithProgrammedGatewaysOnly at all.

I guess that was your thinking for WithAcceptedRoutesLinkedOnly. Nobody will ever need WithAcceptedRoutesLinkedOnly(false). The effect in this case is virtually the same as not using WithAcceptedRoutesLinkedOnly, right?

So why not designing WithProgrammedGatewaysOnly the same way? If there's a reason I struggling to see, then there may be for WithAcceptedRoutesLinkedOnly too perhaps?

IOW, why are the signatures of these 2 functions different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes all the sense. Fixing it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -200,6 +200,8 @@ func (r *RateLimitPolicyEnforcedStatusReconciler) buildTopology(ctx context.Cont
}

return kuadrantgatewayapi.NewTopology(
kuadrantgatewayapi.WithAcceptedRoutesLinkedOnly(),
kuadrantgatewayapi.WithProgrammedGatewaysOnly(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand a gateway that transitions from Programmed: True to Programmed: False will NOT get the statuses of its affecting policies updated then?

I'm thinking this line invoked for a gateway that has not been added to the topology:

policies := indexes.PoliciesFromGateway(gw)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -169,6 +169,8 @@ func (r *AuthPolicyReconciler) generateTopology(ctx context.Context) (*kuadrantg
})

return kuadrantgatewayapi.NewTopology(
kuadrantgatewayapi.WithAcceptedRoutesLinkedOnly(),
kuadrantgatewayapi.WithProgrammedGatewaysOnly(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what problem this is solving, other than skipping applying the getaway overrides on top of route policies when the getaway is not in a programmed state. The getaway policy will still be reconciled as valid, the gateway AuthConfig will still be created (along and possibly clashing with the route one if it exists), and all the policy statuses will be reported as fine.

I'm afraid we may be here hiding the actual problem which in this case is that the gateway is not ready, therefore we shouldn't configure policies for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the described problem is not introduced by this PR, but neither it's fixed by it though. In fact, I'm afraid the change may even aggravate the problem.

The more fundamental problem is of course that, in this case, the topology is only marginal to the decisions we make. Most of the business logic for reconciling AuthPolicies still won't leverage the topology graph built by this function, while some already do.

By trying to fix only the topology graph, we're effectively providing two views of world, thus splitting the business logic into the portion that creates configuration based on one reality and another based on a different reality.

I fear that, by trying to do the right thing here, we're actually bringing the system to an inconsistent state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

I am confused. I will kindly ask you to elaborate more on this because I do not get it.

I will try to clarify some of your statements:

BTW, the described problem is not introduced by this PR, but neither it's fixed by it though.

This PR does not pretend to change a bit the behavior of any of the controllers. All the tests pass. There is no problem being described and thus, no problem being fixed.

This PR is adding few more options to the DAG topology, yet keeping the controllers behavior as they are. The intention is to add more tools/capabilities on the topology lib to enable following up PR's introducing the Limitador limits controller.

The authcontroller in current main branch is building a topology with only programmed gateways and linking only accepted routes. So is doing the code with the new changes in this PR. Nothing changes. And if something changes, let me know and I will fix it.

In fact, I'm afraid the change may even aggravate the problem.

First of all, what problem are you talking about?

Configuration for "programmed gateways only" option was introduced by #715. This PR is adding (among other public methods) yet another configuration option which is opt out from "link accepted routes only". As you know, currently the topology has hard-coded linking only accepted routes. This may not be always desirable. This feature is not needed now, but will be needed by the limits controller. The reasoning being that the controller will need a topology in its mappers which routes are linked to the gateway no matter they are accepted or not. But this is something I can explain further when I open PR for the limits controller.

I do not see any problem being aggravated. Again, this is not changing any controller behavior. And if it does, I will fix it.

By trying to fix only the topology graph, we're effectively providing two views of world, thus splitting the business logic into the portion that creates configuration based on one reality and another based on a different reality.

This is by no means fixing the topology. AFAIK, there is no issue with the current topology for the needs of the current controllers. This is about adding more configurability to the topology. Some business logic requires a type of topology with not programmed gateways as well as linked routes that have not been accepted. Specifically I am thinking on the mappers. Mappers need to walk the topology. And having disconnections because routes are not accepted prunes events to reconcile kuadrant policies.

Copy link
Contributor

@guicassolato guicassolato Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration for "programmed gateways only" option was introduced by #715. […] As you know, currently the topology has hard-coded linking only accepted routes.

This indeed confirms the PR does not aggravate the problem, and it's enough for me to approve it. Thanks for explaining it and for collecting the evidences in the code to support it, @eguzki .

To elaborate on my main criticism, which I still stand for...

Disclaimer for all readers: please try to read in between the lines and focus on the generalized form of the problem I'm trying to describe, rather than the individual examples I use.

I missed #715. I think this is a very good example of how a well-intended change, that we needed indeed and that, looked individually, made our code better than before, but, at the same time, if observed holistically, emerges nonetheless as a change that creates a problem (or aggravates one) we didn't have before (or that simply had a smaller surface before.)

We spotted we were treating non-programmed gateways as if they were ready and thus we were signaling the policies configured on those gateways as enforced. This was clearly a bug and had to be fixed. We have tons of code that implement business logic based on a "view of the world" provided by this topology struct. We made that topology struct to only account for programmed gateways. Done. Problem solved. We fixed all those business logic with one move.

However, unfortunately there are controllers (possibly all of them AFAIK!) that mix business logic based on the topology struct with business logic that does not use the topology struct, where sometimes the status of the Gateways and HTTPRoutes are properly taken into account, sometimes not. (I mentioned the reconciliation of AuthConfigs before, but look at ComputeGatewayDiff for plenty of more examples.)

With the topology struct now doing what seems to be the right thing, we created two "views of the world" – one that we believe is right and another one that may or may not be right. These two views (as well as the decisions based on them) are often intertwined together, leading to a situation where we can no longer say "Kuadrant overlooks the state of the gateways. As a user, it's up to you to make sure to bring all your gateways to a programmed state. Until then, do not trust status of Kuadrant policies." Unfortunately, neither we can say today, after #715, "No worries! Kuadrant makes sure to always configure and report status of your policies right, according to the state of the gateways." Between the two statements, obviously we want to issue the latter, but if that's not true, then we're probably better off with the former IMO.

I used "programmed gateways" as example, but we can run over again replacing it with "accepted routes", or "listener status", and possibly others that do not occur to me right now. The important part is, while we are not consistent about the views of the world that the code bases its decisions upon, it becomes very hard to debug behaviors such as:

  • why a policy with status enforced doesn't seem to be enforced at all
  • why an override that seems not have been accepted appears to be taking over my route-level rules (or the other way around)
  • etc

To sum up, let's be careful with changes that seem to be improving the code but until they are rolled out completely they may actually put us (as well as our users) in a difficult position. This is not the case of this PR as I'm now convinced, and neither it is a negative criticism to the DAG/topology approach in general. Rather, I'm concerned with how we roll out these changes.

@eguzki eguzki merged commit 7745a19 into main Jul 22, 2024
26 checks passed
@eguzki eguzki deleted the enhanced-topology branch July 22, 2024 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants