Changes on HTTPScaledObjects don't update routing table #605

t0rr3sp3dr0 · 2023-02-18T01:14:16Z

Report

The current implementation of the HTTP Add-on allows users to modify all attributes of the HTTPScaledObject, but the internal routing table implementation also assumes some of these fields are immutable. Modifying these fields causes unexpected behaviour, the routing table is never updated but the K8s objects are. I experienced that after changing the value of .spec.targetPendingRequests and having the scaler not behave as expected.

Expected Behavior

Either .spec.scaleTargetRef and .spec.targetPendingRequests should be immutable, or we should update the routing table when these fields get updated on the K8s object.

Actual Behavior

Changes to .spec.scaleTargetRef and .spec.targetPendingRequests are successfully applied to K8s, but the internal routing table is not updated to reflect these new values.

Steps to Reproduce the Problem

Install Keda and the HTTP Add-on to a brand new K8s cluster and also deploy some example HTTP application to it
Check the routing table of the HTTP Add-on is empty (http://keda-add-ons-http-interceptor-admin.svc.cluster.local:9090/routing_table)
Create a valid HTTPScaledObject pointing to the application you deployed and with .spec.targetPendingRequests set to 2
Check the routing table was updated with the details of the HTTPScaledObject created (http://keda-add-ons-http-interceptor-admin.svc.cluster.local:9090/routing_table)
Edit the HTTPScaledObject and set .spec.targetPendingRequests set to 1
Check the routing table was NOT updated and TargetPendingRequests is still 2, not 1 (http://keda-add-ons-http-interceptor-admin.svc.cluster.local:9090/routing_table)

Logs from KEDA HTTP operator

2023-02-17T22:14:46Z	INFO	controllers.HTTPScaledObject	Reconciliation start	{"HTTPScaledObject.Namespace": "default", "HTTPScaledObject.Name": "awesome-project"}

2023-02-17T22:14:46Z	INFO	controllers.HTTPScaledObject	Reconciling HTTPScaledObject	{"HTTPScaledObject.Namespace": "default", "HTTPScaledObject.Name": "awesome-project", "Namespace": "default", "DeploymentName": "awesome-project"}

2023-02-17T22:14:46Z	INFO	controllers.HTTPScaledObject	Creating scaled objects{"HTTPScaledObject.Namespace": "default", "HTTPScaledObject.Name": "awesome-project", "reconciler.appObjects": "addObjects", "HTTPScaledObject.name": "awesome-project", "HTTPScaledObject.namespace": "default", "external scaler host name": "keda-add-ons-http-external-scaler.keda:9090"}

2023-02-17T22:14:46Z	INFO	controllers.HTTPScaledObject	Creating App ScaledObject	{"HTTPScaledObject.Namespace": "default", "HTTPScaledObject.Name": "awesome-project", "reconciler.appObjects": "addObjects", "HTTPScaledObject.name": "awesome-project", "HTTPScaledObject.namespace": "default", "ScaledObject": {"Object":{"apiVersion":"keda.sh/v1alpha1","kind":"ScaledObject","metadata":{"labels":{"app":"kedahttp-awesome-project-app","name":"awesome-project-app"},"name":"awesome-project-app","namespace":"default"},"spec":{"advanced":{"restoreToOriginalReplicaCount":true},"maxReplicaCount":100,"minReplicaCount":0,"pollingInterval":1,"scaleTargetRef":{"kind":"Deployment","name":"awesome-project"},"triggers":[{"metadata":{"host":"awesome-project.t0rr3sp3dr0.dev","scalerAddress":"keda-add-ons-http-external-scaler.keda:9090"},"type":"external-push"}]}}}}

2023-02-17T22:14:47Z	ERROR	controllers.HTTPScaledObject.addAndUpdateRoutingTable	could not add host to routing table, progressing anyway	{"HTTPScaledObject.Namespace": "default", "HTTPScaledObject.Name": "awesome-project", "reconciler.appObjects": "addObjects", "HTTPScaledObject.name": "awesome-project", "HTTPScaledObject.namespace": "default", "host": "awesome-project.t0rr3sp3dr0.dev", "error": "host awesome-project.t0rr3sp3dr0.dev is already registered in the routing table"}
github.com/kedacore/http-add-on/operator/controllers.addAndUpdateRoutingTable
	/workspace/operator/controllers/routing_table.go:46
github.com/kedacore/http-add-on/operator/controllers.createOrUpdateApplicationResources
	/workspace/operator/controllers/app.go:132
github.com/kedacore/http-add-on/operator/controllers.(*HTTPScaledObjectReconciler).Reconcile
	/workspace/operator/controllers/httpscaledobject_controller.go:123
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:122
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:323
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235

2023-02-17T22:14:47Z	INFO	controllers.HTTPScaledObject	Updating status on HTTPScaledObject	{"HTTPScaledObject.Namespace": "default", "HTTPScaledObject.Name": "awesome-project", "resource version": "1941396"}

2023-02-17T22:14:47Z	INFO	controllers.HTTPScaledObject	Updated status on HTTPScaledObject	{"HTTPScaledObject.Namespace": "default", "HTTPScaledObject.Name": "awesome-project", "resource version": "1941399"}

2023-02-17T22:14:47Z	INFO	controllers.HTTPScaledObject	Updating status on HTTPScaledObject	{"HTTPScaledObject.Namespace": "default", "HTTPScaledObject.Name": "awesome-project", "resource version": "1941399"}

2023-02-17T22:14:47Z	INFO	controllers.HTTPScaledObject	Updated status on HTTPScaledObject	{"HTTPScaledObject.Namespace": "default", "HTTPScaledObject.Name": "awesome-project", "resource version": "1941400"}

2023-02-17T22:14:47Z	INFO	controllers.HTTPScaledObject	Reconcile success	{"HTTPScaledObject.Namespace": "default", "HTTPScaledObject.Name": "awesome-project"}

What version of the KEDA HTTP Add-on are you running?

0.4.0

Kubernetes Version

1.24

Platform

Microsoft Azure

Anything else?

I think the current behaviour is related to our lack of support for multiple HTTPScaledObject to have the same host. This implementation prevents a new HTTPScaledObject that uses the host of an existing HTTPScaledObject to override the configuration of the latter. This would also be a security issue, as users that have access to some namespace on a cluster could after resources on other namespaces.

I took a look at the implementation details of the routing table and propose we move from having it stored in a ConfigMap to compute its state based on the HTTPScaledObject present on the cluster and have the built routing table cached in-memory, having it refreshed periodically. Another option is to keep the current ConfigMap-based implementation, but it has some drawbacks that I'll list bellow.

Proposal 1 - HTTPScaledObject-based Routing Table

Rework the routing table implementation by relying on HTTPScaledObjects. Interceptors would compute the routing table based on these objects and cache the result in-memory. The cache would be updated based on K8s events and a full reconciliation performed periodically.

Benefits

Have the K8s API Server as the single source of truth
Internal routing table state isn't stored on user-writable storage and is not affected by external factors
Interceptors no longer have to rely on the controller to have accurate routing table information
Single place to have input validations performed

Drawbacks

Interceptors become watchers of all HTTPScaledObjects
Increased number of API calls to the K8s API Server (distributed routing table computation on interceptors)

Proposal 2 - ConfigMap-based Routing Table

Fix the current routing table implementation based on ConfigMap to have it updated when HTTPScaledObjects change.

Benefits

Reuse existing code/architecture
Reduced number of API calls to the K8s API Server (centralised routing table computation on controller)

Drawbacks

Makes harder for users to debug if something goes wrong (users need to know the source of truth is the ConfigMap, not the HTTPScaledObject instances)
Need to maintain the ConfigMap structure in sync with the HTTPScaledObject definition (developers need to know they need to make changes in both places for everything to work as expected)
The ConfigMap is basically an aggregate view of all HTTPScaledObjects that can get out-of-sync due to business logic bugs or outage of the controller-manager
Additional work to make ConfigMap solution more resilient (the HTTP Add-on is currently unable to self-heal in case the ConfigMap gets deleted or corrupted)

Proposal 3 - Make fields immutable

Update the CRD to make immutable the fields the routing table rely on.

Benefits

Quick fix

Drawbacks

Degraded user experience (users would have to delete HTTPScaledObject and create a new one HTTPScaledObject to update the settings)
Unable to update configuration without downtime (deletion of an HTTPScaledObject make the service unreachable)

The text was updated successfully, but these errors were encountered:

t0rr3sp3dr0 · 2023-02-21T21:04:33Z

@JorTurFer and @tomkerkhove, could I get some input here so I can start working on a fix for this issue?

JorTurFer · 2023-02-22T08:00:16Z

Hi,
I think that we shouldn't store it in configMap. I mean, (IMO), the interceptors should watch changes in HTTPScaledObjects and configure themselves on any change. This makes the HTTPScaledObject the source of the truth. The problem of this is that we need to validate them to ensure that they don't modify others.
If we use the kubeclient properly, it has a cache and we won't increase a lot the amount of queries, they will increase for sure, but the approach of configuring on the fly is what we are doing in KEDA for example.

WDYT @tomkerkhove ?

tomkerkhove · 2023-02-22T13:00:44Z

I think the current behaviour is related to our lack of support for multiple HTTPScaledObject to have the same host. This implementation prevents a new HTTPScaledObject that uses the host of an existing HTTPScaledObject to override the configuration of the latter.

I think this is good that we do not support this, similarly to core where you can only have 1 scaledObject for a scale target.

With regards to the proposed designs, I think watching the objects is only acceptable if it's done by the operator. If we'll start querying Kubernetes for it then it adds load we can avoid and flaky scenarios where the routing table is outdated because it didn't receive the message.

What is the information is kept in the operator that the interceptor queries?

t0rr3sp3dr0 · 2023-02-22T20:45:52Z

The current implementation of the interceptor doesn't fetch data directly from the controller, the controller writes data to a ConfigMap and all interceptor instances watch that ConfigMap. The information contained on the ConfigMap is a list of all registered hosts and for each host we have their service name, port number, deployment name, namespace name, and target pending requests. That's used by the interceptor proxy to route requests properly. What I would like to do is to remove this indirection and grab the data directly from HTTPScaledObjects instead of relying on this ConfigMap containing a single big JSON.

We can have a fourth option where the interceptors fetch the data from the controller, instead of using the K8s API. But I worry that in case something goes wrong with the controller, we won't be able to route traffic, causing an outage on services using the add-on. With the current implementation, when the controller is down, new changes don't take effect but the interceptors are able to use existing routes until the problem is addressed.

t0rr3sp3dr0 · 2023-02-24T00:46:13Z

Some extra information here. I was taking a look at the RBAC configuration for the add-on and noticed that both the scaler and interceptor are watching K8s resources and it's not only the routing table ConfigMap. Both watch Deployments and the scaler also watches Endpoints. Right now, all three components share the same ServiceAccount with way too much access.

We may decide to change that as well and grab all the information from the controller or rely on DNS information for the readiness data, but I don't know if that's a good idea. I believe it's okay to watch K8s resources as long as you have well-scoped read-only access.

tomkerkhove · 2023-02-24T07:07:14Z

Some extra information here. I was taking a look at the RBAC configuration for the add-on and noticed that both the scaler and interceptor are watching K8s resources and it's not only the routing table ConfigMap. Both watch Deployments and the scaler also watches Endpoints. Right now, all three components share the same ServiceAccount with way too much access.

This should be fixed with #612 so let's keep that out-of-scope here.

I am fine with going the HTTPScaledObject route; let's do that as @JorTurFer agreed on it as well - Thanks for the proposals!

JorTurFer · 2023-02-25T00:55:22Z

Thanks for the proposals!

+1000

let's do that as @JorTurFer agreed on it as wel

I prefer this way indeed. Configuring the interceptors based on HTTPScaledObjects we will be always up to date, not have sync issues because new interceptors will read all the HTTPScaledObject and configure themselves.

t0rr3sp3dr0 · 2023-02-25T04:23:32Z

Perfect! I'll start working on working this on Monday.

stale · 2023-04-26T13:18:46Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

t0rr3sp3dr0 · 2023-04-26T18:58:46Z

Don’t close

t0rr3sp3dr0 added the bug Something isn't working label Feb 18, 2023

tomkerkhove added this to Roadmap - KEDA HTTP Add-On Feb 24, 2023

github-project-automation bot moved this to To Triage in Roadmap - KEDA HTTP Add-On Feb 24, 2023

tomkerkhove mentioned this issue Mar 13, 2023

Random restarts on external-scaler #630

Closed

stale bot added the stale All issues that are marked as stale due to inactivity label Apr 26, 2023

stale bot removed the stale All issues that are marked as stale due to inactivity label Apr 26, 2023

tomkerkhove mentioned this issue May 4, 2023

Support routing and scaling based on path #338

Closed

t0rr3sp3dr0 mentioned this issue May 9, 2023

HTTPSO-based Routing Table #669

Merged

3 tasks

JorTurFer closed this as completed in #669 Jun 14, 2023

github-project-automation bot moved this from To Triage to Done in Roadmap - KEDA HTTP Add-On Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes on HTTPScaledObjects don't update routing table #605

Changes on HTTPScaledObjects don't update routing table #605

t0rr3sp3dr0 commented Feb 18, 2023 •

edited

Loading

t0rr3sp3dr0 commented Feb 21, 2023

JorTurFer commented Feb 22, 2023

tomkerkhove commented Feb 22, 2023

t0rr3sp3dr0 commented Feb 22, 2023

t0rr3sp3dr0 commented Feb 24, 2023

tomkerkhove commented Feb 24, 2023

JorTurFer commented Feb 25, 2023

t0rr3sp3dr0 commented Feb 25, 2023

stale bot commented Apr 26, 2023

t0rr3sp3dr0 commented Apr 26, 2023

Changes on HTTPScaledObjects don't update routing table #605

Changes on HTTPScaledObjects don't update routing table #605

Comments

t0rr3sp3dr0 commented Feb 18, 2023 • edited Loading

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA HTTP operator

What version of the KEDA HTTP Add-on are you running?

Kubernetes Version

Platform

Anything else?

Proposal 1 - HTTPScaledObject-based Routing Table

Benefits

Drawbacks

Proposal 2 - ConfigMap-based Routing Table

Benefits

Drawbacks

Proposal 3 - Make fields immutable

Benefits

Drawbacks

t0rr3sp3dr0 commented Feb 21, 2023

JorTurFer commented Feb 22, 2023

tomkerkhove commented Feb 22, 2023

t0rr3sp3dr0 commented Feb 22, 2023

t0rr3sp3dr0 commented Feb 24, 2023

tomkerkhove commented Feb 24, 2023

JorTurFer commented Feb 25, 2023

t0rr3sp3dr0 commented Feb 25, 2023

stale bot commented Apr 26, 2023

t0rr3sp3dr0 commented Apr 26, 2023

t0rr3sp3dr0 commented Feb 18, 2023 •

edited

Loading