Skip to content

helm: forward values to elasti trough env variables#178

Merged
ramantehlan merged 7 commits intoKubeElasti:mainfrom
rethil:issue_175
Sep 23, 2025
Merged

helm: forward values to elasti trough env variables#178
ramantehlan merged 7 commits intoKubeElasti:mainfrom
rethil:issue_175

Conversation

@rethil
Copy link
Contributor

@rethil rethil commented Aug 27, 2025

Description

This allows operator to work in any namespace, not only in elasti.
This also allows forwarding values from helm chart to elasti pods.

Fixes #175 #182

Had to update few third parties, as UTs & e2e didn't want to work (probably due to go1.25.1 or helm 3.18.6).

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • e2e
  • local k3s cluster

Checklist:

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • A PR is open to update the helm chart (link)[https://github.com/truefoundry/elasti/tree/main/charts/elasti] if applicable
  • I have updated the CHANGELOG.md file with the changes I made

Summary by CodeRabbit

  • New Features

    • Operator and Resolver now fully configurable via environment variables/Helm values, including dynamic ports and addresses.
    • Automatic runtime resolution of service endpoints; no hardcoded names or ports.
    • Stricter CRD validation: spec now requires scaleTargetRef and service.
  • Refactor

    • Consolidated environment injection in Helm templates to a common helper.
    • Removed hardcoded configuration across components.
  • Chores

    • Upgraded controller tools to v0.19.0.
  • Tests

    • Updated tests to use environment-driven configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 27, 2025

Walkthrough

Introduces a centralized config package to source operator and resolver settings from environment variables; updates operator, resolver, and Helm charts to consume these values; removes hard-coded names/ports; bumps controller-tools version; and adjusts CRD schemas and tests accordingly.

Changes

Cohort / File(s) Summary
Helm: common env injection
charts/elasti/templates/_helpers.tpl, charts/elasti/templates/deployment.yaml
Adds elasti.commonEnvValues helper emitting operator/resolver env vars (names, services, ports, KUBERNETES_CLUSTER_DOMAIN). Injects helper into operator and resolver containers; removes explicit KUBERNETES_CLUSTER_DOMAIN envs.
Operator: adopt config package
operator/cmd/main.go, operator/internal/informer/informer.go, operator/internal/controller/opsDeployment.go, operator/internal/controller/opsEndpointslices.go, operator/internal/controller/opsInformer.go, operator/internal/controller/elastiservice_controller.go
Imports config; replaces hard-coded resolver/operator identifiers and ports with config-driven values; updates informer namespace/name selectors; uses configured reverse proxy port in EndpointSlices; removes unused resolver constants.
Config module (new)
pkg/config/config.go
Adds env-driven configuration: Config, ResolverConfig, getters for resolver/operator config and cluster domain; strict parsing with panic on missing/invalid envs; defines env var constants.
Resolver: dynamic ports and operator URL
resolver/cmd/main.go, resolver/internal/operator/RPCClient.go
Replaces fixed ports with config-derived ports; constructs operator URL from service/namespace and cluster domain at runtime.
CRDs and tooling version bump
charts/elasti/templates/elastiservice-crd.yaml, operator/config/crd/bases/elasti.truefoundry.com_elastiservices.yaml, operator/Makefile
Updates controller-gen version to v0.19.0; CRD schemas now declare spec.required: [scaleTargetRef, service].
Tests: env setup for config
operator/internal/controller/elastiservice_controller_test.go
Sets resolver-related env vars in test setup to satisfy new config requirements.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant K8s as Kubernetes
    participant Resolver as Resolver Pod
    participant Operator as Operator Controller
    participant API as K8s API Server

    Note over Resolver,Operator: Startup (env-driven)
    Operator->>Operator: Load operator/resolver config from env
    Resolver->>Resolver: Load resolver config from env

    Note over Resolver,API: Resolver restart
    Resolver-->>API: Pod recreated (new IP)
    Operator->>API: Watch Deployment (namespace/name from config)
    API-->>Operator: Deployment update event
    Operator->>API: List/Watch Endpoints for Service (from config)
    Operator->>API: Reconcile EndpointSlice ports (reverse proxy port from config)
    API-->>Operator: EndpointSlice updated
    Operator-->>Resolver: Traffic via Service now targets new Pod IP
Loading
sequenceDiagram
    autonumber
    participant Resolver as Resolver
    participant DNS as Cluster DNS
    participant OperatorSvc as Operator Service

    Note over Resolver: Build Operator URL at runtime
    Resolver->>Resolver: Get operator config (service, namespace, port)
    Resolver->>DNS: Compose FQDN using cluster domain
    Resolver->>OperatorSvc: HTTP RPC to http://<svc>.<ns>.svc.<domain>:<port>
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A whisk of ports, a hop of names,
I tuned my burrow’s env-fed frames.
When resolvers nap and pods awake,
New slices find the paths they take.
I thump the ground—config in sync!
Carrots compiled—deployments blink.
Hop on, release is pink! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Linked Issues Check ⚠️ Warning Although the PR centralizes resolver names and ports into a dynamic configuration, the summary does not describe changes to the controller logic that detect resolver pod restarts and explicitly recreate or update EndpointSlices, so the primary objective of ensuring stale slices are refreshed on resolver restart appears unaddressed. Implement or document controller changes that watch resolver pod lifecycle events and regenerate EndpointSlices when resolver pods restart, and include tests that verify stale slice entries are removed and recreated with the new pod IP.
Out of Scope Changes Check ⚠️ Warning The PR encompasses extensive changes beyond the endpoint slice bug fix—such as introducing a new config package, Helm template helpers, dynamic operator and resolver main updates, CRD version bumps, and Makefile edits—which are unrelated to the linked issue’s primary goal of refreshing EndpointSlices on resolver restart. Split the implementation into separate PRs by isolating the endpoint slice bug fix logic into one change and grouping config/env forwarding, CRD, and chart updates into another to maintain clear scope boundaries.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly describes forwarding Helm chart values to Elasti pods via environment variables, which aligns with the core change of exposing configuration through env vars.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8814d80 and a1dfadd.

📒 Files selected for processing (15)
  • charts/elasti/templates/_helpers.tpl (1 hunks)
  • charts/elasti/templates/deployment.yaml (2 hunks)
  • charts/elasti/templates/elastiservice-crd.yaml (3 hunks)
  • operator/Makefile (1 hunks)
  • operator/cmd/main.go (2 hunks)
  • operator/config/crd/bases/elasti.truefoundry.com_elastiservices.yaml (2 hunks)
  • operator/internal/controller/elastiservice_controller.go (0 hunks)
  • operator/internal/controller/elastiservice_controller_test.go (2 hunks)
  • operator/internal/controller/opsDeployment.go (2 hunks)
  • operator/internal/controller/opsEndpointslices.go (3 hunks)
  • operator/internal/controller/opsInformer.go (2 hunks)
  • operator/internal/informer/informer.go (2 hunks)
  • pkg/config/config.go (1 hunks)
  • resolver/cmd/main.go (3 hunks)
  • resolver/internal/operator/RPCClient.go (2 hunks)
💤 Files with no reviewable changes (1)
  • operator/internal/controller/elastiservice_controller.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-14T12:33:30.800Z
Learnt from: rethil
PR: truefoundry/KubeElasti#178
File: operator/internal/controller/opsEndpointslices.go:21-23
Timestamp: 2025-09-14T12:33:30.800Z
Learning: In the KubeElasti project, the resolver component in `getIPsForResolver()` method must list EndpointSlices across all namespaces, not restricted to its own namespace. This is by design and required for proper resolver functionality.

Applied to files:

  • operator/internal/controller/opsEndpointslices.go
🧬 Code graph analysis (8)
operator/internal/controller/opsDeployment.go (1)
pkg/config/config.go (1)
  • GetResolverConfig (43-54)
resolver/cmd/main.go (1)
pkg/config/config.go (1)
  • GetResolverConfig (43-54)
operator/cmd/main.go (1)
pkg/config/config.go (1)
  • GetOperatorConfig (57-64)
operator/internal/controller/opsInformer.go (3)
pkg/config/config.go (1)
  • GetResolverConfig (43-54)
operator/internal/controller/elastiservice_controller.go (1)
  • ElastiServiceReconciler (31-40)
operator/internal/informer/informer.go (1)
  • KeyParams (337-342)
operator/internal/controller/elastiservice_controller_test.go (1)
pkg/config/config.go (5)
  • EnvResolverNamespace (10-10)
  • EnvResolverDeploymentName (11-11)
  • EnvResolverServiceName (12-12)
  • EnvResolverPort (13-13)
  • EnvResolverProxyPort (14-14)
operator/internal/informer/informer.go (1)
pkg/config/config.go (1)
  • GetResolverConfig (43-54)
operator/internal/controller/opsEndpointslices.go (1)
pkg/config/config.go (1)
  • GetResolverConfig (43-54)
resolver/internal/operator/RPCClient.go (1)
pkg/config/config.go (2)
  • GetOperatorConfig (57-64)
  • GetKubernetesClusterDomain (38-40)
🔇 Additional comments (21)
operator/Makefile (1)

152-152: LGTM! Controller-tools version update aligns with PR scope.

The bump from v0.14.0 to v0.19.0 is consistent with the CRD schema changes throughout the PR.

operator/config/crd/bases/elasti.truefoundry.com_elastiservices.yaml (2)

6-6: LGTM! Controller-gen version updated consistently.

The version bump to v0.19.0 matches the Makefile change and other CRD updates in this PR.


99-101: LGTM! Required fields alignment addresses validation gap.

Based on the past review discussion, these fields were already required at the code level but not enforced in the CRD schema. This change aligns the schema with the actual implementation requirements.

operator/internal/controller/elastiservice_controller_test.go (2)

25-25: LGTM! Test now uses centralized configuration.

The import aligns with the PR's goal of centralizing configuration management.


58-62: Add missing cluster domain environment variable.

The test should set the cluster domain to prevent panics if reconciliation code begins reading GetKubernetesClusterDomain().

Apply this diff to add the missing environment variable:

 		GinkgoT().Setenv(config.EnvResolverNamespace, namespace)
 		GinkgoT().Setenv(config.EnvResolverDeploymentName, "resolver-deployment")
 		GinkgoT().Setenv(config.EnvResolverServiceName, "resolver-service")
 		GinkgoT().Setenv(config.EnvResolverPort, "1234")
 		GinkgoT().Setenv(config.EnvResolverProxyPort, "4321")
+		GinkgoT().Setenv(config.EnvKubernetesClusterDomain, "cluster.local")
```Based on the search results, controller-tools v0.19.0 was released on August 28, 2025, and includes bumps to k8s.io/* v0.34, new CRD markers like AtMostOneOf and ExactlyOneOf constraints, and various bug fixes. The version update appears to be routine maintenance rather than introducing breaking changes that would affect testing.

</blockquote></details>
<details>
<summary>charts/elasti/templates/elastiservice-crd.yaml (2)</summary><blockquote>

`6-6`: **LGTM! Controller-gen version consistent across CRDs.**

The version bump aligns with other CRD files in the PR.

---

`100-102`: **LGTM! Required fields enforcement matches operator CRD.**

This ensures consistency between the Helm chart CRD and the base CRD in the operator directory.

</blockquote></details>
<details>
<summary>pkg/config/config.go (1)</summary><blockquote>

`1-91`: **LGTM! Well-designed centralized configuration package.**

The implementation follows good practices:
- Clear panic-on-missing semantics for fail-fast behavior
- Proper port range validation (1-65535)
- Go naming conventions followed
- Comprehensive documentation
- Structured types for different component configurations

The panic behavior is appropriate for configuration that should be validated at startup time.

</blockquote></details>
<details>
<summary>operator/internal/controller/opsDeployment.go (2)</summary><blockquote>

`10-10`: **LGTM! Centralized configuration import.**

The import enables dynamic configuration instead of hard-coded values.

---

`46-46`: **LGTM! Dynamic resolver deployment name.**

The change enables flexible deployment naming through environment variables, supporting the PR's goal of running operators in any namespace.

</blockquote></details>
<details>
<summary>operator/cmd/main.go (2)</summary><blockquote>

`29-29`: **LGTM! Configuration package import.**

The import supports dynamic port configuration for the operator.

---

`193-194`: **LGTM! Dynamic operator port configuration.**

The change replaces hard-coded port with environment-driven configuration, enabling flexible deployment.

</blockquote></details>
<details>
<summary>operator/internal/controller/opsEndpointslices.go (3)</summary><blockquote>

`7-7`: **LGTM! Configuration package imported.**

The import enables dynamic configuration for EndpointSlice management.

---

`22-22`: **LGTM! Dynamic resolver service name.**

The change uses environment-driven configuration instead of hard-coded service name.

---

`100-100`: **LGTM! Dynamic reverse proxy port.**

The change correctly uses the resolver's reverse proxy port from configuration for EndpointSlice creation.

</blockquote></details>
<details>
<summary>operator/internal/informer/informer.go (1)</summary><blockquote>

`90-102`: **LGTM: resolver informer now honors env-driven namespace/name.**

Switch to config.GetResolverConfig() removes hard-coded values and unblocks multi-namespace installs.


Please ensure the Helm chart always sets ELASTI_RESOLVER_NAMESPACE and ELASTI_RESOLVER_DEPLOYMENT_NAME in all templates, including test charts/manifests.

</blockquote></details>
<details>
<summary>charts/elasti/templates/deployment.yaml (2)</summary><blockquote>

`43-44`: **Good consolidation via elasti.commonEnvValues.**

Environment values are now sourced centrally; reduces drift between components.

---

`128-129`: **Env consolidation for resolver looks correct.**

Ports and names now flow from values; aligns with runtime config in code.


Confirm that values.yaml sets sensible defaults for both service.port and reverseProxyService.port consistent with resolver/cmd/main.go.

</blockquote></details>
<details>
<summary>charts/elasti/templates/_helpers.tpl (1)</summary><blockquote>

`56-77`: **Helper looks solid; cluster domain is quoted.**

Central set of envs enables install-anywhere goal and avoids name coupling.


If you foresee operator and resolver living in different namespaces, consider making ELASTI_RESOLVER_NAMESPACE independently configurable (not just .Release.Namespace).

</blockquote></details>
<details>
<summary>operator/internal/controller/opsInformer.go (1)</summary><blockquote>

`65-66`: **Include resolver namespace in deletion log.**

Add namespace to aid multi-namespace debugging. This was requested earlier.


Apply this diff:

```diff
-            r.Logger.Warn("Resolver deployment deleted", zap.String("deployment_name", config.GetResolverConfig().DeploymentName))
+            rc := config.GetResolverConfig()
+            r.Logger.Warn("Resolver deployment deleted",
+                zap.String("deployment_name", rc.DeploymentName),
+                zap.String("namespace", rc.Namespace))
resolver/internal/operator/RPCClient.go (1)

38-46: Make operator host FQDN-safe and set HTTP client timeout.

  • Current code always appends “..svc.” which breaks if ServiceName is already FQDN.
  • http.Client has no timeout; network issues can hang indefinitely.

Apply this diff:

-    operatorConfig := config.GetOperatorConfig()
-    operatorHost := operatorConfig.ServiceName + "." + operatorConfig.Namespace + ".svc." + config.GetKubernetesClusterDomain()
-    operatorHostPort := net.JoinHostPort(operatorHost, strconv.Itoa(int(operatorConfig.Port)))
+    operatorConfig := config.GetOperatorConfig()
+    host := operatorConfig.ServiceName
+    if !strings.Contains(host, ".") {
+        cd := config.GetKubernetesClusterDomain()
+        host = fmt.Sprintf("%s.%s.svc.%s", host, operatorConfig.Namespace, cd)
+    }
+    operatorHostPort := net.JoinHostPort(host, strconv.Itoa(int(operatorConfig.Port)))
@@
-        client:                  http.Client{},
+        client:                  http.Client{Timeout: 5 * time.Second},

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
operator/internal/informer/informer.go (3)

5-10: Prepare imports for label-based watch

If you adopt label selection (next comment), add:

 import (
   "context"
   "errors"
   "fmt"
   "os"
   "strings"
   "sync"
   "time"
   "truefoundry/elasti/operator/internal/prom"
 
   "github.com/truefoundry/elasti/pkg/values"
   "go.uber.org/zap"
+  "k8s.io/apimachinery/pkg/labels"
   metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

102-117: Use label selector instead of name selector to tolerate Helm release name changes

The chart labels the resolver deployment with app: elasti-resolver. Watching by label is resilient across release names.

 m.resolver.Informer = cache.NewSharedInformer(
   &cache.ListWatch{
     ListFunc: func(_ metav1.ListOptions) (kRuntime.Object, error) {
-      return m.dynamicClient.Resource(deploymentGVR).Namespace(resolverNamespace).List(context.Background(), metav1.ListOptions{
-        FieldSelector: "metadata.name=" + resolverDeploymentName,
-      })
+      return m.dynamicClient.Resource(deploymentGVR).Namespace(resolverNamespace).List(
+        context.Background(),
+        metav1.ListOptions{
+          LabelSelector: labels.SelectorFromSet(labels.Set{"app": "elasti-resolver"}).String(),
+        },
+      )
     },
     WatchFunc: func(_ metav1.ListOptions) (watch.Interface, error) {
-      return m.dynamicClient.Resource(deploymentGVR).Namespace(resolverNamespace).Watch(context.Background(), metav1.ListOptions{
-        FieldSelector: "metadata.name=" + resolverDeploymentName,
-      })
+      return m.dynamicClient.Resource(deploymentGVR).Namespace(resolverNamespace).Watch(
+        context.Background(),
+        metav1.ListOptions{
+          LabelSelector: labels.SelectorFromSet(labels.Set{"app": "elasti-resolver"}).String(),
+        },
+      )
     },
   },
   &unstructured.Unstructured{},
   m.resyncPeriod,
 )

128-133: Fix potential nil dereference in error log

m.resolver.Req is never set in InitializeResolverInformer; logging via m.getKeyFromRequestWatch(m.resolver.Req) can panic on sync failure.

- if !cache.WaitForCacheSync(m.resolver.StopCh, m.resolver.Informer.HasSynced) {
-   m.logger.Error("Failed to sync informer", zap.String("key", m.getKeyFromRequestWatch(m.resolver.Req)))
+ if !cache.WaitForCacheSync(m.resolver.StopCh, m.resolver.Informer.HasSynced) {
+   m.logger.Error("Failed to sync resolver informer",
+     zap.String("deployment", resolverDeploymentName),
+     zap.String("namespace", resolverNamespace),
+   )
   return errors.New("failed to sync resolver informer")
 }
operator/internal/controller/elastiservice_controller.go (1)

71-131: Hard-coded resolver identifiers still present – fix incomplete

A repository-wide search shows numerous hard-coded uses of “elasti-resolver” and “elasti-resolver-service” across both code and Helm charts. This will break when users override the Helm release name or deploy into non-default namespaces:

• operator/internal/controller/elastiservice_controller.go
– constants resolverDeploymentName = "elasti-resolver", resolverServiceName = "elasti-resolver-service" (lines 48–49)
• operator/internal/controller/opsDeployment.go
– comparison against resolverDeploymentName (line 45)
• operator/internal/controller/opsInformer.go
– log warning using resolverDeploymentName (line 64)
• operator/internal/controller/opsEndpointslices.go
– client.MatchingLabels with kubernetes.io/service-name: resolverServiceName (line 21)
• operator/internal/informer/informer.go
– constants and FieldSelector usage tied to resolverDeploymentName (lines 34–37, 106–112)
• charts/elasti/values.yaml
– image repository/tag locked to tfy-images/elasti-resolver:0.1.15
• charts/elasti/templates/{resolver-service.yaml,deployment.yaml,resolver-metrics-monitor.yaml}
– selector and label templates fixed to app: elasti-resolver or app.kubernetes.io/name: elasti-resolver

Until these identifiers are parameterized (e.g. injected from .Release.Name via Helm templates or derived from the CRD/Environment), the operator will not correctly detect resolver restarts in arbitrary namespaces or under custom release names.

Additionally, no RBAC rules for apps/deployments or discovery.k8s.io/endpointslices were found—ensure you include:

// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch
// +kubebuilder:rbac:groups=discovery.k8s.io,resources=endpointslices,verbs=get;list;watch

Please refactor these constants into configurable parameters and add the missing RBAC rules before marking #175 as resolved.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dd07422 and 2051213.

📒 Files selected for processing (3)
  • charts/elasti/templates/deployment.yaml (1 hunks)
  • operator/internal/controller/elastiservice_controller.go (2 hunks)
  • operator/internal/informer/informer.go (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
  • GitHub Check: Run Kuttl E2E Tests
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
charts/elasti/templates/deployment.yaml (1)

36-39: Good: namespace via downward API wired for operator

ELASTI_POD_NAMESPACE from metadata.namespace is the right approach for “install anywhere.”

@ramantehlan
Copy link
Collaborator

Hi @rethil,

Do you think this issue: #182 is part of this scope?
Maybe not namespace, but support any name too.

@rethil
Copy link
Contributor Author

rethil commented Aug 27, 2025

Hi @rethil,

Do you think this issue: #182 is part of this scope? Maybe not namespace, but support any name too.

Hey @ramantehlan,
If you are fine with solution proposed by coderabbit rather than going with ConfigMap, then I could scope that in along with rest of constants.

@ramantehlan
Copy link
Collaborator

Hey @rethil,

Thank you for adding it to the scope!
Yes, putting the resources names to env sounds better to me. We don't need to use ConfigMap for this.

@ramantehlan ramantehlan moved this from Todo to In Progress in KubeElasti Roadmap Sep 4, 2025
@ramantehlan
Copy link
Collaborator

@rethil is this PR ready to review?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
operator/internal/informer/informer.go (2)

102-114: Avoid name coupling in resolver watch; use label selector or env override.

Watching a fixed name breaks with non-default Helm release names. Prefer label-based watch on the resolver, or read the name from env.

Option A (label-based; matches chart label app: elasti-resolver):

-      ListFunc: func(_ metav1.ListOptions) (kRuntime.Object, error) {
-        return m.dynamicClient.Resource(deploymentGVR).Namespace(resolverNamespace).List(context.Background(), metav1.ListOptions{
-          FieldSelector: "metadata.name=" + resolverDeploymentName,
-        })
-      },
-      WatchFunc: func(_ metav1.ListOptions) (watch.Interface, error) {
-        return m.dynamicClient.Resource(deploymentGVR).Namespace(resolverNamespace).Watch(context.Background(), metav1.ListOptions{
-          FieldSelector: "metadata.name=" + resolverDeploymentName,
-        })
-      },
+      ListFunc: func(_ metav1.ListOptions) (kRuntime.Object, error) {
+        return m.dynamicClient.Resource(deploymentGVR).Namespace(resolverNamespace).List(context.Background(), metav1.ListOptions{
+          LabelSelector: "app=elasti-resolver",
+        })
+      },
+      WatchFunc: func(_ metav1.ListOptions) (watch.Interface, error) {
+        return m.dynamicClient.Resource(deploymentGVR).Namespace(resolverNamespace).Watch(context.Background(), metav1.ListOptions{
+          LabelSelector: "app=elasti-resolver",
+        })
+      },

Option B (env override): keep FieldSelector but read resolverDeploymentName from ELASTI_RESOLVER_DEPLOYMENT_NAME (chart to set it).


128-131: Nil deref on error path (m.resolver.Req is nil).

On sync failure, logging calls getKeyFromRequestWatch(m.resolver.Req) dereferences nil.

Apply this diff:

-  m.logger.Error("Failed to sync informer", zap.String("key", m.getKeyFromRequestWatch(m.resolver.Req)))
+  m.logger.Error("Failed to sync resolver informer",
+    zap.String("deployment", resolverDeploymentName),
+    zap.String("namespace", resolverNamespace))
operator/internal/controller/elastiservice_controller.go (2)

159-168: Initialize(): ensure resolver watch covers non-default release names.

If resolver name differs (non-default release), current watch will miss events. Confirm label-based watch or env-provided names are wired end-to-end.

I can produce a small e2e check list/manifests to validate: install chart with a custom release name and non-elasti namespace, restart resolver, verify EndpointSlice IP updates.


58-61: Add RBAC for apps/Deployments (EndpointSlices already present)

operator/internal/controller/elastiservice_controller.go (around lines 58–61): add a kubebuilder RBAC marker for apps/deployments; endpointSlices are already declared in operator/config/rbac/role_additional.yaml.

 //+kubebuilder:rbac:groups=elasti.truefoundry.com,resources=elastiservices/finalizers,verbs=update
+//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch
♻️ Duplicate comments (4)
charts/elasti/templates/deployment.yaml (1)

36-39: Propagate resolver resource names to avoid name coupling (align with non-default release names).

Expose resolver Deployment/Service names so code can read them (or prefer label-based discovery). This prevents breakage when release name changes.

Apply this diff:

         - name: ELASTI_POD_NAMESPACE
           valueFrom:
             fieldRef:
               fieldPath: metadata.namespace
+        - name: ELASTI_RESOLVER_DEPLOYMENT_NAME
+          value: {{ include "elasti.fullname" . }}-resolver
+        - name: ELASTI_RESOLVER_SERVICE_NAME
+          value: {{ include "elasti.fullname" . }}-resolver-service
operator/internal/informer/informer.go (1)

39-42: Fail-fast, centralized namespace resolution (avoid empty env, remove duplication).

Resolve operator namespace once via an internal config helper with SA-file fallback; reuse across packages.

Apply this diff:

-import (
+import (
   "context"
   "errors"
   "fmt"
-	"os"
   "strings"
   "sync"
   "time"
   "truefoundry/elasti/operator/internal/prom"
+  "truefoundry/elasti/operator/internal/config"
-var (
-	// TODO: move this to configmap as const above
-	resolverNamespace = os.Getenv("ELASTI_POD_NAMESPACE")
-)
+var (
+	// TODO: move this to configmap as const above
+	resolverNamespace = config.GetOperatorNamespace()
+)
operator/internal/controller/elastiservice_controller.go (2)

6-6: Replace direct os import with shared config helper.

Apply this diff:

-import (
+import (
   "context"
   "fmt"
-  "os"
   "sync"
   "time"
+  "truefoundry/elasti/operator/internal/config"

53-56: Centralize and validate operator namespace (fail-fast, no duplication).

Apply this diff:

-var (
-  // TODO: move this to configmap as const above
-  resolverNamespace = os.Getenv("ELASTI_POD_NAMESPACE")
-)
+var (
+  // TODO: move this to configmap as const above
+  resolverNamespace = config.GetOperatorNamespace()
+)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2051213 and bcc5151.

📒 Files selected for processing (3)
  • charts/elasti/templates/deployment.yaml (1 hunks)
  • operator/internal/controller/elastiservice_controller.go (2 hunks)
  • operator/internal/informer/informer.go (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
  • GitHub Check: Run Kuttl E2E Tests
  • GitHub Check: Analyze (go)

@rethil
Copy link
Contributor Author

rethil commented Sep 14, 2025

@rethil is this PR ready to review?

@ramantehlan not yet, test 02 fails and need to debug it

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
tests/e2e/Makefile (1)

72-77: Fix race: background installs need a single shell with wait

apply-keda and apply-ingress run in the background but the recipe doesn’t wait before proceeding to apply-elasti, causing nondeterministic failures. Use one shell and wait.

Apply this diff:

-apply-deps-parallel:
-	@$(MAKE) apply-keda &
-	@$(MAKE) apply-ingress &
-	@$(MAKE) apply-prometheus
-	@echo "All dependencies installed successfully."
+apply-deps-parallel:
+	@set -e; \
+	  $(MAKE) apply-keda & \
+	  $(MAKE) apply-ingress & \
+	  $(MAKE) apply-prometheus; \
+	  wait; \
+	  echo "All dependencies installed successfully."
operator/internal/elastiserver/elastiServer.go (1)

54-67: Validate port range upfront

Guard against misconfiguration (e.g., 0 or >65535). Fail fast with a clear error.

Apply this diff:

 func (s *Server) Start(port int) error {
+	if port <= 0 || port > 65535 {
+		return fmt.Errorf("invalid port: %d (must be 1..65535)", port)
+	}
 	mux := http.NewServeMux()
pkg/config/config.go (1)

1-62: Avoid repeated env parsing; add port range checks

Repeated os.Getenv/strconv.Atoi on every call is unnecessary. Cache per struct (resolver/operator) with sync.Once and validate ports.

Apply this diff:

 package config

 import (
 	"os"
 	"strconv"
+	"sync"
 )
 
 type Config struct {
 	Namespace      string
 	DeploymentName string
 	ServiceName    string
 	Port           int
 }
 
 type ResolverConfig struct {
 	Config
 
 	ReverseProxyPort int
 }
 
+var (
+	onceResolver  sync.Once
+	cachedResolver ResolverConfig
+	onceOperator  sync.Once
+	cachedOperator Config
+)
+
 func GetResolverConfig() ResolverConfig {
-	return ResolverConfig{
-		Config: Config{
-			Namespace:      getEnvStringOrPanic("ELASTI_RESOLVER_NAMESPACE"),
-			DeploymentName: getEnvStringOrPanic("ELASTI_RESOLVER_DEPLOYMENT_NAME"),
-			ServiceName:    getEnvStringOrPanic("ELASTI_RESOLVER_SERVICE_NAME"),
-			Port:           getEnvIntOrPanic("ELASTI_RESOLVER_PORT"),
-		},
-
-		ReverseProxyPort: getEnvIntOrPanic("ELASTI_RESOLVER_PROXY_PORT"),
-	}
+	onceResolver.Do(func() {
+		port := getEnvIntOrPanic("ELASTI_RESOLVER_PORT")
+		if port <= 0 || port > 65535 {
+			panic("invalid ELASTI_RESOLVER_PORT")
+		}
+		rp := getEnvIntOrPanic("ELASTI_RESOLVER_PROXY_PORT")
+		if rp <= 0 || rp > 65535 {
+			panic("invalid ELASTI_RESOLVER_PROXY_PORT")
+		}
+		cachedResolver = ResolverConfig{
+			Config: Config{
+				Namespace:      getEnvStringOrPanic("ELASTI_RESOLVER_NAMESPACE"),
+				DeploymentName: getEnvStringOrPanic("ELASTI_RESOLVER_DEPLOYMENT_NAME"),
+				ServiceName:    getEnvStringOrPanic("ELASTI_RESOLVER_SERVICE_NAME"),
+				Port:           port,
+			},
+			ReverseProxyPort: rp,
+		}
+	})
+	return cachedResolver
 }
 
 func GetOperatorConfig() Config {
-	return Config{
-		Namespace:      getEnvStringOrPanic("ELASTI_OPERATOR_NAMESPACE"),
-		DeploymentName: getEnvStringOrPanic("ELASTI_OPERATOR_DEPLOYMENT_NAME"),
-		ServiceName:    getEnvStringOrPanic("ELASTI_OPERATOR_SERVICE_NAME"),
-		Port:           getEnvIntOrPanic("ELASTI_OPERATOR_PORT"),
-	}
+	onceOperator.Do(func() {
+		port := getEnvIntOrPanic("ELASTI_OPERATOR_PORT")
+		if port <= 0 || port > 65535 {
+			panic("invalid ELASTI_OPERATOR_PORT")
+		}
+		cachedOperator = Config{
+			Namespace:      getEnvStringOrPanic("ELASTI_OPERATOR_NAMESPACE"),
+			DeploymentName: getEnvStringOrPanic("ELASTI_OPERATOR_DEPLOYMENT_NAME"),
+			ServiceName:    getEnvStringOrPanic("ELASTI_OPERATOR_SERVICE_NAME"),
+			Port:           port,
+		}
+	})
+	return cachedOperator
 }
operator/internal/controller/opsEndpointslices.go (1)

113-119: Update fails: resourceVersion missing on new object

You’re calling Update with a freshly constructed EndpointSlice (no resourceVersion). Either mutate the fetched sliceToResolver and Update it, or copy resourceVersion.

Minimal fix:

-	if isResolverSliceFound {
-		if err := r.Update(ctx, newEndpointSlice); err != nil {
+	if isResolverSliceFound {
+		newEndpointSlice.ResourceVersion = sliceToResolver.ResourceVersion
+		if err := r.Update(ctx, newEndpointSlice); err != nil {

Prefer mutating the existing object:

-	sliceToResolver := &networkingv1.EndpointSlice{}
+	sliceToResolver := &networkingv1.EndpointSlice{}
 ...
-	newEndpointSlice := &networkingv1.EndpointSlice{
+	newEndpointSlice := &networkingv1.EndpointSlice{
 ...
-	if isResolverSliceFound {
-		if err := r.Update(ctx, newEndpointSlice); err != nil {
+	if isResolverSliceFound {
+		sliceToResolver.Labels = newEndpointSlice.Labels
+		sliceToResolver.AddressType = newEndpointSlice.AddressType
+		sliceToResolver.Ports = newEndpointSlice.Ports
+		sliceToResolver.Endpoints = newEndpointSlice.Endpoints
+		if err := r.Update(ctx, sliceToResolver); err != nil {
operator/internal/controller/opsInformer.go (1)

71-76: Use the ElastiService namespace in the metrics key (not resolver namespace).

The key should reflect the public Service’s actual namespace to avoid cross-namespace key collisions and misleading metrics. Use req.Namespace.

Apply this diff:

-    Namespace:    config.GetResolverConfig().Namespace,
+    Namespace:    req.Namespace,
resolver/internal/operator/RPCClient.go (1)

98-100: Avoid logging resp.Body; log the request URL or omit body.

resp.Body here is an io.ReadCloser; logging it prints a pointer and may tempt future reads after close.

Apply this diff:

-  o.logger.Info("Request sent to controller", zap.Int("statusCode", resp.StatusCode), zap.Any("body", resp.Body))
+  o.logger.Info("Request sent to controller", zap.Int("statusCode", resp.StatusCode), zap.String("url", url))
operator/internal/informer/informer.go (2)

118-121: Potential nil deref when logging key on sync failure.

m.resolver.Req can be nil in InitializeResolverInformer; getKeyFromRequestWatch(nil) will panic.

Apply this defensive change:

-  m.logger.Error("Failed to sync informer", zap.String("key", m.getKeyFromRequestWatch(m.resolver.Req)))
+  key := ""
+  if m.resolver.Req != nil {
+    key = m.getKeyFromRequestWatch(m.resolver.Req)
+  }
+  m.logger.Error("Failed to sync informer", zap.String("key", key))

161-169: Avoid slicing without bounds check; use strings.HasPrefix.

key[:len(crdName)] panics if key length < len(crdName).

Apply this diff:

-      // Check if key starts with the crdName
-      if key.(string)[:len(crdName)] == crdName {
+      // Check if key starts with the crdName
+      if strings.HasPrefix(key.(string), crdName) {

Remember to import strings at top if not already.

♻️ Duplicate comments (1)
operator/internal/informer/informer.go (1)

90-103: Good move to centralized resolver config.

Replaces hard-coded resolver identifiers with config-driven values as previously requested.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcc5151 and ce8c83d.

📒 Files selected for processing (14)
  • charts/elasti/templates/_helpers.tpl (1 hunks)
  • charts/elasti/templates/deployment.yaml (2 hunks)
  • operator/cmd/main.go (2 hunks)
  • operator/internal/controller/elastiservice_controller.go (0 hunks)
  • operator/internal/controller/opsDeployment.go (2 hunks)
  • operator/internal/controller/opsEndpointslices.go (3 hunks)
  • operator/internal/controller/opsInformer.go (2 hunks)
  • operator/internal/elastiserver/elastiServer.go (2 hunks)
  • operator/internal/informer/informer.go (2 hunks)
  • pkg/config/config.go (1 hunks)
  • resolver/cmd/main.go (3 hunks)
  • resolver/internal/operator/RPCClient.go (2 hunks)
  • tests/e2e/Makefile (3 hunks)
  • tests/e2e/kind-config.yaml (1 hunks)
💤 Files with no reviewable changes (1)
  • operator/internal/controller/elastiservice_controller.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-09T11:01:35.692Z
Learnt from: ramantehlan
PR: truefoundry/KubeElasti#184
File: tests/e2e/manifest/global/values-elasti.yaml:19-21
Timestamp: 2025-09-09T11:01:35.692Z
Learning: Test files (like those in tests/e2e/manifest/) can use hardcoded image tags like "v1alpha1" for consistency and predictability in test environments, even when production code might require dynamic tag injection.

Applied to files:

  • tests/e2e/kind-config.yaml
🧬 Code graph analysis (7)
operator/internal/controller/opsDeployment.go (1)
pkg/config/config.go (1)
  • GetResolverConfig (21-32)
operator/internal/informer/informer.go (1)
pkg/config/config.go (1)
  • GetResolverConfig (21-32)
operator/cmd/main.go (1)
pkg/config/config.go (1)
  • GetOperatorConfig (34-41)
operator/internal/controller/opsInformer.go (3)
pkg/config/config.go (1)
  • GetResolverConfig (21-32)
operator/internal/controller/elastiservice_controller.go (1)
  • ElastiServiceReconciler (31-40)
operator/internal/informer/informer.go (1)
  • KeyParams (337-342)
operator/internal/controller/opsEndpointslices.go (1)
pkg/config/config.go (1)
  • GetResolverConfig (21-32)
resolver/internal/operator/RPCClient.go (1)
pkg/config/config.go (1)
  • GetOperatorConfig (34-41)
resolver/cmd/main.go (1)
pkg/config/config.go (1)
  • GetResolverConfig (21-32)
🪛 GitHub Check: golangci-lint (/home/runner/work/KubeElasti/KubeElasti/resolver)
resolver/internal/operator/RPCClient.go

[failure] 43-43:
host:port in url should be constructed with net.JoinHostPort and not directly with fmt.Sprintf (nosprintfhostport)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
  • GitHub Check: Analyze (go)
  • GitHub Check: Run Kuttl E2E Tests
🔇 Additional comments (9)
charts/elasti/templates/deployment.yaml (2)

43-45: LGTM: centralized env injection for operator

The include reduces duplication and aligns with the config package.


126-128: LGTM: centralized env injection for resolver

Env names are present only in charts/elasti/templates/helpers.tpl (lines ~57–75); no other hard-coded ELASTI* or KUBERNETES_CLUSTER_DOMAIN occurrences in charts/elasti/templates.

operator/internal/controller/opsDeployment.go (1)

46-48: No change required — informer already namespaced

Informer is created with .Namespace(resolverConfig.Namespace) plus a FieldSelector for the deployment name, so events are already limited to that namespace (operator/internal/informer/informer.go:94–101).

tests/e2e/Makefile (1)

6-6: Verified — Istio 1.26.4 supports Kubernetes v1.32.
Istio 1.26.x (including 1.26.4) officially supports Kubernetes 1.29–1.32, so the pinned kind v1.32.0 is supported.

tests/e2e/kind-config.yaml (1)

10-12: Pinned kind node image digest — verified

sha256:c48c62eac5da28cdadcf560d1d8616cfa6783b58f0d94cf63ad1bf49600cb027 is listed for kindest/node:v1.32.0 in the kind v0.26.0 release notes (kubernetes-sigs/kind on GitHub).

Cross-check the Istio/Kubernetes support matrix for v1.32 compatibility before merging.

operator/internal/elastiserver/elastiServer.go (1)

54-55: All elastiServer Start call sites use an int port

operator/cmd/main.go:193 calls eServer.Start(config.GetOperatorConfig().Port); pkg/config/config.go defines Port as int. No Start calls passing string ports were found.

operator/internal/controller/opsEndpointslices.go (1)

95-102: Port/name consistency check — verify Service port name matches resolver

EndpointSlice sets Port from resolver config while Name is taken from service.Spec.Ports[0].Name; if they differ, clients that select ports by name can be routed to the wrong port. Ensure the Service’s first port name equals the name clients expect (or make the EndpointSlice use a matching name or numeric port).

File: operator/internal/controller/opsEndpointslices.go Lines: 95-102

		Ports: []networkingv1.EndpointPort{
			{
				Name:     ptr.To(service.Spec.Ports[0].Name),
				Protocol: ptr.To(v1.ProtocolTCP),
				// Make this dynamic too
				Port: ptr.To(int32(config.GetResolverConfig().Port)),
			},
		},
resolver/cmd/main.go (1)

113-141: LGTM: ports sourced from resolver config.

Dynamic reverse proxy and internal ports align with the PR’s “install anywhere” goal.

operator/cmd/main.go (1)

193-197: LGTM — ELASTI_OPERATOR_PORT Helm wiring is consistent.

ELASTI_OPERATOR_PORT is set from .Values.elastiController.service.port in charts/elasti/templates/_helpers.tpl; the operator Service (charts/elasti/templates/operator-controller-service.yaml) targetPort and the Deployment containerPort (charts/elasti/templates/deployment.yaml) reference the same value (charts/elasti/values.yaml: elastiController.service.port = 8013).

@ramantehlan
Copy link
Collaborator

@rethil is this PR ready to review?

@ramantehlan not yet, test 02 fails and need to debug it

yes, saw that, it get 502 when sending traffic. It doesn't happens always, but sometimes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
operator/config/crd/bases/elasti.truefoundry.com_elastiservices.yaml (4)

77-80: scaleTargetRef.kind enum should use Kubernetes Kind names, not plural/lowercase resource paths.

Use singular, PascalCase: Deployment/Rollout. Current values will reject valid CRs or force incorrect values.

-                  kind:
-                    enum:
-                    - deployments
-                    - rollouts
-                    type: string
+                  kind:
+                    enum:
+                    - Deployment
+                    - Rollout
+                    type: string

63-65: minTargetReplicas minimum=1 blocks scale‑to‑zero.

Issue #175 relies on scaling to 0. Allow 0 to avoid schema‑level rejection.

-              minTargetReplicas:
-                format: int32
-                minimum: 1
+              minTargetReplicas:
+                format: int32
+                minimum: 0
                 type: integer

89-91: triggers[].metadata lacks type: object; violates structural schema.

For x-kubernetes-preserve-unknown-fields=true, type: object is required.

-                    metadata:
-                      x-kubernetes-preserve-unknown-fields: true
+                    metadata:
+                      type: object
+                      x-kubernetes-preserve-unknown-fields: true

84-85: Validate service as a DNS‑1123 label (Service name).

Add regex and length to prevent bad inputs; keeps UX tight.

               service:
-                type: string
+                description: Kubernetes Service name in the same namespace (DNS-1123 label, max 63 chars).
+                type: string
+                pattern: '^[a-z0-9]([-a-z0-9]*[a-z0-9])?$'
+                maxLength: 63
♻️ Duplicate comments (2)
charts/elasti/templates/_helpers.tpl (1)

53-77: Excellent centralization of environment configuration.

The new elasti.commonEnvValues template centralizes all configuration environment variables in one place, which is exactly what's needed to support installation in any namespace. The template covers all necessary components (operator, resolver, ports, namespaces).

However, the cluster domain value should be quoted to avoid YAML parsing issues:

- value: {{ .Values.global.kubernetesClusterDomain }}
+ value: {{ .Values.global.kubernetesClusterDomain | quote }}
resolver/internal/operator/RPCClient.go (1)

38-44: Excellent network and configuration improvements!

The code correctly uses net.JoinHostPort to construct the URL, which resolves the linting issue and handles IPv6 addresses properly. The dynamic configuration via config.GetOperatorConfig() enables namespace-flexible operations.

However, consider adding a timeout to the HTTP client to prevent indefinite hangs:

-	client:                  http.Client{},
+	client:                  http.Client{Timeout: 30 * time.Second},
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce8c83d and 9128a7a.

📒 Files selected for processing (9)
  • charts/elasti/templates/_helpers.tpl (1 hunks)
  • charts/elasti/templates/elastiservice-crd.yaml (3 hunks)
  • operator/Makefile (1 hunks)
  • operator/cmd/main.go (2 hunks)
  • operator/config/crd/bases/elasti.truefoundry.com_elastiservices.yaml (2 hunks)
  • operator/internal/controller/elastiservice_controller_test.go (2 hunks)
  • operator/internal/controller/opsEndpointslices.go (3 hunks)
  • pkg/config/config.go (1 hunks)
  • resolver/internal/operator/RPCClient.go (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-14T12:33:30.782Z
Learnt from: rethil
PR: truefoundry/KubeElasti#178
File: operator/internal/controller/opsEndpointslices.go:21-23
Timestamp: 2025-09-14T12:33:30.782Z
Learning: In the KubeElasti project, the resolver component in `getIPsForResolver()` method must list EndpointSlices across all namespaces, not restricted to its own namespace. This is by design and required for proper resolver functionality.

Applied to files:

  • operator/internal/controller/opsEndpointslices.go
🧬 Code graph analysis (4)
operator/internal/controller/opsEndpointslices.go (1)
pkg/config/config.go (1)
  • GetResolverConfig (33-44)
operator/internal/controller/elastiservice_controller_test.go (1)
pkg/config/config.go (5)
  • ENV_RESOLVER_NAMESPACE (16-16)
  • ENV_RESOLVER_DEPLOYMENT_NAME (17-17)
  • ENV_RESOLVER_SERVICE_NAME (18-18)
  • ENV_RESOLVER_PORT (19-19)
  • ENV_RESOLVER_PROXY_PORT (20-20)
operator/cmd/main.go (1)
pkg/config/config.go (1)
  • GetOperatorConfig (46-53)
resolver/internal/operator/RPCClient.go (1)
pkg/config/config.go (1)
  • GetOperatorConfig (46-53)
🪛 GitHub Check: golangci-lint (/home/runner/work/KubeElasti/KubeElasti/pkg)
pkg/config/config.go

[failure] 18-18:
var-naming: don't use ALL_CAPS in Go names; use CamelCase (revive)


[failure] 17-17:
var-naming: don't use ALL_CAPS in Go names; use CamelCase (revive)


[failure] 16-16:
var-naming: don't use ALL_CAPS in Go names; use CamelCase (revive)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
  • GitHub Check: Run Kuttl E2E Tests
  • GitHub Check: Analyze (go)
🔇 Additional comments (13)
operator/Makefile (1)

152-152: LGTM! Controller-tools version upgrade is consistent.

The version bump from v0.14.0 to v0.19.0 matches the CRD files and ensures consistency across the toolchain.

charts/elasti/templates/elastiservice-crd.yaml (2)

6-6: LGTM! Kubebuilder annotation upgrade is consistent.

The version upgrade from v0.14.0 to v0.19.0 matches the operator CRD and maintains consistency.


100-102: CRD validation improvement is well-structured.

The addition of top-level required fields for scaleTargetRef and service strengthens the API contract and ensures proper validation.

pkg/config/config.go (2)

33-53: LGTM! Clean configuration API design.

The configuration functions provide a clean, centralized way to access runtime configuration. The panic-on-missing-env approach is appropriate for required configuration values.


55-73: LGTM! Robust environment variable handling.

The helper functions properly validate environment variables and provide clear error messages. The int32 parsing with proper error handling is well-implemented.

operator/internal/controller/opsEndpointslices.go (3)

7-7: LGTM! Proper use of centralized configuration.

The import of the config package enables dynamic configuration resolution instead of hard-coded values.


22-22: LGTM! Dynamic service name resolution is correct.

Using config.GetResolverConfig().ServiceName instead of hard-coded values enables namespace-agnostic operations. The cross-namespace listing is intentional as per the retrieved learning.


100-100: LGTM! Dynamic port configuration implemented correctly.

Using config.GetResolverConfig().Port allows the port to be configured at runtime rather than being hard-coded, supporting the goal of namespace-flexible deployments.

resolver/internal/operator/RPCClient.go (1)

7-7: LGTM! Proper imports for network configuration.

The addition of net and config imports enables proper host:port construction and dynamic configuration.

Also applies to: 14-14

operator/cmd/main.go (2)

29-29: LGTM! Proper import of configuration package.

The addition of the config package import enables dynamic port configuration.


193-193: LGTM! Dynamic port configuration implemented correctly.

Using config.GetOperatorConfig().Port instead of a hard-coded port enables flexible deployment in any namespace. The port formatting is appropriate for the server startup.

operator/internal/controller/elastiservice_controller_test.go (1)

19-39: Incorrect — imports match declared module paths; no unification required

go.mod shows separate modules: truefoundry/elasti/operator, github.com/truefoundry/elasti/pkg, and github.com/truefoundry/elasti/resolver. The test imports (github.com/truefoundry/elasti/pkg/config and truefoundry/elasti/operator/api/v1alpha1) align with those module declarations, so the mixed prefixes are intentional and correct.

Likely an incorrect or invalid review comment.

operator/config/crd/bases/elasti.truefoundry.com_elastiservices.yaml (1)

6-6: Controller-gen bumped to v0.19.0 — verified, no action required.

operator/Makefile pins CONTROLLER_TOOLS_VERSION ?= v0.19.0 and both operator/config/crd/bases/elasti.truefoundry.com_elastiservices.yaml and charts/elasti/templates/elastiservice-crd.yaml include controller-gen.kubebuilder.io/version: v0.19.0 (CRDs regenerated).

@rethil rethil changed the title operator: support installation in any namespace helm: forward values to elasti trough env variables Sep 14, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
operator/internal/controller/elastiservice_controller_test.go (1)

48-52: Scope and restore env vars to avoid cross-spec leakage; use BeforeAll + DeferCleanup (or GinkgoT().Setenv).

Direct os.Setenv at definition time is process‑global and sticky; this causes flakiness under parallel runs.

Apply:

-		os.Setenv(config.EnvResolverNamespace, namespace)
-		os.Setenv(config.EnvResolverDeploymentName, "resolver-deployment")
-		os.Setenv(config.EnvResolverServiceName, "resolver-service")
-		os.Setenv(config.EnvResolverPort, "1234")
-		os.Setenv(config.EnvResolverProxyPort, "4321")
+		BeforeAll(func() {
+			setEnv := func(key, val string) {
+				old, had := os.LookupEnv(key)
+				Expect(os.Setenv(key, val)).To(Succeed())
+				DeferCleanup(func() {
+					if had {
+						Expect(os.Setenv(key, old)).To(Succeed())
+					} else {
+						Expect(os.Unsetenv(key)).To(Succeed())
+					}
+				})
+			}
+			setEnv(config.EnvResolverNamespace, namespace)
+			setEnv(config.EnvResolverDeploymentName, "resolver-deployment")
+			setEnv(config.EnvResolverServiceName, "resolver-service")
+			setEnv(config.EnvResolverPort, "1234")
+			setEnv(config.EnvResolverProxyPort, "4321")
+		})

If you switch to GinkgoT().Setenv in a BeforeEach, you can drop the os import.

Scan for other unscoped env mutations:

#!/bin/bash
rg -nP --type go -g '*_test.go' 'os\.Setenv\(' -C2
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9128a7a and 5a6c6eb.

📒 Files selected for processing (2)
  • operator/internal/controller/elastiservice_controller_test.go (2 hunks)
  • pkg/config/config.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
operator/internal/controller/elastiservice_controller_test.go (1)
pkg/config/config.go (5)
  • EnvResolverNamespace (16-16)
  • EnvResolverDeploymentName (17-17)
  • EnvResolverServiceName (18-18)
  • EnvResolverPort (19-19)
  • EnvResolverProxyPort (20-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Kuttl E2E Tests
🔇 Additional comments (1)
pkg/config/config.go (1)

15-25: LGTM — env constant names are CamelCase and no stale ENV_ identifiers remain.*
Verified: pkg/config/config.go defines EnvResolver*/EnvOperator* constants and a repo search found no identifiers starting with ENV_.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
operator/internal/controller/elastiservice_controller_test.go (1)

152-183: Prefer GinkgoWriter over fmt.Println for test logs.

Improves integration with Ginkgo’s output and parallel runs.

Apply:

-				fmt.Println("error deleting elastiService: ", err)
+				GinkgoWriter.Printf("error deleting elastiService: %v\n", err)
@@
-				fmt.Println("error deleting deployment: ", err)
+				GinkgoWriter.Printf("error deleting deployment: %v\n", err)
@@
-				fmt.Println("error deleting service: ", err)
+				GinkgoWriter.Printf("error deleting service: %v\n", err)
♻️ Duplicate comments (3)
operator/internal/controller/elastiservice_controller_test.go (1)

58-69: Env scoping with DeferCleanup is spot‑on.

Addresses the prior leakage concern; this is the right pattern for Ginkgo v2.

pkg/config/config.go (1)

76-83: Parse error message should include the bad value; fix comment nit.

Current panic hides the offending value; include it for fast diagnosis. Also “unset of invalid” -> “unset or invalid”.

Apply:

-// getEnvPortOrPanic parses env value as tcp port or panics if value is unset of invalid.
+// getEnvPortOrPanic parses env value as TCP port or panics if unset or invalid.
@@
-	port, err := strconv.ParseInt(envValue, 10, 32)
+	port, err := strconv.ParseInt(envValue, 10, 32)
 	if err != nil {
-		panic("required env value is not integer: " + envName)
+		panic(fmt.Sprintf("invalid int32 for %s: %q (%v)", envName, envValue, err))
 	}
resolver/internal/operator/RPCClient.go (1)

38-48: Don't append namespace/domain when service name is already FQDN; set an HTTP timeout.

Helm templates define ELASTI_OPERATOR_SERVICE_NAME / ELASTI_OPERATOR_NAMESPACE / KUBERNETES_CLUSTER_DOMAIN — service name can be an FQDN. Avoid double‑appending namespace/domain and set http.Client.Timeout to prevent hangs. Apply:

@@
-import (
+import (
   "bytes"
   "encoding/json"
   "io"
-  "net"
   "net/http"
   "strconv"
   "time"
 
   "github.com/truefoundry/elasti/resolver/internal/prom"
 
   "github.com/truefoundry/elasti/pkg/config"
   "github.com/truefoundry/elasti/pkg/messages"
   "go.uber.org/zap"
 
   "sync"
+  "net"
+  "strings"
 )
@@
 func NewOperatorClient(logger *zap.Logger, retryDuration time.Duration) *Client {
-  operatorConfig := config.GetOperatorConfig()
-  operatorHost := operatorConfig.ServiceName + "." + operatorConfig.Namespace + ".svc." + config.GetKubernetesClusterDomain()
-  operatorHostPort := net.JoinHostPort(operatorHost, strconv.Itoa(int(operatorConfig.Port)))
+  operatorConfig := config.GetOperatorConfig()
+  host := operatorConfig.ServiceName
+  if !strings.Contains(host, ".") {
+    host = host + "." + operatorConfig.Namespace + ".svc." + config.GetKubernetesClusterDomain()
+  }
+  operatorHostPort := net.JoinHostPort(host, strconv.Itoa(int(operatorConfig.Port)))
@@
   return &Client{
     logger:                  logger.With(zap.String("component", "operatorRPC")),
     retryDuration:           retryDuration,
     operatorURL:             "http://" + operatorHostPort,
     incomingRequestEndpoint: "/informer/incoming-request",
-    client:                  http.Client{},
+    client:                  http.Client{Timeout: retryDuration},
   }
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a6c6eb and 2248c83.

📒 Files selected for processing (3)
  • operator/internal/controller/elastiservice_controller_test.go (2 hunks)
  • pkg/config/config.go (1 hunks)
  • resolver/internal/operator/RPCClient.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
resolver/internal/operator/RPCClient.go (1)
pkg/config/config.go (2)
  • GetOperatorConfig (57-64)
  • GetKubernetesClusterDomain (38-40)
operator/internal/controller/elastiservice_controller_test.go (1)
pkg/config/config.go (5)
  • EnvResolverNamespace (10-10)
  • EnvResolverDeploymentName (11-11)
  • EnvResolverServiceName (12-12)
  • EnvResolverPort (13-13)
  • EnvResolverProxyPort (14-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
  • GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
  • GitHub Check: golangci-lint (/home/runner/work/KubeElasti/KubeElasti/operator)
  • GitHub Check: Analyze (go)
  • GitHub Check: Run Kuttl E2E Tests
🔇 Additional comments (2)
pkg/config/config.go (2)

9-20: LGTM: centralized env config, proper naming, and port range checks.

Good consolidation and CamelCase constants; explicit [1,65535] validation is correct.

Also applies to: 85-87


37-41: Incorrect — Helm chart already defaults the cluster domain; change not required.
charts/elasti/values.yaml sets global.kubernetesClusterDomain: cluster.local and charts/elasti/templates/_helpers.tpl injects it into KUBERNETES_CLUSTER_DOMAIN, so the resolver won't panic when deployed with the chart.

Likely an incorrect or invalid review comment.

@ramantehlan
Copy link
Collaborator

@rethil We have fixed the 02 e2e failure, I think the fix should work.
We can merge that PR in this once that is merged.

Before that, please see if there is any other changes required in this PR. Will merge this ASAP.

@rethil
Copy link
Contributor Author

rethil commented Sep 17, 2025

@rethil We have fixed the 02 e2e failure, I think the fix should work. We can merge that PR in this once that is merged.

Before that, please see if there is any other changes required in this PR. Will merge this ASAP.

Hey @ramantehlan, thats great to hear!
This PR should be ready for review now. Only issue I can see right now is that failing test.

ramantehlan
ramantehlan previously approved these changes Sep 22, 2025
@ramantehlan
Copy link
Collaborator

@rethil The 02 test is failing, let's fix that first before merging

@ramantehlan ramantehlan dismissed their stale review September 22, 2025 01:19

02 test is failing, requesting fix for that.

@rethil
Copy link
Contributor Author

rethil commented Sep 22, 2025

@rethil The 02 test is failing, let's fix that first before merging

@ramantehlan should be fine now

@ramantehlan
Copy link
Collaborator

Thank you @rethil, will have a look.

@ramantehlan ramantehlan merged commit e770d77 into KubeElasti:main Sep 23, 2025
11 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in KubeElasti Roadmap Sep 23, 2025
@ramantehlan ramantehlan mentioned this pull request Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] EndpointSlices are not recreated on Resolver restart

2 participants