Skip to content

Conversation

@julienmancuso
Copy link
Contributor

@julienmancuso julienmancuso commented Oct 9, 2025

Overview:

setup planner serviceaccount in cluster-wide deployment

Summary by CodeRabbit

  • New Features

    • Optional namespace-restricted install mode for the planner with scoped ServiceAccounts and RoleBindings.
    • New configuration flag to set the planner ClusterRole name; helm now conditionally manages cluster-role vs namespace-scoped RBAC.
  • Improvements

    • Automatic, idempotent RBAC provisioning for cluster-wide installs.
    • Standardized labels on created RBAC resources.
  • Tests

    • Comprehensive tests covering RBAC creation, idempotency, multi-namespace behavior, and edge cases.
  • Chores

    • Helm chart dependency bumps and updated chart metadata.

@julienmancuso julienmancuso requested a review from a team as a code owner October 9, 2025 15:07
@julienmancuso
Copy link
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 9, 2025

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 9, 2025

Walkthrough

Adds conditional Helm and operator RBAC handling: templates now choose namespaced or cluster-scoped planner RBAC; operator gains an RBAC manager, a planner-cluster-role flag, wiring into the reconciler, and runtime logic to ensure planner ServiceAccount/RoleBinding in cluster-wide mode.

Changes

Cohort / File(s) Summary of changes
Helm deployment arg toggle
deploy/cloud/helm/platform/components/operator/templates/deployment.yaml
Conditionally injects manager arg --planner-cluster-role-name={{ include "dynamo-operator.fullname" . }}-planner when namespaceRestriction.enabled is false.
Helm planner RBAC templates
deploy/cloud/helm/platform/components/operator/templates/planner.yaml
Adds branching: when namespaceRestriction.enabled is true, emit namespaced ServiceAccount/Role/RoleBinding in {{ .Release.Namespace }} with labels; else emit a ClusterRole {{ include "dynamo-operator.fullname" . }}-planner. Adjusts namespace references and labels.
Operator CLI & wiring
deploy/cloud/operator/cmd/main.go
Adds plannerClusterRoleName flag/variable, imports internal/rbac, constructs NewManager, wires RBACConfig.PlannerClusterRoleName and injects RBACMgr into the reconciler.
Reconciler RBAC step
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go
Adds rbacManager interface and RBACMgr field; in reconcileResources, when cluster-wide mode is in use, calls EnsureServiceAccountWithRBAC(ctx, ns, plannerSA, PlannerClusterRoleName) and wraps/logs errors.
Shared config surface
deploy/cloud/operator/internal/controller_common/predicate.go
Adds exported RBACConfig with PlannerClusterRoleName and adds RBAC RBACConfig to exported Config struct.
RBAC manager implementation
deploy/cloud/operator/internal/rbac/manager.go
New Manager type with NewManager and EnsureServiceAccountWithRBAC to create/ensure ServiceAccount and RoleBinding referring to a ClusterRole; handles Get/Create/Update and immutable RoleRef considerations.
RBAC manager tests
deploy/cloud/operator/internal/rbac/manager_test.go
New tests covering creation, idempotency, update paths, multi-namespace behavior, partial preconditions, and RoleBinding/ServiceAccount validations using a fake client.
Chart & docs dependency bumps
deploy/cloud/helm/platform/Chart.yaml, deploy/cloud/helm/platform/README.md, deploy/cloud/operator/go.mod
Bumped dependencies: kai-scheduler v0.9.2→v0.9.4, grove-charts / grove operator API v0.1.0-alpha.1→v0.1.0-alpha.2; README and go.mod updated accordingly.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Helm
  participant K8s
  participant Operator
  participant Reconciler
  participant RBACMgr

  rect rgba(230,240,255,0.5)
    note right of Helm: Helm install/upgrade
    User->>Helm: helm install/upgrade
    alt namespaceRestriction.enabled == true
      Helm->>K8s: Create SA, Role, RoleBinding (namespaced)
    else
      Helm->>K8s: Create ClusterRole "...-planner"
      note right of Helm: Inject manager arg --planner-cluster-role-name
    end
  end

  rect rgba(240,255,240,0.5)
    note right of Operator: Runtime reconcile flow
    Operator->>Reconciler: Start reconcile(DynamoGraphDeployment)
    alt Cluster-wide mode (namespaceRestriction disabled)
      Reconciler->>RBACMgr: EnsureServiceAccountWithRBAC(ns, planner-SA, clusterRoleName)
      RBACMgr->>K8s: Get/Create ServiceAccount
      RBACMgr->>K8s: Get/Create/Update RoleBinding -> ClusterRole
      RBACMgr-->>Reconciler: success / error
      opt error
        Reconciler-->>Operator: return wrapped error "failed to ensure planner RBAC"
      end
    else Namespaced mode
      note over Reconciler: Skip RBAC ensure (Helm provided namespaced RBAC)
    end
    Reconciler->>K8s: Reconcile existing PVCs (unchanged flow)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit checks the cluster gate,
Flags flipped—roles align with fate.
ServiceAccounts hop in line,
Bindings stitched, reprovisioned fine.
Charts and manager hum: RBAC set—hop on, all's inline. 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description only fills out the Overview section and omits the required Details, Where should the reviewer start, and Related Issues sections defined by the repository template, leaving reviewers without guidance on specific changes, key files, or issue context. Please expand the description to include a Details section outlining the specific changes, a Where should the reviewer start section highlighting critical files, and a Related Issues section linking any relevant issue numbers.
Docstring Coverage ⚠️ Warning Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly summarizes the primary change of configuring the planner ServiceAccount in cluster-wide deployment mode, which matches the modifications in both the Helm templates and controller code. It is clear, concise, and directly relates to the main intent of the pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deploy/cloud/operator/cmd/main.go (1)

187-236: Fail fast when the planner cluster role name is missing in cluster-wide mode.

When we run without --restrictedNamespace, plannerClusterRoleName defaults to "". The reconciler then calls EnsureServiceAccountWithRBAC with an empty cluster role name, and the RoleBinding creation fails (spec.roleRef.name: Required value). Every reconciliation in cluster-wide mode will keep erroring, effectively breaking planner provisioning for default installs that have not yet set the new flag. Please validate the flag after parsing and exit with a clear message before we start the manager.

Apply this diff after flag.Parse():

 	flag.Parse()
 
+	if restrictedNamespace == "" && plannerClusterRoleName == "" {
+		setupLog.Error(nil, "planner-cluster-role-name is required when running cluster-wide")
+		os.Exit(1)
+	}
+
 	// Validate modelExpressURL if provided
 	if modelExpressURL != "" {
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f712653 and adf0ca9.

📒 Files selected for processing (7)
  • deploy/cloud/helm/platform/components/operator/templates/deployment.yaml (1 hunks)
  • deploy/cloud/helm/platform/components/operator/templates/planner.yaml (3 hunks)
  • deploy/cloud/operator/cmd/main.go (5 hunks)
  • deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (3 hunks)
  • deploy/cloud/operator/internal/controller_common/predicate.go (1 hunks)
  • deploy/cloud/operator/internal/rbac/manager.go (1 hunks)
  • deploy/cloud/operator/internal/rbac/manager_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 9, 2025

Walkthrough

Adds namespace-restricted vs cluster-wide planner RBAC modes. Helm templates conditionally render SA/Role/RoleBinding or ClusterRole and pass planner-cluster-role-name to the operator. Operator introduces an RBAC manager, wires it into the DynamoGraphDeploymentReconciler, and ensures planner ServiceAccount and RoleBinding to a ClusterRole during reconcile. Includes tests.

Changes

Cohort / File(s) Summary of changes
Helm: operator deployment args
deploy/cloud/helm/platform/components/operator/templates/deployment.yaml
Adds conditional manager arg to set --planner-cluster-role-name={{ include "dynamo-operator.fullname" . }}-planner when .Values.namespaceRestriction.enabled is false.
Helm: planner RBAC templates
deploy/cloud/helm/platform/components/operator/templates/planner.yaml
Adds conditional RBAC: namespace-restricted mode creates Namespace ServiceAccount/Role/RoleBinding with labels and imagePullSecrets; else creates ClusterRole. Standardizes use of .Release.Namespace and labels.
Operator wiring and config
deploy/cloud/operator/cmd/main.go
Introduces planner-cluster-role-name flag; adds RBAC config with PlannerClusterRoleName; constructs rbac.Manager and injects into DynamoGraphDeploymentReconciler.
Controller: reconcile RBAC step
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go
Adds rbacManager interface and RBACMgr field. In reconcileResources, when not restricted, ensures planner ServiceAccount and RoleBinding to specified ClusterRole before PVC steps; errors surface on failure.
Common config struct
deploy/cloud/operator/internal/controller_common/predicate.go
Adds RBACConfig type and Config.RBAC with PlannerClusterRoleName. No logic changes.
RBAC manager implementation
deploy/cloud/operator/internal/rbac/manager.go
New Manager with NewManager and EnsureServiceAccountWithRBAC(ctx, ns, sa, clusterRoleName) to create/ensure SA and RoleBinding binding SA to ClusterRole; handles create/update and logging.
RBAC manager tests
deploy/cloud/operator/internal/rbac/manager_test.go
Adds tests covering creation, idempotency, subject correction, multi-namespace behavior, partial-resource cases, different ClusterRoles, and edge cases using fake client.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Helm as Helm Charts
  participant K8s as Kubernetes
  participant Oper as Operator (Manager)
  participant Recon as DynamoGraphDeploymentReconciler
  participant RBAC as RBAC Manager

  rect rgb(235, 245, 255)
  note over Helm,K8s: Install/Upgrade
  Helm->>K8s: Render templates
  alt namespaceRestriction.enabled
    Helm-->>K8s: SA/Role/RoleBinding (namespace-scoped)
  else cluster-wide
    Helm-->>K8s: ClusterRole (planner)
    Helm-->>K8s: Operator Deployment arg --planner-cluster-role-name
  end
  end

  rect rgb(240, 255, 240)
  note over Oper,RBAC: Operator startup
  K8s-->>Oper: Start operator pod
  Oper->>Recon: Initialize with Config.RBAC.PlannerClusterRoleName<br/>and RBACMgr
  end

  rect rgb(255, 250, 235)
  note over Recon,RBAC: Reconcile flow (cluster-wide mode)
  Recon->>Recon: Check RestrictedNamespace
  alt not restricted
    Recon->>RBAC: EnsureServiceAccountWithRBAC(ns, plannerSA, clusterRoleName)
    RBAC->>K8s: Get/Create ServiceAccount
    RBAC->>K8s: Get/Create/Update RoleBinding -> ClusterRole
    RBAC-->>Recon: Result (error/success)
    opt on error
      Recon-->>Oper: Requeue/Error
    end
  else restricted
    Recon->>Recon: Skip RBAC ensure
  end
  Recon->>K8s: Continue with resource reconciliation (e.g., PVC)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I thump my paws on cluster ground,
New roles and bindings hop around.
When fences rise, I stay in lane;
When fields are wide, I cross the plain.
A ServiceAccount nibbles clover—
RBAC in place, the warren’s sober. 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description only includes the Overview section and omits the required Details, Where should the reviewer start?, and Related Issues sections specified in the repository’s description template. Please expand the description to include the Details of the changes, call out which files reviewers should focus on, and link any related issues under a Related Issues heading to comply with the template.
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly and concisely describes the primary change of configuring the planner service account for cluster-wide deployments and matches the main focus of the PR without including unrelated details.

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
deploy/cloud/helm/platform/components/operator/templates/planner.yaml (2)

39-42: Add watch (and possibly delete) verbs for robustness.

Planner likely needs watch for informer/list-watch patterns; delete may be needed for cleanup. Align both Role and ClusterRole.

Apply this diff to both blocks:

-  verbs: ["get", "list", "create", "update", "patch"]
+  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

Also applies to: 68-70


26-29: Parity gap: imagePullSecrets only in namespaced mode.

Cluster-wide mode creates SAs via the RBAC manager without imagePullSecrets, which can break private image pulls. Align behavior by either:

  • Adding optional imagePullSecrets handling in the RBAC manager (preferred), or
  • Dropping imagePullSecrets from this SA and ensuring pods specify them directly.
deploy/cloud/operator/internal/rbac/manager_test.go (2)

419-452: Strengthen test for RoleRef immutability handling.

After changing ClusterRole, assert the RoleBinding is replaced (delete+create), not updated in place.

Example adjustment:

// Second call with a different ClusterRole
err = manager.EnsureServiceAccountWithRBAC(ctx, "test-namespace", "test-sa", "cluster-role-2")
if err != nil {
  t.Fatalf("Second call failed: %v", err)
}
// Verify RoleBinding now points to cluster-role-2
rb := &rbacv1.RoleBinding{}
_ = fakeClient.Get(ctx, client.ObjectKey{Namespace: "test-namespace", Name: "test-sa-binding"}, rb)
if rb.RoleRef.Name != "cluster-role-2" {
  t.Errorf("Expected RoleRef name cluster-role-2, got %s", rb.RoleRef.Name)
}

454-483: Add negative test: empty namespace should error.

Once input validation is added in manager.go, assert EnsureServiceAccountWithRBAC returns an error for empty namespace.

Example:

if err == nil {
  t.Fatalf("expected error on empty namespace, got nil")
}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f712653 and adf0ca9.

📒 Files selected for processing (7)
  • deploy/cloud/helm/platform/components/operator/templates/deployment.yaml (1 hunks)
  • deploy/cloud/helm/platform/components/operator/templates/planner.yaml (3 hunks)
  • deploy/cloud/operator/cmd/main.go (5 hunks)
  • deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (3 hunks)
  • deploy/cloud/operator/internal/controller_common/predicate.go (1 hunks)
  • deploy/cloud/operator/internal/rbac/manager.go (1 hunks)
  • deploy/cloud/operator/internal/rbac/manager_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
🔇 Additional comments (2)
deploy/cloud/helm/platform/components/operator/templates/deployment.yaml (1)

127-129: LGTM: planner ClusterRole flag is correctly gated and named.

Matches the ClusterRole name rendered by Helm.

deploy/cloud/operator/internal/controller_common/predicate.go (1)

69-76: Config surface extension looks good.

RBACConfig and PlannerClusterRoleName integrate cleanly.

@julienmancuso
Copy link
Contributor Author

@coderabbitai help

@julienmancuso
Copy link
Contributor Author

@CodeRabbit full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 9, 2025

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
deploy/cloud/operator/internal/rbac/manager.go (1)

8-18: Harden input validation: trim whitespace for names

Guard against whitespace-only values; import strings and use TrimSpace.

 import (
   "context"
   "fmt"
+  "strings"
@@
-  if targetNamespace == "" {
+  if strings.TrimSpace(targetNamespace) == "" {
     return fmt.Errorf("target namespace is required")
   }
-  if serviceAccountName == "" {
+  if strings.TrimSpace(serviceAccountName) == "" {
     return fmt.Errorf("service account name is required")
   }
-  if clusterRoleName == "" {
+  if strings.TrimSpace(clusterRoleName) == "" {
     return fmt.Errorf("cluster role name is required")
   }

Also applies to: 59-67

deploy/cloud/operator/cmd/main.go (1)

188-191: LGTM - Validation correctly implemented.

Based on past review comments, this validation was already reviewed and confirmed to be correct. It ensures that planner-cluster-role-name is required when running in cluster-wide mode (when restrictedNamespace == ""), preventing RoleBindings from being created with an empty RoleRef.

deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)

167-184: LGTM - Cluster-wide RBAC logic correctly implemented.

Based on past review comments, this logic was already reviewed and confirmed to be correct. It includes:

  • ✅ Check for cluster-wide mode (RestrictedNamespace == "")
  • ✅ Nil check for RBACMgr with clear error message
  • ✅ Empty check for PlannerClusterRoleName with validation
  • ✅ Call to EnsureServiceAccountWithRBAC with proper error handling

The guards prevent panics and invalid bindings in cluster-wide mode.

🧹 Nitpick comments (1)
deploy/cloud/operator/internal/rbac/manager.go (1)

93-97: Reconcile required labels on existing SA/RB

Ensure labels converge when objects pre-exist; update only required keys to avoid clobbering user labels.

@@
-  logger.V(1).Info("ServiceAccount already exists",
-    "serviceAccount", serviceAccountName,
-    "namespace", targetNamespace)
+  // Ensure required labels are present on existing SA
+  updated := false
+  if sa.Labels == nil {
+    sa.Labels = map[string]string{}
+  }
+  for k, v := range map[string]string{
+    "app.kubernetes.io/managed-by": "dynamo-operator",
+    "app.kubernetes.io/component":  "rbac",
+    "app.kubernetes.io/name":       serviceAccountName,
+  } {
+    if sa.Labels[k] != v {
+      sa.Labels[k] = v
+      updated = true
+    }
+  }
+  if updated {
+    if err := m.client.Update(ctx, sa); err != nil {
+      return fmt.Errorf("failed to update service account labels: %w", err)
+    }
+    logger.V(1).Info("ServiceAccount labels updated",
+      "serviceAccount", serviceAccountName,
+      "namespace", targetNamespace)
+  } else {
+    logger.V(1).Info("ServiceAccount already exists",
+      "serviceAccount", serviceAccountName,
+      "namespace", targetNamespace)
+  }
@@
-} else if needsUpdate {
+} else if needsUpdate {
   // Only Subjects changed, can update in-place
-  existingRB.Subjects = rb.Subjects
+  existingRB.Subjects = rb.Subjects
+  // Keep labels in sync too
+  if existingRB.Labels == nil {
+    existingRB.Labels = map[string]string{}
+  }
+  for k, v := range rb.Labels {
+    existingRB.Labels[k] = v
+  }
   if err := m.client.Update(ctx, existingRB); err != nil {
     return fmt.Errorf("failed to update role binding: %w", err)
   }
   logger.V(1).Info("RoleBinding subjects updated",
     "roleBinding", roleBindingName,
     "namespace", targetNamespace)
-} else {
+} else {
+  // Update labels if they drifted
+  labelsChanged := false
+  if existingRB.Labels == nil {
+    existingRB.Labels = map[string]string{}
+  }
+  for k, v := range rb.Labels {
+    if existingRB.Labels[k] != v {
+      existingRB.Labels[k] = v
+      labelsChanged = true
+    }
+  }
+  if labelsChanged {
+    if err := m.client.Update(ctx, existingRB); err != nil {
+      return fmt.Errorf("failed to update role binding labels: %w", err)
+    }
+    logger.V(1).Info("RoleBinding labels updated",
+      "roleBinding", roleBindingName,
+      "clusterRole", clusterRoleName,
+      "namespace", targetNamespace)
+    return nil
+  }
   logger.V(1).Info("RoleBinding already up-to-date",
     "roleBinding", roleBindingName,
     "clusterRole", clusterRoleName,
     "namespace", targetNamespace)
 }

Also applies to: 175-189

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f712653 and 8377658.

⛔ Files ignored due to path filters (1)
  • deploy/cloud/operator/go.sum is excluded by !**/*.sum
📒 Files selected for processing (10)
  • deploy/cloud/helm/platform/Chart.yaml (1 hunks)
  • deploy/cloud/helm/platform/README.md (1 hunks)
  • deploy/cloud/helm/platform/components/operator/templates/deployment.yaml (1 hunks)
  • deploy/cloud/helm/platform/components/operator/templates/planner.yaml (3 hunks)
  • deploy/cloud/operator/cmd/main.go (5 hunks)
  • deploy/cloud/operator/go.mod (1 hunks)
  • deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (3 hunks)
  • deploy/cloud/operator/internal/controller_common/predicate.go (1 hunks)
  • deploy/cloud/operator/internal/rbac/manager.go (1 hunks)
  • deploy/cloud/operator/internal/rbac/manager_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
📚 Learning: 2025-09-17T22:35:40.674Z
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#3100
File: deploy/cloud/operator/cmd/main.go:186-190
Timestamp: 2025-09-17T22:35:40.674Z
Learning: The mpiRunSecretName validation in deploy/cloud/operator/cmd/main.go is safe for Helm-based upgrades because the chart automatically provides default values (secretName: "mpi-run-ssh-secret", sshKeygen.enabled: true) and the deployment template conditionally injects the --mpi-run-ssh-secret-name flag, ensuring existing installations get the required configuration during upgrades.

Applied to files:

  • deploy/cloud/helm/platform/components/operator/templates/deployment.yaml
🪛 YAMLlint (1.37.1)
deploy/cloud/helm/platform/components/operator/templates/planner.yaml

[error] 16-16: syntax error: expected the node content, but found '-'

(syntax)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: sglang
  • GitHub Check: trtllm (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (17)
deploy/cloud/operator/go.mod (1)

9-9: Dep bump looks good; run tidy/tests

Upgrade to grove/operator/api v0.1.0-alpha.2 LGTM. Please run go mod tidy and the operator tests to catch any API drift.

deploy/cloud/helm/platform/Chart.yaml (1)

38-38: LGTM on chart dependency bumps

Versions updated consistently.

Also applies to: 43-43

deploy/cloud/helm/platform/README.md (1)

92-93: Docs sync looks correct

Dependency versions align with Chart.yaml.

deploy/cloud/operator/internal/controller_common/predicate.go (1)

69-77: RBACConfig addition looks good

Public config extension is straightforward.

Please confirm PlannerClusterRoleName is wired from flags/values and used by the reconciler (search for usages and CLI wiring).

deploy/cloud/helm/platform/components/operator/templates/planner.yaml (5)

16-17: LGTM - False positive from static analysis.

The YAML linter error is a false positive. The {{- if .Values.namespaceRestriction.enabled }} syntax is valid Helm template syntax for conditional rendering.


23-25: LGTM - Correct namespace reference.

The change from .Values.namespace to .Release.Namespace ensures the ServiceAccount is created in the operator's deployment namespace, which is the correct behavior for namespace-restricted mode.


35-37: LGTM - Consistent changes.

The namespace and labels changes are consistent with the ServiceAccount updates and correctly implement namespace-restricted mode.


47-53: LGTM - Correct RoleBinding configuration.

The RoleBinding correctly references both its own namespace and the subject's namespace as .Release.Namespace, ensuring the binding is scoped to the operator's deployment namespace.


58-71: LGTM - Cluster-wide mode correctly implemented.

The ClusterRole is properly defined for cluster-wide mode. Note that in cluster-wide mode, the ServiceAccount and RoleBinding are dynamically created in target namespaces by the RBAC manager (as seen in dynamographdeployment_controller.go), while this ClusterRole provides the cluster-wide permissions template.

deploy/cloud/helm/platform/components/operator/templates/deployment.yaml (1)

127-129: LGTM - Correct CLI argument injection.

The --planner-cluster-role-name argument is correctly injected only in cluster-wide mode (when namespaceRestriction.enabled is false) and uses the same naming convention as the ClusterRole defined in planner.yaml.

deploy/cloud/operator/internal/rbac/manager_test.go (1)

1-765: LGTM - Comprehensive test coverage.

The test suite provides excellent coverage of the RBAC manager's functionality:

  • ✅ Creation scenarios
  • ✅ Idempotency verification
  • ✅ Update and reconciliation logic
  • ✅ Multi-namespace handling
  • ✅ Edge cases and validation

The tests properly document edge cases where fake client behavior differs from real Kubernetes (e.g., RoleRef immutability, empty namespace handling).

deploy/cloud/operator/cmd/main.go (4)

63-63: LGTM - Required import added.

The RBAC package import is correctly added to support the new RBAC manager functionality.


141-141: LGTM - Flag properly defined.

The plannerClusterRoleName flag is correctly defined with appropriate help text indicating it's for cluster-wide mode only.

Also applies to: 180-181


237-239: LGTM - RBAC config properly wired.

The PlannerClusterRoleName is correctly added to the RBACConfig and will be available to the reconciler for cluster-wide RBAC provisioning.


436-437: LGTM - RBAC manager initialized and wired.

The RBAC manager is properly initialized with the Kubernetes client and passed to the DynamoGraphDeploymentReconciler, enabling dynamic ServiceAccount and RoleBinding creation in cluster-wide mode.

Also applies to: 446-446

deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (2)

66-69: LGTM - Interface properly defined.

The rbacManager interface is well-defined with a single, focused method for ensuring ServiceAccount with RBAC. This abstraction enables testability and follows good design practices.


79-79: LGTM - Field added to reconciler.

The RBACMgr field is correctly added to the DynamoGraphDeploymentReconciler struct and will be initialized from main.go.

@julienmancuso julienmancuso merged commit bfbcae7 into main Oct 9, 2025
20 of 21 checks passed
@julienmancuso julienmancuso deleted the jsm/dep-492 branch October 9, 2025 20:22
ziqifan617 pushed a commit that referenced this pull request Oct 20, 2025
nv-tusharma pushed a commit that referenced this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants