Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions modules/manage/examples/kubernetes/cluster.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# This file contains some tests originally ported from our e2e-v2 tests.
# We should really evaluate whether or not to just delete these.
Feature: Basic cluster tests
@skip:gke @skip:aks @skip:eks
Scenario: Updating admin ports
# replaces e2e-v2 "upgrade-values-check"
Given I apply Kubernetes manifest:
"""
---
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
name: upgrade
spec:
clusterSpec:
statefulset:
replicas: 1
listeners:
admin:
external:
default:
port: 9645
"""
And cluster "upgrade" is stable with 1 nodes
And service "upgrade-external" has named port "admin-default" with value 9645
And rpk is configured correctly in "upgrade" cluster
When I apply Kubernetes manifest:
"""
---
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
name: upgrade
spec:
clusterSpec:
statefulset:
replicas: 1
listeners:
admin:
external:
default:
port: 9640
"""
Then cluster "upgrade" is stable with 1 nodes
And service "upgrade-external" should have named port "admin-default" with value 9640
And rpk is configured correctly in "upgrade" cluster


@skip:gke @skip:aks @skip:eks
Scenario: Rack Awareness
Given I apply Kubernetes manifest:
# NB: You wouldn't actually use kubernetes.io/os for the value of rack,
# it's just a value that we know is both present and deterministic for the
# purpose of testing.
"""
---
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
name: rack-awareness
spec:
clusterSpec:
console:
enabled: false
statefulset:
replicas: 1
rackAwareness:
enabled: true
nodeAnnotation: 'kubernetes.io/os'
"""
And cluster "rack-awareness" is stable with 1 nodes
Then running `cat /etc/redpanda/redpanda.yaml | grep -o 'rack: .*$'` will output:
"""
rack: linux
"""
80 changes: 80 additions & 0 deletions modules/manage/examples/kubernetes/console.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
@cluster:basic
Feature: Console CRDs
Background: Cluster available
Given cluster "basic" is available

Scenario: Using clusterRef
When I apply Kubernetes manifest:
```yaml
---
apiVersion: cluster.redpanda.com/v1alpha2
kind: Console
metadata:
name: console
spec:
cluster:
clusterRef:
name: basic
```
Then Console "console" will be healthy
# These steps demonstrate that console is correctly connected to Redpanda (Kafka, Schema Registry, and Admin API).
And I exec "curl localhost:8080/api/schema-registry/mode" in a Pod matching "app.kubernetes.io/instance=console", it will output:
```
{"mode":"READWRITE"}
```
And I exec "curl localhost:8080/api/topics" in a Pod matching "app.kubernetes.io/instance=console", it will output:
```
{"topics":[{"topicName":"_schemas","isInternal":false,"partitionCount":1,"replicationFactor":1,"cleanupPolicy":"compact","documentation":"NOT_CONFIGURED","logDirSummary":{"totalSizeBytes":117}}]}
```
And I exec "curl localhost:8080/api/console/endpoints | grep -o '{[^{}]*DebugBundleService[^{}]*}'" in a Pod matching "app.kubernetes.io/instance=console", it will output:
```
{"endpoint":"redpanda.api.console.v1alpha1.DebugBundleService","method":"POST","isSupported":true}
```

Scenario: Using staticConfig
When I apply Kubernetes manifest:
```yaml
---
apiVersion: cluster.redpanda.com/v1alpha2
kind: Console
metadata:
name: console
spec:
cluster:
staticConfiguration:
kafka:
brokers:
- basic-0.basic.${NAMESPACE}.svc.cluster.local.:9093
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix DNS syntax error in Kafka broker address.

The DNS name has an extra period before the port: basic-0.basic.${NAMESPACE}.svc.cluster.local.:9093. The correct format should have the colon directly after local without an additional period.

Apply this fix:

             brokers:
-            - basic-0.basic.${NAMESPACE}.svc.cluster.local.:9093
+            - basic-0.basic.${NAMESPACE}.svc.cluster.local:9093
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- basic-0.basic.${NAMESPACE}.svc.cluster.local.:9093
- basic-0.basic.${NAMESPACE}.svc.cluster.local:9093
🤖 Prompt for AI Agents
In modules/manage/examples/kubernetes/console.feature around line 47, the Kafka
broker DNS contains an extra period before the port causing an invalid address;
remove the stray dot so the host ends with svc.cluster.local:9093 (i.e., replace
"basic-0.basic.${NAMESPACE}.svc.cluster.local.:9093" with
"basic-0.basic.${NAMESPACE}.svc.cluster.local:9093").

tls:
caCertSecretRef:
name: "basic-default-cert"
key: "ca.crt"
admin:
urls:
- https://basic-0.basic.${NAMESPACE}.svc.cluster.local.:9644
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix DNS syntax error in Admin API URL.

The DNS name has an extra period before the port: https://basic-0.basic.${NAMESPACE}.svc.cluster.local.:9644. The correct format should have the colon directly after local without an additional period.

Apply this fix:

             urls:
-            - https://basic-0.basic.${NAMESPACE}.svc.cluster.local.:9644
+            - https://basic-0.basic.${NAMESPACE}.svc.cluster.local:9644
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- https://basic-0.basic.${NAMESPACE}.svc.cluster.local.:9644
urls:
- https://basic-0.basic.${NAMESPACE}.svc.cluster.local:9644
🤖 Prompt for AI Agents
In modules/manage/examples/kubernetes/console.feature around line 54, the Admin
API URL contains an extra period before the port
("https://basic-0.basic.${NAMESPACE}.svc.cluster.local.:9644"); remove the stray
dot so the host ends with ".local" and the port follows immediately, i.e. change
it to "https://basic-0.basic.${NAMESPACE}.svc.cluster.local:9644".

tls:
caCertSecretRef:
name: "basic-default-cert"
key: "ca.crt"
schemaRegistry:
urls:
- https://basic-0.basic.${NAMESPACE}.svc.cluster.local.:8081
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix DNS syntax error in Schema Registry URL.

The DNS name has an extra period before the port: https://basic-0.basic.${NAMESPACE}.svc.cluster.local.:8081. The correct format should have the colon directly after local without an additional period.

Apply this fix:

             urls:
-            - https://basic-0.basic.${NAMESPACE}.svc.cluster.local.:8081
+            - https://basic-0.basic.${NAMESPACE}.svc.cluster.local:9081

Note: Verify the correct Schema Registry port. Typically it's 8081, but ensure it matches your Redpanda configuration.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In modules/manage/examples/kubernetes/console.feature around line 61, the Schema
Registry URL contains an extra dot before the port ("...cluster.local.:8081");
remove the stray period so the host ends with "cluster.local" and the port
follows with a colon (e.g.,
"https://basic-0.basic.${NAMESPACE}.svc.cluster.local:8081"); verify and correct
the port number if your Redpanda configuration uses a different port.

tls:
caCertSecretRef:
name: "basic-default-cert"
key: "ca.crt"
```
Then Console "console" will be healthy
# These steps demonstrate that console is correctly connected to Redpanda (Kafka, Schema Registry, and Admin API).
And I exec "curl localhost:8080/api/schema-registry/mode" in a Pod matching "app.kubernetes.io/instance=console", it will output:
```
{"mode":"READWRITE"}
```
And I exec "curl localhost:8080/api/topics" in a Pod matching "app.kubernetes.io/instance=console", it will output:
```
{"topics":[{"topicName":"_schemas","isInternal":false,"partitionCount":1,"replicationFactor":1,"cleanupPolicy":"compact","documentation":"NOT_CONFIGURED","logDirSummary":{"totalSizeBytes":117}}]}
```
And I exec "curl localhost:8080/api/console/endpoints | grep -o '{[^{}]*DebugBundleService[^{}]*}'" in a Pod matching "app.kubernetes.io/instance=console", it will output:
```
{"endpoint":"redpanda.api.console.v1alpha1.DebugBundleService","method":"POST","isSupported":true}
```
13 changes: 13 additions & 0 deletions modules/manage/examples/kubernetes/decommissioning.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Feature: Decommissioning brokers
# note that this test requires both the decommissioner and pvc unbinder
# run in order to pass
@skip:gke @skip:aks @skip:eks
Scenario: Pruning brokers on failed nodes
Given I create a basic cluster "decommissioning" with 3 nodes
And cluster "decommissioning" is stable with 3 nodes
When I physically shutdown a kubernetes node for cluster "decommissioning"
And cluster "decommissioning" is unhealthy
And cluster "decommissioning" has only 2 remaining nodes
And I prune any kubernetes node that is now in a NotReady status
Then cluster "decommissioning" should recover
And cluster "decommissioning" should be stable with 3 nodes
39 changes: 39 additions & 0 deletions modules/manage/examples/kubernetes/helm-chart.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
@operator:none
Feature: Redpanda Helm Chart

Scenario: Tolerating Node Failure
Given I helm install "redpanda" "../charts/redpanda/chart" with values:
```yaml
nameOverride: foobar
fullnameOverride: bazquux

statefulset:
sideCars:
image:
tag: dev
repository: localhost/redpanda-operator
pvcUnbinder:
enabled: true
unbindAfter: 15s
brokerDecommissioner:
enabled: true
decommissionAfter: 15s
```
When I stop the Node running Pod "bazquux-2"
And Pod "bazquux-2" is eventually Pending
Then Pod "bazquux-2" will eventually be Running
And kubectl exec -it "bazquux-0" "rpk redpanda admin brokers list | sed -E 's/\s+/ /gm' | cut -d ' ' -f 1,6" will eventually output:
```
ID MEMBERSHIP
0 active
1 active
3 active
```
And kubectl exec -it "bazquux-0" "rpk redpanda admin brokers list --include-decommissioned | sed -E 's/\s+/ /gm' | cut -d ' ' -f 1,6" will eventually output:
```
ID MEMBERSHIP
0 active
1 active
3 active
2 -
```
24 changes: 24 additions & 0 deletions modules/manage/examples/kubernetes/metrics.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Feature: Metrics endpoint has authentication and authorization

@skip:gke @skip:aks @skip:eks
Scenario: Reject request without TLS
Given the operator is running
Then its metrics endpoint should reject http request with status code "400"

@skip:gke @skip:aks @skip:eks
Scenario: Reject unauthenticated token
Given the operator is running
Then its metrics endpoint should reject authorization random token request with status code "500"

@skip:gke @skip:aks @skip:eks
Scenario: Accept request
Given the operator is running
When I apply Kubernetes manifest:
"""
apiVersion: v1
kind: ServiceAccount
metadata:
name: testing
"""
And "testing" service account has bounded "redpanda-operator-.*-metrics-reader" regexp cluster role name
Comment on lines +16 to +23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing ClusterRoleBinding in the manifest.

The manifest creates a ServiceAccount named "testing" but line 23 expects this account to be bound to a ClusterRole matching the regex redpanda-operator-.*-metrics-reader. However, no ClusterRoleBinding resource is defined in the manifest to establish this binding. This will cause the test to fail unless the binding is created elsewhere.

Add a ClusterRoleBinding to complete the authorization setup:

 apiVersion: v1
 kind: ServiceAccount
 metadata:
   name: testing
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: testing-metrics-reader
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: redpanda-operator-<namespace>-metrics-reader  # Replace <namespace> with actual namespace
+subjects:
+- kind: ServiceAccount
+  name: testing
+  namespace: <namespace>  # Replace with actual namespace

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In modules/manage/examples/kubernetes/metrics.feature around lines 16 to 23, the
manifest creates a ServiceAccount "testing" but lacks a ClusterRoleBinding to
bind it to a ClusterRole matching the regex redpanda-operator-.*-metrics-reader;
add a ClusterRoleBinding resource in the manifest that references the
ServiceAccount "testing" in the appropriate namespace (or cluster-wide subject)
and binds it to the ClusterRole (or to a ClusterRole whose name matches that
regex) so the ServiceAccount has the required cluster-level permissions for
metrics reading.

Then its metrics endpoint should accept https request with "testing" service account token
38 changes: 38 additions & 0 deletions modules/manage/examples/kubernetes/migration.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
Feature: Helm chart to Redpanda Operator migration

@skip:gke @skip:aks @skip:eks
Scenario: Migrate from a Helm chart release to a Redpanda custom resource
Given I helm install "redpanda-migration-example" "../charts/redpanda/chart" with values:
"""
# tag::helm-values[]
fullnameOverride: name-override
# end::helm-values[]
# Without the below values, the operator would have to modify the cluster after the migration.
# As this is test specific because we use a local version of the operator, this block is excluded from the helm-values tag above.
statefulset:
sideCars:
image:
repository: localhost/redpanda-operator
tag: dev
"""
And I store "{.metadata.generation}" of Kubernetes object with type "StatefulSet.v1.apps" and name "name-override" as "generation"
When I apply Kubernetes manifest:
"""
# tag::redpanda-custom-resource-manifest[]
---
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
name: redpanda-migration-example
spec:
# This manifest is a copy of Redpanda release Helm values
clusterSpec:
fullnameOverride: name-override
# end::redpanda-custom-resource-manifest[]
"""
Then cluster "redpanda-migration-example" is available
And the Kubernetes object of type "StatefulSet.v1.apps" with name "name-override" has an OwnerReference pointing to the cluster "redpanda-migration-example"
And the helm release for "redpanda-migration-example" can be deleted by removing its stored secret
And the cluster "redpanda-migration-example" is healthy
# this winds up being incremented due to us forcibly swapping the cluster's StatefulSets to leverage OnDelete semantics
And the recorded value "generation" is one less than "{.metadata.generation}" of the Kubernetes object with type "StatefulSet.v1.apps" and name "name-override"
40 changes: 40 additions & 0 deletions modules/manage/examples/kubernetes/operator-upgrades.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
@operator:none @vcluster
Feature: Upgrading the operator
@skip:gke @skip:aks @skip:eks
Scenario: Operator upgrade from 25.1.3
Given I helm install "redpanda-operator" "redpanda/operator" --version v25.1.3 with values:
"""
crds:
enabled: true
"""
And I apply Kubernetes manifest:
"""
---
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
name: operator-upgrade
spec:
clusterSpec:
console:
enabled: false
statefulset:
replicas: 1
sideCars:
image:
tag: dev
repository: localhost/redpanda-operator
"""
# use just a Ready status check here since that's all the
# old operator supports
And cluster "operator-upgrade" is available
Then I can helm upgrade "redpanda-operator" "../operator/chart" with values:
"""
image:
tag: dev
repository: localhost/redpanda-operator
crds:
experimental: true
"""
# use the new status as this will eventually get set
And cluster "operator-upgrade" should be stable with 1 nodes
Loading