Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
142 commits
Select commit Hold shift + click to select a range
ed1bdb1
wip
naemono Oct 7, 2025
b439a1b
Adding skeleton
naemono Dec 1, 2025
a24a3c6
renaming crds
naemono Dec 1, 2025
8716505
skeleton of controller logic.
naemono Dec 1, 2025
193a0de
Adding reconciliation logic.
naemono Dec 1, 2025
f32b643
Update config parsing
naemono Dec 1, 2025
8fc5d45
Adding unit tests
naemono Dec 1, 2025
5c81c1a
Optimization
naemono Dec 1, 2025
1d0456a
Nearly functional without ssl verification
naemono Dec 1, 2025
99c90f0
Fixing indentation
naemono Dec 2, 2025
2b925b3
Cleanup. deploys in same ns as policy.
naemono Dec 2, 2025
6cb9347
functional autoops using file-realm users
naemono Dec 3, 2025
4c8c1ba
Functional api keys autoops integration
naemono Dec 4, 2025
a52095f
Cleanup. Adding additional unit tests
naemono Dec 4, 2025
75e717f
Merge branch 'main' into ccm-integration
naemono Dec 4, 2025
d77a53b
make generate
naemono Dec 4, 2025
edadd1c
Better handle the state/status.
naemono Dec 4, 2025
db9cb3b
cleanup
naemono Dec 4, 2025
578c3a9
More cleanup
naemono Dec 4, 2025
275e1a1
lowercase
naemono Dec 4, 2025
a7e3620
cleanup
naemono Dec 4, 2025
335bc67
const for label name
naemono Dec 4, 2025
222589b
cleanup
naemono Dec 4, 2025
0c190d4
more cleanup
naemono Dec 4, 2025
68ca200
Skip api keys in remote cluster controller managed by autoops
naemono Dec 4, 2025
2204c47
Add helm charts
naemono Dec 4, 2025
7e75762
Re-enable license checks
naemono Dec 4, 2025
065f3d7
Changes from initial review.
naemono Dec 5, 2025
e687945
Cleanup some of the reconcile logic.
naemono Dec 5, 2025
8b9a871
Adjust to allow error counts to be calculated.
naemono Dec 5, 2025
f4e1734
Nolint
naemono Dec 5, 2025
25ff367
Lint issues
naemono Dec 5, 2025
6271eed
re-enable temp_resource_id, but generate it
naemono Dec 5, 2025
36b8fe5
remove temp resource id
naemono Dec 5, 2025
db322a5
adjust description of autoops crd
naemono Dec 8, 2025
0ba6849
ensuring deployment name max length isnt exceeded
naemono Dec 8, 2025
2e5ccd1
adjust naming to avoid long names
naemono Dec 8, 2025
83f31b5
update charts
naemono Dec 9, 2025
4110a01
review adjustments
naemono Dec 9, 2025
2e48e42
ensuring secrets are cleaned up when missing es
naemono Dec 9, 2025
86de699
fix comments
naemono Dec 9, 2025
82cb695
cleanup deployment code
naemono Dec 9, 2025
35da2a6
update helm charts adding additional options
naemono Dec 9, 2025
475348d
also watch the autoops ca secret
naemono Dec 9, 2025
a6b70ff
dont check ready status in called func
naemono Dec 9, 2025
e863ad6
adjust helm chart to allow existing secret
naemono Dec 9, 2025
6797418
also hash secret data
naemono Dec 9, 2025
7a0b47f
fix unit tests
naemono Dec 9, 2025
458cf32
fix helm tests for autoops
naemono Dec 9, 2025
70ca228
make generate
naemono Dec 9, 2025
c49cdc2
fix annotation size
naemono Dec 10, 2025
55c95cc
use create, not apply
naemono Dec 10, 2025
153b6c7
add patch for crd; revert create vs apply change
naemono Dec 10, 2025
8331edb
adding back additional hashing
naemono Dec 10, 2025
16d1e7a
using namers for all
naemono Dec 10, 2025
c71af79
reduce size of namer file name
naemono Dec 10, 2025
e196050
centralize autoops naming
naemono Dec 10, 2025
a08eda6
fix linting
naemono Dec 10, 2025
f6209d5
better control namer length
naemono Dec 10, 2025
7dc93ad
fix updatewithphase transitions
naemono Dec 10, 2025
f5d35fe
making the naming consistent for secrets
naemono Dec 10, 2025
f8e6e25
again; consistent naming
naemono Dec 10, 2025
d2a34e5
adding better error handling
naemono Dec 10, 2025
46eca6d
add event when not reconciled
naemono Dec 10, 2025
e982d40
use correct configmap name
naemono Dec 10, 2025
c4913a2
cleanup logger; pass back errors via result
naemono Dec 11, 2025
63b79df
make configref.secret required in helm chart
naemono Dec 11, 2025
e8b10f6
add config sample for autoops
naemono Dec 11, 2025
cbfdc0e
use extract nsn func
naemono Dec 11, 2025
9f0729f
fix linter, unit tests
naemono Dec 11, 2025
4e41ee8
try linter values file.
naemono Dec 11, 2025
ece73b5
Fix lint values
naemono Dec 11, 2025
5e51bbb
Revert config changes
naemono Dec 11, 2025
067dab3
Add back autoops policy
naemono Dec 11, 2025
5738efe
Remove unneeded file
naemono Dec 11, 2025
ffa255a
remove unneeded helm values
naemono Dec 11, 2025
f710577
remove unused var
naemono Dec 11, 2025
4219c51
Fix comment
naemono Dec 11, 2025
5b058aa
Fix linting issue
naemono Dec 11, 2025
3bf4607
Temp disable enterprise check
naemono Dec 11, 2025
c28db04
Review changes.
naemono Dec 12, 2025
f62cee3
Use the common funcs in common/apikey
naemono Dec 12, 2025
4384554
Move parseSecret -> validateSecret
naemono Dec 12, 2025
3a3c083
rename reconciler.
naemono Dec 12, 2025
17b69cf
Adjust cleanup on delete logic.
naemono Dec 12, 2025
f38db24
Adding unit tests for internal reconcile.
naemono Dec 12, 2025
4ab8be8
unexport newstate
naemono Dec 12, 2025
739bfdd
Update name of reconciler.
naemono Dec 12, 2025
ad239f5
Allow ability to override the configuration
naemono Dec 12, 2025
a8b69ae
make generate
naemono Dec 12, 2025
1dd58b8
Fix json tags
naemono Dec 13, 2025
be1db23
add gc; disable config merging
naemono Dec 15, 2025
e0a39a3
add license builder to e2e test
naemono Dec 15, 2025
6c25a13
remove duplicative funcs. make "get api_keys calls consistent"
naemono Dec 15, 2025
db28e81
adjust crd for config. fix unit tests. remove owner from apikey secret
naemono Dec 15, 2025
17518ba
Add cleanup of resources when matchlabels change.
naemono Dec 16, 2025
a3e2b3c
revert license changes
naemono Dec 16, 2025
26f9900
Fix comment
naemono Dec 16, 2025
362591a
remove unused const
naemono Dec 16, 2025
d4bcfb6
Fix linter
naemono Dec 16, 2025
c00bb50
Fix validation
naemono Dec 16, 2025
15373d5
Fix secret name
naemono Dec 16, 2025
9921a5e
Fit unit tests
naemono Dec 16, 2025
1e8e0ad
cleanup from review
naemono Dec 16, 2025
4088eae
optimize initial creation flow avoiding additional gets
naemono Dec 16, 2025
55275f3
remove unused ctx
naemono Dec 16, 2025
5c5cc16
remove redundant name label
naemono Dec 16, 2025
1557204
fix imports
naemono Dec 16, 2025
ac634a7
making things consistent
naemono Dec 16, 2025
fa260ec
requeue until all is ready
naemono Dec 16, 2025
cc1958c
adjust how phase is handled from result
naemono Dec 16, 2025
c896b23
adjust cleanup to happen even when no instances match selector
naemono Dec 16, 2025
9421b3c
cleanup
naemono Dec 16, 2025
a4f7358
remove dynamic watches for all secrets
naemono Dec 17, 2025
91bc15b
Review changes
naemono Dec 17, 2025
a63aa6b
Revert license changes
naemono Dec 17, 2025
3d36a53
Rename short name for CRD.
naemono Dec 17, 2025
074763a
Rename helm chart.
naemono Dec 17, 2025
1bc72aa
Fix helm charts
naemono Dec 17, 2025
fbee875
remove dupe labels
naemono Dec 17, 2025
7ad1ec7
ensure secret cleaned up in e2e test.
naemono Dec 17, 2025
1e7f32e
optimize user role for autoops
naemono Dec 17, 2025
e49890f
Wording in crd definition
naemono Dec 17, 2025
0d0c2c0
Avoid conversion
naemono Dec 17, 2025
eeb4c7b
Add serviceaccount for rbac purposes.
naemono Dec 18, 2025
14553fa
Adjust manager/main.go for changes to autoops add func
naemono Dec 18, 2025
890d730
actually add accessreviewer
naemono Dec 18, 2025
680c8a4
Enhance RBAC handling in AutoOps reconciliation process. Filter Elast…
pebrc Dec 18, 2025
574ecf8
Implement GC for orphaned AutoOps API keys at startup
pebrc Dec 18, 2025
fbb7c14
Optimization from review.
naemono Dec 18, 2025
81a3cab
Added aop telemetry
naemono Dec 18, 2025
2a0c295
Review changes
naemono Dec 19, 2025
19e2051
Also watch deployments.
naemono Dec 19, 2025
b16d057
Fix unit tests
naemono Dec 19, 2025
7821df5
Fix imports
naemono Dec 19, 2025
b184143
don't execute e2e test on unsupported stack version
naemono Dec 22, 2025
66a5e3a
revert rename of func
naemono Dec 22, 2025
220d5fd
register autoops with e2e scheme
naemono Dec 23, 2025
8af8cd1
Merge branch 'main' into ccm-integration
naemono Dec 28, 2025
766c5c1
Fix notice.txt
naemono Dec 28, 2025
ccd7ce7
Merge branch 'ccm-integration' of github.com:naemono/cloud-on-k8s int…
naemono Dec 28, 2025
acb8a74
make generate
naemono Dec 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 26 additions & 1 deletion cmd/manager/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ import (
agentv1alpha1 "github.com/elastic/cloud-on-k8s/v3/pkg/apis/agent/v1alpha1"
apmv1 "github.com/elastic/cloud-on-k8s/v3/pkg/apis/apm/v1"
apmv1beta1 "github.com/elastic/cloud-on-k8s/v3/pkg/apis/apm/v1beta1"
autoopsv1alpha1 "github.com/elastic/cloud-on-k8s/v3/pkg/apis/autoops/v1alpha1"
beatv1beta1 "github.com/elastic/cloud-on-k8s/v3/pkg/apis/beat/v1beta1"
esv1 "github.com/elastic/cloud-on-k8s/v3/pkg/apis/elasticsearch/v1"
esv1beta1 "github.com/elastic/cloud-on-k8s/v3/pkg/apis/elasticsearch/v1beta1"
Expand All @@ -59,11 +60,13 @@ import (
"github.com/elastic/cloud-on-k8s/v3/pkg/controller/apmserver"
"github.com/elastic/cloud-on-k8s/v3/pkg/controller/association"
associationctl "github.com/elastic/cloud-on-k8s/v3/pkg/controller/association/controller"
"github.com/elastic/cloud-on-k8s/v3/pkg/controller/autoops"
"github.com/elastic/cloud-on-k8s/v3/pkg/controller/autoscaling"
esavalidation "github.com/elastic/cloud-on-k8s/v3/pkg/controller/autoscaling/elasticsearch/validation"
"github.com/elastic/cloud-on-k8s/v3/pkg/controller/beat"
"github.com/elastic/cloud-on-k8s/v3/pkg/controller/common/certificates"
"github.com/elastic/cloud-on-k8s/v3/pkg/controller/common/container"
commonesclient "github.com/elastic/cloud-on-k8s/v3/pkg/controller/common/esclient"
commonlicense "github.com/elastic/cloud-on-k8s/v3/pkg/controller/common/license"
"github.com/elastic/cloud-on-k8s/v3/pkg/controller/common/operator"
"github.com/elastic/cloud-on-k8s/v3/pkg/controller/common/password"
Expand Down Expand Up @@ -758,7 +761,7 @@ func startOperator(ctx context.Context) error {

disableTelemetry := viper.GetBool(operator.DisableTelemetryFlag)
telemetryInterval := viper.GetDuration(operator.TelemetryIntervalFlag)
go asyncTasks(ctx, mgr, cfg, managedNamespaces, operatorNamespace, operatorInfo, disableTelemetry, telemetryInterval, tracer)
go asyncTasks(ctx, mgr, cfg, managedNamespaces, operatorNamespace, operatorInfo, disableTelemetry, telemetryInterval, tracer, dialer)

log.Info("Starting the manager", "uuid", operatorInfo.OperatorUUID,
"namespace", operatorNamespace, "version", operatorInfo.BuildInfo.Version,
Expand Down Expand Up @@ -817,6 +820,7 @@ func asyncTasks(
disableTelemetry bool,
telemetryInterval time.Duration,
tracer *apm.Tracer,
dialer net.Dialer,
) {
<-mgr.Elected() // wait for this operator instance to be elected

Expand All @@ -843,13 +847,18 @@ func asyncTasks(
// Garbage collect orphaned secrets leftover from deleted resources while the operator was not running
// - association user secrets
gcCtx := tracing.NewContextTransaction(ctx, tracer, tracing.RunOnceTxType, "garbage-collection", nil)
gcCtx = logconf.AddToContext(gcCtx, logf.Log.WithName("garbage-collection"))
err := garbageCollectUsers(gcCtx, cfg, managedNamespaces)
if err != nil {
log.Error(err, "exiting due to unrecoverable error")
os.Exit(1)
}
// - soft-owned secrets
garbageCollectSoftOwnedSecrets(gcCtx, mgr.GetClient())
// - autoops orphaned resources (API key secrets without owner references)
if err := garbageCollectAutoOpsResources(gcCtx, mgr.GetClient(), dialer); err != nil {
log.Error(err, "AutoOps garbage collection failed, will be attempted again at next operator restart")
}
tracing.EndContextTransaction(gcCtx)
}

Expand Down Expand Up @@ -932,6 +941,9 @@ func registerControllers(mgr manager.Manager, params operator.Parameters, access
name string
registerFunc func(manager.Manager, rbac.AccessReviewer, operator.Parameters) error
}{
// AutoOps isn't technically an association controller, but it's closely related and
// it's Add function signature is the same as the association controllers.
{name: "AutoOpsAgentPolicy", registerFunc: autoops.Add},
{name: "RemoteCA", registerFunc: remotecluster.Add},
{name: "APM-ES", registerFunc: associationctl.AddApmES},
{name: "APM-KB", registerFunc: associationctl.AddApmKibana},
Expand Down Expand Up @@ -999,6 +1011,7 @@ func garbageCollectSoftOwnedSecrets(ctx context.Context, k8sClient k8s.Client) {
emsv1alpha1.Kind: &emsv1alpha1.ElasticMapsServer{},
eprv1alpha1.Kind: &eprv1alpha1.PackageRegistry{},
policyv1alpha1.Kind: &policyv1alpha1.StackConfigPolicy{},
autoopsv1alpha1.Kind: &autoopsv1alpha1.AutoOpsAgentPolicy{},
logstashv1alpha1.Kind: &logstashv1alpha1.Logstash{},
}); err != nil {
log.Error(err, "Orphan secrets garbage collection failed, will be attempted again at next operator restart.")
Expand All @@ -1007,6 +1020,17 @@ func garbageCollectSoftOwnedSecrets(ctx context.Context, k8sClient k8s.Client) {
log.Info("Orphan secrets garbage collection complete")
}

func garbageCollectAutoOpsResources(ctx context.Context, k8sClient k8s.Client, dialer net.Dialer) error {
span, ctx := apm.StartSpan(ctx, "gc_autoops_resources", tracing.SpanTypeApp)
defer span.End()

gc := autoops.NewGarbageCollector(k8sClient, commonesclient.NewClient, dialer)
if err := gc.DoGarbageCollection(ctx); err != nil {
return fmt.Errorf("AutoOps garbage collection failed: %w", err)
}
return nil
}

func setupWebhook(
ctx context.Context,
mgr manager.Manager,
Expand Down Expand Up @@ -1044,6 +1068,7 @@ func setupWebhook(
&emsv1alpha1.ElasticMapsServer{},
&eprv1alpha1.PackageRegistry{},
&policyv1alpha1.StackConfigPolicy{},
&autoopsv1alpha1.AutoOpsAgentPolicy{},
}
for _, obj := range webhookObjects {
commonwebhook.SetupValidatingWebhookWithConfig(&commonwebhook.Config{
Expand Down
176 changes: 176 additions & 0 deletions config/crds/v1/all-crds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2355,6 +2355,182 @@ spec:
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.19.0
name: autoopsagentpolicies.autoops.k8s.elastic.co
spec:
group: autoops.k8s.elastic.co
names:
categories:
- elastic
kind: AutoOpsAgentPolicy
listKind: AutoOpsAgentPolicyList
plural: autoopsagentpolicies
shortNames:
- aop
singular: autoopsagentpolicy
scope: Namespaced
versions:
- additionalPrinterColumns:
- description: Ready resources
jsonPath: .status.ready
name: Ready
type: string
- jsonPath: .status.phase
name: Phase
type: string
- jsonPath: .metadata.creationTimestamp
name: Age
type: date
name: v1alpha1
schema:
openAPIV3Schema:
description: AutoOpsAgentPolicy represents an Elastic AutoOps Policy resource
in a Kubernetes cluster.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
properties:
autoOpsRef:
description: AutoOpsRef defines a reference to a secret containing
connection details for AutoOps via Cloud Connect.
properties:
secretName:
description: |-
SecretName references a Secret containing connection details for external AutoOps.
Required when connecting via Cloud Connect. The secret must contain:
- `cloud-connected-mode-api-key`: Cloud Connected Mode API key
- `autoops-otel-url`: AutoOps OpenTelemetry endpoint URL
- `autoops-token`: AutoOps authentication token
- `cloud-connected-mode-api-url`: (optional) Cloud Connected Mode API URL
This field cannot be used in combination with `name`.
type: string
type: object
image:
description: Image is the AutoOps Agent Docker image to deploy.
type: string
podTemplate:
description: PodTemplate provides customisation options (labels, annotations,
affinity rules, resource requests, and so on) for the Agent pods
type: object
x-kubernetes-preserve-unknown-fields: true
resourceSelector:
description: |-
ResourceSelector is a label selector for the resources to be configured.
Any Elasticsearch instances that match the selector will be configured to send data to AutoOps.
properties:
matchExpressions:
description: matchExpressions is a list of label selector requirements.
The requirements are ANDed.
items:
description: |-
A label selector requirement is a selector that contains values, a key, and an operator that
relates the key and values.
properties:
key:
description: key is the label key that the selector applies
to.
type: string
operator:
description: |-
operator represents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists and DoesNotExist.
type: string
values:
description: |-
values is an array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. This array is replaced during a strategic
merge patch.
items:
type: string
type: array
x-kubernetes-list-type: atomic
required:
- key
- operator
type: object
type: array
x-kubernetes-list-type: atomic
matchLabels:
additionalProperties:
type: string
description: |-
matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
map is equivalent to an element of matchExpressions, whose key field is "key", the
operator is "In", and the values array contains only "value". The requirements are ANDed.
type: object
type: object
x-kubernetes-map-type: atomic
revisionHistoryLimit:
description: RevisionHistoryLimit is the number of revisions to retain
to allow rollback in the underlying Deployment.
format: int32
type: integer
serviceAccountName:
description: |-
ServiceAccountName is used to check access to Elasticsearch resources in different namespaces.
Can only be used if ECK is enforcing RBAC on references (--enforce-rbac-on-refs flag).
The service account must have "get" permission on elasticsearch.k8s.elastic.co/elasticsearches
in the target namespaces.
type: string
version:
description: Version of the AutoOpsAgentPolicy.
type: string
required:
- version
type: object
status:
properties:
errors:
description: Errors is the number of resources that are in an error
state.
type: integer
observedGeneration:
description: ObservedGeneration is the most recent generation observed
for this AutoOpsAgentPolicy.
format: int64
type: integer
phase:
description: Phase is the phase of the AutoOpsAgentPolicy.
type: string
ready:
description: Ready is the number of resources that are in a ready
state.
type: integer
resources:
description: Resources is the number of resources that match the ResourceSelector.
type: integer
required:
- errors
- ready
- resources
type: object
type: object
served: true
storage: true
subresources:
status: {}
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.19.0
Expand Down
8 changes: 8 additions & 0 deletions config/crds/v1/patches/autoopsagentpolicy-patches.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Using `kubectl apply` stores the complete CRD file as an annotation,
# which may be too big for the annotations size limit.
# One way to mitigate this problem is to remove the (huge) podTemplate properties from the CRD.
# It also avoids the problem of having any k8s-version specific field in the Pod schema,
# that would maybe not match the user's k8s version.
- op: remove
path: /spec/versions/0/schema/openAPIV3Schema/properties/spec/properties/podTemplate/properties

9 changes: 8 additions & 1 deletion config/crds/v1/patches/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,20 @@ patches:
kind: CustomResourceDefinition
name: elasticmapsservers.maps.k8s.elastic.co
path: maps-patches.yaml
# custom patches for Logstash
# custom patches for Logstash
- target:
group: apiextensions.k8s.io
version: v1
kind: CustomResourceDefinition
name: logstashes.logstash.k8s.elastic.co
path: logstash-patches.yaml
# custom patches for AutoOpsAgentPolicy
- target:
group: apiextensions.k8s.io
version: v1
kind: CustomResourceDefinition
name: autoopsagentpolicies.autoops.k8s.elastic.co
path: autoopsagentpolicy-patches.yaml
# custom patches for PackageRegistry
- target:
group: apiextensions.k8s.io
Expand Down
Loading