Skip to content
This repository has been archived by the owner on Jul 26, 2022. It is now read-only.

Standardize CRD Spec #477

Closed
wants to merge 16 commits into from
254 changes: 254 additions & 0 deletions keps/20200901-standard-crd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
```yaml
---
title: Standardize ExternalSecret CRD
authors: all of us
creation-date: 2020-09-01
status: draft
---
```

# Standardize ExternalSecret CRD

## Table of Contents

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Terminology](#terminology)
- [Use-Cases](#use-cases)
- [Proposal](#proposal)
- [API](#api)
- [Alternatives](#alternatives)
<!-- /toc -->

## Summary

This is a proposal to standardize the `ExternalSecret` CRD in an combined effort through all projects that deal with syncing external secrets. This proposal aims to do find the _common denominator_ for all users of `ExternalSecrets`.

## Motivation

There are a lot of different projects in the wild that essentially do the same thing: sync secrets with Kubernetes. The idea is to unify efforts into a single project that serves the needs of all users in this problem space.

As a starting point i would like to define a **common denominator** for a CustomResourceDefinition that serves all known use-cases. This CRD should follow the standard alpha -> beta -> GA feature process.

Once the CRD API is defined we can move on with more delicate discussions about technology, organization and processes.

List of Projects known so far or related:
* https://github.com/godaddy/kubernetes-external-secrets
* https://github.com/itscontained/secret-manager
* https://github.com/ContainerSolutions/externalsecret-operator
* https://github.com/mumoshu/aws-secret-operator
* https://github.com/cmattoon/aws-ssm
* https://github.com/tuenti/secrets-manager
* https://github.com/kubernetes-sigs/k8s-gsm-tools

### Goals

- Define a alpha CRD
- Fully document the Spec and use-cases

### Non-Goals

This KEP proposes the CRD Spec and documents the use-cases, not the choice of technology or migration path towards implementing the CRD.

We do not want to sync secrets into a `ConfigMap`.

## Terminology

* Kubernetes External Secrets `KES`: A Application that runs a control loop which syncs secrets
* KES `instance`: A single entity that runs a control loop.
* ExternalSecret `ES`: A CustomResource that declares which secrets should be synced
* Store: Is a **source** for secrets. The Store is external to KES. It can be a hosted service like Alibabacloud SecretsManager, AWS SystemsManager, Azure KeyVault...
moolen marked this conversation as resolved.
Show resolved Hide resolved
* Frontend: A **sink** for the synced secrets. Usually a `Secret`
* Secret: credentials that act as a key to sensitive information

## Use-Cases
* one global KES instance that manages ES in **all namespaces**, which gives access to **all stores**, with ACL
* multiple global KES instances, each manages access to a single or multiple stores (e.g.: shard by stage or team...)
* one KES per namespace (a user manages his/her own KES instance)

### User definitions
* `operator :=` i manage one or multiple `KES` instances
* `user :=` i only create `ES`, KES is managed by someone else

### User Stories
From that we can derive the following requirements or user-stories:
1. AS a KES operator i want to run multiple KES instances per cluster (e.g. one KES instance per DEV/PROD)
2. AS a KES operator or user i want to integrate **multiple stores** from a **single KES instance** (e.g. dev namespace has access only to dev secrets)
3. AS a KES user i want to control the sink for the secrets (aka frontend: store secret as `kind=Secret`)
4. AS a KES user i want to fetch **from multiple** stores and store the secrets **in a single** Frontend
5. AS a KES operator i want to limit the access to certain stores or subresources (e.g. having one central KES instance that handles all ES - similar to `iam.amazonaws.com/permitted` annotation per namespace)
4. AS a KES user i want to provide an application with a configuration that contains a secret

### Stores

These stores are relevant:
* AWS Secure Systems Manager Parameter Store
* AWS Secrets Manager
* Hashicorp Vault
* Azure Key Vault
* Alibaba Cloud KMS Secret Manager
* Google Cloud Platform Secret Manager
* Kubernetes (see #422)
* noop (see #476)

### Frontends
moolen marked this conversation as resolved.
Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Flydiverny I like the idea of a universal injector in addition to the secret sync to allow injection via multiple sources to runtime env or file, without the need of using (potentially) 3+ separate upstream injectors with varying installation and usage


* Kind=Secret
* *potentially* we could sync store to store

## Proposal

### API

### External Secret

The `ExternalSecret` CustomResourceDefinition is **namespaced**. It defines the following:
1. source for the secret (store)
2. sink for the secret (fronted)
3. and a mapping to translate the keys

```yaml
apiVersion: external-secrets.k8s.io/v1alpha1
kind: ExternalSecret
metadata: {...}

spec:

# the amount of time before the values will be read again from the store
refreshInterval: "1h"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a default value for refreshInterval? This can be a sensitive field, since it might make users reach their provider API limits.

If not, we need to mark this as required (and might give some advice on how to estimate it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default reconcile loops happen between 10h (if we don't change anything to the generated stuff), right?


# there can only be one target per ES
target:
# The secret name of the resource
# defaults to .metadata.name of the ExternalSecret. immutable.
name: my-secret

# Enum with values: 'Owner', 'Merge', or 'None'
# Default value of 'Owner'
# Owner creates the secret and sets .metadata.ownerReferences of the resource
# Merge does not create the secret, but merges in the data fields to the secret
# None does not create a secret (future use with injector)
creationPolicy: 'Merge'

# specify a blueprint for the resulting Kind=Secret
template:
type: kubernetes.io/dockerconfigjson # or TLS...

metadata:
annotations: {}
labels: {}

# use inline templates to construct your desired config file that contains your secret
data:
config.yml: |
endpoints:
- https://{{ .data.user }}:{{ .data.password }}@api.exmaple.com

# Uses an existing template from configmap
# secret is fetched, merged and templated within the referenced configMap data
# It does not update the configmap, it creates a secret with: data["alertmanager.yml"] = ...result...
templateFrom:
- configMap:
name: alertmanager
items:
- key: alertmanager.yaml

# data contains key/value pairs which correspond to the keys in the resulting secret
data:

# EXAMPLE 1: simple mapping
# one key from a store may hold multiple values
# we need a way to map the values to the frontend
# it is the responsibility of the store implementation to know how to extract a value
tls.crt:
key: /corp.org/dev/certs/ingress
property: pubcert
tls.key:
key: /corp.org/dev/certs/ingress
property: privkey

# used to fetch all properties from a secret.
# if multiple dataFrom are specified, secrets are merged in the specified order
dataFrom:
- key: /user/all-creds

# status holds the timestamp and status of last last sync
status:
lastSync: 2020-09-01T18:19:17.263Z # ISO 8601 date string
status: success # or failure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A message field with the stringified error would be great for diagnostics.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, any reason not to go with a condition array block here for readiness?

Copy link
Member Author

@moolen moolen Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, conditions would fit in here nicely!

What about:

status:
  # represents the current phase of secret sync:
  # * Pending | ES created, controller did not (yet) sync the ES or other dependencies are missing (e.g. secret store or configmap template)
  # * Syncing | ES is being actively synced according to spec
  # * Failing | Secret can not be synced, this might require user intervention
  # * Failed  | ES can not be synced right now and will not able to
  # * Completed | ES was synced successfully (one-time use only)
  phase: Syncing
  lastSyncTime: "2020-09-23T16:27:53Z"
  failureReason: "..."
  failureMessage: "..."
  
  conditions:
  - type: InSync
    status: "True" # False if last sync was not successful
    reason: "SecretSynced"
    message: "Secret was synced"
    lastTransitionTime: "2019-08-12T12:33:02Z"
    lastHeartbeatTime: "2020-09-23T16:27:53Z"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few comments

  • lastHeartbeatTime doesnt need to be used here, that is generally only for conditions which need explicit heart beats
  • failureReason, failureMessage already fit within the condition, those would better fit with events in this case rather than explicitly in the base status.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding lastHeartbeatTime: i think this is the case here. The InSync condition should be updated with every sync. We should consider renaming this to lastSyncTime.

failure* as attributes are duplicates, true. Ill remove them.

```

This API makes the options more explicit rather than having annotations.


### External Secret Store

The store configuration in an `ExternalSecret` may contain a lot of redundancy, this can be factored out into its own CRD.
These stores are defined in a particular namespace using `SecretStore` **or** globally with `GlobalSecretStore`.

```yaml
apiVerson: external-secrets.k8s.io/v1alpha1
kind: SecretStore # or ClusterSecretStore
metadata:
name: vault
namespace: example-ns
spec:

# optional.
# used to select the correct KES controller (think: ingress.ingressClassName)
# The KES controller is instantiated with a specific controller name
# and filters ES based on this property
controller: "dev"

store:
# store implementation
vault:
moolen marked this conversation as resolved.
Show resolved Hide resolved
server: "https://vault.example.com"
path: secret/data
auth:
kubernetes:
path: kubernetes
role: example-role
secretRef:
name: vault-secret

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should SecretStore have a status? W/ secret stores implementing some sort of liveliness check so you can validate a SecretStore is healthy before adding secrets?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be useful but slightly challenging. Some SecretStore types may not have permissions to do inspection, for instance if Vault role isn't granted lookup-self permissions then the SecretStore may report failed liveness but still have permission to fetch ExternalSecrets.

I think my recommendation would be to implement this with best effort and documenting the potential permissions store backends may need to support the liveness check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, determining the status of an SecretStore is challenging. I think it would be great to have Events on the SecretStore CRD this way it is easier to observe if a single ES or multiple ES are having problems.

A basic is ready / is running could look like this:

status:
  # * Pending: prerequisites missing / e.g. referenced secret containing credentials
  # * Running: all dependencies are met
  phase: Running
  conditions:
  - type: Ready
    status: "False"
    reason: "ErrorConfig"
    message: "Unable to assume role arn:xxxx"
    lastTransitionTime: "2019-08-12T12:33:02Z"
    lastHeartbeatTime: "2020-09-23T16:27:53Z"

```

Example Secret that uses the reference to a store
```yaml
apiVersion: external-secrets.k8s.io/v1alpha1
kind: ExternalSecret
metadata:
name: foo
spec:
storeRef:
kind: SecretStore # ClusterSecretStore
name: my-store
frontend:
secret:
name: my-secret
template:
type: kubernetes.io/TLS
data:
tls.crt:
key: /corp.org/dev/certs/ingress
property: pubcert
tls.key:
key: /corp.org/dev/certs/ingress
property: privkey
```

Workflow in a KES instance:
1. A user creates a Store with a certain `spec.controller`
2. A controller picks up the `ExternalSecret` if it matches the `controller` field
3. The controller fetches the secret from the provider and stores it as kind=Secret in the same namespace as ES


## Backlog

We have a bunch of features which are not relevant for the MVP implementation. We keep the features here in this backlog. Order is not specific:

1. Secret injection with a mutating Webhook [#81](https://github.com/godaddy/kubernetes-external-secrets/issues/81)