Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: ADR for creation and distribution of secrets #141

Merged
merged 8 commits into from
Aug 12, 2020
1 change: 1 addition & 0 deletions docs_src/design/TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@
| [0004 Feature Flags](./adr/0004-Feature-Flags.md) | Feature Flag Implementation |
| [0005 Service Self Config Init](./adr/0005-Service-Self-Config.md) | Service Self Config Init & Config Seed Removal |
| [0007 Release Automation](./adr/devops/0007-Release-Automation.md) | Overview of Release Automation Flow for EdgeX |
| [0008 Secret Distribution](./adr/security/0008-Secret-Creation-and-Distribution.md) | Creation and Distribution of Secrets |
| [0011 Device Service REST API](./adr/device-service/0011-DeviceService-Rest-API.md) | The REST API for Device Services in EdgeX v2.x |
363 changes: 363 additions & 0 deletions docs_src/design/adr/security/0008-Secret-Creation-and-Distribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,363 @@
# Creation and Distribution of Secrets

## Status

** proposed **

## Context

This ADR seeks to clarify and prioritize the secret handling approach taken by EdgeX.

EdgeX microservices need a number of secrets to be created and distributed
in order to create a functional, secure system.
Among these secrets are:

- Privileged administrator passwords (such as a database superuser password)
tonyespy marked this conversation as resolved.
Show resolved Hide resolved
- Service account passwords (e.g. non-privileged database accounts)
- PKI private keys

There is a lack of consistency on how secrets are created and distributed to EdgeX microservices,
and when developers need to add new components to the system,
it is unclear on what the preferred approach should be.

This document assumes a threat model wherein the EdgeX services are sandboxed
(such as in a snap or a container) and the host system is trusted,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the host system is trusted

Is this an OK posture to take?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an OK position to take if one is actually building a production system based on EdgeX. However, as a middleware it is impossible for EdgeX to know the environment into which it is being deployed, nor does EdgeX mandate an underlying infrastructure. This statement basically declares the host OS as out of scope.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the compliant / non-compliant designation but I think we should preface those decisions with what quantifies something as compliant / non-compliant.

Added language to describe what these terms mean.

I think a section we're missing is the addition of some processes going forward so we avoid regressing. Stand, walk, run: first version maybe as a wiki page?

Please elaborate.

I'm kind of surprised / disappointed there isn't an OWASP cheatsheet for this already. Closest I could find was OWASP/CheatSheetSeries#124

Yes, it is a broad problem to solve in a generic way across the industry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a section we're missing is the addition of some processes going forward so we avoid regressing.

The document has a priority list of how to create / distribute secrets and an audit in time of now. Do we need a process in place such that going forward a secret doesn't get hard coded (PR template updated, scheduled audits, tooling to detect passwords being commited etc)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document has a priority list of how to create / distribute secrets and an audit in time of now. Do we need a process in place such that going forward a secret doesn't get hard coded (PR template updated, scheduled audits, tooling to detect passwords being commited etc)?

I am open to suggestions, but I won't want to propose anything that I'm unwilling to do myself. (e.g. watch every PR)

and all services running in a single snap share a trust boundary.

### Terms

The following terms will be helpful for understading the subsequent discussion:

- _SECRETSLOC_ is a protected file system path where bootstrapping secrets are stored.

While EdgeX implements a sophisticated secret handling mechanism,
that mechanism itself requires secrets.
For example, every microservice that talks to Vault
must have its own unique secret to authenticate:
Vault itself cannot be used to distribute these secrets.
_SECRETSLOC_ fulfills the role that the non-routable
instance data IP address, 169.254.169.254,
fulfills in the public cloud:
delivery of bootstrapping secrets.
As EdgeX does not have a hypervisor nor virtual machines for this purpose,
a protected file system path is used instead.

_SECRETSLOC_ is implementation-dependent.
A desirable feature of _SECRETSLOC_ would be that data written here
is kept in RAM and is not persisted to storage media.
This property is not achieveable in all circumstances.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative to a file system and something kept in memory with optional persistence, Redis Labs uses Redis instances for this purpose. The runtime overhead of another Redis instance is a couple of MB and storing a few key/value pairs won't change that. Simple shell scripts can return the values.

Copy link
Collaborator Author

@bnevis-i bnevis-i Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the Redis Labs approach is good, many open source projects can only consume data from files, making it the lowest-common denominator. I will try to fit this into the description of secretsloc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andresrinivasan The way I have explained this is that SECRETSLOC is the equivalent of 169.254.169.254 in the public cloud -- it delivers bootstrapping secrets.

For Docker, a list of suggested paths--in preference order--is:

* `/run/edgex/secrets` (a `tmpfs` volume on a Linux host)
* `/tmp/edgex/secrets` (a temporary file area on Linux and MacOS hosts)
* A persistent docker volume (use when host bind mounts are not available)

For snaps, a list of suggested paths-in preference order--is:
* `/run/snap.`_$SNAP_NAME_`/` (a `tmpfs` volume on a Linux host)
* _$SNAP_DATA_`/secrets` (a snap-specific persistent data area)
* _TBD_ (a content interface that allows for sharing of secrets from the core snap)


### Current practices survey

A survey on the existing EdgeX secrets reveals the following appoaches.

A designation of "compliant" means that the current implementation
is aligned with the recommended practices documented in the next section.
A designation of "non-compliant" means that the current implementation
uses an implemention mechanism outside of the recommended practices documented in the next section.
A "non-compliant" implementation is a candidate for refactoring
to bring the implementation into conformance with the recommended practices.

#### System-managed secrets

- PKI private keys
* Docker: PKI generated by standalone utility every cold start of the framework. Distribution via _SECRETSLOC_. (Compliant.)
* Snaps: PKI generated by standalone utility every cold start of the framework. Deployed to _SECRETSLOC_. (Compliant.)

- Secret store master password
* Docker: Distribution via persistent docker volume. (Non-compliant.)
* Snaps: Stored in `$SNAP_DATA/config/security-secrets-setup/res`. (Non-compliant.)

- Secret store per-service authentication tokens
* Docker: Distribution via _SECRETSLOC_ generated every cold start of the framework. (Compliant.)
* Snaps: Distribution via _SECRETSLOC_, generated every cold start of the framework. (Compliant.)

tonyespy marked this conversation as resolved.
Show resolved Hide resolved
- Postgres superuser password
* Docker: Hard-coded into docker-compose file, checked in to source control. (Non-compliant.)
* Snaps: Generated at snap install time via "apg" ("automatic password generator") tool, installed into Postgres, cached to `$SNAP_DATA/config/postgres/kongpw` (non-compliant), and passed to Kong via `$KONG_PG_PASSWORD`.

- MongoDB service account passwords
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be deprecated since we are using Redis from now on by default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be deprecated since we are using Redis from now on by default?

Hard to tell what lines were highlighted, but:

Postgres: Kong only works with Postgres and Cassandra not Redis so Postgres has to stay.
MongoDB: We reverted the PR to remove PW generation for MongoDB just in case someone wants to keep using it, so I think this should stay as well.

* Docker: Direct consumption from secret store. (Compliant.)
* Snaps: Direct consumption from secret store. (Compliant.)

- Redis authentication password
* Docker: Server--staged to secrets volume and injected via command line. (Non-compliant.). Clients--direct consumption from secret store. (Compliant.)
* Snaps: Server--staged to `$SNAP_DATA/secrets/edgex-redis/redis5-password` and injected via command line. (Non-compliant.). Clients--direct consumption from secret store. (Compliant.)
tonyespy marked this conversation as resolved.
Show resolved Hide resolved

- Kong client authentication tokens
* Docker: System of reference is unencrypted Postgres database. (Non-compliant.)
* Snaps: System of reference is unencrypted Postgres database. (Non-compliant.)
tonyespy marked this conversation as resolved.
Show resolved Hide resolved

Note: in the current implementation,
Consul is being operated as a public service.
Consul will be a subject of a future "bootstrapping ADR"
due to its role in serivce location.

#### User-managed secrets

User-managed secrets functionality is provided by `app-functions-sdk-go`.

If security is enabled, secrets are retrieved from Vault.
If security is disabled, secrets are retreived from the configuration provider.
If the configuration provider is not available, secrets are read from the underlying `.toml`.
It is taken as granted in this ADR that secrets originating in the configuration provider
or from `.toml` configuration files are not secret.
The fallback mechanism is provided as a convienience to the developer,
who would otherwise have to litter their code with "if (isSecurityEnabled())" logic leading to implementation inconsistencies.

The central database credential is supplied by `GetDatabaseCredentials()`
and returns the database credential assigned to `app-service-configurable`.
If security is enabled, database credentials are retreived using the standard flow.
If security is disabled, secrets are retreived from the configuration provider
from a special section called `[Writable.InsecureSecrets]`.
If not found there, the configuration provider is searched
for credentials stored in the legacy `[Databases.Primary]` section
using the `Username` and `Password` keys.

Each user application has its own exclusive-use area of the secret store
that is accessed via `GetSecrets()`.
If security is enabled, secret requests are passed along to `go-mod-secrets`
using an application-specific access token.
If security is disabled, secret requets are made to the configuration provider
from the `[Writable.InsecureSecrets]` section.
There is no fallback configuration location.

As user-managed secrets have no framework support for initialization,
a special `StoreSecrets()` method is made available
to the application for the application to initialize its own secrets.
This method is only available in security-enabled mode.

No changes to user-managed secrets are being proposed in this ADR.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this ADR mention that similar support for user-managed secrets should be added to the device SDKs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this ADR mention that similar support for user-managed secrets should be added to the device SDKs?

No opinion. Hard to write about when no design has been proposed. @lenny-intel do you input on the above?



## Decision

### Creation of secrets

Management of hardware-bound secrets is platform-specific
and out-of-scope for the EdgeX framework.
EdgeX open source will contain only the necessary hooks to integrate
platform-specific functionality.

For software-managed secrets, the
[_system of referece_](https://www.dqglossary.com/record%20of%20reference.html)
of secrets in EdgeX is the EdgeX secret store.
The EdgeX secret store provides for encryption of secrets at rest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some naive questions:

  • is the edgex secret store in security enabled mode only?
  • is the edgex secret store backend hashicorp vault for all secrets or just some?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • is the edgex secret store in security enabled mode only?

Yes.

  • is the edgex secret store backend hashicorp vault for all secrets or just some?

Yes, the backend is Vault. It is not possible to store all secrets in Vault, but as many as possible should be stored in Vault.
As a counter example, the access token needed to connect to Vault itself cannot be stored in Vault. As a fallback mechanism when security is not enabled but a registry is used, secrets will be fetched from Consul. IMHO this is worse than just leaving them on the file system, as Consul is a networked service. However, security disabled in security disabled.

This term means that if a secret is replicated,
the EdgeX secret store is the authoritative source of truth of the secret.
Whenever possible, the EdgeX secret store should also be the
[_record of origin_](https://www.dqglossary.com/record%20of%20origin.html)
of a secret as well.
This means creating secrets inside of the EdgeX secret store
is preferrable to importing an externally-created secret into the secret store.
tonyespy marked this conversation as resolved.
Show resolved Hide resolved
This can often be done for framework-managed secrets,
but not possible for user-managed secrets.

### Choosing between alternative forms of secrets

When given a choice between plain-text secrets
and cryptographic keys,
cryptographic keys should be preferred.

An example situation would be the introduction of an MQTT message broker.
A broker may support both TLS client authentication as well as username/password authentication.
In such a situation, TLS client authentication would be preferred:

- The cryptographic key is typically longer in bits than a plain-text secret.
- A plain-text secret will require transport encryption in order to protect confidentiality of the secret, such as server-side TLS.
- Use of TLS client authentication typically eliminates the need for additional assets on the server side (such as a password database) to authenticate the client, by relying on digital signature instead.

TLS client authentication **should not be used**
unless there is a capability to revoke a compromised certificate,
such as by replacing the certificate authority,
or providing a certificate revokation list to the server.
If certificate revokation is not supported,
plain-text secrets (such as username/password) should be used instead,
as they are typically easier to revoke.

### Distribution and consumption of secrets

#### Prohibited practices

Use of hard-coded secrets is an instance of
[CWE-798: Use of hard-coded credentials](https://cwe.mitre.org/data/definitions/798.html)
and is not allowed.
A hard-coded secret is a secret that is the same across multiple EdgeX instances.
Hard-coded secrets make devices susceptible to BORE (break-once-run-everywhere) attacks,
where collections of machines can compromised by a single replicated secret.
Specific cases where this is likely to come up are:

- Secrets embedded in source control

EdgeX is an open-source project.
Any secret that is present in an EdgeX repository is public to the world,
and therefore not a secret, by definition.
Configuration files, such as .toml files, .json files, .yaml files
(including `docker-compose.yml`) are specific instances of this practice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a survey of existing code? There are quite a few instances of username/password being defined in existing EdgeX services:

https://github.com/edgexfoundry/device-mqtt-go/blob/master/cmd/res/configuration.toml#L60

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a survey of existing code? There are quite a few instances of username/password being defined in existing EdgeX services

Yes, but I am not sure that we should do it RIGHT NOW. Besides, there is thorn in the side of EdgeX--EDGEX_SECURITY_SECRET_STORE=false--which if set essentially mandates this practice.


- Secrets embedded in binaries

Binaries are usually not protected against confidentiality threats,
and binaries can be easily reverse-engineered to find any secrets therein.
Binaries included compile executables as well as Docker images.

#### Recommended practices
bnevis-i marked this conversation as resolved.
Show resolved Hide resolved

1. Direct consumption from process-to-process interaction with secret store

This approach is only possible for components that have native support for
[Hashicorp Vault](https://www.vaultproject.io/).
This includes any EdgeX service that links to go-mod-secrets.

For example, if secretClient is an instance of the go-mod-secrets
secret store client:

```go
secrets, err := secretClient.GetSecrets("myservice", "username", "password")
```

The above code will retrieve the `username` and `password` properties
of the `myservice` secret.

2. Dynamic injection of secret into process environment space
bnevis-i marked this conversation as resolved.
Show resolved Hide resolved

Environment variables are part of a process' environment block
and are mapped into a process' memory.
In this scenario,
an intermediary makes a connection to the secret store to fetch a secret,
store it into an environment variable,
and then launches a target executable,
thereby passing the secret _in-memory_ to the target process.

Existing examples of this functionality include
[vaultenv](https://github.com/channable/vaultenv),
[envconsul](https://github.com/hashicorp/envconsul),
or [env-aws-params](https://github.com/gmr/env-aws-params).
These tools authenticate to a remote network service,
inject secrets into the process environment,
and then exec's a replacment process
that inherits the secret-enriched enviornment block.

There are a few potential risks with this approach:

* Environment blocks are passed to child processes by default.
* Environment-variable-sniffing malware (introduced by compromised 3rd party libaries) is a proven attack method.

3. Dynamic injection of secret into container-scoped `tmpfs` volume

An example of this approach is [consul-template](https://github.com/hashicorp/consul-template).
This approach is useful when a secret is required to be in a configuration file
and cannot be passed via an environment variable
or directly consumed from a secret store.

4. Distribution via _SECRETSLOC_

This option is the most widely supported secret distribution mechanism by container orchestrators.

EdgeX supports runtime environments such as standard Docker and snaps
that have no built-in secret management features.

* Generic Docker does not have a built-in secrets mechanism.
Manual configuration of a _SECRETSLOC_ should utilize either
a host file file system path or
a Docker volume.

* Snaps also do not have a built-in secrets mechanism.
The options for _SECRETSLOC_ are limited
to designated snap-writable directories.

For comparison:

* Docker Swarm:
Swarm swarm mode is not officially supported by the EdgeX project.
Docker Swarm secrets are shared via the `/run/secrets` volume,
which is a Linux `tmpfs` volume created on the host and shared with the container.
For an example of Docker Swarm secrets, see the
[docker-compose secrets stanza](https://docs.docker.com/compose/compose-file/#secrets).
Secrets distributed in this manner become part of the RaftDB,
and thus it becomes necessary to enable swarm autolock mode,
which prevents the Raft database encryption key
from being stored plaintext on disk.
Swarm secrets have an additional limitation in that they are not
mutable at runtime.

* Kubernetes:
Kubernetes is not officially supported by the EdgeX project.
Kubernetes also supports the secrets volume approach,
though the secrets volume can be mounted anywhere in the container namespace.
For an example of Kubernetes secrets volumes, see the
[Kubernetes secrets documentation](https://kubernetes.io/docs/concepts/configuration/secret/).
Secrets distributed in this manner become part of the `etcd` database,
and thus it becomes necessary to specify a
[KMS provider for data encryption](https://kubernetes.io/docs/tasks/administer-cluster/kms-provider/)
to prevent `etcd` from storing plaintext versions of secrets.

## Consequences

As the existing implementation is not fully-compliant with this ADR,
significant scope will be added to current and future EdgeX releases
in order to bring the project into compliance.

List of needed improvements:

- PKI private keys
* All: Move to using Vault as system of origin for the PKI instead of the standalone `security-secrets-setup` utility.
* All: Cache the PKI for Consul and Vault on persistent disk; rotate occasionally.
* All: Investigate hardware protection of cached Consul and Vault PKI secret keys. (Vault cannot unseal its own TLS certificate.)
tonyespy marked this conversation as resolved.
Show resolved Hide resolved

- Special case: Bring-your-own external Kong certificate and key
* The Kong external certificate and key is already stored in Vault,
however, additional metadata is needed
to signal whether these are auto-generated or manually-installed.
A manually-installed certificate and key
would not be overwritten by the framework bringup logic.
Installing a custom certificate and key can then be implemented by
overwriting the system-generated ones and setting a flag
indicating that they were manually-installed.

- Secret store master password
* All: Enable hooks for [hardware protection](https://github.com/edgexfoundry/edgex-go/issues/1919) of secret store master password.

- Secret store per-service authentication tokens
tonyespy marked this conversation as resolved.
Show resolved Hide resolved
* No changes required.

- Postgres superuser password
* Generate at install time or on cold start of the framework.
* Cache in Vault and inject into Kong using environment variable injection.

- MongoDB service account passwords
* No changes required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be deprecated i guess

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in Hanoi release, at least. Maybe Ireland.


- Redis(v5) authentication password
* All: Implement process-to-process injection: start Redis unauthenticated, with a post-start hook to read the secret out of Vault and set the Redis password. (Short race condition between Redis starting, password being set, and dependent services starting.)
* No changes on client side.

- Redis(v6) passwords (v6 adds multiple user support)
* Interim solution: handle like MongoDB service account passwords.
Future ADR to propose use of a Vault database secrets engine.
* No changes on client side (each service accesses its own credential)

- Kong authentication tokens
* All: Implement in-transit authentication with TLS-protected Postgres interface.
(Subject to change if it is decided not to enable a Postgres backend out of the box.)
* Additional research needed as PostgreSQL does not support transparent data encryption.

## References

- [ADR for secret creation and distribution](https://github.com/edgexfoundry/edgex-docs/issues/132)
- [CWE-798: Use of hard-coded credentials](https://cwe.mitre.org/data/definitions/798.html)
- [Docker Swarm secrets](https://docs.docker.com/engine/swarm/secrets/)
- [EdgeX go-mod-secrets](https://github.com/edgexfoundry/go-mod-secrets)
- [Hashicorp Vault](https://www.vaultproject.io/)