Skip to content

Commit 021cf6c

Browse files
committed
Added spec document
Signed-off-by: ItalyPaleAle <[email protected]>
1 parent 9efade3 commit 021cf6c

File tree

1 file changed

+137
-0
lines changed

1 file changed

+137
-0
lines changed

crypto/schemes/enc/v1/README.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Dapr encryption scheme v1: `dapr.io/enc/v1`
2+
3+
This document contains the reference for the **Dapr encryption scheme v1**, identified by the header `dapr.io/enc/v1`.
4+
5+
The Dapr encryption scheme is optimized for processing data as a stream. Data is chunked into multiple parts which are encrypted independently. This allows us to return data to callers as a stream, even when decrypting messages, being confident that we are not flushing unverified data to the client.
6+
7+
> **Sources:** The encryption scheme that Dapr uses is heavily inspired by the [Tink wire format](https://developers.google.com/tink/wire-format) (from the Tink library maintained by Google), as well as by Filippo Valsorda's [age](https://age-encryption.org/v1), and Minio's [DARE](https://github.com/minio/sio).
8+
9+
## Key
10+
11+
Each message is encrypted with a 256-bit symmetric **File Key (FK)** that is randomly generated by Dapr for each new message. The key must be generated as 32 byte of output from a CSPRNG (such as Go's `crypto/rand.Reader`) and must not be reused for other files.
12+
13+
The FK is wrapped using a key stored in a key vault (**Key Encryption Key (KEK)**) by Dapr. The result of the wrapping operation is the **Wrapped File Key (WFK)**. The algorithm used depends on the type of the KEK as well as the algorithms supported by the component:
14+
15+
- For symmetric keys:
16+
- AES-KW ([RFC 3394](https://www.rfc-editor.org/rfc/rfc3394.html)): `AES-KW`
17+
- For RSA keys:
18+
- RSA OAEP with SHA-256: `RSA-OAEP-256`
19+
20+
> Other key wrapping algorithms can be implemented in the future.
21+
22+
## Ciphertext format
23+
24+
The ciphertext is formatted as:
25+
26+
```text
27+
header || binary_payload
28+
```
29+
30+
## Header
31+
32+
The **header** is human-readable and contains 3 items, each terminated by a line feed (`0x0A`) character:
33+
34+
1. Name and version of the encryption scheme used. In this version of the spec, this is always `dapr.io/enc/v1`.
35+
2. The manifest, which is a JSON object.
36+
3. The MAC for the header, base64-encoded.
37+
38+
> Base64 encoding follows [RFC 4648 §4](https://datatracker.ietf.org/doc/html/rfc4648#section-4) ("standard" format, with padding included)
39+
40+
For example:
41+
42+
```text
43+
dapr.io/enc/v1
44+
{"k":"mykey","kw":1,"wfk":"hGYjwDpWEXEymSTFZ95zgX8krElb3Gqyls67R8zJA3k=","cph":1,"np":"Y3J5cHRvIQ=="}
45+
pBDKLrhAWL7IAvDKBV/v7lmbTG6AEZbf3srUN0Pnn30=
46+
```
47+
48+
### Manifest
49+
50+
The second line in the header is the **manifest**, which is a compact JSON object.
51+
52+
Its corresponding Go struct is:
53+
54+
```go
55+
type Manifest struct {
56+
// Name of the key that can be used to decrypt the message.
57+
// This is optional, and if specified can be in the format `key` or `key/version`.
58+
KeyName string `json:"k,omitempty"`
59+
// ID of the wrapping algorithm used.
60+
// 0x01 = AES-KW
61+
// 0x02 = RSA-OAEP-256
62+
KeyWrappingAlgorithm int `json:"kw"`
63+
// The Wrapped File Key
64+
WFK []byte `json:"wfk"`
65+
// ID of the cipher used.
66+
// 0x01 = AES-GCM
67+
// 0x02 = ChaCha20-Poly1305
68+
Cipher int `json:"cph"`
69+
// Random sequence of 7 bytes generated by a CSPRNG.
70+
NoncePrefix []byte `json:"np"`
71+
}
72+
```
73+
74+
- **`KeyName`** is the name of the key that can be used to decrypt the message.
75+
Usually this is the same as the name of the key used to encrypt the message, but when asymmetric ciphers are used, it could be different.
76+
Including a `KeyName` in the manifest is not required, but when i'ts present, it's used as the default value for the key name while decrypting the document (however, users can override this value by passing a custom one while decrypting the document).
77+
- **`Cipher`** indicates the cipher used to encrypt the actual data, and it must be an [AEAD](https://en.wikipedia.org/wiki/Authenticated_encryption#Authenticated_encryption_with_associated_data_(AEAD)) symmetric cipher.
78+
- Dapr will choose AES-GCM as cipher by default.
79+
- ChaCha20-Poly1305 is offered as an option for users that work with hardware that doesn't support AES-NI (such as Raspberry Pi), and needs to be enabled explicitly.
80+
- Other AEAD ciphers can be supported in the future if needed, for example.
81+
82+
### MAC
83+
84+
The third and final line of the plaintext header is the MAC for the header, which is the HMAC-SHA-256 hash computed over the previous 2 lines (including the final newline character).
85+
86+
The HMAC key is derived from the (plain-text) File Key with HKDF-SHA-256:
87+
88+
```text
89+
mac-key = HKDF-SHA-256(ikm = file key, salt = empty, info = "header")
90+
MAC = HMAC-SHA-256(key = mac-key, message = first 2 lines of the header, including the trailing newline character)
91+
```
92+
93+
> HKDF-SHA-256 is a key derivation function based on HMAC with SHA-256. See [RFC 5869 ("HMAC-based Extract-and-Expand Key Derivation Function (HKDF)")](https://www.rfc-editor.org/rfc/rfc5869.html). Being based on HMAC, it's not vulnerable to length-extension attacks, so we do not consider using SHA-512 and truncating the output to 256-bits necessary.
94+
95+
Note that there's one newline character (`0x0A`) at the end of the MAC, which concludes the header.
96+
97+
> Because each JSON encoder could produce a slightly different output, when verifying the manifest the MAC should be computed on the exact manifest string as included in the header. Verifiers should not re-encode the message as JSON themselves.
98+
99+
## Binary payload
100+
101+
The binary payload begins immediately after the header (after the 3rd newline character) and it includes the each segment of data encrypted:
102+
103+
```text
104+
segment_0 || segment_1 || ... || segment_k
105+
```
106+
107+
### Segments
108+
109+
The plaintext is chunked into segments of 64KB (65,536 bytes) each; the last segment may be shorter. Segments must never be empty, unless the entire file is empty.
110+
111+
> Because segments are 64KB each, and we can have up to 2^32 segments, the maximum size of the encrypted message is 256TB.
112+
113+
Each segment of plaintext is encrypted independently and stored together with its authentication tag at the end:
114+
115+
```text
116+
encrypted_chunk || tag
117+
```
118+
119+
> Tag size is 16 bytes for AES-GCM and ChaCha20-Poly1305, so each encrypted segment has an overhead of 16 bytes.
120+
121+
Segments are encrypted with a **Payload Key (PK)** that is derived from the plain-text File Key and the nonce prefix:
122+
123+
```text
124+
payload-key = HKDF-SHA-256(ikm = file key, salt = nonce prefix, info = "payload")
125+
```
126+
127+
Each segment is encrypted using a different 12-byte nonce:
128+
129+
```text
130+
nonce_prefix || i || last_segment
131+
```
132+
133+
Where:
134+
135+
- `nonce_prefix` (7 bytes) is the nonce prefix from the header.
136+
- `i` (4 bytes) is the sequence number, as a 32-bit unsigned integer counter, encoded as big-endian. The first segment has sequence number 0, and it increases.
137+
- `last_segment` (1 byte) is `0x01` if this is the last segment, or `0x00` otherwise.

0 commit comments

Comments
 (0)