-
Notifications
You must be signed in to change notification settings - Fork 26
chore: introduce metrics change #307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
ccd27f7
402d3cd
0fcc10e
97380ef
a42a891
61fe175
e95b0f2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,105 @@ | ||||||
[//]: # "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved." | ||||||
[//]: # "SPDX-License-Identifier: CC-BY-SA-4.0" | ||||||
|
||||||
# Interacting with a Metrics Interface in the AWS Encryption SDK family of products (Background) | ||||||
|
||||||
## Definitions | ||||||
|
||||||
### Conventions used in this document | ||||||
|
||||||
The key words | ||||||
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" | ||||||
in this document are to be interpreted as described in | ||||||
[RFC 2119](https://tools.ietf.org/html/rfc2119). | ||||||
|
||||||
## Issues and Alternatives | ||||||
|
||||||
Crypto Tools (CT) publishes software libraries. The latest | ||||||
versions of these libraries have no logging or metrics publishing | ||||||
to either a local application or to an observability service like AWS CloudWatch. | ||||||
|
||||||
As client side encryption libraries emitting metrics must be done carefully as | ||||||
to avoid accidentally [leaking](https://github.com/aws/aws-encryption-sdk-python/pull/105/files) any information related to the plaintext that could lead to a | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if we want to link to this :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am planning on removing it before merging but wanted it here for folks to get context. |
||||||
loss of customer trust. | ||||||
|
||||||
A popular feature request has been for in depth insights into CT libraries. Many customers | ||||||
ask for suggestions on how to reduce network calls to AWS Key Management Service (AWS KMS) and | ||||||
followup questions around cache performance. | ||||||
|
||||||
CT offers solutions to reduce network calls to AWS KMS through the Caching CMM and the AWS KMS Hierarchical Keyring. | ||||||
Today, there is no CT solution for customers to extract the performance metrics customers are looking for. | ||||||
This can lead to frustrating debugging sessions and escalations that | ||||||
could have been resolved with additional information. | ||||||
|
||||||
Recent customer demand has allowed CT to re-evaluate client side metrics to offer | ||||||
a better customer experience. | ||||||
|
||||||
### Issue 1: What will be the default behavior? | ||||||
|
||||||
As a client-side encryption library CT should be as cautious as possible. | ||||||
Customers of CT libraries should be on the driver seat and determine for | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
english is silly |
||||||
themselves if their application could benefit from emitting metrics. | ||||||
Making that decision for customers can erode customer trust. | ||||||
|
||||||
For CT to be comfortable with allowing metrics, CT must consider that | ||||||
this process must not affect the availability of the consumer of the library. | ||||||
|
||||||
#### Opt-In (recommended) | ||||||
|
||||||
By not emitting metrics by default existing customer workflows do not change. | ||||||
|
||||||
This allows customers to test how their applications behave when they start to emit | ||||||
metrics. Customers can then ask for updates to the implementations | ||||||
CT provides or customers can go an implement their own interfaces that are fine-tuned | ||||||
to their use cases. | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a middle ground? For example, logging information locally without uploading it anywhere. |
||||||
#### Always | ||||||
|
||||||
This option implies that CT guarantees that the availability of an application | ||||||
will not change. Perhaps a bold implication this is ultimately what the customer | ||||||
will feel like; getting no choice on the matter and opting to not upgrade. | ||||||
Going from never emitting metrics to always emitting them says to customers | ||||||
that their application no matter its use case will always benefit from metrics. | ||||||
Without letting customers make that choice, CT looses hard earned customer trust. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
This also forces customers to make a choice, start collecting metrics and pick up | ||||||
additional updates CT provides or get stuck in a version of the library that will | ||||||
become unsupported. | ||||||
|
||||||
Additionally, requiring customers to start emitting metrics | ||||||
almost certainly guarantees a breaking change across supported libraries. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is true There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How so? If we require that all customer facing APIs now require a metric agent, then you can't pick up this change in a minor version because you are required to make a code change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, not that I am advocating for this, but we could just provide a metric agent. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh sure, I apparently forgot this was in the context of specific API changes, I retract my objection |
||||||
|
||||||
### Issue 2: Should Data Plane APIs fail if metrics fail to publish? | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about latency? This is somewhat relevant to the discussion around availability, particularly in technical terms. What I'm getting at is, are we planning to make blocking calls to CT or wherever, or have a separate thread/pool that does so periodically? Do we get this for free from the CT metrics agent (or whatever) or do we have to write this ourselves? In any case we should specify what the bar is here for performance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 . We should delegate the whole metric stuff to some known frame work. Maintaining this would be difficult and in any case we shouldn't be reinventing the wheel. |
||||||
|
||||||
#### No (recommended) | ||||||
|
||||||
Metrics publishing must not impact application availability. | ||||||
|
||||||
CT should allow for a fail-open approach when metrics fail to publish. | ||||||
This will prevent metric publishing issues from impacting the | ||||||
core functionality of the application. | ||||||
|
||||||
CT can consider this a two-way door with initially not attempting to retry | ||||||
to publish failed metrics and add this functionality later on. | ||||||
|
||||||
#### Yes | ||||||
|
||||||
This will become a problem for the libraries and will undoubtedly result | ||||||
in customer friction and failing adoption rates. | ||||||
Failing operations due to metrics not being published leaves the availability | ||||||
of the application to rest on the implementation of the metrics interface. | ||||||
This should not be the case, metrics should aid the customer application | ||||||
not restrict it. | ||||||
|
||||||
### Issue 3: How will customers interact with the libraries to emit metrics? | ||||||
|
||||||
#### Provide an Interface | ||||||
|
||||||
Keeping in line with the rest of CT features, a well defined interface with out | ||||||
of the box implementations should satisfy the feature request. | ||||||
|
||||||
Out of the box implementations should cover publishing metrics to an | ||||||
existing observability service like AWS CloudWatch and to the local file system. | ||||||
These implementations should offer customers a guide into implementing their own | ||||||
if they wish to do so. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,169 @@ | ||||||
[//]: # "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved." | ||||||
[//]: # "SPDX-License-Identifier: CC-BY-SA-4.0" | ||||||
|
||||||
# Adding a Metrics Interface | ||||||
|
||||||
**_NOTE: This document will be used to gain alignment on | ||||||
this interface should look like and how it could be integrated with | ||||||
existing operations. This document will not seek to specify | ||||||
a Metrics implementation or specify which metrics will be collected | ||||||
from impacted APIs or configurations._** | ||||||
|
||||||
## Affected APIs or Client Configurations | ||||||
|
||||||
This serves as a reference of all APIs and Client Configurations that this change affects. | ||||||
This list is not exhaustive. Any downstream consumer of any API or client configuration SHOULD | ||||||
also be updated as part of this proposed changed. | ||||||
|
||||||
| API/ Configuration | | ||||||
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||||
| [Encrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/client-apis/encrypt.md) | | ||||||
| [Decrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/client-apis/decrypt.md) | | ||||||
| [GetEncryptionMaterials](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/cmm-interface.md#get-encryption-materials) | | ||||||
| [DecryptionMaterials](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/cmm-interface.md#decrypt-materials) | | ||||||
| [OnEncrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/keyring-interface.md#onencrypt) | | ||||||
| [OnDecrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/keyring-interface.md#ondecrypt) | | ||||||
| [DynamoDB Table Encryption Config](https://github.com/aws/aws-database-encryption-sdk-dynamodb/blob/main/specification/dynamodb-encryption-client/ddb-table-encryption-config.md) | | ||||||
|
||||||
## Affected Libraries | ||||||
|
||||||
| Library | Version Introduced | Implementation | | ||||||
| ------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ||||||
| ESDK | T.B.D | [ESDK.smithy](https://github.com/aws/aws-encryption-sdk/blob/mainline/AwsEncryptionSDK/dafny/AwsEncryptionSdk/Model/esdk.smithy) | | ||||||
| MPL | T.B.D | [material-provider.smithy](https://github.com/aws/aws-cryptographic-material-providers-library/blob/main/AwsCryptographicMaterialProviders/dafny/AwsCryptographicMaterialProviders/Model/material-provider.smithy) | | ||||||
| DB-ESDK | T.B.D | [DynamoDbEncryption.smithy](https://github.com/aws/aws-database-encryption-sdk-dynamodb/blob/main/DynamoDbEncryption/dafny/DynamoDbEncryption/Model/DynamoDbEncryption.smithy) | | ||||||
|
||||||
## Definitions | ||||||
|
||||||
### Conventions used in this document | ||||||
|
||||||
The key words | ||||||
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" | ||||||
in this document are to be interpreted as described in | ||||||
[RFC 2119](https://tools.ietf.org/html/rfc2119). | ||||||
|
||||||
## Summary | ||||||
|
||||||
Existing users of Crypto Tools (CT) libraries do no have any insights as to | ||||||
how the librar(y/ies) behave(s) in their application. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
simplify, we know what you mean |
||||||
This can lead to frustrating debugging sessions where users | ||||||
are required to have explicit tests to assert they are using a particular feature | ||||||
correctly, or if customers are using any of the KMS keyrings users have to have | ||||||
AWS Cloudwatch open to verify their application is sending values users expect. | ||||||
This can be seen as a best practice and users may find this a good exercise; | ||||||
however, CT's libraries do not make debugging an easy task. | ||||||
|
||||||
A feature which allows customers to get real-time telemetry of their application's | ||||||
integration with CT's libraries would be welcomed by users and will deliver on the | ||||||
"easy to use and hard to misuse" tenet. | ||||||
|
||||||
Introducing a new interface that defines the operations that must be | ||||||
implemented in order to build a specification compliant MetricsAgent. | ||||||
|
||||||
## Requirements | ||||||
|
||||||
The interface should have three requirements. | ||||||
|
||||||
1. MUST be simple. | ||||||
1. MUST be extensible. | ||||||
|
||||||
The following is documented to signify its importance | ||||||
even though the interface is not able to make this guarantee. | ||||||
Every implementation of the proposed interface must | ||||||
ensure the following. | ||||||
|
||||||
1. MUST NOT block the application's execution thread. | ||||||
|
||||||
## Points of Integration | ||||||
|
||||||
To collect metrics across CT's library stack multiple points of integration | ||||||
are needed in order to collect metrics across CT's libraries. | ||||||
Generally, CT's libraries work as follows: | ||||||
|
||||||
_Note: Not every Client supports the Material Provider Library. | ||||||
The Diagram below assumes it to help the mental model._ | ||||||
|
||||||
```mermaid | ||||||
sequenceDiagram | ||||||
participant Client | ||||||
box Material Provider Library | ||||||
participant CMM | ||||||
participant Keyring | ||||||
end | ||||||
loop Content Encryption | ||||||
Client->>Client: Content Encryption | ||||||
end | ||||||
Client<<->>CMM: GetEncryption/Decryption Materials | ||||||
CMM<<->>Keyring: OnEncrypt/OnDecrypt | ||||||
``` | ||||||
|
||||||
To optionally emit metrics from a top level client, | ||||||
all customer facing APIs MUST be changed to optionally accept | ||||||
a Metrics Agent. This will allow customers to define and supply one top | ||||||
level Metrics Agent; this agent will get plumbed throughout CT's stack. | ||||||
|
||||||
For example, in the ESDK for Java this would look like: | ||||||
|
||||||
```java | ||||||
final AwsCrypto crypto = AwsCrypto.builder().build(); | ||||||
// Create a Keyring | ||||||
final MaterialProviders matProv = | ||||||
MaterialProviders.builder() | ||||||
.MaterialProvidersConfig(MaterialProvidersConfig.builder().build()) | ||||||
.build(); | ||||||
|
||||||
final IKeyring rawAesKeyring = matProv.CreateRawAesKeyring(keyringInput); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we implementing our own metric agent? I'm pretty sure there exists some framework which we can leverage. Customers can optionally provide the agent, or we do no-op (like logging frameworks such as slf4j etc). |
||||||
final Map<String, String> encryptionContext = | ||||||
Collections.singletonMap("ExampleContextKey", "ExampleContextValue"); | ||||||
|
||||||
// Create a Metrics Agent | ||||||
final IMetricsAgent metrics = matProv.CreateSimpleMetricsAgent(metricsAgentInput); | ||||||
// 4. Encrypt the data | ||||||
final CryptoResult<byte[], ?> encryptResult = | ||||||
crypto.encryptData(rawAesKeyring, metrics, EXAMPLE_DATA, encryptionContext); | ||||||
final byte[] ciphertext = encryptResult.getResult(); | ||||||
``` | ||||||
|
||||||
This change will allow Crypto Tools to introduce a Metrics Agent in a | ||||||
non-breaking way as the Metrics Agent will be an optional parameter | ||||||
at customer facing API call sites. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we support providing a metrics agent to the client constructor? Such that the same instance would then be used for each API call. I don't necessarily think we should but it might be worth discussing here. |
||||||
|
||||||
Currently, the ESDK client APIs models are defined [here](https://github.com/aws/aws-encryption-sdk/blob/mainline/AwsEncryptionSDK/dafny/AwsEncryptionSdk/Model/esdk.smithy#L60-L126). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume that the DB-ESDK item encryptor would be similar. |
||||||
This change would see that the client APIs be changed as follows: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can't be the only change. I like the detail, but we should explicitly add the CMM/Keyring inputs to this list. |
||||||
|
||||||
```diff | ||||||
structure EncryptInput { | ||||||
@required | ||||||
plaintext: Blob, | ||||||
|
||||||
encryptionContext: aws.cryptography.materialProviders#EncryptionContext, | ||||||
|
||||||
// One of keyring or CMM are required | ||||||
materialsManager: aws.cryptography.materialProviders#CryptographicMaterialsManagerReference, | ||||||
keyring: aws.cryptography.materialProviders#KeyringReference, | ||||||
|
||||||
algorithmSuiteId: aws.cryptography.materialProviders#ESDKAlgorithmSuiteId, | ||||||
|
||||||
frameLength: FrameLength, | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this probably isn't possible in Smithy/Smithy-Dafny but we might want to consider having a more open-ended "request override" option for things like this. (This is the same way the SDK does it.) |
||||||
+ metricsAgent: aws.cryptography.materialProviders#MetricsAgentReference | ||||||
} | ||||||
... | ||||||
structure DecryptInput { | ||||||
@required | ||||||
ciphertext: Blob, | ||||||
|
||||||
// One of keyring or CMM are required | ||||||
materialsManager: aws.cryptography.materialProviders#CryptographicMaterialsManagerReference, | ||||||
keyring: aws.cryptography.materialProviders#KeyringReference, | ||||||
//= aws-encryption-sdk-specification/client-apis/keyring-interface.md#onencrypt | ||||||
//= type=implication | ||||||
//# The following inputs to this behavior MUST be OPTIONAL: | ||||||
// (blank line for duvet) | ||||||
//# - [Encryption Context](#encryption-context) | ||||||
encryptionContext: aws.cryptography.materialProviders#EncryptionContext, | ||||||
|
||||||
+ metricsAgent: aws.cryptography.materialProviders#MetricsAgentReference | ||||||
} | ||||||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit/Issue: We have not defined metrics.
While the definition might seem obvious to some, metrics can either be a description of the library's performance in the customer's application OR they could be telementry on the usage of Crypto Tools products.
Having read the docs, it is clear that is NOT telementry, and maybe it is only Tony Knapp who gets the two confused, but we could clarify this.