awslabs · josecorella · Sep 5, 2025 · Sep 9, 2025 · Sep 13, 2025 · Sep 14, 2025
@@ -0,0 +1,105 @@
+[//]: # "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved."
+[//]: # "SPDX-License-Identifier: CC-BY-SA-4.0"
+
+# Interacting with a Metrics Interface in the AWS Encryption SDK family of products (Background)
+
+## Definitions
+
+### Conventions used in this document
+
+The key words
+"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
+in this document are to be interpreted as described in
+[RFC 2119](https://tools.ietf.org/html/rfc2119).
+
+## Issues and Alternatives
+
+Crypto Tools (CT) publishes software libraries. The latest
+versions of these libraries have no logging or metrics publishing
+to either a local application or to an observability service like AWS CloudWatch.
+
+As client side encryption libraries emitting metrics must be done carefully as
+to avoid accidentally [leaking](https://github.com/aws/aws-encryption-sdk-python/pull/105/files) any information related to the plaintext that could lead to a
+loss of customer trust.
+
+A popular feature request has been for in depth insights into CT libraries. Many customers
+ask for suggestions on how to reduce network calls to AWS Key Management Service (AWS KMS) and
+followup questions around cache performance.
+
+CT offers solutions to reduce network calls to AWS KMS through the Caching CMM and the AWS KMS Hierarchical Keyring.
+Today, there is no CT solution for customers to extract the performance metrics customers are looking for.
+This can lead to frustrating debugging sessions and escalations that
+could have been resolved with additional information.
+
+Recent customer demand has allowed CT to re-evaluate client side metrics to offer
+a better customer experience.
+
+### Issue 1: What will be the default behavior?
+
+As a client-side encryption library CT should be as cautious as possible.
+Customers of CT libraries should be on the driver seat and determine for
-Customers of CT libraries should be on the driver seat and determine for
+Customers of CT libraries should be in the driver's seat and determine for
-Customers of CT libraries should be on the driver seat and determine for
+Customers of CT libraries should be in the driver's seat and determine for
+themselves if their application could benefit from emitting metrics.
+Making that decision for customers can erode customer trust.
+
+For CT to be comfortable with allowing metrics, CT must consider that
+this process must not affect the availability of the consumer of the library.
+
+#### Opt-In (recommended)
+
+By not emitting metrics by default existing customer workflows do not change.
+
+This allows customers to test how their applications behave when they start to emit
+metrics. Customers can then ask for updates to the implementations
+CT provides or customers can go an implement their own interfaces that are fine-tuned
+to their use cases.
+
+#### Always
+
+This option implies that CT guarantees that the availability of an application
+will not change. Perhaps a bold implication this is ultimately what the customer
+will feel like; getting no choice on the matter and opting to not upgrade.
+Going from never emitting metrics to always emitting them says to customers
+that their application no matter its use case will always benefit from metrics.
+Without letting customers make that choice, CT looses hard earned customer trust.
-Without letting customers make that choice, CT looses hard earned customer trust.
+Without letting customers make that choice, CT loses hard earned customer trust.
-Without letting customers make that choice, CT looses hard earned customer trust.
+Without letting customers make that choice, CT loses hard earned customer trust.
+
+This also forces customers to make a choice, start collecting metrics and pick up
+additional updates CT provides or get stuck in a version of the library that will
+become unsupported.
+
+Additionally, requiring customers to start emitting metrics
+almost certainly guarantees a breaking change across supported libraries.
+
+### Issue 2: Should Data Plane APIs fail if metrics fail to publish?
+
+#### No (recommended)
+
+Metrics publishing must not impact application availability.
+
+CT should allow for a fail-open approach when metrics fail to publish.
+This will prevent metric publishing issues from impacting the
+core functionality of the application.
+
+CT can consider this a two-way door with initially not attempting to retry
+to publish failed metrics and add this functionality later on.
+
+#### Yes
+
+This will become a problem for the libraries and will undoubtedly result
+in customer friction and failing adoption rates.
+Failing operations due to metrics not being published leaves the availability
+of the application to rest on the implementation of the metrics interface.
+This should not be the case, metrics should aid the customer application
+not restrict it.
+
+### Issue 3: How will customers interact with the libraries to emit metrics?
+
+#### Provide an Interface
+
+Keeping in line with the rest of CT features, a well defined interface with out
+of the box implementations should satisfy the feature request.
+
+Out of the box implementations should cover publishing metrics to an
+existing observability service like AWS CloudWatch and to the local file system.
+These implementations should offer customers a guide into implementing their own
+if they wish to do so.
@@ -0,0 +1,169 @@
+[//]: # "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved."
+[//]: # "SPDX-License-Identifier: CC-BY-SA-4.0"
+
+# Adding a Metrics Interface
+
+**_NOTE: This document will be used to gain alignment on
+this interface should look like and how it could be integrated with
+existing operations. This document will not seek to specify
+a Metrics implementation or specify which metrics will be collected
+from impacted APIs or configurations._**
+
+## Affected APIs or Client Configurations
+
+This serves as a reference of all APIs and Client Configurations that this change affects.
+This list is not exhaustive. Any downstream consumer of any API or client configuration SHOULD
+also be updated as part of this proposed changed.
+
+| API/ Configuration                                                                                                                                                                |
+| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| [Encrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/client-apis/encrypt.md)                                                                         |
+| [Decrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/client-apis/decrypt.md)                                                                         |
+| [GetEncryptionMaterials](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/cmm-interface.md#get-encryption-materials)                             |
+| [DecryptionMaterials](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/cmm-interface.md#decrypt-materials)                                       |
+| [OnEncrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/keyring-interface.md#onencrypt)                                                     |
+| [OnDecrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/keyring-interface.md#ondecrypt)                                                     |
+| [DynamoDB Table Encryption Config](https://github.com/aws/aws-database-encryption-sdk-dynamodb/blob/main/specification/dynamodb-encryption-client/ddb-table-encryption-config.md) |
+
+## Affected Libraries
+
+| Library | Version Introduced | Implementation                                                                                                                                                                                                     |
+| ------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| ESDK    | T.B.D              | [ESDK.smithy](https://github.com/aws/aws-encryption-sdk/blob/mainline/AwsEncryptionSDK/dafny/AwsEncryptionSdk/Model/esdk.smithy)                                                                                   |
+| MPL     | T.B.D              | [material-provider.smithy](https://github.com/aws/aws-cryptographic-material-providers-library/blob/main/AwsCryptographicMaterialProviders/dafny/AwsCryptographicMaterialProviders/Model/material-provider.smithy) |
+| DB-ESDK | T.B.D              | [DynamoDbEncryption.smithy](https://github.com/aws/aws-database-encryption-sdk-dynamodb/blob/main/DynamoDbEncryption/dafny/DynamoDbEncryption/Model/DynamoDbEncryption.smithy)                                     |
+
+## Definitions
+
+### Conventions used in this document
+
+The key words
+"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
+in this document are to be interpreted as described in
+[RFC 2119](https://tools.ietf.org/html/rfc2119).
+
+## Summary
+
+Existing users of Crypto Tools (CT) libraries do no have any insights as to
+how the librar(y/ies) behave(s) in their application.
-how the librar(y/ies) behave(s) in their application.
+how the libraries behave in their application.
-how the librar(y/ies) behave(s) in their application.
+how the libraries behave in their application.
+This can lead to frustrating debugging sessions where users
+are required to have explicit tests to assert they are using a particular feature
+correctly, or if customers are using any of the KMS keyrings users have to have
+AWS Cloudwatch open to verify their application is sending values users expect.
+This can be seen as a best practice and users may find this a good exercise;
+however, CT's libraries do not make debugging an easy task.
+
+A feature which allows customers to get real-time telemetry of their application's
+integration with CT's libraries would be welcomed by users and will deliver on the
+"easy to use and hard to misuse" tenet.
+
+Introducing a new interface that defines the operations that must be
+implemented in order to build a specification compliant MetricsAgent.
+
+## Requirements
+
+The interface should have three requirements.
+
+1. MUST be simple.
+1. MUST be extensible.
+
+The following is documented to signify its importance
+even though the interface is not able to make this guarantee.
+Every implementation of the proposed interface must
+ensure the following.
+
+1. MUST NOT block the application's execution thread.
+
+## Points of Integration
+
+To collect metrics across CT's library stack multiple points of integration
+are needed in order to collect metrics across CT's libraries.
+Generally, CT's libraries work as follows:
+
+_Note: Not every Client supports the Material Provider Library.
+The Diagram below assumes it to help the mental model._
+
+```mermaid
+sequenceDiagram
+    participant Client
+    box Material Provider Library
+      participant CMM
+      participant Keyring
+    end
+    loop Content Encryption
+        Client->>Client: Content Encryption
+    end
+    Client<<->>CMM: GetEncryption/Decryption Materials
+    CMM<<->>Keyring: OnEncrypt/OnDecrypt
+```
+
+To optionally emit metrics from a top level client,
+all customer facing APIs MUST be changed to optionally accept
+a Metrics Agent. This will allow customers to define and supply one top
+level Metrics Agent; this agent will get plumbed throughout CT's stack.
+
+For example, in the ESDK for Java this would look like:
+
+```java
+final AwsCrypto crypto = AwsCrypto.builder().build();
+// Create a Keyring
+final MaterialProviders matProv =
+  MaterialProviders.builder()
+      .MaterialProvidersConfig(MaterialProvidersConfig.builder().build())
+      .build();
+
+final IKeyring rawAesKeyring = matProv.CreateRawAesKeyring(keyringInput);
+final Map<String, String> encryptionContext =
+    Collections.singletonMap("ExampleContextKey", "ExampleContextValue");
+
+// Create a Metrics Agent
+final IMetricsAgent metrics = matProv.CreateSimpleMetricsAgent(metricsAgentInput);
+// 4. Encrypt the data
+final CryptoResult<byte[], ?> encryptResult =
+    crypto.encryptData(rawAesKeyring, metrics, EXAMPLE_DATA, encryptionContext);
+final byte[] ciphertext = encryptResult.getResult();
+```
+
+This change will allow Crypto Tools to introduce a Metrics Agent in a
+non-breaking way as the Metrics Agent will be an optional parameter
+at customer facing API call sites.
+
+Currently, the ESDK client APIs models are defined [here](https://github.com/aws/aws-encryption-sdk/blob/mainline/AwsEncryptionSDK/dafny/AwsEncryptionSdk/Model/esdk.smithy#L60-L126).
+This change would see that the client APIs be changed as follows:
+
+```diff
+structure EncryptInput {
+  @required
+  plaintext: Blob,
+
+  encryptionContext: aws.cryptography.materialProviders#EncryptionContext,
+
+  // One of keyring or CMM are required
+  materialsManager: aws.cryptography.materialProviders#CryptographicMaterialsManagerReference,
+  keyring: aws.cryptography.materialProviders#KeyringReference,
+
+  algorithmSuiteId: aws.cryptography.materialProviders#ESDKAlgorithmSuiteId,
+
+  frameLength: FrameLength,
+
++  metricsAgent: aws.cryptography.materialProviders#MetricsAgentReference
+}
+...
+structure DecryptInput {
+  @required
+  ciphertext: Blob,
+
+  // One of keyring or CMM are required
+  materialsManager: aws.cryptography.materialProviders#CryptographicMaterialsManagerReference,
+  keyring: aws.cryptography.materialProviders#KeyringReference,
+  //= aws-encryption-sdk-specification/client-apis/keyring-interface.md#onencrypt
+  //= type=implication
+  //# The following inputs to this behavior MUST be OPTIONAL:
+  // (blank line for duvet)
+  //# - [Encryption Context](#encryption-context)
+  encryptionContext: aws.cryptography.materialProviders#EncryptionContext,
+
++  metricsAgent: aws.cryptography.materialProviders#MetricsAgentReference
+}
+```