Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions changes/2025-09-12_introduce-metrics-interface/background.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
[//]: # "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved."
[//]: # "SPDX-License-Identifier: CC-BY-SA-4.0"

# Interacting with a Metrics Interface in the AWS Encryption SDK family of products (Background)

## Definitions

### Conventions used in this document

The key words
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in
[RFC 2119](https://tools.ietf.org/html/rfc2119).

## Issues and Alternatives

Crypto Tools (CT) publishes software libraries. The latest
versions of these libraries have no logging or metrics publishing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit/Issue: We have not defined metrics.
While the definition might seem obvious to some, metrics can either be a description of the library's performance in the customer's application OR they could be telementry on the usage of Crypto Tools products.

Having read the docs, it is clear that is NOT telementry, and maybe it is only Tony Knapp who gets the two confused, but we could clarify this.

to either a local application or to an observability service like AWS CloudWatch.

As client side encryption libraries emitting metrics must be done carefully as
to avoid accidentally [leaking](https://github.com/aws/aws-encryption-sdk-python/pull/105/files) any information related to the plaintext that could lead to a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to link to this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning on removing it before merging but wanted it here for folks to get context.

loss of customer trust.

A popular feature request has been for in depth insights into CT libraries. Many customers
ask for suggestions on how to reduce network calls to AWS Key Management Service (AWS KMS) and
followup questions around cache performance.

CT offers solutions to reduce network calls to AWS KMS through the Caching CMM and the AWS KMS Hierarchical Keyring.
Today, there is no CT solution for customers to extract the performance metrics customers are looking for.
This can lead to frustrating debugging sessions and escalations that
could have been resolved with additional information.

Recent customer demand has allowed CT to re-evaluate client side metrics to offer
a better customer experience.

### Issue 1: What will be the default behavior?

As a client-side encryption library CT should be as cautious as possible.
Customers of CT libraries should be on the driver seat and determine for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Customers of CT libraries should be on the driver seat and determine for
Customers of CT libraries should be in the driver's seat and determine for

english is silly

themselves if their application could benefit from emitting metrics.
Making that decision for customers can erode customer trust.

For CT to be comfortable with allowing metrics, CT must consider that
this process must not affect the availability of the consumer of the library.

#### Opt-In (recommended)

By not emitting metrics by default existing customer workflows do not change.

This allows customers to test how their applications behave when they start to emit
metrics. Customers can then ask for updates to the implementations
CT provides or customers can go an implement their own interfaces that are fine-tuned
to their use cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a middle ground? For example, logging information locally without uploading it anywhere.

#### Always

This option implies that CT guarantees that the availability of an application
will not change. Perhaps a bold implication this is ultimately what the customer
will feel like; getting no choice on the matter and opting to not upgrade.
Going from never emitting metrics to always emitting them says to customers
that their application no matter its use case will always benefit from metrics.
Without letting customers make that choice, CT looses hard earned customer trust.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Without letting customers make that choice, CT looses hard earned customer trust.
Without letting customers make that choice, CT loses hard earned customer trust.


This also forces customers to make a choice, start collecting metrics and pick up
additional updates CT provides or get stuck in a version of the library that will
become unsupported.

Additionally, requiring customers to start emitting metrics
almost certainly guarantees a breaking change across supported libraries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? If we require that all customer facing APIs now require a metric agent, then you can't pick up this change in a minor version because you are required to make a code change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not that I am advocating for this, but we could just provide a metric agent.
I am against it, but there is probably a way to do this in a non-breaking way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sure, I apparently forgot this was in the context of specific API changes, I retract my objection


### Issue 2: Should Data Plane APIs fail if metrics fail to publish?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about latency?

This is somewhat relevant to the discussion around availability, particularly in technical terms.

What I'm getting at is, are we planning to make blocking calls to CT or wherever, or have a separate thread/pool that does so periodically? Do we get this for free from the CT metrics agent (or whatever) or do we have to write this ourselves?

In any case we should specify what the bar is here for performance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 . We should delegate the whole metric stuff to some known frame work. Maintaining this would be difficult and in any case we shouldn't be reinventing the wheel.


#### No (recommended)

Metrics publishing must not impact application availability.

CT should allow for a fail-open approach when metrics fail to publish.
This will prevent metric publishing issues from impacting the
core functionality of the application.

CT can consider this a two-way door with initially not attempting to retry
to publish failed metrics and add this functionality later on.

#### Yes

This will become a problem for the libraries and will undoubtedly result
in customer friction and failing adoption rates.
Failing operations due to metrics not being published leaves the availability
of the application to rest on the implementation of the metrics interface.
This should not be the case, metrics should aid the customer application
not restrict it.

### Issue 3: How will customers interact with the libraries to emit metrics?

#### Provide an Interface

Keeping in line with the rest of CT features, a well defined interface with out
of the box implementations should satisfy the feature request.

Out of the box implementations should cover publishing metrics to an
existing observability service like AWS CloudWatch and to the local file system.
These implementations should offer customers a guide into implementing their own
if they wish to do so.
169 changes: 169 additions & 0 deletions changes/2025-09-12_introduce-metrics-interface/change.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
[//]: # "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved."
[//]: # "SPDX-License-Identifier: CC-BY-SA-4.0"

# Adding a Metrics Interface

**_NOTE: This document will be used to gain alignment on
this interface should look like and how it could be integrated with
existing operations. This document will not seek to specify
a Metrics implementation or specify which metrics will be collected
from impacted APIs or configurations._**

## Affected APIs or Client Configurations

This serves as a reference of all APIs and Client Configurations that this change affects.
This list is not exhaustive. Any downstream consumer of any API or client configuration SHOULD
also be updated as part of this proposed changed.

| API/ Configuration |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [Encrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/client-apis/encrypt.md) |
| [Decrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/client-apis/decrypt.md) |
| [GetEncryptionMaterials](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/cmm-interface.md#get-encryption-materials) |
| [DecryptionMaterials](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/cmm-interface.md#decrypt-materials) |
| [OnEncrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/keyring-interface.md#onencrypt) |
| [OnDecrypt](https://github.com/awslabs/aws-encryption-sdk-specification/blob/master/framework/keyring-interface.md#ondecrypt) |
| [DynamoDB Table Encryption Config](https://github.com/aws/aws-database-encryption-sdk-dynamodb/blob/main/specification/dynamodb-encryption-client/ddb-table-encryption-config.md) |

## Affected Libraries

| Library | Version Introduced | Implementation |
| ------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| ESDK | T.B.D | [ESDK.smithy](https://github.com/aws/aws-encryption-sdk/blob/mainline/AwsEncryptionSDK/dafny/AwsEncryptionSdk/Model/esdk.smithy) |
| MPL | T.B.D | [material-provider.smithy](https://github.com/aws/aws-cryptographic-material-providers-library/blob/main/AwsCryptographicMaterialProviders/dafny/AwsCryptographicMaterialProviders/Model/material-provider.smithy) |
| DB-ESDK | T.B.D | [DynamoDbEncryption.smithy](https://github.com/aws/aws-database-encryption-sdk-dynamodb/blob/main/DynamoDbEncryption/dafny/DynamoDbEncryption/Model/DynamoDbEncryption.smithy) |

## Definitions

### Conventions used in this document

The key words
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in
[RFC 2119](https://tools.ietf.org/html/rfc2119).

## Summary

Existing users of Crypto Tools (CT) libraries do no have any insights as to
how the librar(y/ies) behave(s) in their application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
how the librar(y/ies) behave(s) in their application.
how the libraries behave in their application.

simplify, we know what you mean

This can lead to frustrating debugging sessions where users
are required to have explicit tests to assert they are using a particular feature
correctly, or if customers are using any of the KMS keyrings users have to have
AWS Cloudwatch open to verify their application is sending values users expect.
This can be seen as a best practice and users may find this a good exercise;
however, CT's libraries do not make debugging an easy task.

A feature which allows customers to get real-time telemetry of their application's
integration with CT's libraries would be welcomed by users and will deliver on the
"easy to use and hard to misuse" tenet.

Introducing a new interface that defines the operations that must be
implemented in order to build a specification compliant MetricsAgent.

## Requirements

The interface should have three requirements.

1. MUST be simple.
1. MUST be extensible.

The following is documented to signify its importance
even though the interface is not able to make this guarantee.
Every implementation of the proposed interface must
ensure the following.

1. MUST NOT block the application's execution thread.

## Points of Integration

To collect metrics across CT's library stack multiple points of integration
are needed in order to collect metrics across CT's libraries.
Generally, CT's libraries work as follows:

_Note: Not every Client supports the Material Provider Library.
The Diagram below assumes it to help the mental model._

```mermaid
sequenceDiagram
participant Client
box Material Provider Library
participant CMM
participant Keyring
end
loop Content Encryption
Client->>Client: Content Encryption
end
Client<<->>CMM: GetEncryption/Decryption Materials
CMM<<->>Keyring: OnEncrypt/OnDecrypt
```

To optionally emit metrics from a top level client,
all customer facing APIs MUST be changed to optionally accept
a Metrics Agent. This will allow customers to define and supply one top
level Metrics Agent; this agent will get plumbed throughout CT's stack.

For example, in the ESDK for Java this would look like:

```java
final AwsCrypto crypto = AwsCrypto.builder().build();
// Create a Keyring
final MaterialProviders matProv =
MaterialProviders.builder()
.MaterialProvidersConfig(MaterialProvidersConfig.builder().build())
.build();

final IKeyring rawAesKeyring = matProv.CreateRawAesKeyring(keyringInput);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we implementing our own metric agent? I'm pretty sure there exists some framework which we can leverage. Customers can optionally provide the agent, or we do no-op (like logging frameworks such as slf4j etc).

final Map<String, String> encryptionContext =
Collections.singletonMap("ExampleContextKey", "ExampleContextValue");

// Create a Metrics Agent
final IMetricsAgent metrics = matProv.CreateSimpleMetricsAgent(metricsAgentInput);
// 4. Encrypt the data
final CryptoResult<byte[], ?> encryptResult =
crypto.encryptData(rawAesKeyring, metrics, EXAMPLE_DATA, encryptionContext);
final byte[] ciphertext = encryptResult.getResult();
```

This change will allow Crypto Tools to introduce a Metrics Agent in a
non-breaking way as the Metrics Agent will be an optional parameter
at customer facing API call sites.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support providing a metrics agent to the client constructor? Such that the same instance would then be used for each API call.

I don't necessarily think we should but it might be worth discussing here.


Currently, the ESDK client APIs models are defined [here](https://github.com/aws/aws-encryption-sdk/blob/mainline/AwsEncryptionSDK/dafny/AwsEncryptionSdk/Model/esdk.smithy#L60-L126).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that the DB-ESDK item encryptor would be similar.
What about the DDB client interface?
Also, are there plans for S3EC?

This change would see that the client APIs be changed as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be the only change. I like the detail, but we should explicitly add the CMM/Keyring inputs to this list.


```diff
structure EncryptInput {
@required
plaintext: Blob,

encryptionContext: aws.cryptography.materialProviders#EncryptionContext,

// One of keyring or CMM are required
materialsManager: aws.cryptography.materialProviders#CryptographicMaterialsManagerReference,
keyring: aws.cryptography.materialProviders#KeyringReference,

algorithmSuiteId: aws.cryptography.materialProviders#ESDKAlgorithmSuiteId,

frameLength: FrameLength,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this probably isn't possible in Smithy/Smithy-Dafny but we might want to consider having a more open-ended "request override" option for things like this. (This is the same way the SDK does it.)

+ metricsAgent: aws.cryptography.materialProviders#MetricsAgentReference
}
...
structure DecryptInput {
@required
ciphertext: Blob,

// One of keyring or CMM are required
materialsManager: aws.cryptography.materialProviders#CryptographicMaterialsManagerReference,
keyring: aws.cryptography.materialProviders#KeyringReference,
//= aws-encryption-sdk-specification/client-apis/keyring-interface.md#onencrypt
//= type=implication
//# The following inputs to this behavior MUST be OPTIONAL:
// (blank line for duvet)
//# - [Encryption Context](#encryption-context)
encryptionContext: aws.cryptography.materialProviders#EncryptionContext,

+ metricsAgent: aws.cryptography.materialProviders#MetricsAgentReference
}
```
Loading
Loading