RFC: Logging in the Presence of Sensitive Data #1536

hlbarber · 2022-07-06T14:33:43Z

hlbarber · 2022-07-07T20:28:42Z

design/src/rfcs/rfc0017_logging_sensitive.md

+
+## Background
+
+### HTTP Binding Traits


Are there any other ways that @sensitive fields can be bound/leak?

hlbarber · 2022-07-07T20:31:50Z

design/src/rfcs/rfc0017_logging_sensitive.md

+
+Note that:
+- We are not required to deserialize the entire request before we can make judgments on what data is sensitive or not - only which operation it has been routed to.
+- We are permitted to emit logs prior to routing when:


I think this allows us to do important logging in the Router and the remainder can be done within the logging layer.

design/src/rfcs/rfc0017_logging_sensitive.md

hlbarber · 2022-07-07T20:35:51Z

design/src/rfcs/rfc0017_logging_sensitive.md

+        };
+
+        // Instrument the future with a span
+        let span = span!(%method, %uri, headers = request.headers(), "received request");


This might be a lot of data to include in a span? We might want a way to reduce the amount placed in here and instead log an debug!(?headers) for example.

The benefit of having the span is that every log within this will also have access to this data.

This is a fair callout. I think I am in favour of have as many infos as possible. What do you think people?

design/src/rfcs/rfc0017_logging_sensitive.md

hlbarber · 2022-07-07T20:39:21Z

design/src/rfcs/rfc0017_logging_sensitive.md

+
+### Routing
+
+The sensitivity and HTTP bindings are declared within specific structures/operations. For this reason, in the general case, it's unknowable whether or not any given part of a request is sensitive until we determine which operation is tasked with handling the request and hence which fields are bound. Implementation wise, this means that any `Layer` applied _before_ routing has taken place cannot log anything sensitive without performing routing logic itself.


in the general case

Is noted because in the case where there are absolutely no @sensitives applied in the model we can log freely anywhere.

design/src/rfcs/rfc0017_logging_sensitive.md

crisidev

Nice work Harry, I put some comments and thoughts.

design/src/rfcs/rfc0017_logging_sensitive.md

crisidev · 2022-07-08T09:16:13Z

design/src/rfcs/rfc0017_logging_sensitive.md

+        };
+
+        // Instrument the future with a span
+        let span = span!(%method, %uri, headers = request.headers(), "received request");


This is a fair callout. I think I am in favour of have as many infos as possible. What do you think people?

design/src/rfcs/rfc0017_logging_sensitive.md

crisidev · 2022-07-08T09:19:45Z

design/src/rfcs/rfc0017_logging_sensitive.md

+  - Sensitive data leaking from middleware applied to the `Router`.
+- How to use the `Sensitive` struct and the `debug-logging` feature flag described in [Debug Logging](#debug-logging).
+
+## Alternative Proposals


I think I lean towards the main solution you proposed as the code-generation allows us to precisely tailor the logging on a per-operation basis, while using runtime switches feels more complicated and we could probably loose control without noticing in the future.

design/src/rfcs/rfc0017_logging_sensitive.md

Co-authored-by: Matteo Bigoi <[email protected]>

github-actions · 2022-07-18T16:25:17Z

A new generated diff is ready to view.

AWS SDK (ignoring whitespace)
No codegen difference in the Server Test
No codegen difference in the Server Test Python

A new doc preview is ready to view.

jdisanti

Great work on this RFC! It's very well written. I have just a couple questions.

jdisanti · 2022-07-18T23:53:52Z

design/src/rfcs/rfc0017_logging_sensitive.md

+
+### Scope and Guidelines
+
+It is unfeasible to make the logging of sensitive data forbidden a type theoretic invariant. With the current API, the customer will always have an opportunity to log a request containing sensitive data before it enters the `Service<Request<B>>` that we provide to them.


This first sentence doesn't make a lot of sense. Is the word "forbidden" in the wrong place?

jdisanti · 2022-07-19T00:03:03Z

design/src/rfcs/rfc0017_logging_sensitive.md

+the following wrapper should be provided
+
+```rust
+pub struct Sensitive<T>(T);


Where should this struct live?

Good question, there's conversation to be had here - I've raised it here #1550 (comment). For the purpose of the RFC, I think probably it's specific enough to say that it'll live within the runtime crates?

jdisanti

LGTM!

design/src/rfcs/rfc0017_logging_sensitive.md

Co-authored-by: John DiSanti <[email protected]>

hlbarber · 2022-07-22T16:55:46Z

@drganjoo Mentioned a good point, echoing it here for posterity.

There might be a want for {redacted} to instead be some sort of hash of the underlying value. This would obey the spec but allow for developers to identify equivalent values while debugging.

The current implementation of this RFC allows for this to be added in the future without much pain.

github-actions · 2022-07-22T17:10:03Z

A new generated diff is ready to view.

AWS SDK (ignoring whitespace)
No codegen difference in the Server Test
No codegen difference in the Server Test Python

A new doc preview is ready to view.

david-perez · 2022-07-26T15:14:31Z

design/src/rfcs/rfc0018_logging_sensitive.md

+
+### Routing
+
+The sensitivity and HTTP bindings are declared within specific structures/operations. For this reason, in the general case, it's unknowable whether or not any given part of a request is sensitive until we determine which operation is tasked with handling the request and hence which fields are bound. Implementation wise, this means that any middleware applied _before_ routing has taken place cannot log anything sensitive without performing routing logic itself.


cannot log anything sensitive without performing

cannot log anything potentially sensitive without performing

david-perez · 2022-07-26T15:16:46Z

design/src/rfcs/rfc0018_logging_sensitive.md

+
+## Proposal
+
+This proposal serves to honor the sensitivity specification via code generation of a logging middleware which is aware of the sensitivity, together with a developer contract disallowing logging potentially sensitive data in the runtime crates. An internal and external guideline should be provided in addition to the middleware.


internal and external guideline

Why would there be a need for two guidelines?

I read the sections for the guidelines below and still don't understand why we need two guidelines. Aren't all the bullet points in those sections useful for both internal and external developers?

Good point.

To clarify, by external developers I mean someone who has a model and wants to use smithy-rs to construct a server. In this case the external developer would be composing/initializing Subscribers, whereas the internal developer wouldn't be. This means internal developers, while just using tracing crate, aren't concerned with:

Sensitive data leaking from third-party dependencies.

Sensitive data leaking from middleware applied to the Router.

There might also be a slight difference in emphasis about certain things:

The internal developer needs to very familiar with the concept of "potentially sensitive data" because, outside of codegen, they cannot make assumptions about the model.

The external developer knows the model they're using and therefore knows exactly which data is sensitive.

But yeah, these differences are probably not large enough to warrant a separate guide.

david-perez · 2022-07-26T15:18:36Z

design/src/rfcs/rfc0018_logging_sensitive.md

+
+### Code Generated Logging Middleware
+
+Using the smithy model, for each operation, a logging middleware should be generated. Through the model, the code generation knows which fields are sensitive and which HTTP bindings exist, therefore the logging middleware can be careful crafted to avoid leaking sensitive data.


carefully crafted

david-perez · 2022-07-26T15:30:42Z

design/src/rfcs/rfc0018_logging_sensitive.md

+
+### Logging within the Router
+
+There is need for logging within the `Router` implementation - this is a crucial area of business logic. As mentioned in the [Routing](#routing) section, we are permitted to log potentially sensitive data in cases where requests fail to get routed to an operation.


Will we be able to correlate the log statement of a request that failed to be routed successfully (where we log potentially sensitive data) with previous log statements pertaining to the same request (where we are not allowed to log anything potentially sensitive)?

I don't think there is a general solution we can enforce to provide this correlation. The customer is always free to apply middleware to the outside of the Router and log in it anyway they want.

For example:

let svc = router .map_request(|request| { if request.len() > 5 { info!("hello world"); } request });

What the customer could do is to attach a unique ID to a span which is open during a requests lifetime. The unique ID would adjoined to all the logging messages and so would provide the correlation. You'd then get a log like

span(message = "request", id = 342432) -> info(message = "hello world") -> error(message = "failed to route", uri = "amazon.com")

in the error case and

span(message = "request", id = 342433) -> info(message = "hello world") -> span(headers = "{redacted}", uri = "amazon.com") -> info(message = "logged from a handler perhaps") -> debug(message = "response", status_code = 3)

in the happy path.

I'm unsure whether we should provide them with middleware for opening these "correlation" spans.

@drganjoo was eluding to this general concern the other day. #1536 (comment) relates to it.

* Add documentation covering instrumentation approaches for Smithy Rust. * Tweak the logging in the Pokemon service to better exemplify instrumentation. * Remove `TraceLayer` which violates sensitivity contract. * Switch to [Pretty](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/fmt/format/struct.Pretty.html) logs to better showcase the outputs. * Update [Logging in the Presence of Sensitive Data](#1536) checklist. * Rename `logging` module to `instrumentation` to improve coherence across struct names and documentation.

Harry Barber added 6 commits July 6, 2022 14:11

RFC introduction

bf519a0

Improvements to introduction

c9dbb3b

Add proposal

2308877

Add alternative proposals

138602e

Tweaks

ef5c4d3

Fix bullet points

4c91fa4

hlbarber commented Jul 7, 2022

View reviewed changes

smithy-lang deleted a comment from github-actions bot Jul 7, 2022

hlbarber force-pushed the harryb/logging-rfc branch from 7a645be to 953d632 Compare July 7, 2022 20:47

Fixes

c35938a

hlbarber force-pushed the harryb/logging-rfc branch from 953d632 to c35938a Compare July 7, 2022 20:52

Tweaks

fedc254

hlbarber commented Jul 7, 2022

View reviewed changes

design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved

hlbarber marked this pull request as ready for review July 7, 2022 21:09

smithy-lang deleted a comment from github-actions bot Jul 7, 2022

crisidev reviewed Jul 8, 2022

View reviewed changes

jdisanti reviewed Jul 11, 2022

View reviewed changes

design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved

hlbarber mentioned this pull request Jul 14, 2022

Add server side HTTP logging layer #1550

Merged

Harry Barber and others added 10 commits July 14, 2022 16:07

Change debug-logging to unredacted-logging

09a2427

Change _redacted_ to {redacted}

0e519d5

Remove http trait from bindings table

cb422c9

Improve runtime crate description

596714d

Co-authored-by: Matteo Bigoi <[email protected]>

Improve wording

ba32005

Improve logging layer description

fd40c84

Improve wording around Layer/middleware

024bf9b

Emphasize middleware positioning concerns

4c7b1b5

Add changes checklist

69056e5

Various improvements/spelling fixes

11cd052

smithy-lang deleted a comment from github-actions bot Jul 15, 2022

Merge branch 'main' into harryb/logging-rfc

e6018c7

smithy-lang deleted a comment from github-actions bot Jul 18, 2022

hlbarber added the needs-sdk-review label Jul 18, 2022

jdisanti reviewed Jul 19, 2022

View reviewed changes

Harry Barber added 3 commits July 19, 2022 08:55

Improve scope wording

9ec9af8

Improve wrapping documentation

5549038

Improve wording

e723022

smithy-lang deleted a comment from github-actions bot Jul 19, 2022

jdisanti approved these changes Jul 19, 2022

View reviewed changes

design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved

jdisanti removed the needs-sdk-review label Jul 19, 2022

Fix feature flag spelling

c8490a4

Co-authored-by: John DiSanti <[email protected]>

hlbarber requested a review from a team as a code owner July 19, 2022 18:59

Harry Barber added 4 commits July 22, 2022 16:21

Negate condition

3579354

Rename to 18

d8cb8f6

Merge branch 'main' into harryb/logging-rfc

1a8b6b1

Add references

925f4d6

smithy-lang deleted a comment from github-actions bot Jul 22, 2022

Fix grammar

d8ac0ea

smithy-lang deleted a comment from github-actions bot Jul 22, 2022

hlbarber merged commit 5edd9d2 into main Jul 22, 2022

hlbarber deleted the harryb/logging-rfc branch July 22, 2022 20:33

crisidev approved these changes Jul 22, 2022

View reviewed changes

david-perez reviewed Jul 26, 2022

View reviewed changes

hlbarber mentioned this pull request Jul 28, 2022

RFC(edit): Logging in the Presence of Sensitive Data #1591

Merged

david-perez mentioned this pull request Aug 3, 2022

Add minimal tracing to server framework #1314

Closed

hlbarber mentioned this pull request Sep 27, 2022

Add Instrumentation documentation #1772

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Logging in the Presence of Sensitive Data #1536

RFC: Logging in the Presence of Sensitive Data #1536

hlbarber commented Jul 6, 2022 •

edited

Loading

hlbarber Jul 7, 2022

hlbarber Jul 7, 2022 •

edited

Loading

hlbarber Jul 7, 2022

crisidev Jul 8, 2022

hlbarber Jul 7, 2022 •

edited

Loading

crisidev left a comment

crisidev Jul 8, 2022

crisidev Jul 8, 2022

github-actions bot commented Jul 18, 2022

jdisanti left a comment

jdisanti Jul 18, 2022

hlbarber Jul 19, 2022

jdisanti Jul 19, 2022

hlbarber Jul 19, 2022 •

edited

Loading

hlbarber Jul 19, 2022

jdisanti left a comment

hlbarber commented Jul 22, 2022 •

edited

Loading

github-actions bot commented Jul 22, 2022

david-perez Jul 26, 2022

david-perez Jul 26, 2022

david-perez Jul 26, 2022

hlbarber Jul 27, 2022

david-perez Jul 26, 2022

david-perez Jul 26, 2022

hlbarber Jul 27, 2022 •

edited

Loading


		### Routing

		The sensitivity and HTTP bindings are declared within specific structures/operations. For this reason, in the general case, it's unknowable whether or not any given part of a request is sensitive until we determine which operation is tasked with handling the request and hence which fields are bound. Implementation wise, this means that any `Layer` applied _before_ routing has taken place cannot log anything sensitive without performing routing logic itself.


		### Scope and Guidelines

		It is unfeasible to make the logging of sensitive data forbidden a type theoretic invariant. With the current API, the customer will always have an opportunity to log a request containing sensitive data before it enters the `Service<Request<B>>` that we provide to them.


		## Proposal

		This proposal serves to honor the sensitivity specification via code generation of a logging middleware which is aware of the sensitivity, together with a developer contract disallowing logging potentially sensitive data in the runtime crates. An internal and external guideline should be provided in addition to the middleware.


		### Code Generated Logging Middleware

		Using the smithy model, for each operation, a logging middleware should be generated. Through the model, the code generation knows which fields are sensitive and which HTTP bindings exist, therefore the logging middleware can be careful crafted to avoid leaking sensitive data.


		### Logging within the Router

		There is need for logging within the `Router` implementation - this is a crucial area of business logic. As mentioned in the [Routing](#routing) section, we are permitted to log potentially sensitive data in cases where requests fail to get routed to an operation.

RFC: Logging in the Presence of Sensitive Data #1536

RFC: Logging in the Presence of Sensitive Data #1536

Conversation

hlbarber commented Jul 6, 2022 • edited Loading

Choose a reason for hiding this comment

hlbarber Jul 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlbarber Jul 7, 2022 • edited Loading

Choose a reason for hiding this comment

crisidev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 18, 2022

jdisanti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlbarber Jul 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdisanti left a comment

Choose a reason for hiding this comment

hlbarber commented Jul 22, 2022 • edited Loading

github-actions bot commented Jul 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlbarber Jul 27, 2022 • edited Loading

Choose a reason for hiding this comment

hlbarber commented Jul 6, 2022 •

edited

Loading

hlbarber Jul 7, 2022 •

edited

Loading

hlbarber Jul 7, 2022 •

edited

Loading

hlbarber Jul 19, 2022 •

edited

Loading

hlbarber commented Jul 22, 2022 •

edited

Loading

hlbarber Jul 27, 2022 •

edited

Loading