Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Logging in the Presence of Sensitive Data #1536

Merged
merged 29 commits into from
Jul 22, 2022
Merged

Conversation

hlbarber
Copy link
Contributor

@hlbarber hlbarber commented Jul 6, 2022


## Background

### HTTP Binding Traits
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other ways that @sensitive fields can be bound/leak?


Note that:
- We are not required to deserialize the entire request before we can make judgments on what data is sensitive or not - only which operation it has been routed to.
- We are permitted to emit logs prior to routing when:
Copy link
Contributor Author

@hlbarber hlbarber Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this allows us to do important logging in the Router and the remainder can be done within the logging layer.

design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved
};

// Instrument the future with a span
let span = span!(%method, %uri, headers = request.headers(), "received request");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a lot of data to include in a span? We might want a way to reduce the amount placed in here and instead log an debug!(?headers) for example.

The benefit of having the span is that every log within this will also have access to this data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair callout. I think I am in favour of have as many infos as possible. What do you think people?

design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved

### Routing

The sensitivity and HTTP bindings are declared within specific structures/operations. For this reason, in the general case, it's unknowable whether or not any given part of a request is sensitive until we determine which operation is tasked with handling the request and hence which fields are bound. Implementation wise, this means that any `Layer` applied _before_ routing has taken place cannot log anything sensitive without performing routing logic itself.
Copy link
Contributor Author

@hlbarber hlbarber Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the general case

Is noted because in the case where there are absolutely no @sensitives applied in the model we can log freely anywhere.

@smithy-lang smithy-lang deleted a comment from github-actions bot Jul 7, 2022
@smithy-lang smithy-lang deleted a comment from github-actions bot Jul 7, 2022
@hlbarber hlbarber force-pushed the harryb/logging-rfc branch from 7a645be to 953d632 Compare July 7, 2022 20:47
@hlbarber hlbarber force-pushed the harryb/logging-rfc branch from 953d632 to c35938a Compare July 7, 2022 20:52
@hlbarber hlbarber marked this pull request as ready for review July 7, 2022 21:09
@smithy-lang smithy-lang deleted a comment from github-actions bot Jul 7, 2022
Copy link
Contributor

@crisidev crisidev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Harry, I put some comments and thoughts.

design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved
design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved
design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved
design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved
design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved
};

// Instrument the future with a span
let span = span!(%method, %uri, headers = request.headers(), "received request");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair callout. I think I am in favour of have as many infos as possible. What do you think people?

design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved
- Sensitive data leaking from middleware applied to the `Router`.
- How to use the `Sensitive` struct and the `debug-logging` feature flag described in [Debug Logging](#debug-logging).

## Alternative Proposals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I lean towards the main solution you proposed as the code-generation allows us to precisely tailor the logging on a per-operation basis, while using runtime switches feels more complicated and we could probably loose control without noticing in the future.

@smithy-lang smithy-lang deleted a comment from github-actions bot Jul 15, 2022
@smithy-lang smithy-lang deleted a comment from github-actions bot Jul 18, 2022
@github-actions
Copy link

A new generated diff is ready to view.

A new doc preview is ready to view.

Copy link
Collaborator

@jdisanti jdisanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this RFC! It's very well written. I have just a couple questions.


### Scope and Guidelines

It is unfeasible to make the logging of sensitive data forbidden a type theoretic invariant. With the current API, the customer will always have an opportunity to log a request containing sensitive data before it enters the `Service<Request<B>>` that we provide to them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This first sentence doesn't make a lot of sense. Is the word "forbidden" in the wrong place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the following wrapper should be provided

```rust
pub struct Sensitive<T>(T);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should this struct live?

Copy link
Contributor Author

@hlbarber hlbarber Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, there's conversation to be had here - I've raised it here #1550 (comment). For the purpose of the RFC, I think probably it's specific enough to say that it'll live within the runtime crates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smithy-lang smithy-lang deleted a comment from github-actions bot Jul 19, 2022
Copy link
Collaborator

@jdisanti jdisanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

design/src/rfcs/rfc0017_logging_sensitive.md Outdated Show resolved Hide resolved
Co-authored-by: John DiSanti <[email protected]>
@hlbarber hlbarber requested a review from a team as a code owner July 19, 2022 18:59
@smithy-lang smithy-lang deleted a comment from github-actions bot Jul 22, 2022
@hlbarber
Copy link
Contributor Author

hlbarber commented Jul 22, 2022

@drganjoo Mentioned a good point, echoing it here for posterity.

There might be a want for {redacted} to instead be some sort of hash of the underlying value. This would obey the spec but allow for developers to identify equivalent values while debugging.

The current implementation of this RFC allows for this to be added in the future without much pain.

@smithy-lang smithy-lang deleted a comment from github-actions bot Jul 22, 2022
@github-actions
Copy link

A new generated diff is ready to view.

A new doc preview is ready to view.

@hlbarber hlbarber merged commit 5edd9d2 into main Jul 22, 2022
@hlbarber hlbarber deleted the harryb/logging-rfc branch July 22, 2022 20:33

### Routing

The sensitivity and HTTP bindings are declared within specific structures/operations. For this reason, in the general case, it's unknowable whether or not any given part of a request is sensitive until we determine which operation is tasked with handling the request and hence which fields are bound. Implementation wise, this means that any middleware applied _before_ routing has taken place cannot log anything sensitive without performing routing logic itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot log anything sensitive without performing

cannot log anything potentially sensitive without performing


## Proposal

This proposal serves to honor the sensitivity specification via code generation of a logging middleware which is aware of the sensitivity, together with a developer contract disallowing logging potentially sensitive data in the runtime crates. An internal and external guideline should be provided in addition to the middleware.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal and external guideline

Why would there be a need for two guidelines?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the sections for the guidelines below and still don't understand why we need two guidelines. Aren't all the bullet points in those sections useful for both internal and external developers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

To clarify, by external developers I mean someone who has a model and wants to use smithy-rs to construct a server. In this case the external developer would be composing/initializing Subscribers, whereas the internal developer wouldn't be. This means internal developers, while just using tracing crate, aren't concerned with:

  • Sensitive data leaking from third-party dependencies.
  • Sensitive data leaking from middleware applied to the Router.

There might also be a slight difference in emphasis about certain things:

  • The internal developer needs to very familiar with the concept of "potentially sensitive data" because, outside of codegen, they cannot make assumptions about the model.
  • The external developer knows the model they're using and therefore knows exactly which data is sensitive.

But yeah, these differences are probably not large enough to warrant a separate guide.


### Code Generated Logging Middleware

Using the smithy model, for each operation, a logging middleware should be generated. Through the model, the code generation knows which fields are sensitive and which HTTP bindings exist, therefore the logging middleware can be careful crafted to avoid leaking sensitive data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

carefully crafted


### Logging within the Router

There is need for logging within the `Router` implementation - this is a crucial area of business logic. As mentioned in the [Routing](#routing) section, we are permitted to log potentially sensitive data in cases where requests fail to get routed to an operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we be able to correlate the log statement of a request that failed to be routed successfully (where we log potentially sensitive data) with previous log statements pertaining to the same request (where we are not allowed to log anything potentially sensitive)?

Copy link
Contributor Author

@hlbarber hlbarber Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a general solution we can enforce to provide this correlation. The customer is always free to apply middleware to the outside of the Router and log in it anyway they want.

For example:

let svc = router
    .map_request(|request| {
            if request.len() > 5 { info!("hello world"); }
            request
    });

What the customer could do is to attach a unique ID to a span which is open during a requests lifetime. The unique ID would adjoined to all the logging messages and so would provide the correlation. You'd then get a log like

span(message = "request", id = 342432)
    -> info(message = "hello world")
    -> error(message = "failed to route", uri = "amazon.com")

in the error case and

span(message = "request", id = 342433)
    -> info(message = "hello world")
    -> span(headers = "{redacted}", uri = "amazon.com")
        -> info(message = "logged from a handler perhaps")
        -> debug(message = "response", status_code = 3)

in the happy path.

I'm unsure whether we should provide them with middleware for opening these "correlation" spans.

@drganjoo was eluding to this general concern the other day. #1536 (comment) relates to it.

hlbarber added a commit that referenced this pull request Sep 28, 2022
* Add documentation covering instrumentation approaches for Smithy Rust.

* Tweak the logging in the Pokemon service to better exemplify instrumentation.
    * Remove `TraceLayer` which violates sensitivity contract.
    * Switch to [Pretty](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/fmt/format/struct.Pretty.html) logs to better showcase the outputs.

* Update [Logging in the Presence of Sensitive Data](#1536) checklist.
* Rename `logging` module to `instrumentation` to improve coherence across struct names and documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants