-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Logging in the Presence of Sensitive Data #1536
Conversation
|
||
## Background | ||
|
||
### HTTP Binding Traits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other ways that @sensitive
fields can be bound/leak?
|
||
Note that: | ||
- We are not required to deserialize the entire request before we can make judgments on what data is sensitive or not - only which operation it has been routed to. | ||
- We are permitted to emit logs prior to routing when: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this allows us to do important logging in the Router
and the remainder can be done within the logging layer.
}; | ||
|
||
// Instrument the future with a span | ||
let span = span!(%method, %uri, headers = request.headers(), "received request"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a lot of data to include in a span? We might want a way to reduce the amount placed in here and instead log an debug!(?headers)
for example.
The benefit of having the span
is that every log within this will also have access to this data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair callout. I think I am in favour of have as many infos as possible. What do you think people?
|
||
### Routing | ||
|
||
The sensitivity and HTTP bindings are declared within specific structures/operations. For this reason, in the general case, it's unknowable whether or not any given part of a request is sensitive until we determine which operation is tasked with handling the request and hence which fields are bound. Implementation wise, this means that any `Layer` applied _before_ routing has taken place cannot log anything sensitive without performing routing logic itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the general case
Is noted because in the case where there are absolutely no @sensitive
s applied in the model we can log freely anywhere.
7a645be
to
953d632
Compare
953d632
to
c35938a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work Harry, I put some comments and thoughts.
}; | ||
|
||
// Instrument the future with a span | ||
let span = span!(%method, %uri, headers = request.headers(), "received request"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair callout. I think I am in favour of have as many infos as possible. What do you think people?
- Sensitive data leaking from middleware applied to the `Router`. | ||
- How to use the `Sensitive` struct and the `debug-logging` feature flag described in [Debug Logging](#debug-logging). | ||
|
||
## Alternative Proposals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I lean towards the main solution you proposed as the code-generation allows us to precisely tailor the logging on a per-operation basis, while using runtime switches feels more complicated and we could probably loose control without noticing in the future.
Co-authored-by: Matteo Bigoi <[email protected]>
A new generated diff is ready to view.
A new doc preview is ready to view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this RFC! It's very well written. I have just a couple questions.
|
||
### Scope and Guidelines | ||
|
||
It is unfeasible to make the logging of sensitive data forbidden a type theoretic invariant. With the current API, the customer will always have an opportunity to log a request containing sensitive data before it enters the `Service<Request<B>>` that we provide to them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This first sentence doesn't make a lot of sense. Is the word "forbidden" in the wrong place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the following wrapper should be provided | ||
|
||
```rust | ||
pub struct Sensitive<T>(T); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should this struct live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, there's conversation to be had here - I've raised it here #1550 (comment). For the purpose of the RFC, I think probably it's specific enough to say that it'll live within the runtime crates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Co-authored-by: John DiSanti <[email protected]>
@drganjoo Mentioned a good point, echoing it here for posterity.
The current implementation of this RFC allows for this to be added in the future without much pain. |
A new generated diff is ready to view.
A new doc preview is ready to view. |
|
||
### Routing | ||
|
||
The sensitivity and HTTP bindings are declared within specific structures/operations. For this reason, in the general case, it's unknowable whether or not any given part of a request is sensitive until we determine which operation is tasked with handling the request and hence which fields are bound. Implementation wise, this means that any middleware applied _before_ routing has taken place cannot log anything sensitive without performing routing logic itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot log anything sensitive without performing
cannot log anything potentially sensitive without performing
|
||
## Proposal | ||
|
||
This proposal serves to honor the sensitivity specification via code generation of a logging middleware which is aware of the sensitivity, together with a developer contract disallowing logging potentially sensitive data in the runtime crates. An internal and external guideline should be provided in addition to the middleware. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal and external guideline
Why would there be a need for two guidelines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the sections for the guidelines below and still don't understand why we need two guidelines. Aren't all the bullet points in those sections useful for both internal and external developers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
To clarify, by external developers I mean someone who has a model and wants to use smithy-rs
to construct a server. In this case the external developer would be composing/initializing Subscribers
, whereas the internal developer wouldn't be. This means internal developers, while just using tracing
crate, aren't concerned with:
- Sensitive data leaking from third-party dependencies.
- Sensitive data leaking from middleware applied to the
Router
.
There might also be a slight difference in emphasis about certain things:
- The internal developer needs to very familiar with the concept of "potentially sensitive data" because, outside of codegen, they cannot make assumptions about the model.
- The external developer knows the model they're using and therefore knows exactly which data is sensitive.
But yeah, these differences are probably not large enough to warrant a separate guide.
|
||
### Code Generated Logging Middleware | ||
|
||
Using the smithy model, for each operation, a logging middleware should be generated. Through the model, the code generation knows which fields are sensitive and which HTTP bindings exist, therefore the logging middleware can be careful crafted to avoid leaking sensitive data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
carefully crafted
|
||
### Logging within the Router | ||
|
||
There is need for logging within the `Router` implementation - this is a crucial area of business logic. As mentioned in the [Routing](#routing) section, we are permitted to log potentially sensitive data in cases where requests fail to get routed to an operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we be able to correlate the log statement of a request that failed to be routed successfully (where we log potentially sensitive data) with previous log statements pertaining to the same request (where we are not allowed to log anything potentially sensitive)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a general solution we can enforce to provide this correlation. The customer is always free to apply middleware to the outside of the Router
and log in it anyway they want.
For example:
let svc = router
.map_request(|request| {
if request.len() > 5 { info!("hello world"); }
request
});
What the customer could do is to attach a unique ID to a span which is open during a requests lifetime. The unique ID would adjoined to all the logging messages and so would provide the correlation. You'd then get a log like
span(message = "request", id = 342432)
-> info(message = "hello world")
-> error(message = "failed to route", uri = "amazon.com")
in the error case and
span(message = "request", id = 342433)
-> info(message = "hello world")
-> span(headers = "{redacted}", uri = "amazon.com")
-> info(message = "logged from a handler perhaps")
-> debug(message = "response", status_code = 3)
in the happy path.
I'm unsure whether we should provide them with middleware for opening these "correlation" spans.
@drganjoo was eluding to this general concern the other day. #1536 (comment) relates to it.
* Add documentation covering instrumentation approaches for Smithy Rust. * Tweak the logging in the Pokemon service to better exemplify instrumentation. * Remove `TraceLayer` which violates sensitivity contract. * Switch to [Pretty](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/fmt/format/struct.Pretty.html) logs to better showcase the outputs. * Update [Logging in the Presence of Sensitive Data](#1536) checklist. * Rename `logging` module to `instrumentation` to improve coherence across struct names and documentation.
Rendered