-
Notifications
You must be signed in to change notification settings - Fork 5
feat: Add the Tracing logging system #28
Conversation
merino-web/src/suggest.rs
Outdated
@@ -42,21 +44,33 @@ async fn suggest( | |||
suggesters: Data<SuggesterSet>, | |||
settings: Data<Settings>, | |||
) -> Result<HttpResponse, HandlerError> { | |||
let request_id = Uuid::new_v4(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using tracking-actix-web
, which comes with the concept of Request ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That crate was on my list, but I haven't looked at it yet. It seems useful. This was some of the remaining work I alluded to in the initial comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took a bit of bending to get it into a format that will eventually work for MozLog, but this is integrated now, and it seems to be an improvement.
@@ -27,6 +27,7 @@ impl RemoteSettingsSuggester { | |||
/// | |||
/// This must be called at least once before any suggestions will be provided | |||
pub async fn sync(&mut self, settings: &Settings) -> Result<()> { | |||
tracing::info!("Syncing quicksuggest records from Remote Settings"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we have multiple logging calls in this function, shall we add a #[tracing::instrument]
span to it?
Looks like we have a deviation on logging between Contile ( I am inclined to pick I believe that |
I don't have a strong preference either way about what Contile uses. I'll write up an ADR in this PR to explain why I chose tracing for this PR. |
Honestly, so long as the following criteria are met, I'm fine with using whatever:
|
this might be nice to have but isn't totally necessary as we can always call into the sentry client directly |
@mythmon Thanks for preparing this detailed ADR! Given the feedback (thanks JR and Phil!), I'd propose us to:
Thoughts? |
Here is some preview of the things I added. Notably, this is focused on showing that the logs are configurable. The exact set of logs, and the default levels of different libraries, are not set in stone (obviously). DocsDefault loggingLogging can be customizedWith |
I don't think I'd qualify sentry support as a "nice to have", since it's kind of how we monitor and track issues associated with the service. If this is using a different mechanism, that's fine, so long as Ops, WebSec and others are aware of it and are using it. I don't think we need to do the same "middleware" element that we use in other projects, since that can be hairy, but yeah, we do need to make sure that trackable, actionable errors are logged to sentry in some consistent manner. If we call directly, that's fine. |
merino-web/src/suggest.rs
Outdated
@@ -37,6 +38,7 @@ async fn setup_suggesters(settings: &Settings) -> Result<Vec<Box<dyn Suggester>> | |||
|
|||
/// Suggest content in response to the queried text. | |||
#[get("")] | |||
#[tracing::instrument(skip(settings, suggesters))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also skip query
here unless we find a use case for it and legal approves to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I skipped the other two because they were noisy. We should write down a policy about what we can/can't log.
@@ -0,0 +1,75 @@ | |||
//! Loggers for the request/response cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, looks like it's not trivial to hook up with tracing_actix_web unlike what I initially thought. Super cool that you got it worked out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic integration was very simple, I just needed to add an extra middleware. The complication came from wanting to customize the context that was reported, and get back the per request events that we had before. That is what most of the code is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I left one comment on skipping for your consideration.
Thanks!
That's a fair point, we definitely value the observability of our services. I've rephrased my proposal to drop the "nice-to-have" bit. Let's closely track the progress of that integration.
I have a question for you all (@jrconlin @pjenvey @mythmon), are we going to completely rely on Sentry for system monitoring and error/panic logging? To my understanding, Sentry is a great choice for the latter, but I've never used it for system monitoring before, such as recording request counters or measuring other system metrics/events. As we'd like to have extensive coverage on system monitoring (real-time) for both Contile and Merino, if Sentry is capable for both, that'll be pretty awesome. |
I'm going to say "it depends". For metric monitoring, I'm going to say we should use the known metric monitoring stuff we have (that goes to grafana and gets picked up by various tooling). The sentry bits are more about indicating error tendencies as well as action resolutions. There is also some external tooling that allows ops to monitor those and flag on spikes. We've found it very useful for spotting and managing issues with other services, and it reduces the number of tools that we have to check. I know that there is a log capture system that uses GCP and sends logging data through, but as I understand it, we need to be very mindful of not producing a lot of data for that because the costs can grow very high very quickly. (We hit an event where we had verbose logs with Push and accumulated a shocking bill after only a few days, so we're a bit gun shy about what data we log in production.) |
I think JR and I are in agreement here. Having data sent from Merino to Sentry to monitor and debug errors is very useful, and we shouldn't consider it optional. On the other hand, the Sentry PR to add first party support isn't strictly needed; we can send our own sentry alerts if needed. Having some amount of integration between Tracing and Sentry is good because it allows Sentry's view of errors to include richer context and make things easier to debug. Of course we'll need to be careful about the volume we send, as JR points out. Sending all of our observational metrics to Sentry would probably get very expensive, very fast. We have better tools for that. I'm going to fix up the remaining review issues, and then squash and merge this. I don't want it growing too much more. |
* Logs HTTP response data * Includes HTTP request information in log context * Can assert log lines during tests Note, this is not compatible with MozLog yet. ref #15
Includes functionality to assert log lines during tests.
Note, this is not compatible with MozLog yet. There is some more work that should be done to produce nice logs out of this, but this seems like a sizable enough chunk that I should get it reviewed before going forward.
ref #15