feat: Add the Tracing logging system #28

mythmon · 2021-05-07T21:27:46Z

Includes functionality to assert log lines during tests.

Note, this is not compatible with MozLog yet. There is some more work that should be done to produce nice logs out of this, but this seems like a sizable enough chunk that I should get it reviewed before going forward.

ref #15

ncloudioj · 2021-05-10T19:38:20Z

merino-web/src/suggest.rs

@@ -42,21 +44,33 @@ async fn suggest(
    suggesters: Data<SuggesterSet>,
    settings: Data<Settings>,
 ) -> Result<HttpResponse, HandlerError> {
+    let request_id = Uuid::new_v4();


Have you considered using tracking-actix-web, which comes with the concept of Request ID?

That crate was on my list, but I haven't looked at it yet. It seems useful. This was some of the remaining work I alluded to in the initial comment.

It took a bit of bending to get it into a format that will eventually work for MozLog, but this is integrated now, and it seems to be an improvement.

ncloudioj · 2021-05-10T20:23:59Z

merino-adm/src/remote_settings.rs

@@ -27,6 +27,7 @@ impl RemoteSettingsSuggester {
    ///
    /// This must be called at least once before any suggestions will be provided
    pub async fn sync(&mut self, settings: &Settings) -> Result<()> {
+        tracing::info!("Syncing quicksuggest records from Remote Settings");


Looks like we have multiple logging calls in this function, shall we add a #[tracing::instrument] span to it?

ncloudioj · 2021-05-10T21:03:36Z

Looks like we have a deviation on logging between Contile (slog) and Merino (tracing). To reduce the complexities of development and maintenance, let's see if we can unify them by choosing one of those two.

I am inclined to pick tracing over slog for Contextual Services mainly for its affinity to the Tokio ecosystem. My biggest uncertainty with it is about its mozlog support, we might need to write our own tracing subscriber for that.

I believe that slog predates tracing and has been adopted by many existing CS projects. @pjenvey @jrconlin Do you think it's worth the switch for Contile as it doesn't bear any legacy burden for now? (Note: we don't need to do this for the current milestone)

mythmon · 2021-05-11T16:43:48Z

Looks like we have a deviation on logging between Contile (slog) and Merino (tracing). To reduce the complexities of development and maintenance, let's see if we can unify them by choosing one of those two.

I don't have a strong preference either way about what Contile uses. I'll write up an ADR in this PR to explain why I chose tracing for this PR.

jrconlin · 2021-05-11T17:16:11Z

Honestly, so long as the following criteria are met, I'm fine with using whatever:

integrates with Sentry for log reporting
provides ops with MozLog formatted output to STDERR
provides a "human friendly" output format for local debugging (which I believe tracing does already)

pjenvey · 2021-05-11T23:06:12Z

Honestly, so long as the following criteria are met, I'm fine with using whatever:
- [ ]  integrates with Sentry for log reporting

this might be nice to have but isn't totally necessary as we can always call into the sentry client directly

ncloudioj · 2021-05-12T01:41:54Z

@mythmon Thanks for preparing this detailed ADR!

Given the feedback (thanks JR and Phil!), I'd propose us to:

Continue exploring the tracing logging for Merino
- Add the MozLog support (P1)
- Let's see if we could extract the common parts (boilerplate & utilities) into a reusable logging crate
- Sentry integration (P2)
Contile will be sticking to the slog logging for now
- It will switch over to the tracing logging once it gets stabilized in Merino

Thoughts?

mythmon · 2021-05-12T20:20:45Z

Here is some preview of the things I added. Notably, this is focused on showing that the logs are configurable. The exact set of logs, and the default levels of different libraries, are not set in stone (obviously).

Docs

Default logging

Logging can be customized

With RUST_LOG=viaduct=WARN,remote_settings_client=WARN

jrconlin · 2021-05-12T23:48:46Z

I don't think I'd qualify sentry support as a "nice to have", since it's kind of how we monitor and track issues associated with the service. If this is using a different mechanism, that's fine, so long as Ops, WebSec and others are aware of it and are using it. I don't think we need to do the same "middleware" element that we use in other projects, since that can be hairy, but yeah, we do need to make sure that trackable, actionable errors are logged to sentry in some consistent manner. If we call directly, that's fine.

ncloudioj · 2021-05-13T02:29:56Z

merino-web/src/suggest.rs

@@ -37,6 +38,7 @@ async fn setup_suggesters(settings: &Settings) -> Result<Vec<Box<dyn Suggester>>

 /// Suggest content in response to the queried text.
 #[get("")]
+#[tracing::instrument(skip(settings, suggesters))]


I think we should also skip query here unless we find a use case for it and legal approves to do so.

Good call. I skipped the other two because they were noisy. We should write down a policy about what we can/can't log.

ncloudioj · 2021-05-13T02:40:40Z

merino-web/src/logging.rs

@@ -0,0 +1,75 @@
+//! Loggers for the request/response cycle.


Wow, looks like it's not trivial to hook up with tracing_actix_web unlike what I initially thought. Super cool that you got it worked out!

The basic integration was very simple, I just needed to add an extra middleware. The complication came from wanting to customize the context that was reported, and get back the per request events that we had before. That is what most of the code is doing.

ncloudioj

This looks great! I left one comment on skipping for your consideration.

Thanks!

ncloudioj · 2021-05-13T02:59:58Z

I don't think I'd qualify sentry support as a "nice to have", since it's kind of how we monitor and track issues associated with the service.

That's a fair point, we definitely value the observability of our services. I've rephrased my proposal to drop the "nice-to-have" bit. Let's closely track the progress of that integration.

If this is using a different mechanism, that's fine, so long as Ops, WebSec and others are aware of it and are using it. I don't think we need to do the same "middleware" element that we use in other projects, since that can be hairy, but yeah, we do need to make sure that trackable, actionable errors are logged to sentry in some consistent manner. If we call directly, that's fine.

I have a question for you all (@jrconlin @pjenvey @mythmon), are we going to completely rely on Sentry for system monitoring and error/panic logging? To my understanding, Sentry is a great choice for the latter, but I've never used it for system monitoring before, such as recording request counters or measuring other system metrics/events. As we'd like to have extensive coverage on system monitoring (real-time) for both Contile and Merino, if Sentry is capable for both, that'll be pretty awesome.

jrconlin · 2021-05-13T03:24:25Z

I'm going to say "it depends".

For metric monitoring, I'm going to say we should use the known metric monitoring stuff we have (that goes to grafana and gets picked up by various tooling).

The sentry bits are more about indicating error tendencies as well as action resolutions. There is also some external tooling that allows ops to monitor those and flag on spikes. We've found it very useful for spotting and managing issues with other services, and it reduces the number of tools that we have to check.

I know that there is a log capture system that uses GCP and sends logging data through, but as I understand it, we need to be very mindful of not producing a lot of data for that because the costs can grow very high very quickly. (We hit an event where we had verbose logs with Push and accumulated a shocking bill after only a few days, so we're a bit gun shy about what data we log in production.)

mythmon · 2021-05-13T16:45:37Z

I think JR and I are in agreement here. Having data sent from Merino to Sentry to monitor and debug errors is very useful, and we shouldn't consider it optional. On the other hand, the Sentry PR to add first party support isn't strictly needed; we can send our own sentry alerts if needed.

Having some amount of integration between Tracing and Sentry is good because it allows Sentry's view of errors to include richer context and make things easier to debug. Of course we'll need to be careful about the volume we send, as JR points out.

Sending all of our observational metrics to Sentry would probably get very expensive, very fast. We have better tools for that.

I'm going to fix up the remaining review issues, and then squash and merge this. I don't want it growing too much more.

* Logs HTTP response data * Includes HTTP request information in log context * Can assert log lines during tests Note, this is not compatible with MozLog yet. ref #15

mythmon requested a review from a team May 7, 2021 21:27

mythmon force-pushed the logging branch from a0f2406 to 2792711 Compare May 7, 2021 21:30

ncloudioj reviewed May 10, 2021

View reviewed changes

mythmon self-assigned this May 11, 2021

ncloudioj reviewed May 13, 2021

View reviewed changes

ncloudioj approved these changes May 13, 2021

View reviewed changes

feat: Add a logging system based on Tracing

1eb4359

* Logs HTTP response data * Includes HTTP request information in log context * Can assert log lines during tests Note, this is not compatible with MozLog yet. ref #15

mythmon force-pushed the logging branch from 8eff303 to 1eb4359 Compare May 13, 2021 17:00

mythmon enabled auto-merge May 13, 2021 17:01

mythmon merged commit 1e494e9 into main May 13, 2021

mythmon deleted the logging branch May 13, 2021 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add the Tracing logging system #28

feat: Add the Tracing logging system #28

mythmon commented May 7, 2021 •

edited

Loading

ncloudioj May 10, 2021 •

edited

Loading

mythmon May 11, 2021

mythmon May 12, 2021

ncloudioj May 10, 2021

ncloudioj commented May 10, 2021

mythmon commented May 11, 2021

jrconlin commented May 11, 2021 •

edited by mythmon

Loading

pjenvey commented May 11, 2021

ncloudioj commented May 12, 2021 •

edited

Loading

mythmon commented May 12, 2021 •

edited

Loading

jrconlin commented May 12, 2021 •

edited

Loading

ncloudioj May 13, 2021 •

edited

Loading

mythmon May 13, 2021

ncloudioj May 13, 2021

mythmon May 13, 2021

ncloudioj left a comment

ncloudioj commented May 13, 2021

jrconlin commented May 13, 2021

mythmon commented May 13, 2021

feat: Add the Tracing logging system #28

feat: Add the Tracing logging system #28

Conversation

mythmon commented May 7, 2021 • edited Loading

ncloudioj May 10, 2021 • edited Loading

Choose a reason for hiding this comment

mythmon May 11, 2021

Choose a reason for hiding this comment

mythmon May 12, 2021

Choose a reason for hiding this comment

ncloudioj May 10, 2021

Choose a reason for hiding this comment

ncloudioj commented May 10, 2021

mythmon commented May 11, 2021

jrconlin commented May 11, 2021 • edited by mythmon Loading

pjenvey commented May 11, 2021

ncloudioj commented May 12, 2021 • edited Loading

mythmon commented May 12, 2021 • edited Loading

Docs

Default logging

Logging can be customized

jrconlin commented May 12, 2021 • edited Loading

ncloudioj May 13, 2021 • edited Loading

Choose a reason for hiding this comment

mythmon May 13, 2021

Choose a reason for hiding this comment

ncloudioj May 13, 2021

Choose a reason for hiding this comment

mythmon May 13, 2021

Choose a reason for hiding this comment

ncloudioj left a comment

Choose a reason for hiding this comment

ncloudioj commented May 13, 2021

jrconlin commented May 13, 2021

mythmon commented May 13, 2021

mythmon commented May 7, 2021 •

edited

Loading

ncloudioj May 10, 2021 •

edited

Loading

jrconlin commented May 11, 2021 •

edited by mythmon

Loading

ncloudioj commented May 12, 2021 •

edited

Loading

mythmon commented May 12, 2021 •

edited

Loading

jrconlin commented May 12, 2021 •

edited

Loading

ncloudioj May 13, 2021 •

edited

Loading