Skip to content
This repository has been archived by the owner on Apr 6, 2023. It is now read-only.

feat: Add the Tracing logging system #28

Merged
merged 1 commit into from
May 13, 2021
Merged

feat: Add the Tracing logging system #28

merged 1 commit into from
May 13, 2021

Conversation

mythmon
Copy link
Contributor

@mythmon mythmon commented May 7, 2021

Includes functionality to assert log lines during tests.

Note, this is not compatible with MozLog yet. There is some more work that should be done to produce nice logs out of this, but this seems like a sizable enough chunk that I should get it reviewed before going forward.

ref #15

@@ -42,21 +44,33 @@ async fn suggest(
suggesters: Data<SuggesterSet>,
settings: Data<Settings>,
) -> Result<HttpResponse, HandlerError> {
let request_id = Uuid::new_v4();
Copy link
Contributor

@ncloudioj ncloudioj May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using tracking-actix-web, which comes with the concept of Request ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That crate was on my list, but I haven't looked at it yet. It seems useful. This was some of the remaining work I alluded to in the initial comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took a bit of bending to get it into a format that will eventually work for MozLog, but this is integrated now, and it seems to be an improvement.

@@ -27,6 +27,7 @@ impl RemoteSettingsSuggester {
///
/// This must be called at least once before any suggestions will be provided
pub async fn sync(&mut self, settings: &Settings) -> Result<()> {
tracing::info!("Syncing quicksuggest records from Remote Settings");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have multiple logging calls in this function, shall we add a #[tracing::instrument] span to it?

@ncloudioj
Copy link
Contributor

Looks like we have a deviation on logging between Contile (slog) and Merino (tracing). To reduce the complexities of development and maintenance, let's see if we can unify them by choosing one of those two.

I am inclined to pick tracing over slog for Contextual Services mainly for its affinity to the Tokio ecosystem. My biggest uncertainty with it is about its mozlog support, we might need to write our own tracing subscriber for that.

I believe that slog predates tracing and has been adopted by many existing CS projects. @pjenvey @jrconlin Do you think it's worth the switch for Contile as it doesn't bear any legacy burden for now? (Note: we don't need to do this for the current milestone)

@mythmon
Copy link
Contributor Author

mythmon commented May 11, 2021

Looks like we have a deviation on logging between Contile (slog) and Merino (tracing). To reduce the complexities of development and maintenance, let's see if we can unify them by choosing one of those two.

I don't have a strong preference either way about what Contile uses. I'll write up an ADR in this PR to explain why I chose tracing for this PR.

@mythmon mythmon self-assigned this May 11, 2021
@jrconlin
Copy link
Member

jrconlin commented May 11, 2021

Honestly, so long as the following criteria are met, I'm fine with using whatever:

  • integrates with Sentry for log reporting
  • provides ops with MozLog formatted output to STDERR
  • provides a "human friendly" output format for local debugging (which I believe tracing does already)

@pjenvey
Copy link
Member

pjenvey commented May 11, 2021

Honestly, so long as the following criteria are met, I'm fine with using whatever:

- [ ]  integrates with Sentry for log reporting

this might be nice to have but isn't totally necessary as we can always call into the sentry client directly

@ncloudioj
Copy link
Contributor

ncloudioj commented May 12, 2021

@mythmon Thanks for preparing this detailed ADR!

Given the feedback (thanks JR and Phil!), I'd propose us to:

  • Continue exploring the tracing logging for Merino
    • Add the MozLog support (P1)
    • Let's see if we could extract the common parts (boilerplate & utilities) into a reusable logging crate
    • Sentry integration (P2)
  • Contile will be sticking to the slog logging for now
    • It will switch over to the tracing logging once it gets stabilized in Merino

Thoughts?

@mythmon
Copy link
Contributor Author

mythmon commented May 12, 2021

Here is some preview of the things I added. Notably, this is focused on showing that the logs are configurable. The exact set of logs, and the default levels of different libraries, are not set in stone (obviously).

Docs

image

Default logging

image

Logging can be customized

With RUST_LOG=viaduct=WARN,remote_settings_client=WARN

image

@jrconlin
Copy link
Member

jrconlin commented May 12, 2021

I don't think I'd qualify sentry support as a "nice to have", since it's kind of how we monitor and track issues associated with the service. If this is using a different mechanism, that's fine, so long as Ops, WebSec and others are aware of it and are using it. I don't think we need to do the same "middleware" element that we use in other projects, since that can be hairy, but yeah, we do need to make sure that trackable, actionable errors are logged to sentry in some consistent manner. If we call directly, that's fine.

@@ -37,6 +38,7 @@ async fn setup_suggesters(settings: &Settings) -> Result<Vec<Box<dyn Suggester>>

/// Suggest content in response to the queried text.
#[get("")]
#[tracing::instrument(skip(settings, suggesters))]
Copy link
Contributor

@ncloudioj ncloudioj May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also skip query here unless we find a use case for it and legal approves to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I skipped the other two because they were noisy. We should write down a policy about what we can/can't log.

@@ -0,0 +1,75 @@
//! Loggers for the request/response cycle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, looks like it's not trivial to hook up with tracing_actix_web unlike what I initially thought. Super cool that you got it worked out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic integration was very simple, I just needed to add an extra middleware. The complication came from wanting to customize the context that was reported, and get back the per request events that we had before. That is what most of the code is doing.

Copy link
Contributor

@ncloudioj ncloudioj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I left one comment on skipping for your consideration.

Thanks!

@ncloudioj
Copy link
Contributor

I don't think I'd qualify sentry support as a "nice to have", since it's kind of how we monitor and track issues associated with the service.

That's a fair point, we definitely value the observability of our services. I've rephrased my proposal to drop the "nice-to-have" bit. Let's closely track the progress of that integration.

If this is using a different mechanism, that's fine, so long as Ops, WebSec and others are aware of it and are using it. I don't think we need to do the same "middleware" element that we use in other projects, since that can be hairy, but yeah, we do need to make sure that trackable, actionable errors are logged to sentry in some consistent manner. If we call directly, that's fine.

I have a question for you all (@jrconlin @pjenvey @mythmon), are we going to completely rely on Sentry for system monitoring and error/panic logging? To my understanding, Sentry is a great choice for the latter, but I've never used it for system monitoring before, such as recording request counters or measuring other system metrics/events. As we'd like to have extensive coverage on system monitoring (real-time) for both Contile and Merino, if Sentry is capable for both, that'll be pretty awesome.

@jrconlin
Copy link
Member

I'm going to say "it depends".

For metric monitoring, I'm going to say we should use the known metric monitoring stuff we have (that goes to grafana and gets picked up by various tooling).

The sentry bits are more about indicating error tendencies as well as action resolutions. There is also some external tooling that allows ops to monitor those and flag on spikes. We've found it very useful for spotting and managing issues with other services, and it reduces the number of tools that we have to check.

I know that there is a log capture system that uses GCP and sends logging data through, but as I understand it, we need to be very mindful of not producing a lot of data for that because the costs can grow very high very quickly. (We hit an event where we had verbose logs with Push and accumulated a shocking bill after only a few days, so we're a bit gun shy about what data we log in production.)

@mythmon
Copy link
Contributor Author

mythmon commented May 13, 2021

I think JR and I are in agreement here. Having data sent from Merino to Sentry to monitor and debug errors is very useful, and we shouldn't consider it optional. On the other hand, the Sentry PR to add first party support isn't strictly needed; we can send our own sentry alerts if needed.

Having some amount of integration between Tracing and Sentry is good because it allows Sentry's view of errors to include richer context and make things easier to debug. Of course we'll need to be careful about the volume we send, as JR points out.

Sending all of our observational metrics to Sentry would probably get very expensive, very fast. We have better tools for that.

I'm going to fix up the remaining review issues, and then squash and merge this. I don't want it growing too much more.

* Logs HTTP response data
* Includes HTTP request information in log context
* Can assert log lines during tests

Note, this is not compatible with MozLog yet.

ref #15
@mythmon mythmon enabled auto-merge May 13, 2021 17:01
@mythmon mythmon merged commit 1e494e9 into main May 13, 2021
@mythmon mythmon deleted the logging branch May 13, 2021 17:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants