Skip to content

Replace NewRelic browser instrumentation with custom error handler#8950

Merged
aduth merged 27 commits intomainfrom
aduth-try-custom-error-logger
Sep 14, 2023
Merged

Replace NewRelic browser instrumentation with custom error handler#8950
aduth merged 27 commits intomainfrom
aduth-try-custom-error-logger

Conversation

@aduth
Copy link
Contributor

@aduth aduth commented Aug 7, 2023

🛠 Summary of changes

Implements a frontend error logger, replacing the NewRelic browser instrumentation.

Why?

  • Removes the large NewRelic loader snippet from every page
    • The snippet exceeds 50% of the markup size of the Sign In page (15kb of 27kb, before compression)
    • The snippet is out-of-date, in part because the latest version of the snippet is even larger in size
    • The snippet is rendered inline and is therefore not cacheable
    • The snippet is render-blocking
    • The practical impact here is a reduction of 98.2%, from 6.84kb of uncacheable gzip-compressed JavaScript, to 124 bytes of cacheable brotli-compressed JavaScript.† Edit (after revisions): 99.0%, from 6.84kb to 67 bytes gzipped
  • Consolidates logging
  • Reduce number of third-party JavaScript scripts present in the page, to reduce vulnerability surface area
  • Allows for greater control over how errors are logged, e.g. excluding errors occurring from browser extensions
  • We don't use most of the frontend NewRelic features, and are largely interested in uncaught errors

📜 Testing Plan

  1. Go to http://localhost:3000
  2. tail -n1 log/events.log
  3. See "Frontend error" log result with name, message, and stack properties

@zachmargolis
Copy link
Contributor

  • We don't use most of the frontend NewRelic features, and are largely interested in uncaught errors

I'll admit I don't look at it often, but I do like knowing we have access to some of the page load times, as reported by the browser. I worry about the number of logs we'd create if we tried to log that ourselves, but some of that could be nice (loading time, time to first paint, etc etc)

@aduth
Copy link
Contributor Author

aduth commented Aug 8, 2023

  • We don't use most of the frontend NewRelic features, and are largely interested in uncaught errors

I'll admit I don't look at it often, but I do like knowing we have access to some of the page load times, as reported by the browser. I worry about the number of logs we'd create if we tried to log that ourselves, but some of that could be nice (loading time, time to first paint, etc etc)

It's a paradox where NewRelic's tooling to diagnose slow page loads would inevitably lead us to the conclusion that NewRelic's own tooling contributes most significantly to it? 😄

For this, I wasn't sure how much value we'd get out of this data vs. something we could debug or simulate locally, since I wouldn't really expect a lot of variability between usage in the wild. Alternatively, I'd wondered if the synthetic monitors would give us similar data in NewRelic, but act externally without needing to be part of each user's request.

@zachmargolis
Copy link
Contributor

For this, I wasn't sure how much value we'd get out of this data vs. something we could debug or simulate locally, since I wouldn't really expect a lot of variability between usage in the wild. Alternatively, I'd wondered if the synthetic monitors would give us similar data in NewRelic, but act externally without needing to be part of each user's request.

My understanding is that people's internet connections (which vary a lot) would definitely contribute to things like asset loading, etc etc, so we'd get much more value of the overall sense (and the long tail) by measuring real user usage. Maybe we could randomly sample 1/1000 pageviews and send that data so we get something but not an avalanche of everything

@aduth
Copy link
Contributor Author

aduth commented Aug 8, 2023

My understanding is that people's internet connections (which vary a lot) would definitely contribute to things like asset loading, etc etc, so we'd get much more value of the overall sense (and the long tail) by measuring real user usage. Maybe we could randomly sample 1/1000 pageviews and send that data so we get something but not an avalanche of everything

I'm sure it's not 100% accurate, but we can simulate network conditions and even CPU conditions with Chrome's tooling:

Looks like it's even built-in to the new "Performance insights" tab:

image

@aduth aduth force-pushed the aduth-try-custom-error-logger branch from 39cc667 to 547c42b Compare August 29, 2023 19:22
Comment on lines +41 to +45
if error_event?
NewRelic::Agent.notice_error(FrontendError.new, custom_params: log_params[:payload].to_h)
else
frontend_logger.track_event(log_params[:event], log_params[:payload].to_h)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we brought back the "symbol or proc" logic we used to have before #7110 in EVENT_MAP and then supply a proc for this event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we brought back the "symbol or proc" logic we used to have before #7110 in EVENT_MAP and then supply a proc for this event?

Ooh, that's a very interesting idea! Let me give that a shot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had fun and gave it a shot here #9117

interface NewRelicGlobals {
newrelic?: NewRelicAgent;
}
export { default as isTrackableErrorEvent } from './is-trackable-error-event';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up task: Would like to move trackEvent and trackError to dedicated files track-event.ts and track-error.ts so this index.ts can actually be an index.

false,
) %>
<%= javascript_packs_tag_once('application', prepend: true) %>
<%= javascript_packs_tag_once('track-errors') if BrowserSupport.supported?(request.user_agent) %>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Can we defer this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Can we defer this?

I think we can and should make this <script async defer>, with the benefit being that the page can finish being considered "loaded" without being blocked by the tracker.

I still need to get some clarification about async script load order when combined with non-async scripts (related Slack discussion), and this'll require some revisions to ScriptHelper / AssetSources to make sure the extra attributes are carried across to the rendered script tag.

I'll plan to explore this as a follow-on task.

@aduth
Copy link
Contributor Author

aduth commented Sep 5, 2023

I think this is in mostly good shape as-is. One idea I'd been contemplating is checking to see if it's feasible enough to directly call to NewRelic's browser error logging endpoints, i.e. treat this as a tiny drop-in replacement for their existing browser agent for the error logging in particular. That's quite a departure from the initial direction and not sure it's worth pursuing, but figured I should at least check.

@zachmargolis
Copy link
Contributor

One idea I'd been contemplating is checking to see if it's feasible enough to directly call to NewRelic's browser error logging endpoints, i.e. treat this as a tiny drop-in replacement for their existing browser agent for the error logging in particular.

I think that would be a great follow-up PR!

Honestly I think it would be worth considering having this log to events.log only first, just to understand how many of these we'd be seeing/sending, before sending to NewRelic

Or maybe set expected: true (see api docs) so that we don't immediately affect monitoring, etc

@aduth
Copy link
Contributor Author

aduth commented Sep 13, 2023

I investigated how JavaScript error logging is currently handled in the NewRelic client, and it does seem pretty straight forward as far as POST-ing to a URL with the application ID and an embedded payload of error details, but it might take some sleuthing through the internals of their browser agent code to fully understand what all of the parameters mean.

I think to minimize the changes necessary to get this merged, I'll go ahead and add the expected: true argument so that we will log this to NewRelic APM, but avoid any unnecessary alerting until we get a handle on how expected these truly are.

@mitchellhenke mitchellhenke marked this pull request as ready for review September 13, 2023 17:12
@mitchellhenke mitchellhenke marked this pull request as draft September 13, 2023 17:12
zachmargolis and others added 9 commits September 14, 2023 10:30
… its thing

- bind_call calls a specific method implementation, so it doesn't go through
  the Enhancer implementation
The method signature is lost if read directly from analytics instance, since it's overridden by AnalyticsEventsEnhancer. We need the original method reference
@aduth aduth force-pushed the aduth-try-custom-error-logger branch from 62f9721 to c9d605c Compare September 14, 2023 14:36
@aduth aduth marked this pull request as ready for review September 14, 2023 14:41
@aduth
Copy link
Contributor Author

aduth commented Sep 14, 2023

I tested this using @mitchellhenke 's personal environment and verified that the errors are being logged as expected errors to NewRelic with the custom parameters https://onenr.io/08wogAPAbwx

Copy link
Contributor

@zachmargolis zachmargolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I tested this using @mitchellhenke 's personal environment and verified that the errors are being logged as expected errors to NewRelic with the custom parameters https://onenr.io/08wogAPAbwx

This didn't quite match my expecations:

  • name and message are the same?
  • the thing I expected to be message was inside stack?

    Error: Example error at https://..../packs/js/password_toggle_component-fba98070.digested.js:1:360

Either way, I think it's good enough for now, and we can always iterate and improve later

@aduth
Copy link
Contributor Author

aduth commented Sep 14, 2023

This didn't quite match my expecations:

That is strange! What I'm seeing sent from the browser matches my expectations, so not sure where it's getting lost along the way. Maybe NewRelic is doing its own thing with those parameters, since I could see message being pretty common.

Example payload I see in testing:

message: "Example error"
name: "Error"
stack: "Error: Example error\n    at https://[...]/packs/js/track-errors-02b3022a.digested.js:1:456"

Easier to see grouped in alphabetical order, avoid conflicting names
@aduth
Copy link
Contributor Author

aduth commented Sep 14, 2023

I think it's definitely a naming conflict thing. Also, namespacing helps make it more easy to see the properties together anyways, so this should be improved now after 4bdf111 . See https://onenr.io/0Owvg5MzaRv

image

@aduth aduth merged commit a7ed678 into main Sep 14, 2023
@aduth aduth deleted the aduth-try-custom-error-logger branch September 14, 2023 20:05
@aduth aduth changed the title Try replacing NewRelic browser instrumentation with custom error handler Replace NewRelic browser instrumentation with custom error handler Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants