Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] applicationinsights-web is leaking memory in NextJS SSR #59

Closed
mauron85 opened this issue Oct 6, 2023 · 17 comments
Closed

[BUG] applicationinsights-web is leaking memory in NextJS SSR #59

mauron85 opened this issue Oct 6, 2023 · 17 comments
Assignees

Comments

@mauron85
Copy link

mauron85 commented Oct 6, 2023

Description/Screenshot

image

image

Steps to Reproduce

  • OS/Browser: AppService Container/ Windows 10 (Browser is N/A since leak happens in NextJS nodeJs server)
  • React Version: 18
  • SDK Version [e.g. 22]:
    @microsoft/applicationinsights-react-js: 17.0.1
    @microsoft/applicationinsights-web: 3.0.3
  • NodeJs Version: 18.17.1
  • NextJs Version: 13.5.2
  • How you initialized the SDK:
import {ApplicationInsights, ITelemetryItem} from '@microsoft/applicationinsights-web';
import {ReactPlugin} from '@microsoft/applicationinsights-react-js';

const reactPlugin = new ReactPlugin();
const appInsights = new ApplicationInsights({
 config: {
   connectionString: "instrumentationKey=test",
   extensions: [reactPlugin],
   enableAutoRouteTracking: true,
   disableAjaxTracking: false,
   autoTrackPageVisitTime: true,
   enableCorsCorrelation: true,
   enableRequestHeaderTracking: true,
   enableResponseHeaderTracking: true,
 }
});
appInsights.loadAppInsights();

appInsights.addTelemetryInitializer((env:ITelemetryItem) => {
   env.tags = env.tags || [];
   env.tags["ai.cloud.role"] = "testTag";
});

export { reactPlugin, appInsights };

Heap Snapshots for analysis in Chrome:
heapsnaps.zip

Min repo:

https://github.com/mauron85/react-appinsights

Install clinic.js and autocannon for testing:

npm install -g autocannon
npm install -g clinic

Build example app:

npm run build
cd .next/
cd .\.next\standalone\
clinic doctor -- node server.js

test load:

autocannon -c 100 -d 600 http://127.0.0.1:3000/

Autocannon will run 100 concurrent connections for 10 min.
After 10min CTRL+C to to stop server and clinicjs will generate mem usage report.

Expected behavior

Memory is not leaking

Additional context

App is constantly leaking memory on server side NextJS server. After removal of appinsights, memory consumption is flat.

image

In real world app, effect is applified greatly.

MicrosoftTeams-image

Tasks

No tasks being tracked yet.
@MSNev MSNev self-assigned this Oct 6, 2023
@MSNev MSNev added the investigating Investigating the issue label Oct 6, 2023
@MSNev
Copy link
Contributor

MSNev commented Oct 6, 2023

Ok, I've not been able to repo your issue so far, because the repo steps above don't work for the min example

npm run build
cd .next/
cd .\.next\standalone\
clinic doctor -- node server.js

But looking at the heap dumps and the pattern that is occurring, it seems to me that on the server side it is creating a "new" Application Insights for every request, which is initializing (and never unloading), so every request we keep creating new instances as there are "lots" of importModuleDynamically.

If this is true then there will be 2 side effects

  1. Excessive and growing memory usage (as reported)
  2. Every initialized instance will "hook" the fetch (which is possibly why its listed as a retained) as part of this hooking it will keep references to the current instance (that just hooked the code) so that when a fetch request occurs it can report to the relevant instance
  • This WILL also cause 1 fetch request to "report" multiple events (to every initialized instance) -- which will get worse the more instances that are initialized causing more memory and CPU load (not including the batching timers)

We support (and need to support) having multiple instances loaded and running in a single runtime (usually a browser) as each instance (is normally) for a different component (instrumentationKey) and therefore is reporting to different people.

Still assuming that my assumption (about the multiple instances being initialized) is correct you have 2 options

  1. Update your code to either
  • Use a shared single instance, if supporting multiple concurrent requests this could get challenging and the existing react wrapper probably won't work.
  • Use a "pool" of instances (assuming you want to handle multiple concurrent inbound requests on your server) and when the server has finished with the request it can return the instance back to the pool. Thus you should only ever have X instances based on how many received concurrent requests you received. There (might) be event flushing issues depending on how the runtime "handles" timers for objects where the requests are no longer active.
  1. We have an unload function, which is designed to fully unload the SDK, remove all of it hooks and flush all telemetry (freeing up memory, resources and CPU).
  • This also causes all batch events to be immediately flushed (sent) (which you may not want / need) because this will cause a new outbound network request.
  • You can initialize (loadAppInsights) and unload the same instance multiple times if you wish (this would save loading the same "module" dynamically for every request (so I would suggest a pooling system for this -- assuming the runtime doesn't handle this already). By default the unload is asynchronous, so you can't call unload and then immediately call loadAppInsights on the same instance, But you can request a synchronous unload(false) as well as either give it a callback or use await https://github.com/microsoft/ApplicationInsights-JS/blob/4c326f7cd858c37fef71904e08ccccdd4e1e08d9/AISKU/src/AISku.ts#L454-L480
  • When using synchronous unload this WILL attempt to flush all batched events synchronously (which for a browser would be either sendBeacon or a synchronous XHR request, for node (and therefore fetch (node 20+) it will try and use fetch (with the keep-alive flag), but it will need to "assume" that the request is successful (so any send errors are ignored).

Still assuming this is the root issue, I would suggest that you implement a combination of the above

  • A "pool" of instances where you "borrow (or create)" instances and "return" to the pool for each server request
  • Have a "timeout" for the "pool" that "frees up (calls unload on each instance) if the instance is stale (been in the pool and unused for some period of time)
  • You should also probably have a "limit" on the number of entries, either indirectly based on the number of inbound requests or when you "borrow" (fetch from the pool) this is asynchronous so that it can "wait" for any previous request to finish and return a previous instance to the pool before creating a new one.

Note: I'm not a Next.Js developer and I've not used it for a server side "node" implementation. The above is derived (and assumed) from my previous history with other server side environments / runtimes (mostly Java, C++ and .Net -- at massive scales)

@MSNev
Copy link
Contributor

MSNev commented Oct 6, 2023

Can you either please provide a repo for your simple example, or confirm that the above assumption is likely the issue?

@MSNev MSNev added information requested Issue is waiting on additional information from the user and removed investigating Investigating the issue labels Oct 6, 2023
@mauron85
Copy link
Author

mauron85 commented Oct 6, 2023

Thank you @MSNev for investigation. I'm will answer all your questions, but let me provide instructions for min repo:
(I just tried to clone and was working on fresh clone)

git clone https://github.com/mauron85/react-appinsights.git
cd .\react-appinsights\
npm i
npm run build
cd .\.next\standalone\
clinic doctor -- node server.js

But make sure you have clinic js installed globally.

@mauron85
Copy link
Author

mauron85 commented Oct 6, 2023

But looking at the heap dumps and the pattern that is occurring, it seems to me that on the server side it is creating a "new" Application Insights for every request, which is initializing (and never unloading), so every request we keep creating new instances as there are "lots" of importModuleDynamically.

yes, this is probably the case. I'm not nextJS internals expert. But I believe AppInsights context - https://github.com/mauron85/react-appinsights/blob/main/pages/_app.tsx#L7 is initializing for every request creating new instance of reactPlugin.

I understand your recommendations about creating shared pool or unload, however not sure if there is any request lifecycle method in NextJS I can hook into. At least not officially in their docs.

@MSNev MSNev removed the information requested Issue is waiting on additional information from the user label Oct 6, 2023
@mauron85
Copy link
Author

mauron85 commented Oct 6, 2023

I put console log into components\ApplicationInsightsService.ts to detect if NextJS is creating new instances of ApplicationInsights and ReactPlugin for every request. But it is not. It is only called once, meaning there is only single instance of both.

@MSNev
Copy link
Contributor

MSNev commented Oct 6, 2023

Hmm, I'm not sure whats going on there as this definitely "looks" like a multiple instance situation.

You could try adding disableAjaxTracking with true to stop the SDK from hooking the fetch (this will also stop reporting any and all "RemoteDependency" requests (which you may not need / want or getting anyway) as this will stop the "hooking" of the fetch API and therefore the references in the heap captures should not exist in the first place and then all "forgotten" references (should) be eligible for Garbage collection.

(If) this stops the memory from consistently increasing then that "would" suggest that it is creating / loading multiple instances....

Then "perhaps" the location with the console.log is not in the place that is getting duplicated or is the runtime doing some optimization where if the same message is logged it just "increases" a count of ignores the value??? I know in the latest browsers the console (in F12 dev tools) only lists repeated messages once but has a "tag" saying the number of repeated times that line is displayed -- Stabbing in the dark on this.

@mauron85
Copy link
Author

mauron85 commented Oct 6, 2023

The console.log is for sure not deduplicated. It's actually writes to windows cmd/terminal (since it's running in nodejs process due SSR).

I did change disableAjaxTracking: true but still memory is leaking. Will try to disable all settings one by one.

EDIT: still leaking with config, but mem profile is bit different

{
    connectionString: "instrumentationKey=test",
    extensions: [reactPlugin],
    enableAutoRouteTracking: false,
    disableAjaxTracking: true,
    autoTrackPageVisitTime: false,
    enableCorsCorrelation: false,
    enableRequestHeaderTracking: false,
    enableResponseHeaderTracking: false,
  }

image

@MSNev
Copy link
Contributor

MSNev commented Oct 9, 2023

@siyuniu-ms can you please try and to reproduce this locally

@siyuniu-ms
Copy link
Contributor

Hi, I cloned the repo locally to reproduce the problem.
I follow the steps
git clone https://github.com/mauron85/react-appinsights.git
cd .\react-appinsights\
npm i
npm run build
cd .\.next\standalone\
clinic doctor -- node server.js
Then open another cmd and run
autocannon -c 100 -d 600 http://127.0.0.1:3000/

After it finished, I press CLRT+C and get following error:
image

Could you take a look if there is any step I miss here?

@mauron85
Copy link
Author

@siyuniu-ms what node version are you using? Please try with Node 18.17.1 (or later).

Maybe you are experiencing this bug: clinicjs/node-clinic#430

@siyuniu-ms
Copy link
Contributor

@siyuniu-ms what node version are you using? Please try with Node 18.17.1 (or later).

Maybe you are experiencing this bug: clinicjs/node-clinic#430

Thanks. That helps work.

@siyuniu-ms
Copy link
Contributor

investigate ongoing......

@siyuniu-ms
Copy link
Contributor

Hi @mauron85 ! I've replicated the issue at the doctor clinic, and it seems to yield the same result as the image you provided. However, since the tool used there doesn't offer detailed information about memory usage, it's proving challenging to conduct a deeper investigation.

Could you please share the method you used to generate the screenshot (https://user-images.githubusercontent.com/359450/273190291-3d9f7a94-773f-44fe-9b68-96c017040d40.png)? Additionally, I'd greatly appreciate it if you could provide instructions on how to generate Heap Snapshots for analysis in Chrome using the testapp. I attempted to ping/call track locally after running the server, but I couldn't spot any requests using the Chrome DevTools.

@mauron85
Copy link
Author

The image was captured from app live production app on appservice monitoring site. So I assume this is not an option for you.

Regarding chrome snapshots. You can:

npm run build
node --inspect .\.next\standalone\server.js

And connect to debugging instance of nodejs from chrome by visiting chrome://inspect - and then clicking link "Open dedicated devtools for Node"

@siyuniu-ms
Copy link
Contributor

siyuniu-ms commented Jan 9, 2024

Hi @mauron85, my apologies for the delayed response. It took us a considerable amount of time to replicate the issue, but during my local testing, I didn't observe any memory leaks. Here's a summary of the steps I took:

  1. Ran the provided repository in inspect mode and exported the app-insights for external calls.
  2. Used the Chrome DevTools to perform allocation instrumentation on the timeline, specifically designed to isolate memory leaks.
  3. Generated 100 requests to the local server every second and recorded telemetry every 5 seconds.
  4. Stored the memory graph and compared.
    I conducted experiments for 1 hour, 5 hours, and 24 hours, and the memory graph consistently showed that garbage collection is functioning well.

@MSNev MSNev removed their assignment Mar 4, 2024
@MSNev MSNev added the information requested Issue is waiting on additional information from the user label Mar 4, 2024
@siyuniu-ms
Copy link
Contributor

may related to microsoft/ApplicationInsights-JS#2311

@MSNev
Copy link
Contributor

MSNev commented Jul 2, 2024

I believe that this issue is now addressed by the above fix from @microsoft/ApplicationInsights-JS

@MSNev MSNev closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants