Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] unhandledException TypeError: segment.transaction.isActive is not a function #281

Closed
ali-essam opened this issue Jun 20, 2019 · 23 comments

Comments

@ali-essam
Copy link

newrelic causes unhandledException and crashes the process

TypeError: segment.transaction.isActive is not a function
    at Shim.getActiveSegment (/usr/src/app/node_modules/newrelic/lib/shim/shim.js:1329:61)
    at AsyncHook.afterHook (/usr/src/app/node_modules/newrelic/lib/instrumentation/core/async_hooks.js:88:31)
    at emitAfterNative (internal/async_hooks.js:164:40)
    at <anonymous>
    at process._tickDomainCallback [as _tickCallback] (internal/process/next_tick.js:229:7)
@astormnewrelic
Copy link
Contributor

@ali-essam Thanks for reporting this issue, and for the workaround PR you submitted.

Do you happen to know what/how this is being triggered? The fact that

  1. segment.transaction IS set
  2. But it's isActive property is not a function

is super confusing, and shouldn't be possible. Obviously it is possible, but we're curious as to what might be causing it. The fix you provided will prevent this error, but it also means a lot of data might be silently dropped on the floor, and we're not sure that's the right long term solution. (Although folks running into this issue are free to give your PR a try)

Thinking this through out loud ...

  1. Are you able to reproduce this problem at will, or does it just happen randomly? If so how often?
  2. Can you reproduce this issue with a small test case, or does it require your entire application to be loaded?
  3. The stack trace says segment.transaction.isActive isn't a function -- can you use a debugger (or simply temporary logging) to figure out what segment.transaction.isActive is?

We basically want to find out what's replacing the isActive function, and then come up with a fix that will allow us to continue recording data.

@ali-essam
Copy link
Author

Hi @astormnewrelic,

Thanks for the quick reply.

I totally understand your concern regarding losing data silently, this hack was just to prevent the application from crashing.

  • I could only see this issue in one of our components in production, but it's a huge one with many modules loaded, so replicating it with a smaller app might not be trivial.
  • Also it doesn't happen all the time, it randomly happens after receiving a number of requests. I'm suspecting that it happens with a specific request.

However I'll try logging in production all the info I can about the segment and the transaction (in addition to the hack to prevent crashing), and will get back to you with the results.

@ali-essam
Copy link
Author

I'm currently running newrelic in prod from this branch master...ali-essam:tmp/debug-is-active. Now we wait and see if this issue happens again

@ali-essam
Copy link
Author

So I was trying to log the segment object console.log('NEWRELICDEBUG segment', stringify(segment)), but this itself fails with this error in the overridden toJSON function in the TraceSegment with the following stacktrace.

Seems like segment.transaction.trace is undefined but the code assumes it just exists.

 TypeError: Cannot read property 'root' of undefined
    at TraceSegment.toJSON (/usr/src/app/node_modules/newrelic/lib/transaction/trace/segment.js:416:77)
    at JSON.stringify (<anonymous>)
    at stringify (/usr/src/app/node_modules/json-stringify-safe/stringify.js:5:15)
    at Shim.getActiveSegment (/usr/src/app/node_modules/newrelic/lib/shim/shim.js:1340:44)
    at AsyncHook.afterHook (/usr/src/app/node_modules/newrelic/lib/instrumentation/core/async_hooks.js:88:31)
    at emitAfterNative (internal/async_hooks.js:164:40)
    at <anonymous>
    at process._tickDomainCallback [as _tickCallback] (internal/process/next_tick.js:229:7)

@ali-essam
Copy link
Author

ali-essam commented Jun 25, 2019

Okay, so that's what I got from the logs

NEWRELICDEBUG segment.transaction "[Circular]"
NEWRELICDEBUG segment.transaction.isActive undefined
NEWRELICDEBUG typeof segment.transaction.isActive "undefined"

However, I also noticed a crash with the following stacktrace. Seems like this.tracer is undefined too

TypeError: Cannot read property 'segment' of undefined
    at WebFrameworkShim.applySegment (/usr/src/app/node_modules/newrelic/lib/shim/shim.js:1431:28)
    at wrapper (/usr/src/app/node_modules/newrelic/lib/shim/shim.js:2084:17)
    at /usr/src/app/lib/express_async.js:9:7
    at <anonymous>
    at process._tickDomainCallback [as _tickCallback] (internal/process/next_tick.js:229:7)

I will keep monitoring to see the rate.

@astormnewrelic Is there any piece of info I might log to help us debug? Since I'm unable to log the whole segment

@ali-essam
Copy link
Author

Here are some findings, let me know if you have some thoughts

Screenshot from 2019-06-25 13-48-20
Note: The numbers are multiplied by 3 because the log appears 3 times for each occurance.

This seems to happen in bursts over the course of the last 1 hour. Logs from each burst have the same exact timestamp. After each burst of logs, I see a log of the crash mentioned before (TypeError: Cannot read property 'segment' of undefined)

@astormnewrelic
Copy link
Contributor

@ali-essam Interesting -- like most debugging it leads to even more questions :)

Can you tie the bursting to particular routes (or code paths if this isn't a web application/service) being hit? If so, are those routes doing anything fancy/clever that your other routes are not doing?

Talking out loud, it looks like somehow the segment's current transaction ends up being being assigned as a circular reference.

segment.transaction = segment

Obviously a bug, but the trick will be figuring out what code paths trigger it so we can reproduce the behavior and figure out which assignment is the problematic one.

@ali-essam
Copy link
Author

ali-essam commented Jul 1, 2019

So I added some more logs, changed the json stringify library to one that handles throws and circular in a bit better way, also disabled the overridden toJSON implementation with undefined before serialization to avoid the bug in the serializer.

master...ali-essam:tmp/debug-is-active-3

NEWRELICDEBUG segment "[Throws: Cannot read property 'root' of undefined]"
NEWRELICDEBUG segment (no toJSON) {"name":"Nodejs/Middleware/Expressjs/<anonymous>","transaction":"[Circular]","attributes":{"limit":null,"attributes":{}},"children":["[Throws: Cannot read property 'root' of undefined]","[Throws: Cannot read property 'root' of undefined]","[Circular]"],"id":"bc419693fb4de8","timer":{"state":2,"touched":true,"hrDuration":[0,186400],"hrstart":[1092088,976960667],"start":1561976405726,"durationInMillis":null},"internal":false,"opaque":false,"shim":{"_logger":{"extra":{"component":"WebFrameworkShim","module":"express","framework":"Expressjs"},"options":{"_level":30,"enabled":true}},"_agent":"[Circular]","_toExport":null,"_debug":false,"moduleName":"express","_moduleRoot":"/usr/src/app/node_modules/express","_metrics":{"PREFIX":"Expressjs/","FRAMEWORK":"Expressjs","MIDDLEWARE":"Nodejs/Middleware/"}},"partialName":null,"_exclusiveDuration":null,"_collect":true,"host":null,"port":null,"state":"EXTERNAL","async":true,"ignore":false}
NEWRELICDEBUG segment.transaction "[Circular]"
NEWRELICDEBUG typeof segment.transaction "string"
NEWRELICDEBUG segment.transaction.isActive undefined
NEWRELICDEBUG typeof segment.transaction.isActive "undefined"

After some debugging, it doesn't actually seem that segment.transaction === segment. Because when I'm printing safeJsonStringify(segment.transaction) itself, I get "[Circular]", which means somehow that the object itself is circular on itself, I was only able to reproduce this kind of output in stringify by overriding the toJSON function to return the object itself

> const safeJsonStringify = require('safe-json-stringify')
undefined
> var x = {}
undefined
> x.toJSON = () => x
[Function]
> safeJsonStringify(x)
'"[Circular]"'

What I'm probably going to try now is to remove all the custom toJSON from the prototypes themselves so I can better debug the output

@mweibel
Copy link

mweibel commented Jul 16, 2019

I just had the same issue seemingly on production.
I don't have a big stacktrace unfortunately, error reporting could use some improvements there..
What I have is just this:

Screenshot 2019-07-16 at 17 09 13

// Edit:
I think it happened when starting the app, i.e. at the point when require('newrelic') is called. Not 100% sure, but it looks like that.

@castocolina
Copy link

/bug

@ali-essam
Copy link
Author

@astormnewrelic It's been sometime since I commented the custom TraceSegment.toJSON and since then the loggers that I added before don't fire anymore. However I can still see some crashes (less frequent) coming from a different part of the shim.js file.

TypeError: Cannot read property 'segment' of undefined
    at WebFrameworkShim.applySegment (/usr/src/app/node_modules/newrelic/lib/shim/shim.js:1434:28)
    at wrapper (/usr/src/app/node_modules/newrelic/lib/shim/shim.js:2087:17)
    at /usr/src/app/lib/express_async.js:9:7
    at <anonymous>
    at process._tickDomainCallback [as _tickCallback] (internal/process/next_tick.js:229:7)
1433|  var tracer = this.tracer
1434|  var prevSegment = tracer.segment
1435|  tracer.segment = segment

You can see the modified code here: master...ali-essam:tmp/debug-is-active-4

I will try to add more logs to this part and see what I will get

@bkapus2
Copy link

bkapus2 commented Jul 26, 2019

So my team had this same issue occurring.

Steps to resolve:
1 - determine when the segment that throws the error is being set
Screen Shot 2019-07-26 at 4 08 36 PM
2 - add a proxy getter/setter to that segments transaction
Screen Shot 2019-07-26 at 4 18 16 PM
3 - on that debugger statement look at the stack trace
Screen Shot 2019-07-26 at 4 20 21 PM
4 - find your culprit
Screen Shot 2019-07-26 at 4 10 59 PM

For us we were logging a Hapi request.response. The response had an .agent which was the newrelic Agent, which went into the pino logger library which used fast-safe-stringify which traversed the Agent object deeply and converted any circular references into the string '[Circular]'.

@wszgxa
Copy link

wszgxa commented Aug 27, 2019

@bkapus2 @ali-essam

Do you guys use the logger package: winston

I have the same problem in my graphql service, when I send multiple requests to another server. I . There will be an error:

this._segment && this._segment.probe('Segment removed from tracer')
                                   ^

TypeError: this._segment.probe is not a function

So I start to debug this one. Because we will send the request to another elastic search server, and the Elastic search client will print so much log which will affect me to see the log. So I remove the log.

const client = new elasticsearch.Client({
  host: ELASTICSEARCH_HOST,
  // log: LoggerForES 
})

After I remove it, the error never show again.

@ali-essam
Copy link
Author

@wszgxa Yes we do use winston, mainly for http request and error logging. But are you totally sure this is the issue not just correlation? If it is, do you have any thoughts how could this lead to this behavior?

@bkapus2
Copy link

bkapus2 commented Aug 28, 2019

If you want to know what exactly is happening in your system you will have to open up the node inspector and see what is happening (assuming you are reliably able to reproduce an error):
https://nodejs.org/de/docs/guides/debugging-getting-started/

I've attempted to reproduce with pino and winston just now, but am unable to reproduce what I was seeing before. My best guess is that that fast-safe-stringify mutates our new relic objects (https://github.com/davidmarkclements/fast-safe-stringify/blob/master/index.js#L23). The new relic objects have some setters (would need more time to look into this) that are causing some sort of side-effect when this happens. Normally fast-safe-stringify will 'reset' the object after it is done stringifying it, but if these objects have custom setters (and a lot of them do, and they are all different from one another), it could be causing fast-safe-stringify to not fully reset these objects and leave in the string '[Circular]' value.

I have not looked into winston too deeply yet. Do you guys use winston with some sort of pretty print functionality?

@lykkin
Copy link
Contributor

lykkin commented Aug 28, 2019

Hey all,

I've pushed up a best guess for a fix to this issue on this branch. It seems like this issue is somewhat problematic to track down and reproduce, so we can at least try testing the potential fix against the environment that is most consistent in (i.e. your environments).

If/when you find some time to do some more testing, try this change out and gimme a +1/-1 on whether it makes a difference.

@bkapus2
Copy link

bkapus2 commented Aug 28, 2019

This should in theory work because it prevents the property from being enumerated in an Object.keys(...). In my troubleshooting I did something similar, but it only led to other properties being overwritten with a '[Circular]'.

I will see if I can reproduce this and report back when I get the chance.

@jgradzki
Copy link

jgradzki commented Sep 4, 2019

Hi, i have same problem and i can confirm 'fast-safe-stringify' is causing this:

function ElasticsearchLoggerAdapter() {
  this.trace = (method, requestUrl, body, responseBody, responseStatus) => {
  logger.debug(stringfyJSON({
      method,
      requestUrl,
      body,
      responseBody,
      responseStatus,
    }));
  };
/opt/agregator/node_modules/newrelic/lib/transaction/trace/segment.js:416
    const start = segment.timer.startedRelativeTo(segment.transaction.trace.root.timer)
                                                                            ^
TypeError: Cannot read property 'root' of undefined
    at TraceSegment.toJSON (/opt/agregator/node_modules/newrelic/lib/transaction/trace/segment.js:416:77)
    at JSON.stringify (<anonymous>)
    at stringify (/opt/agregator/node_modules/fast-safe-stringify/index.js:11:18)
    at ElasticsearchLoggerAdapter.trace (/opt/agregator/src/Modules/Core/Adapters/ElasticsearchLoggerAdapter.js:21:16)
    at HttpConnector.<anonymous> (/opt/agregator/node_modules/elasticsearch/src/lib/connectors/http.js:161:9)
    at IncomingMessage.wrapper (/opt/agregator/node_modules/lodash/lodash.js:4935:19)
    at emitNone (events.js:111:20)
    at IncomingMessage.emit (events.js:208:7)
    at IncomingMessage.wrapped (/opt/agregator/node_modules/newrelic/lib/transaction/tracer/index.js:188:22)
    at IncomingMessage.wrappedResponseEmit (/opt/agregator/node_modules/newrelic/lib/instrumentation/core/http-outbound.js:212:24)

@michaelgoin
Copy link
Member

With version 5.13.0 we made the segment reference we add to the output request non-enumerable to hopefully avoid these issues.

Has anyone upgraded and are they still having issues?

@michaelgoin
Copy link
Member

We haven't heard any more cases of this recently, so I'm going to close this one out.

Feel free to reach out if that is not the case.

Thank you,

Michael

@daviddutch
Copy link

Hi. I'm still having this issue in version 6.2.0

TypeError: this.segment.transaction.isActive is not a function
    at Tracer.getTransaction (D:\Dev\server\node_modules\newrelic\lib\transaction\tracer\index.js:53:76)
    at Agent.getTransaction (D:\Dev\server\node_modules\newrelic\lib\agent.js:807:22)
    at AsyncHook.initHook (D:\Dev\server\node_modules\newrelic\lib\instrumentation\core\async_hooks.js:53:31)
    at emitInitNative (internal/async_hooks.js:137:43)
    at emitInitScript (internal/async_hooks.js:336:3)
    at initAsyncResource (internal/timers.js:50:5)
...

I've got this after changing from Bunyan to Pino logs. Is there any help I can get/provide?
Thanks

@michaelgoin michaelgoin reopened this Jan 10, 2020
@michaelgoin
Copy link
Member

@daviddutch thanks for letting us know. I've reopened the issue.

So far, all the cases we've found of this problem there has been a library deleting/modifying properties on objects we pass around to keep track of state for our instrumentation. Where we can, we are attempting to limit/prevent that.

Are you logging out request information? Or other similar framework objects?

We can do some pinot experimentation but if you are able to narrow down a simple repro at all, since you can reproduce now, that would be super useful. It is often hard to manufacture just the right scenario. Also useful to know if you changed any other module/dependency usages during that time.

@astormnewrelic
Copy link
Contributor

Closing this out due to inactivity. If anyone's still running into this issue and can provide details (or better yet, a reproduction) please don't hesitate to reopen and/or contact support directly.

bizob2828 pushed a commit to bizob2828/node-newrelic that referenced this issue Apr 19, 2024
bizob2828 pushed a commit to bizob2828/node-newrelic that referenced this issue Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants