Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timers: fix unref() memory leak #1152

Closed
wants to merge 2 commits into from
Closed

Conversation

Fishrock123
Copy link
Contributor

Port of nodejs/node-v0.x-archive@0d05123

Fixes: #1151 (Not really. It's complicated. #1152 (comment))

Not sure what to call the test, or even if this test is particularly good. (or if it even needs a test)

R=@trevnorris

@Fishrock123 Fishrock123 added the timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout. label Mar 14, 2015
@@ -0,0 +1,14 @@
var assert = require('assert')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the original test didn't have semicolons, but I think that we should add them to conform to style, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, haha yeah. I'm used to not really caring about semicolons. Will fix.

@Fishrock123
Copy link
Contributor Author

@trevnorris I think the original test here tested something different, I'll add that one soon too.

Any suggestions on the first test? I'm not super confident about it relying on the require('http') (or similar modules that cause it) sort of thing.

@trevnorris
Copy link
Contributor

@Fishrock123 Not really. Seems you got that test from #1151, which is unfortunate because the test is a race condition. But doing the best we can here. I'd stick with the test you have and add the additional changes from the aforementioned additional patch in Node.

@Fishrock123
Copy link
Contributor Author

Problem: the original test doesn't pass without unrefdHandle(), but it does pass with

function unrefdHandle() {
  this.owner._onTimeout();
  // if (!this.owner._repeat)
  //   this.owner.close();
}

@trevnorris
Copy link
Contributor

Original test, as in the one that uses 'http'?

@Fishrock123
Copy link
Contributor Author

Correct, regress-GH-io-1151. The new unrefd-interval-still-fires that Julian wrote only fails between the original commit and the patch.

@trevnorris
Copy link
Contributor

@Fishrock123 Can you go ahead and update this PR with the additional patch and I'll take a look at the failures.

@Fishrock123
Copy link
Contributor Author

@trevnorris updated with original commits and the regress-1151 test, which fails under the conditions I mentioned above.

@trevnorris
Copy link
Contributor

@Fishrock123 I can't get test-regress-GH-io-1151.js to fail. Here's a one-liner I used to test:

$ for i in `seq 1000`; do ./iojs test/parallel/test-regress-GH-io-1151.js; if [ $? -ne 0 ]; then break; fi; done

@Fishrock123
Copy link
Contributor Author

@trevnorris Without 9ab35aa it fails instantly for me. OS X, if that matters.

@trevnorris
Copy link
Contributor

@Fishrock123 Sorry, I'm missing the issue. This PR as it stands at the moment passes all tests. Why do we care if a test fails without one of the patches?

@Fishrock123
Copy link
Contributor Author

Because it also passes with these lines commented out, which seems to be worrisome?:

function unrefdHandle() {
  this.owner._onTimeout();
  // if (!this.owner._repeat)
  //   this.owner.close();
}

@trevnorris
Copy link
Contributor

@Fishrock123 That conditional will only fire on setTimeout, and the test in question only uses a setInterval. So it makes sense it wouldn't affect it.

@Fishrock123
Copy link
Contributor Author

Sure, but that still doesn't answer why this patch fixes the regress-1151 test:

diff --git a/lib/timers.js b/lib/timers.js
index e5c0c9e..5af6df3 100644
--- a/lib/timers.js
+++ b/lib/timers.js
@@ -313,6 +313,14 @@ const Timeout = function(after) {
   this._repeat = false;
 };

+
+function unrefdHandle() {
+  this.owner._onTimeout();
+  // without the close()
+}
+
+
 Timeout.prototype.unref = function() {
   if (this._handle) {
     this._handle.unref();
@@ -323,7 +331,8 @@ Timeout.prototype.unref = function() {
     if (delay < 0) delay = 0;
     exports.unenroll(this);
     this._handle = new Timer();
-    this._handle[kOnTimeout] = this._onTimeout;
+    this._handle.owner = this;
+    this._handle[kOnTimeout] = unrefdHandle;
     this._handle.start(delay, 0);
     this._handle.domain = this.domain;
     this._handle.unref();

@trevnorris
Copy link
Contributor

It's because _onTimeout() for a setInterval() uses a wrapper() callback that uses callback.call(this). So if the execution context is incorrect then it can't properly be cleaned up. Before the patch the call was equivalent to _onTimeout.call(timer._handle), instead of _onTimeout.call(timer).

That's still confusing. Let me know if you have any additional questions.

@trevnorris
Copy link
Contributor

@Fishrock123 LGTM. Merge yourself, or I can merge. Just let me know.

@Fishrock123
Copy link
Contributor Author

I can merge myself. I'll try looking at it again tomorrow. Mostly I'm just not happy with the state of testing what is going on.

@thlorenz
Copy link
Contributor

Started investigating if this fixes all problems. I tested on OSX at this point.
Instead it seems to make them just less likely. When using the debug build I still get the following about 90% of the time:

➝  ./iojs_g test-regress-GH-io-1151.js

assert.js:88
  throw new assert.AssertionError({
        ^
AssertionError: [ { '0': [Function: listOnTimeout],
    _idleNext: [Circular],
    _idlePrev: [Circular],
    msecs: 10 } ] deepStrictEqual []
    at process.<anonymous> (/Volumes/d/dev/js/io.js/test-regress-GH-io-1151.js:13:10)
    at emitOne (events.js:77:13)
    at process.emit (events.js:166:7)
    at Immediate.process.exit (node.js:681:17)
    at processImmediate [as _immediateCallback] (timers.js:366:17)

In Release mode that test runs fine.

Fo the other test it is worse (getting Segmentation 11 fault in Release mode), Debug mode shows the problem:

➝  ./iojs_g test-timers-unrefd-interval-still-fires.js
Assertion failed: (cb_v->IsFunction()), function MakeCallback, file ../../src/async-wrap-inl.h, line 85.
Abort trap: 6

To be clear I rebased this on the latest v1.x work:

➝  git remote -v
origin  [email protected]:thlorenz/io.js.git (fetch)
origin  [email protected]:thlorenz/io.js.git (push)
upstream        [email protected]:iojs/io.js.git (fetch)
upstream        [email protected]:iojs/io.js.git (push)

➝  git lg
* b2df34a Jeremiah Senkpiel (HEAD, origin/timer-fix, timer-fix) test: add test-regress-GH-io-1151 (15 minutes ago)
* 2977fdf Julien Gilli timers: don't close interval timers when unrefd (15 minutes ago)
* 0875e3f Trevor Norris timers: fix unref() memory leak (15 minutes ago)
* 8a94581 Brian White (upstream/v1.x, origin/v1.x, origin/HEAD, v1.x) string_decoder: optimize write() (11 hours ago)
* 3d46fef Bert Belder win,node-gyp: allow node.exe/iojs.exe to be renamed (11 hours ago)
* c9207f7 Olov Lassus fs: fix corruption in writeFile and writeFileSync (14 hours ago)
* 4e9bf93 Roman Reiss doc: clarify real name requirement (17 hours ago)
* e84dd5f Roman Reiss doc: document repl on-demand module loading (18 hours ago)
* 1832743 Mayhem lib: add missing `new` for errors lib/*.js (20 hours ago)

When running the latter test from v1.x things are fine:

➝  git br
  timer-fix  b2df34a test: add test-regress-GH-io-1151
* v1.x       8a94581 string_decoder: optimize write()

/Volumes/d/dev/js/io.js
➝  ./iojs_g test-timers-unrefd-interval-still-fires.js

/Volumes/d/dev/js/io.js
➝  ./iojs test-timers-unrefd-interval-still-fires.js

So clearly that problem is introduced with this change.

The regress test fails as expected on the v1.x branch.

Additionally the issue @brycebaril logged is not fixed either.

Going to investigate on Linux as well, but at this point we shouldn't merge this IMHO.

@thlorenz
Copy link
Contributor

Ok, tried this on an Arch Linux box with similar results:

With this Fix

➝  uname -a
Linux localhost 3.16.1-1-ARCH #1 SMP PREEMPT Thu Aug 14 07:40:19 CEST 2014 x86_64 GNU/Linux

arch:~/dev/js/io.js (timer-fix %)
➝  git lg | head
* b2df34a Jeremiah Senkpiel (HEAD, origin/timer-fix, timer-fix) test: add test-regress-GH-io-1151 (63 minutes ago)
* 2977fdf Julien Gilli timers: don't close interval timers when unrefd (63 minutes ago)
* 0875e3f Trevor Norris timers: fix unref() memory leak (63 minutes ago)
* 8a94581 Brian White (origin/v1.x, origin/HEAD, v1.x) string_decoder: optimize write() (12 hours ago)
* 3d46fef Bert Belder win,node-gyp: allow node.exe/iojs.exe to be renamed (12 hours ago)
* c9207f7 Olov Lassus fs: fix corruption in writeFile and writeFileSync (15 hours ago)
* 4e9bf93 Roman Reiss doc: clarify real name requirement (18 hours ago)
* e84dd5f Roman Reiss doc: document repl on-demand module loading (19 hours ago)
* 1832743 Mayhem lib: add missing `new` for errors lib/*.js (21 hours ago)
* 7dd5e82 Alex Yursha assert: simplify logic of testing buffer equality (2 days ago)

arch:~/dev/js/io.js (timer-fix %)
➝  ./iojs test-timers-unrefd-interval-still-fires.js
Segmentation fault (core dumped)

arch:~/dev/js/io.js (timer-fix %)
➝  ./iojs_g test-timers-unrefd-interval-still-fires.js
iojs_g: ../../src/async-wrap-inl.h:85: v8::Handle<v8::Value> node::AsyncWrap::MakeCallback(uint32_t, int, v8::Handle<v8::Value> *): Assertion `cb_v->IsFunction()' failed.
Aborted (core dumped)

arch:~/dev/js/io.js (timer-fix %)
➝  ./iojs test-regress-GH-io-1151.js

arch:~/dev/js/io.js (timer-fix %)
➝  ./iojs_g test-regress-GH-io-1151.js

Without this Fix

arch:~/dev/js/io.js (v1.x %)
➝  git lg | head
* 8a94581 Brian White (HEAD, origin/v1.x, origin/HEAD, v1.x) string_decoder: optimize write() (12 hours ago)
* 3d46fef Bert Belder win,node-gyp: allow node.exe/iojs.exe to be renamed (12 hours ago)
* c9207f7 Olov Lassus fs: fix corruption in writeFile and writeFileSync (15 hours ago)
* 4e9bf93 Roman Reiss doc: clarify real name requirement (18 hours ago)
* e84dd5f Roman Reiss doc: document repl on-demand module loading (19 hours ago)
* 1832743 Mayhem lib: add missing `new` for errors lib/*.js (21 hours ago)
* 7dd5e82 Alex Yursha assert: simplify logic of testing buffer equality (2 days ago)
* 269e46b cjihrig deps: make node-gyp work with io.js (2 days ago)
* b542fb9 Forrest L Norvell deps: upgrade npm to 2.7.3 (2 days ago)
* 2e5b87a Ben Noordhuis src: remove unnecessary environment lookups (2 days ago)

arch:~/dev/js/io.js (v1.x %)
➝  ./iojs test-timers-unrefd-interval-still-fires.js

arch:~/dev/js/io.js (v1.x %)
➝  ./iojs_g test-timers-unrefd-interval-still-fires.js

arch:~/dev/js/io.js (v1.x %)
➝  ./iojs test-regress-GH-io-1151.js

assert.js:88
  throw new assert.AssertionError({
        ^
AssertionError: [ { '0': [Function: listOnTimeout],
    _idleNext:
     { _idleTimeout: 10,
       _idlePrev: [Circular],
       _idleNext: [Ci deepStrictEqual []
    at process.<anonymous> (/home/thlorenz/dev/js/io.js/test-regress-GH-io-1151.js:13:10)
    at emitOne (events.js:77:13)
    at process.emit (events.js:166:7)
    at Immediate.process.exit (node.js:681:17)
    at processImmediate [as _immediateCallback] (timers.js:357:17)

arch:~/dev/js/io.js (v1.x %)
➝  ./iojs_g test-regress-GH-io-1151.js

assert.js:88
  throw new assert.AssertionError({
        ^
AssertionError: [ { '0': [Function: listOnTimeout],
    _idleNext:
     { _idleTimeout: 10,
       _idlePrev: [Circular],
       _idleNext: [Ci deepStrictEqual []
    at process.<anonymous> (/home/thlorenz/dev/js/io.js/test-regress-GH-io-1151.js:13:10)
    at emitOne (events.js:77:13)
    at process.emit (events.js:166:7)
    at Immediate.process.exit (node.js:681:17)
    at processImmediate [as _immediateCallback] (timers.js:357:17)

So basically the test-regress-GH-io-1151.js seemingly gets fixed(I ran this in a tight loop for a 2 minutes and saw it fail twice).
So also here not fixed, just much less likely to happen.
The test-timers-unrefd-interval-still-fires.js starts segfaulting.

@Fishrock123
Copy link
Contributor Author

The segfault must be related to a more recent change in v1.x -- test-timers-unrefd-interval-still-fires.jsis in joyent/node, presumably without issue, and It was passing fine on the original base commit.

@thlorenz can you verify this by basing the commits on 056ed4b? Perhaps we need to bisect a bug somewhere. I'll try rebasing to v1.x and see if I get similar results.

@thlorenz
Copy link
Contributor

@Fishrock123 I'll do that, but this isn't happening in v1.x ever, so isn't that enough indication that these changes here introduce the problem?

@thlorenz
Copy link
Contributor

So I rebased on 056ed4b as @Fishrock123 suggested and get no more segfaults.

OSX

Regress test still failing (only in debug mode):

➝  while :; do ./iojs_g test-regress-GH-io-1151.js; echo "." ;done
.
.
.

assert.js:87
  throw new assert.AssertionError({
        ^
AssertionError: [ { '0': [Function: listOnTimeout],
    _idleNext: [Circular],
    _idlePrev: [Circular],
    msecs: 10 } ] deepStrictEqual []
    at process.<anonymous> (/Volumes/d/dev/js/io.js/test-regress-GH-io-1151.js:13:10)
    at emitOne (events.js:77:13)
    at process.emit (events.js:166:7)
    at Immediate.process.exit (node.js:661:17)
    at processImmediate [as _immediateCallback] (timers.js:378:17)
.

assert.js:87
  throw new assert.AssertionError({
        ^
AssertionError: [ { '0': [Function: listOnTimeout],
    _idleNext: [Circular],
    _idlePrev: [Circular],
    msecs: 10 } ] deepStrictEqual []
    at process.<anonymous> (/Volumes/d/dev/js/io.js/test-regress-GH-io-1151.js:13:10)
    at emitOne (events.js:77:13)
    at process.emit (events.js:166:7)
    at Immediate.process.exit (node.js:661:17)
    at processImmediate [as _immediateCallback] (timers.js:378:17)

Linux

All seems fixed, I can't repro any issues at this point.


So looks like something got introduced in the meantime that doesn't jive with this fix to cause it to segfault :(

@Fishrock123
Copy link
Contributor Author

currently investigating pre/post 31da975

Fishrock123 referenced this pull request Mar 25, 2015
The common case is where setInterval() is called with two arguments,
the callback and the timeout.  Specifying optional arguments in
the parameter list forces common case calls to go through an arguments
adaptor stack frame.

PR-URL: #1221
Reviewed-By: Trevor Norris <[email protected]>
@brycebaril
Copy link
Contributor

@Fishrock123 Here's a new test that never completes without this patch, but still fails with this patch due to being fired multiple times:

var assert = require("assert")
var intervals = 0
var i = setInterval(function () {
  intervals++
  i.unref()
  eatTime()
}, 10)

function eatTime() {
  // the goal of this function is to take longer than 10ms, i.e. longer than the interval
  var count = 0
  while (count++ < 1e7) {
    Math.random()
  }
}

process.on("exit", function () {
  assert.equal(intervals, 1)
})

FYI of [io.js, 0.12, 0.10] this only works "correctly" with 0.10

@silverwind
Copy link
Contributor

@brycebaril #1231 should definitely fix these double callbacks. It is a bit of a bandaid fix right now but I think it works good enough to include now and reasearch why exactly it double fires later.

@brycebaril
Copy link
Contributor

@silverwind from manual patch application #1231 does not seem to be related to this one (and it didn't fix for me). @trevnorris did a git bisect to find the offending patch and it looks like nodejs/node-v0.x-archive@a2eeb43

@silverwind
Copy link
Contributor

Strange it doesn't fix it, will try to investigate this.

@Fishrock123
Copy link
Contributor Author

This patch likely broken in it's current state as exposed by 33fea6e#commitcomment-10436644.

I'll update if we find a way to fix it, but it may be better to just wait on @bnoordhuis's timer re-write.

thlorenz referenced this pull request in nodejs/node-v0.x-archive Mar 30, 2015
Unlike the 'exit' event, this event allows the user to schedule more
work and thereby postpone the exit.  That also means that the
'beforeExit' event may be emitted many times, see the attached test
case for an example.

Refs #6305.
@Olegas
Copy link
Contributor

Olegas commented Apr 2, 2015

Got yet another sample (io.js v1.6.2)

// To collect heap
require('heapdump');

var o = [];

function setup() {

   // to generate some amout of data
   var randomData = require('crypto').randomBytes(1024).toString('hex');

   var to = setTimeout(function(){
        // uncomment next line to prevent leak
        // clearTimeout( to );

        // to keep refenreces to big data chunk inside timer's scope
        o[0] = randomData;

        // monitoring
        console.log(process.memoryUsage().heapUsed / (1024 * 1024));

        // plan next execution
        setup();
   }, 10);

   // without unref() there are no leak
   to.unref();
}

// begin
setup();

// just to keep prcess running
setInterval(function(){}, 10000);

Here is the screenshot of heapdump data
timers-leak

@Fishrock123
Copy link
Contributor Author

Re: 33fea6e#commitcomment-10440550

Reason for calling close() is to make sure the destructor is called. Ah suck, though now thinking about it if there are multiple JS timers linked to the same C++ TimerWrap class then all hell could break loose. So .close() should only be run if the calling timer is the last timer attached to that class instance.

@trevnorris is there a way to check if the timer is the last one attached?

Looks like this affecting #1075...

@indutny
Copy link
Member

indutny commented Apr 2, 2015

Hooray!

@Fishrock123 Fishrock123 added the confirmed-bug Issues with confirmed bugs. label Apr 2, 2015
@indutny
Copy link
Member

indutny commented Apr 2, 2015

@Fishrock123

Please add following patch here:

commit b926a62c719677d091019164a895a2f93feea682
Author: Fedor Indutny <[email protected]>
Date:   Fri Apr 3 01:14:08 2015 +0300

    timers: do not touch timer handle after closing it

diff --git a/src/timer_wrap.cc b/src/timer_wrap.cc
index d71213f..8002f75 100644
--- a/src/timer_wrap.cc
+++ b/src/timer_wrap.cc
@@ -73,6 +73,9 @@ class TimerWrap : public HandleWrap {
   static void Start(const FunctionCallbackInfo<Value>& args) {
     TimerWrap* wrap = Unwrap<TimerWrap>(args.Holder());

+    if (!HandleWrap::IsAlive(wrap))
+      return args.GetReturnValue().Set(UV_EINVAL);
+
     int64_t timeout = args[0]->IntegerValue();
     int64_t repeat = args[1]->IntegerValue();
     int err = uv_timer_start(&wrap->handle_, OnTimeout, timeout, repeat);
@@ -82,6 +85,9 @@ class TimerWrap : public HandleWrap {
   static void Stop(const FunctionCallbackInfo<Value>& args) {
     TimerWrap* wrap = Unwrap<TimerWrap>(args.Holder());

+    if (!HandleWrap::IsAlive(wrap))
+      return args.GetReturnValue().Set(UV_EINVAL);
+
     int err = uv_timer_stop(&wrap->handle_);
     args.GetReturnValue().Set(err);
   }
@@ -89,6 +95,9 @@ class TimerWrap : public HandleWrap {
   static void Again(const FunctionCallbackInfo<Value>& args) {
     TimerWrap* wrap = Unwrap<TimerWrap>(args.Holder());

+    if (!HandleWrap::IsAlive(wrap))
+      return args.GetReturnValue().Set(UV_EINVAL);
+
     int err = uv_timer_again(&wrap->handle_);
     args.GetReturnValue().Set(err);
   }
@@ -96,6 +105,9 @@ class TimerWrap : public HandleWrap {
   static void SetRepeat(const FunctionCallbackInfo<Value>& args) {
     TimerWrap* wrap = Unwrap<TimerWrap>(args.Holder());

+    if (!HandleWrap::IsAlive(wrap))
+      return args.GetReturnValue().Set(UV_EINVAL);
+
     int64_t repeat = args[0]->IntegerValue();
     uv_timer_set_repeat(&wrap->handle_, repeat);
     args.GetReturnValue().Set(0);
@@ -104,6 +116,9 @@ class TimerWrap : public HandleWrap {
   static void GetRepeat(const FunctionCallbackInfo<Value>& args) {
     TimerWrap* wrap = Unwrap<TimerWrap>(args.Holder());

+    if (!HandleWrap::IsAlive(wrap))
+      return args.GetReturnValue().Set(0);
+
     int64_t repeat = uv_timer_get_repeat(&wrap->handle_);
     if (repeat <= 0xfffffff)
       args.GetReturnValue().Set(static_cast<uint32_t>(repeat));

It appears to be fixing all crashes.

trevnorris and others added 2 commits April 2, 2015 18:20
The destructor isn't being called for timers that have been unref'd.

Fixes: nodejs/node-v0.x-archive#8364
This change fixes a regression introduced by commit
0d05123, which contained a typo that
would cause every unrefd interval to fire only once.

Fixes: nodejs/node-v0.x-archive#8900
Reviewed-By: Timothy J Fontaine <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-by: Trevor Norris <[email protected]>
@Fishrock123
Copy link
Contributor Author

@indutny am I doing something wrong? that doesn't seem to want to apply.

Jeremiahs-MacBook-Pro:io.js Jeremiah$ pbpaste | git am --whitespace=fix
Patch does not have a valid e-mail address.
Jeremiahs-MacBook-Pro:io.js Jeremiah$ git am --abort
Jeremiahs-MacBook-Pro:io.js Jeremiah$ pbpaste | git apply
fatal: corrupt patch at line 60

Edit: line 54 should have been 60

@indutny
Copy link
Member

indutny commented Apr 2, 2015

@Fishrock123 I'll push it to new PR.

@Fishrock123
Copy link
Contributor Author

@indutny's patch fixes the memory corruption as had surfaced in the tests. Moving to his PR.

@Fishrock123 Fishrock123 closed this Apr 2, 2015
@Fishrock123 Fishrock123 deleted the fix-1151 branch April 4, 2015 00:19
@Fishrock123
Copy link
Contributor Author

Fixed by #1330

@ChALkeR ChALkeR added the memory Issues and PRs related to the memory management or memory footprint. label Aug 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. memory Issues and PRs related to the memory management or memory footprint. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants