Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providers memory leak #2750

Closed
edt-xx opened this issue May 21, 2016 · 11 comments
Closed

Providers memory leak #2750

edt-xx opened this issue May 21, 2016 · 11 comments
Labels
kind/bug A bug in existing code (including security flaws) status/in-progress In progress topic/bitswap Topic bitswap topic/perf Performance

Comments

@edt-xx
Copy link

edt-xx commented May 21, 2016

go-ipfs compiled from git on the 17th (package says 0.4.2.r2.gabeea4f-1) running 4.5 days with 98 connections (great) but is using 3110m, as reported by top, which seems a bit extreme.

16098 ed 20 0 3122.9m 846.7m 0.7 5.3 88:13.21 S ipfs

Ran the commands from:
https://github.com/ipfs/go-ipfs/blob/master/debug-guide.md
and packaged the files (called the heap dump ipfs.heap so as not to overwrite the ipfs.stack file.

Hope this helps
Ed

ipfs.3g.tar.gz

@edt-xx
Copy link
Author

edt-xx commented May 22, 2016

About 23h later:

16098 ed 20 0 3420.7m 1.097g 2.6 7.0 118:41.41 S ipfs

ipfs_3g_23.tar.gz

hopefully serial debug profiles help. Included:

curl localhost:5001/debug/metrics/prometheus > ipfs.prom

Ed

@Kubuxu
Copy link
Member

Kubuxu commented May 22, 2016

It looks like there is goroutine leak. There are over 600 active goroutines. Some of them are in IO Wait for almost 100 hours:

goroutine 5140122 [IO wait, 5692 minutes]:
net.runtime_pollWait(0x7f89df7315f8, 0x72, 0xc8252b424c)
    /usr/lib/go/src/runtime/netpoll.go:160 +0x60
net.(*pollDesc).Wait(0xc823812060, 0x72, 0x0, 0x0)
    /usr/lib/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc823812060, 0x0, 0x0)
    /usr/lib/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0xc823812000, 0xc8252b424c, 0x4, 0x4, 0x0, 0x7f89df77b050, 0xc820010220)
    /usr/lib/go/src/net/fd_unix.go:250 +0x23a
net.(*conn).Read(0xc82012b608, 0xc8252b424c, 0x4, 0x4, 0x0, 0x0, 0x0)
    /usr/lib/go/src/net/net.go:172 +0xe4
gx/ipfs/QmYp8PC6b9M3UY4awdHkdRUim68KpGSmRmz27bANHteen6/go-multiaddr-net.(*maConn).Read(0xc82efbc810, 0xc8252b424c, 0x4, 0x4, 0x41da31, 0x0, 0x0)
    <autogenerated>:3 +0x82
gx/ipfs/QmboVACz6WES3byh3q8jdYPWL9TMz7CT89V34WPtruuRoS/go-libp2p-transport.(*connWrap).Read(0xc82383cf00, 0xc8252b424c, 0x4, 0x4, 0x15f9590, 0x0, 0x0)
    <autogenerated>:24 +0x82
gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/metrics/conn.(*MeteredConn).Read(0xc82383d040, 0xc8252b424c, 0x4, 0x4, 0x0, 0x0, 0x0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/metrics/conn/conn.go:28 +0x75
gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/net/conn.(*singleConn).Read(0xc82d5f2180, 0xc8252b424c, 0x4, 0x4, 0xc8252b43c0, 0x0, 0x0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/net/conn/conn.go:123 +0x6b
io.ReadAtLeast(0x7f89df78ffb8, 0xc82d5f2180, 0xc8252b424c, 0x4, 0x4, 0x4, 0x0, 0x0, 0x0)
    /usr/lib/go/src/io/io.go:297 +0xe6
io.ReadFull(0x7f89df78ffb8, 0xc82d5f2180, 0xc8252b424c, 0x4, 0x4, 0x0, 0x0, 0x0)
    /usr/lib/go/src/io/io.go:315 +0x62
gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio.ReadLen(0x7f89df78ffb8, 0xc82d5f2180, 0xc8252b424c, 0x4, 0x4, 0xac5a61, 0x0, 0x0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio/num.go:27 +0xb7
gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio.(*reader).nextMsgLen(0xc8237cc0a0, 0x7f89df790008, 0x0, 0x0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio/msgio.go:163 +0x66
gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio.(*reader).ReadMsg(0xc8237cc0a0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio/msgio.go:195 +0xdb
gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio.(*readWriter).ReadMsg(0xc8251367a0, 0x0, 0x0, 0x0, 0x0, 0x0)
    <autogenerated>:17 +0x70
gx/ipfs/QmbuSYB51KtX8izgab1fSyBgsyMH2hTuCfBtXm77PeXWca/go-libp2p-secio.readMsgCtx.func1(0xc825136c60, 0x7f89df7903f0, 0xc8251367a0, 0xc82e3aa420, 0x7f89df78f9d8, 0xc82e3aa2a0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmbuSYB51KtX8izgab1fSyBgsyMH2hTuCfBtXm77PeXWca/go-libp2p-secio/rw.go:280 +0x48
created by gx/ipfs/QmbuSYB51KtX8izgab1fSyBgsyMH2hTuCfBtXm77PeXWca/go-libp2p-secio.readMsgCtx
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmbuSYB51KtX8izgab1fSyBgsyMH2hTuCfBtXm77PeXWca/go-libp2p-secio/rw.go:285 +0xe6

There are also lots of goroutines running in yamux.

Here is full goroutine stack dump from this report: https://gist.github.com/Kubuxu/da3c88d618fc534198249ad833ad79d9

@Kubuxu
Copy link
Member

Kubuxu commented May 22, 2016

It weren't goroutines

Goroutines at the dump time position frequency list:

    159 net.runtime_pollWait(
    106 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).send(
    106 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).keepalive(
    106 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).AcceptStream(
    106 github.com/ipfs/go-ipfs/exchange/bitswap.(*msgQueue).runQueue(
      8 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess/context.CloseAfterContext.func1(
      8 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).taskWorker(
      6 sync.runtime_Semacquire(
      4 sync.runtime_Syncsemacquire(
      3 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/jbenet/go-context/io.(*ctxReader).Read(
      2 net/http.(*persistConn).writeLoop(
      2 main.merge.func1(
      2 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Stream).Read(
      2 gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/net/swarm.(*Swarm).addConnListener.func2(
      2 gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/net/conn.(*listener).Accept(
      2 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess/periodic.callOnTicker.func1(
      2 github.com/ipfs/go-ipfs/core/corehttp.Serve(
      1 runtime/pprof.writeGoroutineStacks(
      1 runtime.gopark(
      1 runtime.goexit(
      1 os/signal.signal_recv(
      1 main.(*IntrHandler).Handle.func1(
      1 main.daemonFunc.func1(
      1 main.daemonFunc(
      1 gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/discovery.(*mdnsService).pollForEntries(
      1 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess.(*process).CloseAfterChildren(
      1 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess/periodic.Every.func1(
      1 gx/ipfs/QmeYJHEk8UjVVZ4XCRTZe6dFQrb8pGWD81LYCgeLp8CvMB/go-metrics.(*meterArbiter).tick(
      1 gx/ipfs/QmduCCgTaLnxwwf9RFQy2PMUytrKcEH9msohtVxSBZUdgu/go-peerstream.(*Swarm).connGarbageCollect(
      1 gx/ipfs/QmdhsRK1EK2fvAz2i2SH5DEfkL6seDuyMYEsxKa9Braim3/client_golang/prometheus.computeApproximateRequestSize(
      1 gx/ipfs/QmbyvM8zRFDkbFdYyt1MnevUMJ62SiSGbfDFZ3Z8nkrzr4/go-libp2p-peer/queue.(*ChanQueue).process.func1(
      1 gx/ipfs/Qmazh5oNUVsDZTs2g59rq8aYQqwpss8tcUWQzor5sCCEuH/go-log.(*MirrorWriter).logRoutine(
      1 gx/ipfs/QmaDNZ4QMdBdku1YZWBysufYyoQt1negQGNav6PLYarbY8/go-log.(*MirrorWriter).logRoutine(
      1 github.com/ipfs/go-ipfs/routing/dht.(*ProviderManager).run(
      1 github.com/ipfs/go-ipfs/routing/dht.(*IpfsDHT).Provide(
      1 github.com/ipfs/go-ipfs/routing/dht.(*IpfsDHT).Bootstrap.func1(
      1 github.com/ipfs/go-ipfs/routing/dht.(*dhtQueryRunner).spawnWorkers(
      1 github.com/ipfs/go-ipfs/routing/dht.(*dhtQueryRunner).Run(
      1 github.com/ipfs/go-ipfs/namesys/republisher.(*Republisher).Run(
      1 github.com/ipfs/go-ipfs/mfs.(*Republisher).Run(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).drain(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb.(*DB).tCompaction(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb.(*DB).mpoolDrain(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb.(*DB).mCompaction(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb.(*DB).jWriter(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb.(*DB).compactionError(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/ipfs/go-datastore/query.ResultsWithChan.func1(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/ipfs/go-datastore/namespace.(*datastore).Query.func1(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/ipfs/go-datastore/keytransform.(*ktds).Query.func1(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/ipfs/go-datastore/flatfs.(*Datastore).Query.func1.1(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/codahale/metrics.init.1.func2(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/briantigerchow/pubsub.(*PubSub).start(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.(*WantManager).Run(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.New.func2(
      1 github.com/ipfs/go-ipfs/exchange/bitswap/decision.(*Engine).nextEnvelope(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).rebroadcastWorker(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).provideWorker(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).providerQueryManager(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).provideCollector(
      1 github.com/ipfs/go-ipfs/blocks/blockstore.(*blockstore).AllKeysChan.func2(

Generated with:

grep -e '^goroutine' ipfs.stacks -A1 | grep -v '^goroutine' | grep -v -- '--' | sort| sed 's/[^\(]*$//' | uniq -c | sort -bg | tac

Looks like the yamux issue isn't really fixed. Now it doesn't hang the daemon but keeps accumulating goroutines.

@Kubuxu
Copy link
Member

Kubuxu commented May 22, 2016

Can you try using this ipfs version: https://ipfs.io/ipfs/QmUj8XdoQHfEe2MwBBnCP7HsXGFAWZHWiuLYA2ynBvZyoj/ipfs It includes newest patched version of yamux in libp2p which might not be leaking goroutines.

@edt-xx
Copy link
Author

edt-xx commented May 22, 2016

Okay trying the above binary.

@Kubuxu Kubuxu changed the title High memory usage in 0.4.2 Goroutine leak in yamux (0.4.2) May 23, 2016
@Kubuxu
Copy link
Member

Kubuxu commented May 23, 2016

I was wrong. The yamux isn't leaking. The memory leak has to be somewhere lease IMO.

Looks like there are 3 yamux + 1 IO goroutine per connection.

@Kubuxu
Copy link
Member

Kubuxu commented May 23, 2016

There is 400MB of data on heap. It is still too much but it is not 3G the htop was reporting. Are you sure that you looked on RES (resident memory) not virtual?

@Kubuxu Kubuxu changed the title Goroutine leak in yamux (0.4.2) Possible memory leak (0.4.2) May 23, 2016
@Kubuxu
Copy link
Member

Kubuxu commented May 23, 2016

A lot of data is provider dht data, full graph here

@edt-xx
Copy link
Author

edt-xx commented May 24, 2016

From the top line posted in the second snapshot the RESident memory was 1.097g.

@Kubuxu Kubuxu added the kind/bug A bug in existing code (including security flaws) label May 25, 2016
@whyrusleeping
Copy link
Member

Yeah, this looks like a "leak" on the providers manager. This is something i'm starting to work on fixing

@ghost ghost added topic/bitswap Topic bitswap topic/perf Performance labels May 29, 2016
@ghost ghost changed the title Possible memory leak (0.4.2) Providers memory leak May 29, 2016
@whyrusleeping whyrusleeping added this to the Ipfs 0.4.3 milestone Jun 21, 2016
@Kubuxu Kubuxu added the status/in-progress In progress label Jun 24, 2016
@whyrusleeping
Copy link
Member

This should be resolved now, #2860 has merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) status/in-progress In progress topic/bitswap Topic bitswap topic/perf Performance
Projects
None yet
Development

No branches or pull requests

3 participants