Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak using python 2.7, zodb 5 and cacheMinimize #203

Closed
azmeuk opened this issue May 14, 2018 · 7 comments
Closed

Possible memory leak using python 2.7, zodb 5 and cacheMinimize #203

azmeuk opened this issue May 14, 2018 · 7 comments

Comments

@azmeuk
Copy link
Member

azmeuk commented May 14, 2018

After upgrading from zodb 4.4.5 to zodb 5.4.0 I experienced an increasing memory consumption in a daemon process I developed. At some point it saturated the whole memory in my server so I tried to understand what was going on, and I managed to reproduce the memory growth with this script:

import transaction
import ZODB
from BTrees.OOBTree import OOBTree

conn = ZODB.DB(None).open()
conn.root()["foobar"] = OOBTree()
transaction.commit()

for i in range(10 ** 7):
    conn.root()["foobar"]
    conn.cacheMinimize()

conn.close()

If you run this script with python 2.7 and zodb 5 the memory consumption of the python process will grow slowly but steadily. The size of the chunks I observed was sometimes 256kB, sometimes 4kB, but I do not really know how to reproduce specifically a growth by 256 or 4kB.

If you comment the line conn.cacheMinimize() or the line conn.root()["foobar"] the memory usage will not grow.

I wrote a proof of concept that generates a memory consumption chart comparison :
Memory usage

I also asked other developers at my company to test how the memory consumption behave on their computers with python 2.7, 3.4, 3.5, 3.6 and zodb 4.4.5 and zodb 5.4.0. The only case that always grows is python 2 and zodb 5. You can see the results here, with the Linux distributions used. We did not test on Mac or Windows.

@jamadden
Copy link
Member

jamadden commented May 14, 2018

I can reproduce the memory growth exactly as described with ZODB 5.4.0 and CPython 2.7.15 on macOS.

Adding a gc.collect() call to the loop stops the memory growth, so it may be a reference cycle. Using a call to objgraph.show_growth() (that has been patched to not call gc.collect) shows one new tuple for each iteration of the loop. The tuple appears to only contain on OID bytestring. The tuple might be referenced by a list.

@jamadden
Copy link
Member

I was able to simplify the case I was observing. The pair conn.root()['foobar'] and conn.cacheMinimize() turns out to basically be equivalent to db_root = conn.root(); conn.setstate(db_root). If we inline as many intermediate uses of conn and conn._reader as we can, we arrive at something like this:

from ZODB._compat import PersistentUnpickler
def load_persistent(arg):
    return None

db_root = conn.root()
for i in range(NB_READS):
    o = conn._storage.load(db_root._p_oid)
    unpickler = PersistentUnpickler(None, load_persistent, BytesIO(o[0]))
    unpickler.load()
    unpickler.load()
    del unpickler
    del o

That still appears to leak a tuple on every iteration. (The tuple and memory growth still goes away when we call gc.collect().) Observing the contents of the leaked tuple, it is the OID of the OOBTree---which suggests that the leak occurs on the BINPERSID pickle opcode (when the root pickle loads the reference to the ghost of the OOBTree).

It can be further simplified and still "leak" a tuple:

import zodbpickle.fastpickle as cPickle

for i in range(NB_READS):
        p3 = '\x80\x02}q\x02U\x04dataq\x03}q\x04U\x06foobarq\x05C\x08\x00\x00\x00\x00\x00\x00\x00\x01cBTrees.OOBTree\nOOBTree\nq\x06\x86Qss.'
        p2 = '\x80\x02}q\x02U\x04dataq\x03}q\x04U\x06foobarq\x05c_codecs\nencode\nq\x06X\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01q\x07U\x06latin1q\x08\x86RcBTrees.OOBTree\nOOBTree\nq\t\x86Qss.'
        unpickler = cPickle.Unpickler(BytesIO(p3))
        unpickler.persistent_load = load_persistent
        unpickler.load()
        del unpickler

Here, p3 is the pickle produced by protocol 3, and p2 is the equivalent pickle produced by protocol 2. Using p3 leaks the tuple; using p2 does not leak. I likewise found that if I used zodbpickle's Python implementation of protocol 3, I did not leak.

Neither protocol version leaks under Python 3.

So it appears that the implementation of BINPERSID in zodbpickle's _pickle_27.c module is buggy for Python 3, although the "leaked" tuple is still found by the GC. I had a look and it's not immediately apparent to me how that's happening...but the code for Python 3 is quite a bit cleaned up and simplified.

@azmeuk can you confirm that adding a call to gc.collect() in your inner loop stops the memory growth?

@azmeuk
Copy link
Member Author

azmeuk commented May 15, 2018

Thank you for taking some time for this issue.

I can confirm this script does not seem to leak anymore with python 2 and zodb 5, when I watch the memory consumption with system monitoring tools, like htop:

import gc, transaction, ZODB
from BTrees.OOBTree import OOBTree

conn = ZODB.DB(None).open()
conn.root()["foobar"] = OOBTree()
transaction.commit()

for i in range(10 ** 7):
    conn.root()["foobar"]
    conn.cacheMinimize()
    gc.collect()

conn.close()

However, with a slightly more complicated script it still seems to leak. For instance, here is the script I used to create a memory chart:

# tox.ini
[tox]
envlist = {py27,py35}-{collect,nocollect}
skipsdist=True

[testenv]
install_command = pip install {packages}
commands = python leak.py
deps =
    psutil
    zodb
# leak.py
import os, psutil, transaction, gc, ZODB
from BTrees.OOBTree import OOBTree

output = "{}.data".format(os.path.basename(os.environ["VIRTUAL_ENV"]))
nocollect = "nocollect" in os.environ["VIRTUAL_ENV"]
process = psutil.Process(os.getpid())

conn = ZODB.DB(None).open()
conn.root()["foobar"] = OOBTree()
transaction.commit()

with open(output, "w") as fd:
    for i in range(10 ** 5):
        conn.root()["foobar"]
        conn.cacheMinimize()
        nocollect or gc.collect()

        fd.write("{} {}\n".format(i, process.memory_full_info().rss))

conn.close()

leak

I don't understand why the first script has a steady memory consumption while if I try to monitor it, it behaves differently. That feels quantic :)

In my application daemon, the gc.collect does not seem effective neither.

@jamadden
Copy link
Member

I don't understand why the first script has a steady memory consumption while if I try to monitor it, it behaves differently. That feels quantic

Indeed. I think it has to do with the resolution of the monitoring tools and the differences between RSS, USS and VMS. For me, the operating system supplied tools did not function at a fine-enough resolution to detect the memory growth which still occurs using gc.collect for the period of time I ran it. If I run my script for longer and/or use the finer resolution I can get from psutil (thanks for the reminder, BTW) the leak is still apparent.

Here's the script that uses just zodbpickle to produce the problem:

from __future__ import print_function
import sys
from io import BytesIO
import gc as gc_
import os

import psutil
import zodbpickle.fastpickle as cPickle


process = psutil.Process(os.getpid())

do_gc = True
p3 = b'\x80\x02}q\x02U\x04dataq\x03}q\x04U\x06foobarq\x05C\x08\x00\x00\x00\x00\x00\x00\x00\x01cBTrees.OOBTree\nOOBTree\nq\x06\x86Qss.'
p2 = b'\x80\x02}q\x02U\x04dataq\x03}q\x04U\x06foobarq\x05c_codecs\nencode\nq\x06X\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01q\x07U\x06latin1q\x08\x86RcBTrees.OOBTree\nOOBTree\nq\t\x86Qss.'

# Assigning to output and do_gc based on environment elided...

def load_persistent(ref):
    return None


with open(output, "w") as fd:
    for i in range(50000):
        unpickler = cPickle.Unpickler(BytesIO(pickle))
        unpickler.persistent_load = load_persistent
        unpickler.load()
        del unpickler
        if do_gc:
            gc_.collect()


        mem_info = process.memory_full_info()
        print(i, mem_info.uss, mem_info.vms, mem_info.pfaults, file=fd)
        if i % 100 == 0:
            print('.', end='', file=sys.stderr)
            sys.stderr.flush()
            fd.flush()

print()

Here's the USS growth. We can see that only Python 2.7 with pickle protocol 3 grows, whether or not we perform garbage collection:

leak

And here's VMS (only Python 2.7 is shown because the base memory usage of 2 and 3 are so different). Again, only the protocol 3 case grows (in a step function, as expected, due to the way the OS and Python internally allocate memory):

leak_vms

objgraph is not able to find any Python-object-level leaks.

@jamadden
Copy link
Member

jamadden commented May 15, 2018

It seems that persistent references have nothing to do with it, it's the C opcode (SHORT_BINBYTES, used for the OID) that leaks under Python 2. We can produce the same graphs using these much simpler pickles (pickle.dumps(b'abc', PROT)):

p3 = b'\x80\x03C\x03abcq\x00.'
p2 = b'\x80\x02c_codecs\nencode\nq\x00X\x03\x00\x00\x00abcq\x01X\x06\x00\x00\x00latin1q\x02\x86q\x03Rq\x04.'

leak

Examining the C code for that pickle opcode, the leak seems pretty clear. If I fix it, we get a much nicer graph with no linear growth:
leak

@jamadden
Copy link
Member

Fixed with the release of zodbpickle 1.0.1. I'll open a PR that specifies that as a minimum version for the next ZODB release.

jamadden added a commit that referenced this issue May 16, 2018
This is required to avoid a memory leak on Python 2.7.

See #203.
@azmeuk
Copy link
Member Author

azmeuk commented May 16, 2018

Thank you!

jimfulton pushed a commit that referenced this issue May 16, 2018
This is required to avoid a memory leak on Python 2.7.

See #203.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants