Possible memory leak using python 2.7, zodb 5 and cacheMinimize #203

azmeuk · 2018-05-14T12:20:16Z

After upgrading from zodb 4.4.5 to zodb 5.4.0 I experienced an increasing memory consumption in a daemon process I developed. At some point it saturated the whole memory in my server so I tried to understand what was going on, and I managed to reproduce the memory growth with this script:

import transaction
import ZODB
from BTrees.OOBTree import OOBTree

conn = ZODB.DB(None).open()
conn.root()["foobar"] = OOBTree()
transaction.commit()

for i in range(10 ** 7):
    conn.root()["foobar"]
    conn.cacheMinimize()

conn.close()

If you run this script with python 2.7 and zodb 5 the memory consumption of the python process will grow slowly but steadily. The size of the chunks I observed was sometimes 256kB, sometimes 4kB, but I do not really know how to reproduce specifically a growth by 256 or 4kB.

If you comment the line conn.cacheMinimize() or the line conn.root()["foobar"] the memory usage will not grow.

I wrote a proof of concept that generates a memory consumption chart comparison :

I also asked other developers at my company to test how the memory consumption behave on their computers with python 2.7, 3.4, 3.5, 3.6 and zodb 4.4.5 and zodb 5.4.0. The only case that always grows is python 2 and zodb 5. You can see the results here, with the Linux distributions used. We did not test on Mac or Windows.

The text was updated successfully, but these errors were encountered:

jamadden · 2018-05-14T12:33:15Z

I can reproduce the memory growth exactly as described with ZODB 5.4.0 and CPython 2.7.15 on macOS.

Adding a gc.collect() call to the loop stops the memory growth, so it may be a reference cycle. Using a call to objgraph.show_growth() (that has been patched to not call gc.collect) shows one new tuple for each iteration of the loop. The tuple appears to only contain on OID bytestring. The tuple might be referenced by a list.

jamadden · 2018-05-14T18:38:38Z

I was able to simplify the case I was observing. The pair conn.root()['foobar'] and conn.cacheMinimize() turns out to basically be equivalent to db_root = conn.root(); conn.setstate(db_root). If we inline as many intermediate uses of conn and conn._reader as we can, we arrive at something like this:

from ZODB._compat import PersistentUnpickler
def load_persistent(arg):
    return None

db_root = conn.root()
for i in range(NB_READS):
    o = conn._storage.load(db_root._p_oid)
    unpickler = PersistentUnpickler(None, load_persistent, BytesIO(o[0]))
    unpickler.load()
    unpickler.load()
    del unpickler
    del o

That still appears to leak a tuple on every iteration. (The tuple and memory growth still goes away when we call gc.collect().) Observing the contents of the leaked tuple, it is the OID of the OOBTree---which suggests that the leak occurs on the BINPERSID pickle opcode (when the root pickle loads the reference to the ghost of the OOBTree).

It can be further simplified and still "leak" a tuple:

import zodbpickle.fastpickle as cPickle

for i in range(NB_READS):
        p3 = '\x80\x02}q\x02U\x04dataq\x03}q\x04U\x06foobarq\x05C\x08\x00\x00\x00\x00\x00\x00\x00\x01cBTrees.OOBTree\nOOBTree\nq\x06\x86Qss.'
        p2 = '\x80\x02}q\x02U\x04dataq\x03}q\x04U\x06foobarq\x05c_codecs\nencode\nq\x06X\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01q\x07U\x06latin1q\x08\x86RcBTrees.OOBTree\nOOBTree\nq\t\x86Qss.'
        unpickler = cPickle.Unpickler(BytesIO(p3))
        unpickler.persistent_load = load_persistent
        unpickler.load()
        del unpickler

Here, p3 is the pickle produced by protocol 3, and p2 is the equivalent pickle produced by protocol 2. Using p3 leaks the tuple; using p2 does not leak. I likewise found that if I used zodbpickle's Python implementation of protocol 3, I did not leak.

Neither protocol version leaks under Python 3.

So it appears that the implementation of BINPERSID in zodbpickle's _pickle_27.c module is buggy for Python 3, although the "leaked" tuple is still found by the GC. I had a look and it's not immediately apparent to me how that's happening...but the code for Python 3 is quite a bit cleaned up and simplified.

@azmeuk can you confirm that adding a call to gc.collect() in your inner loop stops the memory growth?

azmeuk · 2018-05-15T08:53:39Z

Thank you for taking some time for this issue.

I can confirm this script does not seem to leak anymore with python 2 and zodb 5, when I watch the memory consumption with system monitoring tools, like htop:

import gc, transaction, ZODB
from BTrees.OOBTree import OOBTree

conn = ZODB.DB(None).open()
conn.root()["foobar"] = OOBTree()
transaction.commit()

for i in range(10 ** 7):
    conn.root()["foobar"]
    conn.cacheMinimize()
    gc.collect()

conn.close()

However, with a slightly more complicated script it still seems to leak. For instance, here is the script I used to create a memory chart:

# tox.ini
[tox]
envlist = {py27,py35}-{collect,nocollect}
skipsdist=True

[testenv]
install_command = pip install {packages}
commands = python leak.py
deps =
    psutil
    zodb

# leak.py
import os, psutil, transaction, gc, ZODB
from BTrees.OOBTree import OOBTree

output = "{}.data".format(os.path.basename(os.environ["VIRTUAL_ENV"]))
nocollect = "nocollect" in os.environ["VIRTUAL_ENV"]
process = psutil.Process(os.getpid())

conn = ZODB.DB(None).open()
conn.root()["foobar"] = OOBTree()
transaction.commit()

with open(output, "w") as fd:
    for i in range(10 ** 5):
        conn.root()["foobar"]
        conn.cacheMinimize()
        nocollect or gc.collect()

        fd.write("{} {}\n".format(i, process.memory_full_info().rss))

conn.close()

I don't understand why the first script has a steady memory consumption while if I try to monitor it, it behaves differently. That feels quantic :)

In my application daemon, the gc.collect does not seem effective neither.

jamadden · 2018-05-15T12:30:14Z

I don't understand why the first script has a steady memory consumption while if I try to monitor it, it behaves differently. That feels quantic

Indeed. I think it has to do with the resolution of the monitoring tools and the differences between RSS, USS and VMS. For me, the operating system supplied tools did not function at a fine-enough resolution to detect the memory growth which still occurs using gc.collect for the period of time I ran it. If I run my script for longer and/or use the finer resolution I can get from psutil (thanks for the reminder, BTW) the leak is still apparent.

Here's the script that uses just zodbpickle to produce the problem:

from __future__ import print_function
import sys
from io import BytesIO
import gc as gc_
import os

import psutil
import zodbpickle.fastpickle as cPickle


process = psutil.Process(os.getpid())

do_gc = True
p3 = b'\x80\x02}q\x02U\x04dataq\x03}q\x04U\x06foobarq\x05C\x08\x00\x00\x00\x00\x00\x00\x00\x01cBTrees.OOBTree\nOOBTree\nq\x06\x86Qss.'
p2 = b'\x80\x02}q\x02U\x04dataq\x03}q\x04U\x06foobarq\x05c_codecs\nencode\nq\x06X\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01q\x07U\x06latin1q\x08\x86RcBTrees.OOBTree\nOOBTree\nq\t\x86Qss.'

# Assigning to output and do_gc based on environment elided...

def load_persistent(ref):
    return None


with open(output, "w") as fd:
    for i in range(50000):
        unpickler = cPickle.Unpickler(BytesIO(pickle))
        unpickler.persistent_load = load_persistent
        unpickler.load()
        del unpickler
        if do_gc:
            gc_.collect()


        mem_info = process.memory_full_info()
        print(i, mem_info.uss, mem_info.vms, mem_info.pfaults, file=fd)
        if i % 100 == 0:
            print('.', end='', file=sys.stderr)
            sys.stderr.flush()
            fd.flush()

print()

Here's the USS growth. We can see that only Python 2.7 with pickle protocol 3 grows, whether or not we perform garbage collection:

And here's VMS (only Python 2.7 is shown because the base memory usage of 2 and 3 are so different). Again, only the protocol 3 case grows (in a step function, as expected, due to the way the OS and Python internally allocate memory):

objgraph is not able to find any Python-object-level leaks.

jamadden · 2018-05-15T12:53:54Z

It seems that persistent references have nothing to do with it, it's the C opcode (SHORT_BINBYTES, used for the OID) that leaks under Python 2. We can produce the same graphs using these much simpler pickles (pickle.dumps(b'abc', PROT)):

p3 = b'\x80\x03C\x03abcq\x00.'
p2 = b'\x80\x02c_codecs\nencode\nq\x00X\x03\x00\x00\x00abcq\x01X\x06\x00\x00\x00latin1q\x02\x86q\x03Rq\x04.'

Examining the C code for that pickle opcode, the leak seems pretty clear. If I fix it, we get a much nicer graph with no linear growth:

jamadden · 2018-05-16T16:04:12Z

Fixed with the release of zodbpickle 1.0.1. I'll open a PR that specifies that as a minimum version for the next ZODB release.

This is required to avoid a memory leak on Python 2.7. See #203.

azmeuk · 2018-05-16T16:08:08Z

Thank you!

This is required to avoid a memory leak on Python 2.7. See #203.

This was referenced May 15, 2018

_pickle_27.c leaks memory for load_[short_]binbytes zopefoundation/zodbpickle#36

Closed

Don't leak objects in _pickle_27.c load_[short_]binbytes. zopefoundation/zodbpickle#37

Merged

jamadden closed this as completed May 16, 2018

jamadden added a commit that referenced this issue May 16, 2018

Bump dependency on zodbpickle to at least 1.0.1

bbbcaed

This is required to avoid a memory leak on Python 2.7. See #203.

jamadden mentioned this issue May 16, 2018

Bump dependency on zodbpickle to at least 1.0.1 #205

Merged

jimfulton pushed a commit that referenced this issue May 16, 2018

Bump dependency on zodbpickle to at least 1.0.1 (#205)

8f5ac63

This is required to avoid a memory leak on Python 2.7. See #203.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible memory leak using python 2.7, zodb 5 and cacheMinimize #203

Possible memory leak using python 2.7, zodb 5 and cacheMinimize #203

azmeuk commented May 14, 2018 •

edited

Loading

jamadden commented May 14, 2018 •

edited

Loading

jamadden commented May 14, 2018

azmeuk commented May 15, 2018 •

edited

Loading

jamadden commented May 15, 2018

jamadden commented May 15, 2018 •

edited

Loading

jamadden commented May 16, 2018

azmeuk commented May 16, 2018

Possible memory leak using python 2.7, zodb 5 and cacheMinimize #203

Possible memory leak using python 2.7, zodb 5 and cacheMinimize #203

Comments

azmeuk commented May 14, 2018 • edited Loading

jamadden commented May 14, 2018 • edited Loading

jamadden commented May 14, 2018

azmeuk commented May 15, 2018 • edited Loading

jamadden commented May 15, 2018

jamadden commented May 15, 2018 • edited Loading

jamadden commented May 16, 2018

azmeuk commented May 16, 2018

azmeuk commented May 14, 2018 •

edited

Loading

jamadden commented May 14, 2018 •

edited

Loading

azmeuk commented May 15, 2018 •

edited

Loading

jamadden commented May 15, 2018 •

edited

Loading