Skip to content

Memory consumption gradually grows when Operator called in loop #845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
navjotk opened this issue Jun 5, 2019 · 13 comments · Fixed by #2190
Closed

Memory consumption gradually grows when Operator called in loop #845

navjotk opened this issue Jun 5, 2019 · 13 comments · Fixed by #2190

Comments

@navjotk
Copy link
Member

navjotk commented Jun 5, 2019

In current master, there are a few sources of increased memory consumption when operators are created in a loop through the seismic wrapper classes:

  • memoized methods in Solver
  • Entries in cache: especially the newly created symbol types
@mloubout
Copy link
Contributor

mloubout commented Jun 5, 2019

Is the issue creating an operator or are you creating new solver?

@FabioLuporini
Copy link
Contributor

the issue is creating new symbolic objects

@mloubout
Copy link
Contributor

mloubout commented Aug 1, 2019

Didn't realize I haven't posted it there. There is a small issue wit hsympy.solve that I am trying to fix but the bottom line is:

if you call solve(eq), it seems like sympy is creating an extra reference to all the objects in eq so the cache won't clear. One of the (hacky) bypass for now is to directly write the stencil like at:

https://github.com/opesci/devito/blob/935621d5f020b08f3a229a53632b304a9761c8e0/examples/seismic/tti/operators.py#L17

@FabioLuporini
Copy link
Contributor

not sure if related to solve. Maybe it's two different things.

I have a reproducer here

@mloubout
Copy link
Contributor

Isn't this fixed byt the clear_cache fix? I k ow #945 is referenced here but I don't think it's the same issue

@navjotk
Copy link
Member Author

navjotk commented Oct 24, 2019

Why is it not the same issue?

@mloubout
Copy link
Contributor

Sorry read through it again and the title/description is quite missleading as it says that doing solver.op() multiple times increase mempry consumption, which is not the case.
I would say rename/re-descrivbe it or open a different one.

@navjotk
Copy link
Member Author

navjotk commented Oct 24, 2019

it says that doing solver.op() multiple times increase mempry consumption

It doesn't?

@mloubout
Copy link
Contributor

mloubout commented Nov 8, 2019

With the new caching infrastructure it doesn't no, that's why I removed all the clear cache and preallocation of u0 in the fwi/rtm notebooks

@FabioLuporini
Copy link
Contributor

I think it is still an issue. Every time new symbolic objects are created, a new class type is dynamically created ; that is what causes the memory increase (not the data), as apparently this dynamically created class cannot be eliminated from the python session.

solver.op() called in a loop thousands of times creates thousand of new (Time)Functions, each of them dynamically creating a new class type, hence increasing memory consumption. Note that the per-iteration increase is totally negligible ; but with O(10**4) iterations, this starts becoming a thing

It is also possible I'm missing something at this point... what I encourage to do is to take my MFE linked above and trying it again

@ofmla
Copy link
Contributor

ofmla commented Apr 13, 2022

I would like to make a comment on this issue. Motived by discussion in the thread https://devitocodes.slack.com/archives/C1HSXP6KB/p1643619000223249 I performed several 3D experiments. In a first test (test1.py),


from examples.seismic.acoustic import AcousticWaveSolver
from examples.seismic import demo_model, setup_geometry


def main():
    shape = (151, 151, 151)    # 151x151x151 grid
    spacing = (10., 10., 10.)  # spacing of 10 meters
    origin = (0., 0., 0.)      # origin
    nbl = 0                    # number of pad points
    tn = 1000
    space_order = 8
    nshots = 20

    # Use a defined preset model
    model = demo_model('layers-isotropic', spacing=spacing, space_order=space_order,
                       shape=shape, nbl=nbl)

    geometry = setup_geometry(model, tn)

    solver = AcousticWaveSolver(model, geometry, kernel='OT2',
                                space_order=space_order)

    for i in range(nshots):
        solver.forward()


if __name__ == "__main__":
    main()

I modeled 20 shots in a loop using the forward method without any arguments and could see the memory growing, although at certain intervals of time the memory is released.
test1

In a second experiment (test2.py)


from devito import *
from examples.seismic.acoustic import AcousticWaveSolver
from examples.seismic import demo_model, setup_geometry


def main():
    shape = (151, 151, 151)    # 151x151x151 grid
    spacing = (10., 10., 10.)  # spacing of 10 meters
    origin = (0., 0., 0.)      # origin
    nbl = 0                    # number of pad points
    tn = 1000
    space_order = 8
    nshots = 20

    # Use a defined preset model
    model = demo_model('layers-isotropic', spacing=spacing, space_order=space_order,
                       shape=shape, nbl=nbl)

    geometry = setup_geometry(model, tn)

    solver = AcousticWaveSolver(model, geometry, kernel='OT2',
                                space_order=space_order)

    # Define the wavefield with the size of the model and the time dimension
    v = TimeFunction(name="v", grid=model.grid, time_order=2,
                     space_order=space_order)

    for i in range(nshots):
        solver.forward(u=v)


if __name__ == "__main__":
    main()

I passed a TimeFunction created out of the loop as an argument to the forward method and I also observed an increasing in the memory on a smaller scale.
test2
Then I verified that simply by using the op_fwd method (which returns the Operator) in the following way: solver.op_fwd().apply(dt=model.critical_dt) the memory is kept constant (test3.py).


from examples.seismic.acoustic import AcousticWaveSolver
from examples.seismic import demo_model, setup_geometry


def main():
    shape = (151, 151, 151)    # 151x151x151 grid
    spacing = (10., 10., 10.)  # spacing of 10 meters
    origin = (0., 0., 0.)      # origin
    nbl = 0                    # number of pad points
    tn = 1000
    space_order = 8
    nshots = 20

    # Use a defined preset model
    model = demo_model('layers-isotropic', spacing=spacing, space_order=space_order,
                       shape=shape, nbl=nbl)

    geometry = setup_geometry(model, tn)

    solver = AcousticWaveSolver(model, geometry, kernel='OT2',
                                space_order=space_order)
    
    for i in range(nshots):
        solver.op_fwd().apply(dt=model.critical_dt)


if __name__ == "__main__":
    main()

test3
However I was expecting this to be the case in the second experiment, as it is exactly the same way it is done in the forward method (https://github.com/devitocodes/devito/blob/master/examples/seismic/acoustic/wavesolver.py#L114).

You can run the tests with a simple command, i.e mprof run --include-children python test2.py for test 2. The plots are generated with mprof plot mprof.file where mprfo.file is the file created once the last command was executed.

Having thinked a little I came to realize that the creation of (Sparse)TimeFunctions by the forward method was the cause of the growth in memory. In addition to passing the field u to forward, you also should pass rec and src, i.e. running test 2 as below the memory profile is flat


 src = geometry.src
 rec = geometry.rec
 for i in range(nshots):
     solver.forward(src=src, rec=rec, u=v).

At certain point I considered the possibility of the solver (i.e. AcousticWaveSolver) classes were not working as expected, but there are no problems with them. The important take away message is that you must never create TimeFunction, Function and SparseTimeFunction within a loop. I would also say that the forward method has to be used with special care.

@ggorman
Copy link
Contributor

ggorman commented Aug 10, 2023

Bug relates to known python bug:
https://bugs.python.org/issue9417

ChatGPT summary:
Based on the thread from the Python repository, the issue revolves around circular references created when declaring a class, either using the class statement or using type(). The circular reference problem can potentially cause complications with garbage collection, debugging, and performance, especially in certain specialized use cases like high-traffic servers or applications like Django that create temporary classes.

While the problem has been identified and discussed in detail, the consensus among the participants, including some Python core developers, is that fixing it would be complex, possibly leading to a change in public C APIs or introducing slower lookups. Additionally, the idea of adding a new 'unlink' method was not widely accepted.

The key takeaways from the discussion include:

  1. Acknowledgement of Circular References: The circular reference in class creation is acknowledged and understood by core developers. It is seen as a design tradeoff rather than a traditional "bug" in the sense that it is not a discrepancy between documented behavior and actual behavior.

  2. Proposed Solutions and Their Challenges: Various solutions were proposed, such as using weak references or adding a new "unlink" method, but they were met with concerns about performance impact, API changes, and whether they align with Python's design principles.

  3. Potential Workarounds: Some participants suggested the possibility of creating a new metaclass that produces reference-cycle-free classes or adopting practices to manually manage references. However, these were viewed as more experimental and specific to particular use cases rather than suitable for broad adoption.

  4. Issue Status as "Won't Fix": The status of the issue being closed with the resolution "won't fix" indicates that the core developers did not find a satisfactory way to address the problem that would align with Python's broader design goals and performance needs.

  5. Implementation Specificity: The problem seems to be more related to CPython (and possibly PyPy), and the solutions would likely expose implementation details. This also contributes to the resistance to implementing a fix within the standard library.

  6. Historical Context: The thread also makes references to other related issues and discussions, showing that this isn't an isolated incident but part of a more complex ongoing conversation about reference management within Python.

In conclusion, while the circular reference in class creation is recognized, the decision was made not to pursue a fix due to the complexity of the problem, the potential performance impact, and the risk of exposing implementation details. Users experiencing specific issues related to this behavior may need to consider custom solutions tailored to their particular needs and constraints.

@ggorman
Copy link
Contributor

ggorman commented Aug 10, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants