Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed pmap memory leak using metaprogramming with MWE #37560

Closed
freddycct opened this issue Sep 14, 2020 · 7 comments
Closed

Distributed pmap memory leak using metaprogramming with MWE #37560

freddycct opened this issue Sep 14, 2020 · 7 comments

Comments

@freddycct
Copy link

freddycct commented Sep 14, 2020

I discovered this issue while using Flux/Zygote. I am training a big and complex model with Julia distributed pmap. It used to be fine under Flux/Tracker but after moving to Zygote, the memory consumption seems to grow overtime. GC.gc() does not work, if one observes memory usage using top in linux or activity monitor in osx, it seems to grow until the program terminates due to out-of-memory by the OS.

It looks like pmap doesn't free the memory generated through the use of Julia's metaprogramming features.

The machines I have experimented on:

versioninfo()

Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, broadwell)

Minimum working example

using Distributed
addprocs(4)

@everywhere mutable struct A
    a::Float32
end

@everywhere function genprog(n, p::A)
    map(1:n) do i
        y = rand()
        mdname = gensym()
        expr = :(module $mdname
            f(x) = 2*x + $y + $p.a
            end
        )
        m = eval(expr)
        Base.invokelatest(m.f, p.a)
    end
end

function main()
    i = 0
    x = A(rand())
    while true
        println("epoch $(i)")
        @everywhere GC.gc()
        
        tasks = rand(1:100, 100)
        _, timeTaken, _, _, _ = @timed let x=x
            pmap(tasks) do n
                genprog(n, x)
            end
        end
        @show timeTaken
        x.a = rand()
        i += 1
    end
end

main()

@MikeInnes , trying to get your attention since you are the author on Zygote/src/compiler/*.jl files

References

  1. probably related issue Generated code memory leaks #14495
  2. https://discourse.julialang.org/t/dynamically-create-and-destoy-modules-to-avoid-memory-leaks-in-program-synthesis/39729
  3. Memory leak with function redefinition #30653
@Keno
Copy link
Member

Keno commented Sep 14, 2020

As written, this is expected, since generated code does not get freed.

@freddycct
Copy link
Author

@Keno Thanks for the attention given to this issue. I wonder if this is recognized as a bug for Flux? Is there a temporary workaround for this?

@simonbyrne
Copy link
Contributor

Closing as a duplicate of #14495.

For your particular use case, can you redefine the method rather than define new modules/functions?

@freddycct
Copy link
Author

@simonbyrne This affects all the Flux/Zygote backpropagation code.

@simonbyrne
Copy link
Contributor

@freddycct are you saying that Zygote requires you to define new modules or functions rather than redefine existing methods?

@freddycct
Copy link
Author

freddycct commented Sep 16, 2020

@simonbyrne, Zygote internally define new functions for each backpropagation. Not sure why the Zygote devs are not commenting. The above is just an abstraction, or minimal example.

@simonbyrne
Copy link
Contributor

I suggest opening an issue on Zygote.jl then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants