Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The limit of the number of generated functions? #59

Closed
xukai92 opened this issue Mar 2, 2021 · 17 comments
Closed

The limit of the number of generated functions? #59

xukai92 opened this issue Mar 2, 2021 · 17 comments

Comments

@xukai92
Copy link

xukai92 commented Mar 2, 2021

I meet a weird situation:

My code with GG.jl works fine as long as I only generated around < 400 functions using mk_functions.
When the number of total functions generated reaches a fewer than 400, I see this error:

julia: /buildworker/worker/package_linux64/build/src/codegen.cpp:2770: jl_cgval_t emit_invoke(jl_codectx_t&, jl_expr_t*, jl_value_t*): Assertion `(((jl_value_t*)(((jl_taggedvalue_t*)((char*)(mi) - sizeof(jl_taggedvalue_t)))->header & ~(uintptr_t)15))==(jl_value_t*)(jl_method_instance_type))' failed.

signal (6): Aborted
in expression starting at /afs/inf.ed.ac.uk/user/s16/s1672897/projects/bayesian_symbolic/notebooks/revamp_dev.jl:132
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
__assert_fail_base at /lib64/libc.so.6 (unknown line)
__assert_fail at /lib64/libc.so.6 (unknown line)
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:2770
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3671
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3339
...

(Just copying-pasting a few lines in the head of the error message; the complete version is a bit long.)

Is this a known issue of GG.jl or Julia?

@thautwarm
Copy link
Member

It seems to be a Julia bug: JuliaLang/julia#35580

I'd suggest your slightly changing your implementation accordingly, or you can provide me an MME so that I can give your advice about how-to stuffs.

@xukai92
Copy link
Author

xukai92 commented Mar 2, 2021

Thanks for the pointer. I checked related issues a bit and believe the MWE from this issue is same as my setting.
In short, I generate random expressions and want to evaluate them on some inputs.

@thautwarm
Copy link
Member

I'd suspect such an issue occurs due to overflows occurred in c part, because GG extensively uses of type level things and is likely to hit some boundaries.

You might break your expressions into smaller parts and then runtime-generate them.

@xukai92
Copy link
Author

xukai92 commented Mar 2, 2021

I feel what you propose here can potentially alleviate the issue.
In fact, when I trying to get a MWE from my side, I found that I need to use a much larger number of expressions to fail GG.jl, if the epxression themselves are small:

using GeneralizedGenerated: mk_function

module GeneratedFunctions end

function rand_ex()
    u = rand()
    c = randn()
    if u < 0.5
        return :(m1 + m2 ++ $c)
    else
        return :(m1 * m2 /* $c)
    end
end

function gen_f_mkfunc(ex)
    return mk_function(GeneratedFunctions, :((m1, m2, r²) -> $ex))
end

function main(N::Int)
    for i in 1:N
        m1, m2, r² = rand(3)
        ex = rand_ex()
        f_mkfunc = gen_f_mkfunc(ex)
        f_mkfunc(m1, m2, r²)
    end
end

main(500)   # successful
main(5_000) # failed

By in my case, it's bit hard for me to break my expression into smaller parts,
as I'm using ExprOptimization.jl to define a context free grammar and just sample an expression as a whole.
It seems to me that breaking it down requires a bit involved work here, or there is a straightforward way to do so?

@thautwarm
Copy link
Member

Hi, I usually see people use runtime-generate functions only once, which looks quite abusing to me.
Runtime generated functions are extremely slow in the first call, and become fast since the second call.
Could you explain a bit about why you call mk_function inside the loop but use each only once?

@xukai92
Copy link
Author

xukai92 commented Mar 2, 2021

The purpose of my generated function call is to evaluate how good the (random) function is, on some observed data.
I can modify the example above a bit to make it clear.

function main(N::Int)
    m1, m2, r² = rand(3)
    f = m1 * m2 /for i in 1:N
        ex = rand_ex()
        f_mkfunc = gen_f_mkfunc(ex)
        f_est = f_mkfunc(m1, m2, r²)
        loss = sqrt((f - f_est)^2)
    end
end

@thautwarm
Copy link
Member

This code runs and finishes instantly. Could you tell me if it is adequate? And if not, why?

using GeneralizedGenerated: mk_function

function rand_f()
    u = rand()
    c = randn()
    if u < 0.5
        (m1, m2, r²) -> m1 + m2 ++ c
    else
        (m1, m2, r²) -> m1 * m2 /* c
    end
end

function main(N::Int)
    for i in 1:N
        m1, m2, r² = rand(3)
        f_mkfunc = rand_f()
        f_mkfunc(m1, m2, r²)
    end
end

main(500)   # successful
main(5_000) #

@thautwarm
Copy link
Member

The purpose of my generated function call is to evaluate how good the (random) function is, on some observed data.
I can modify the example above a bit to make it clear.

function main(N::Int)
    m1, m2, r² = rand(3)
    f = m1 * m2 /for i in 1:N
        ex = rand_ex()
        f_mkfunc = gen_f_mkfunc(ex)
        f_est = f_mkfunc(m1, m2, r²)
        loss = sqrt((f - f_est)^2)
    end
end

You didn't use loss, maybe you want to print it?
I'm very sorry that I did not get your point, but I think it important for me to make sure this package is used in proper cases.

@xukai92
Copy link
Author

xukai92 commented Mar 2, 2021

This is not really what I'm looking for.
My real rand_f is defined by a probabilistic context-free grammar (PCFG), from which I can sample an expression.
It is also impossible to enumerate the possible function from the PCFG in my real application either, as the number of trees to the depth I want is way too large (>1,000,000).
I can post an example of the PCFG version using ExprOptimization.jl if that helps.

PS: This is replying to #59 (comment)

@xukai92
Copy link
Author

xukai92 commented Mar 2, 2021

You didn't use loss, maybe you want to print it?
I'm very sorry that I did not get your point, but I think it important for me to make sure this package is used in proper cases.

For a more complete example

function main(N::Int)
    m1, m2, r² = rand(3)
    f = m1 * m2 / r²
    best_loss = Inf
    best_func = nothing
    for i in 1:N
        ex = rand_ex()
        f_mkfunc = gen_f_mkfunc(ex)
        f_est = f_mkfunc(m1, m2, r²)
        loss = sqrt((f - f_est)^2)
        if loss < best_loss
            best_loss = loss
            best_func = f_mkfunc
        end
    end
    # finding the best_func is my goal
end

@thautwarm
Copy link
Member

Hi, I guess this is adequate:

function rand_f()
    u = rand()
    c = randn()
    if u < 0.5
        (m1, m2, r²) -> m1 + m2 ++ c
    else
        (m1, m2, r²) -> m1 * m2 /* c
    end
end

function main(N::Int)
    m1, m2, r² = rand(3)
    f = m1 * m2 / r²
    best_loss = Inf
    best_func = nothing
    for i in 1:N
        f_mkfunc = rand_f()
        f_est = f_mkfunc(m1, m2, r²)
        loss = sqrt((f - f_est)^2)
        if loss < best_loss
            best_loss = loss
            best_func = f_mkfunc
        end
    end
    best_func
end

f = main(4000)

@xukai92
Copy link
Author

xukai92 commented Mar 2, 2021

The issue for me is that I can't actually write down the rand_f as you do here,
which was the reason I wrote it as in my initial version.
It's something like below in practice, using ExprOptimization.jl:

using ExprOptimization
using ExprOptimization.ProbabilisticExprRules: RuleNode, mindepth_map, ProbabilisticGrammar

G = @grammar begin
    Real = m1
    Real = m2
    Real = r²
    Real = Real + Real
    Real = Real - Real
    Real = Real * Real
    Real = Real / Real
end

PCFG = ProbabilisticGrammar(G)

function rand_f()
    tree = rand(RuleNode, PCFG, :Real, mindepth_map(G), 5)
    ex = get_executable(tree, G)
    return mk_function(GeneratedFunctions, :((m1, m2, r²) -> $ex))
end

Again, in practice my G is much complex than this example and there is no way to enumerate all possible functions underG.
Hope this makes my situation a bit clear.

@cscherrer
Copy link
Collaborator

Hi @xukai92 , I hope you don't mind my jumping in on this...

In my experience, most applications are a natural fit for either GG or https://github.com/SciML/RuntimeGeneratedFunctions.jl

  • GG has a really nice ecosystem around it and is very powerful, for example allowing closures. But the approach lead to some limitations inherited from Julia compiler limitations
  • RGF is simpler, and for example cannot represent closures. But it is also very fast, and the simpler approach help it to avoid some of the Julia compiler's corner cases

I use both, choosing based on the kind of problem. Have you tried RGF for this case?

@cscherrer
Copy link
Collaborator

function rand_f()
           tree = rand(RuleNode, PCFG, :Real, mindepth_map(G), 5)
           ex = get_executable(tree, G)
           return @RuntimeGeneratedFunction :((m1, m2, r²) -> $ex)
       end

julia> [rand_f()(1,2,3) for j in 1:2000][end-10:end]
11-element Vector{Real}:
  1
  1
  1
 -2.6666666666666665
  4.0
  3
  3
  0.5
  2.0
  1
  2

I prefer GG when you're writing a few functions that will be evaluated many times. But for a very large number of functions, RGF can handle simpler cases without needing to build as many types.

@xukai92
Copy link
Author

xukai92 commented Mar 3, 2021

Thanks a lot @cscherrer for pointing RuntimeGeneratedFunctions.jl, which I didn't know before!
I give a quick try on this package

  • It doesn't have the same issue I described here (i.e. throwing errors when the number of generated function reaches a high figure).
  • It's slightly faster than GG.jl (5,584s -> 5,300s in one of my experiment).

I prefer GG when you're writing a few functions that will be evaluated many times. But for a very large number of functions, RGF can handle simpler cases without needing to build as many types.

Thanks for also making this comment here.
In fact in my setup, I need to evaluate each function k times, with the hyper-parameter k ranging from 1 to 10.
The number I reported above is from k=5 and I suspect that only with a smaller k the benefit of RGF would be more clear.

Thanks again!

@thautwarm
Copy link
Member

Thanks a lot @cscherrer !

RGF is absolutely a right way here.

Besides, you @xukai92 do not need to generate functions so many times. You only need to generate one function:

function rand_ex()
   tree = rand(RuleNode, PCFG, :Real, mindepth_map(G), 5)
   ex = get_executable(tree, G)
   return ex
end
for j = 1:2000
    f_mkfunc_ex = rand_ex()
    f_est = eval(:(
                 let (m1, m2, r²) = $(((m1, m2, r²))
                      $f_mkfunc_ex
                 end))
    loss = sqrt((f - f_est)^2)
    if loss < best_loss
        best_loss = loss
        best_ex = f_mkfunc_ex 
    end
end

best_func = @RuntimeGeneratedFunction :((m1, m2, r²) -> $best_ex)

thautwarm added a commit that referenced this issue Mar 19, 2021
@thautwarm
Copy link
Member

This is now directly supported by GG v0.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants