Allocation regression: global const getfield allocates in some cases #50317

NHDaly · 2023-06-28T00:33:28Z

Here's a small regression that is showing up pretty large in the RAI codebase (we have a global lock in our database pager that's way slower now).

Somehow accessing the fields of a global constant has become type unstable in some cases.

struct Wrapper
    lock::ReentrantLock
end

Base.lock(f::Function, m::Wrapper) = Base.lock(f, m.lock)
Base.lock(m::Wrapper) = Base.lock(m.lock)
Base.unlock(m::Wrapper) = Base.unlock(m.lock)

const MONITOR = Wrapper(ReentrantLock())

function foo()
    Base.@lock MONITOR begin
        return 2+2
    end
end
foo()
@time foo()  # 0 allocs on 1.8, 1 alloc on 1.9

The allocation only shows up when referencing the global const variable. It disappears if you pass it in as an argument:

function bar(x)
    Base.@lock x begin
        return 2+2
    end
end
bar(MONITOR)
@time bar(MONITOR) # 0 allocs on 1.8, 0 allocs on 1.9

This might be related to #50073, but i'm not sure.

KristofferC · 2023-06-28T19:34:02Z

Bisected to bbdee0b, cc @Keno

NHDaly · 2023-06-29T20:25:08Z

Thanks @KristofferC. It looks like the regressions were identified last November:
bbdee0b#commitcomment-89854661

Did we decide we will just accept these for 1.9? Is there anything that can be done to address them? They are currently having quite significant impact in our upgrade attempts.

KristofferC · 2023-06-29T20:28:30Z

Did we decide we will just accept these for 1.9?

Not sure what you mean, 1.9 is already out.

Is there anything that can be done to address them?

Fix it and backport the fix I would assume.

NHDaly · 2023-06-29T20:34:26Z

Not sure what you mean, 1.9 is already out.

I meant: back in November, when this issue was identified, there weren't any follow up comments. So did we explicitly decide that the regression was acceptable for the (then) upcoming release? Or was it accidentally missed?

I'm asking because if it was just accidentally missed, there's still hope that it's an easy fix. If it was explicitly decided that we would live with it, then maybe there are reasons it will be hard to fix for a 1.9.2.

Sorry for my sloppy / confusing language. :)

Keno · 2023-06-29T20:35:53Z

Somebody just needs to decide what it should do and what the inlining rules are.

Keno · 2023-06-29T20:49:17Z

To expand on that, the issue here is that the compiler improved, so we now know the constant result of the getproperty.
However, because it's mutable, we refuse to use the result for inlining (but we do use it for the rest of inference). So there's basically three choices here:

Allow mutable values to be inlineable constants, by tweaking the code here (but is this ok or will that blow up the serializer?): https://github.com/JuliaLang/julia/blob/master/base/compiler/utilities.jl#L88
Allow some functions to inline even if their result is not inlineable (tricky, because you risk losing type informations and it's not profitable in general, so how do you draw the line for the predicate)
Fix the generated code get getproperty to get rid of the useless allocation (reasonably straightforward, but probably not sure general).

kpamnany · 2023-06-30T17:29:40Z

The third choice seems like the easiest way forward here, and additionally, seems like it will address our problem. Can we request that fix as it's beyond my understanding?

KristofferC · 2023-06-30T18:16:28Z

I meant: back in November, when this issue was identified, there weren't any follow up comments. So did we explicitly decide that the regression was acceptable for the (then) upcoming release? Or was it accidentally missed?

Looking at bbdee0b#commitcomment-89859406 it seems it got missed and never got "promoted" from a comment to a milestoned issue.

NHDaly · 2023-06-30T19:16:19Z

Thanks Kristoffer. 👍

+1 that it would be great to get this resolved. It's currently blocking our upgrade since, like I said, it's making all locks+unlocks in a core component (our pager) noticeably slower.

We're seeing like a 20%+ regression in some workloads I think?

nsajko · 2023-07-04T15:12:11Z

As already noted in linked issues, a workaround is to use getters:

struct Wrapper
    lock::ReentrantLock
end

get_lock(w::Wrapper) = w.lock

Base.lock(f::Function, m::Wrapper) = Base.lock(f, get_lock(m))
Base.lock(m::Wrapper) = Base.lock(get_lock(m))
Base.unlock(m::Wrapper) = Base.unlock(get_lock(m))

const MONITOR = Wrapper(ReentrantLock())

function foo()
    Base.@lock MONITOR begin
        return 2+2
    end
end

foo()
@time foo()  # zero allocs

kpamnany · 2023-07-05T16:03:04Z

Thanks @nsajko. We're trying this with one of our data structures but the trouble is that we'd have to make this change in a number of different places.

aviatesk · 2023-07-06T16:18:34Z

Let me post a gist of discussion we had on Slack.

@Keno said:

For option 3, there's two ways to fix it. One is to have a version of the generic getfield function that doesn't need a box, but instead takes a separate type tag. We already do that for some other functions. The other way is to have a special case for one field structs that just checks the field index and then emits the specialized code for that one field. Ideally we'd do both.

And I believe that in particular the latter optimization is relatively easier to implement. It basically extends this optimization at the codegen level for Symbol-type field values:

julia/src/codegen.cpp

Lines 3737 to 3821 in feb2988

    
           else if (fld.typ == (jl_value_t*)jl_long_type) { 
        
               if (ctx.vaSlot > 0) { 
        
                   // optimize VA tuple 
        
                   if (LoadInst *load = dyn_cast_or_null<LoadInst>(obj.V)) { 
        
                       if (load->getPointerOperand() == ctx.slots[ctx.vaSlot].boxroot && ctx.argArray) { 
        
                           Value *valen = emit_n_varargs(ctx); 
        
                           jl_cgval_t va_ary( // fake instantiation of a cgval, in order to call emit_bounds_check (it only checks the `.V` field) 
        
                                   ctx.builder.CreateInBoundsGEP(ctx.types().T_prjlvalue, ctx.argArray, ConstantInt::get(ctx.types().T_size, ctx.nReqArgs)), 
        
                                   NULL, NULL); 
        
                           Value *idx = emit_unbox(ctx, ctx.types().T_size, fld, (jl_value_t*)jl_long_type); 
        
                           idx = emit_bounds_check(ctx, va_ary, NULL, idx, valen, boundscheck); 
        
                           idx = ctx.builder.CreateAdd(idx, ConstantInt::get(ctx.types().T_size, ctx.nReqArgs)); 
        
                           Instruction *v = ctx.builder.CreateAlignedLoad(ctx.types().T_prjlvalue, ctx.builder.CreateInBoundsGEP(ctx.types().T_prjlvalue, ctx.argArray, idx), Align(sizeof(void*))); 
        
                           setName(ctx.emission_context, v, "getfield"); 
        
                           // if we know the result type of this load, we will mark that information here too 
        
                           jl_aliasinfo_t ai = jl_aliasinfo_t::fromTBAA(ctx, ctx.tbaa().tbaa_value); 
        
                           ai.decorateInst(maybe_mark_load_dereferenceable(v, false, rt)); 
        
                           *ret = mark_julia_type(ctx, v, /*boxed*/ true, rt); 
        
                           return true; 
        
                       } 
        
                   } 
        
               } 
        
               if (jl_is_datatype(utt)) { 
        
                   if (jl_struct_try_layout(utt)) { 
        
                       size_t nfields = jl_datatype_nfields(utt); 
        
                       // integer index 
        
                       size_t idx; 
        
                       if (fld.constant && (idx = jl_unbox_long(fld.constant) - 1) < nfields) { 
        
                           if (!jl_has_free_typevars(jl_field_type(utt, idx))) { 
        
                               // known index 
        
                               *ret = emit_getfield_knownidx(ctx, obj, idx, utt, order); 
        
                               return true; 
        
                           } 
        
                       } 
        
                       else { 
        
                           // unknown index 
        
                           Value *vidx = emit_unbox(ctx, ctx.types().T_size, fld, (jl_value_t*)jl_long_type); 
        
                           if (emit_getfield_unknownidx(ctx, ret, obj, vidx, utt, boundscheck, order)) { 
        
                               return true; 
        
                           } 
        
                       } 
        
                   } 
        
                   Value *vidx = emit_unbox(ctx, ctx.types().T_size, fld, (jl_value_t*)jl_long_type); 
        
                   if (jl_is_tuple_type(utt) && is_tupletype_homogeneous(utt->parameters, true)) { 
        
                       // For tuples, we can emit code even if we don't know the exact 
        
                       // type (e.g. because we don't know the length). This is possible 
        
                       // as long as we know that all elements are of the same (leaf) type. 
        
                       if (obj.ispointer()) { 
        
                           if (order != jl_memory_order_notatomic && order != jl_memory_order_unspecified) { 
        
                               emit_atomic_error(ctx, "getfield: non-atomic field cannot be accessed atomically"); 
        
                               *ret = jl_cgval_t(); // unreachable 
        
                               return true; 
        
                           } 
        
                           // Determine which was the type that was homogeneous 
        
                           jl_value_t *jt = jl_tparam0(utt); 
        
                           if (jl_is_vararg(jt)) 
        
                               jt = jl_unwrap_vararg(jt); 
        
                           assert(jl_is_datatype(jt)); 
        
                           // This is not necessary for correctness, but allows to omit 
        
                           // the extra code for getting the length of the tuple 
        
                           if (!bounds_check_enabled(ctx, boundscheck)) { 
        
                               vidx = ctx.builder.CreateSub(vidx, ConstantInt::get(ctx.types().T_size, 1)); 
        
                           } 
        
                           else { 
        
                               vidx = emit_bounds_check(ctx, obj, (jl_value_t*)obj.typ, vidx, 
        
                                   emit_datatype_nfields(ctx, emit_typeof(ctx, obj, false, false)), 
        
                                   jl_true); 
        
                           } 
        
                           bool isboxed = !jl_datatype_isinlinealloc((jl_datatype_t*)jt, 0); 
        
                           Value *ptr = data_pointer(ctx, obj); 
        
                           *ret = typed_load(ctx, ptr, vidx, 
        
                                   isboxed ? (jl_value_t*)jl_any_type : jt, 
        
                                   obj.tbaa, nullptr, isboxed, AtomicOrdering::NotAtomic, false); 
        
                           return true; 
        
                       } 
        
                   } 
        
                   // Unknown object, but field known to be integer 
        
                   vidx = ctx.builder.CreateSub(vidx, ConstantInt::get(ctx.types().T_size, 1)); 
        
                   Value *fld_val = ctx.builder.CreateCall(prepare_call(jlgetnthfieldchecked_func), { boxed(ctx, obj), vidx }); 
        
                   *ret = mark_julia_type(ctx, fld_val, true, jl_any_type); 
        
                   return true; 
        
               } 
        
           }

NHDaly added performance Must go faster regression Regression in behavior compared to a previous version regression 1.9 Regression in the 1.9 release labels Jun 28, 2023

gbaraldi added the bisect wanted label Jun 28, 2023

KristofferC removed the bisect wanted label Jun 28, 2023

KristofferC added this to the 1.10 milestone Jun 28, 2023

nsajko mentioned this issue Jul 4, 2023

Unexpected allocation in Julia 1.9/1.10 #49241

Closed

gbaraldi mentioned this issue Jul 6, 2023

Optimize getfield lowering to avoid boxing in some cases #50444

Merged

gbaraldi closed this as completed Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocation regression: global const getfield allocates in some cases #50317

Allocation regression: global const getfield allocates in some cases #50317

NHDaly commented Jun 28, 2023 •

edited

Loading

KristofferC commented Jun 28, 2023

NHDaly commented Jun 29, 2023

KristofferC commented Jun 29, 2023

NHDaly commented Jun 29, 2023 •

edited

Loading

Keno commented Jun 29, 2023

Keno commented Jun 29, 2023

kpamnany commented Jun 30, 2023

KristofferC commented Jun 30, 2023

NHDaly commented Jun 30, 2023

nsajko commented Jul 4, 2023

kpamnany commented Jul 5, 2023

aviatesk commented Jul 6, 2023

Allocation regression: global const getfield allocates in some cases #50317

Allocation regression: global const getfield allocates in some cases #50317

Comments

NHDaly commented Jun 28, 2023 • edited Loading

KristofferC commented Jun 28, 2023

NHDaly commented Jun 29, 2023

KristofferC commented Jun 29, 2023

NHDaly commented Jun 29, 2023 • edited Loading

Keno commented Jun 29, 2023

Keno commented Jun 29, 2023

kpamnany commented Jun 30, 2023

KristofferC commented Jun 30, 2023

NHDaly commented Jun 30, 2023

nsajko commented Jul 4, 2023

kpamnany commented Jul 5, 2023

aviatesk commented Jul 6, 2023

NHDaly commented Jun 28, 2023 •

edited

Loading

NHDaly commented Jun 29, 2023 •

edited

Loading