Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocation regression: global const getfield allocates in some cases #50317

Closed
NHDaly opened this issue Jun 28, 2023 · 12 comments
Closed

Allocation regression: global const getfield allocates in some cases #50317

NHDaly opened this issue Jun 28, 2023 · 12 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version regression 1.9 Regression in the 1.9 release
Milestone

Comments

@NHDaly
Copy link
Member

NHDaly commented Jun 28, 2023

Here's a small regression that is showing up pretty large in the RAI codebase (we have a global lock in our database pager that's way slower now).

Somehow accessing the fields of a global constant has become type unstable in some cases.

struct Wrapper
    lock::ReentrantLock
end

Base.lock(f::Function, m::Wrapper) = Base.lock(f, m.lock)
Base.lock(m::Wrapper) = Base.lock(m.lock)
Base.unlock(m::Wrapper) = Base.unlock(m.lock)

const MONITOR = Wrapper(ReentrantLock())

function foo()
    Base.@lock MONITOR begin
        return 2+2
    end
end
foo()
@time foo()  # 0 allocs on 1.8, 1 alloc on 1.9

The allocation only shows up when referencing the global const variable. It disappears if you pass it in as an argument:

function bar(x)
    Base.@lock x begin
        return 2+2
    end
end
bar(MONITOR)
@time bar(MONITOR) # 0 allocs on 1.8, 0 allocs on 1.9

This might be related to #50073, but i'm not sure.

@NHDaly NHDaly added performance Must go faster regression Regression in behavior compared to a previous version regression 1.9 Regression in the 1.9 release labels Jun 28, 2023
@KristofferC
Copy link
Member

Bisected to bbdee0b, cc @Keno

@KristofferC KristofferC added this to the 1.10 milestone Jun 28, 2023
@NHDaly
Copy link
Member Author

NHDaly commented Jun 29, 2023

Thanks @KristofferC. It looks like the regressions were identified last November:
bbdee0b#commitcomment-89854661

Did we decide we will just accept these for 1.9? Is there anything that can be done to address them? They are currently having quite significant impact in our upgrade attempts.

@KristofferC
Copy link
Member

Did we decide we will just accept these for 1.9?

Not sure what you mean, 1.9 is already out.

Is there anything that can be done to address them?

Fix it and backport the fix I would assume.

@NHDaly
Copy link
Member Author

NHDaly commented Jun 29, 2023

Not sure what you mean, 1.9 is already out.

I meant: back in November, when this issue was identified, there weren't any follow up comments. So did we explicitly decide that the regression was acceptable for the (then) upcoming release? Or was it accidentally missed?

I'm asking because if it was just accidentally missed, there's still hope that it's an easy fix. If it was explicitly decided that we would live with it, then maybe there are reasons it will be hard to fix for a 1.9.2.

Sorry for my sloppy / confusing language. :)

@Keno
Copy link
Member

Keno commented Jun 29, 2023

Somebody just needs to decide what it should do and what the inlining rules are.

@Keno
Copy link
Member

Keno commented Jun 29, 2023

To expand on that, the issue here is that the compiler improved, so we now know the constant result of the getproperty.
However, because it's mutable, we refuse to use the result for inlining (but we do use it for the rest of inference). So there's basically three choices here:

  1. Allow mutable values to be inlineable constants, by tweaking the code here (but is this ok or will that blow up the serializer?): https://github.com/JuliaLang/julia/blob/master/base/compiler/utilities.jl#L88
  2. Allow some functions to inline even if their result is not inlineable (tricky, because you risk losing type informations and it's not profitable in general, so how do you draw the line for the predicate)
  3. Fix the generated code get getproperty to get rid of the useless allocation (reasonably straightforward, but probably not sure general).

@kpamnany
Copy link
Contributor

The third choice seems like the easiest way forward here, and additionally, seems like it will address our problem. Can we request that fix as it's beyond my understanding?

@KristofferC
Copy link
Member

I meant: back in November, when this issue was identified, there weren't any follow up comments. So did we explicitly decide that the regression was acceptable for the (then) upcoming release? Or was it accidentally missed?

Looking at bbdee0b#commitcomment-89859406 it seems it got missed and never got "promoted" from a comment to a milestoned issue.

@NHDaly
Copy link
Member Author

NHDaly commented Jun 30, 2023

Thanks Kristoffer. 👍

+1 that it would be great to get this resolved. It's currently blocking our upgrade since, like I said, it's making all locks+unlocks in a core component (our pager) noticeably slower.

We're seeing like a 20%+ regression in some workloads I think?

@nsajko
Copy link
Contributor

nsajko commented Jul 4, 2023

As already noted in linked issues, a workaround is to use getters:

struct Wrapper
    lock::ReentrantLock
end

get_lock(w::Wrapper) = w.lock

Base.lock(f::Function, m::Wrapper) = Base.lock(f, get_lock(m))
Base.lock(m::Wrapper) = Base.lock(get_lock(m))
Base.unlock(m::Wrapper) = Base.unlock(get_lock(m))

const MONITOR = Wrapper(ReentrantLock())

function foo()
    Base.@lock MONITOR begin
        return 2+2
    end
end

foo()
@time foo()  # zero allocs

@kpamnany
Copy link
Contributor

kpamnany commented Jul 5, 2023

Thanks @nsajko. We're trying this with one of our data structures but the trouble is that we'd have to make this change in a number of different places.

@aviatesk
Copy link
Member

aviatesk commented Jul 6, 2023

Let me post a gist of discussion we had on Slack.

@Keno said:

For option 3, there's two ways to fix it. One is to have a version of the generic getfield function that doesn't need a box, but instead takes a separate type tag. We already do that for some other functions. The other way is to have a special case for one field structs that just checks the field index and then emits the specialized code for that one field. Ideally we'd do both.

And I believe that in particular the latter optimization is relatively easier to implement. It basically extends this optimization at the codegen level for Symbol-type field values:

julia/src/codegen.cpp

Lines 3737 to 3821 in feb2988

else if (fld.typ == (jl_value_t*)jl_long_type) {
if (ctx.vaSlot > 0) {
// optimize VA tuple
if (LoadInst *load = dyn_cast_or_null<LoadInst>(obj.V)) {
if (load->getPointerOperand() == ctx.slots[ctx.vaSlot].boxroot && ctx.argArray) {
Value *valen = emit_n_varargs(ctx);
jl_cgval_t va_ary( // fake instantiation of a cgval, in order to call emit_bounds_check (it only checks the `.V` field)
ctx.builder.CreateInBoundsGEP(ctx.types().T_prjlvalue, ctx.argArray, ConstantInt::get(ctx.types().T_size, ctx.nReqArgs)),
NULL, NULL);
Value *idx = emit_unbox(ctx, ctx.types().T_size, fld, (jl_value_t*)jl_long_type);
idx = emit_bounds_check(ctx, va_ary, NULL, idx, valen, boundscheck);
idx = ctx.builder.CreateAdd(idx, ConstantInt::get(ctx.types().T_size, ctx.nReqArgs));
Instruction *v = ctx.builder.CreateAlignedLoad(ctx.types().T_prjlvalue, ctx.builder.CreateInBoundsGEP(ctx.types().T_prjlvalue, ctx.argArray, idx), Align(sizeof(void*)));
setName(ctx.emission_context, v, "getfield");
// if we know the result type of this load, we will mark that information here too
jl_aliasinfo_t ai = jl_aliasinfo_t::fromTBAA(ctx, ctx.tbaa().tbaa_value);
ai.decorateInst(maybe_mark_load_dereferenceable(v, false, rt));
*ret = mark_julia_type(ctx, v, /*boxed*/ true, rt);
return true;
}
}
}
if (jl_is_datatype(utt)) {
if (jl_struct_try_layout(utt)) {
size_t nfields = jl_datatype_nfields(utt);
// integer index
size_t idx;
if (fld.constant && (idx = jl_unbox_long(fld.constant) - 1) < nfields) {
if (!jl_has_free_typevars(jl_field_type(utt, idx))) {
// known index
*ret = emit_getfield_knownidx(ctx, obj, idx, utt, order);
return true;
}
}
else {
// unknown index
Value *vidx = emit_unbox(ctx, ctx.types().T_size, fld, (jl_value_t*)jl_long_type);
if (emit_getfield_unknownidx(ctx, ret, obj, vidx, utt, boundscheck, order)) {
return true;
}
}
}
Value *vidx = emit_unbox(ctx, ctx.types().T_size, fld, (jl_value_t*)jl_long_type);
if (jl_is_tuple_type(utt) && is_tupletype_homogeneous(utt->parameters, true)) {
// For tuples, we can emit code even if we don't know the exact
// type (e.g. because we don't know the length). This is possible
// as long as we know that all elements are of the same (leaf) type.
if (obj.ispointer()) {
if (order != jl_memory_order_notatomic && order != jl_memory_order_unspecified) {
emit_atomic_error(ctx, "getfield: non-atomic field cannot be accessed atomically");
*ret = jl_cgval_t(); // unreachable
return true;
}
// Determine which was the type that was homogeneous
jl_value_t *jt = jl_tparam0(utt);
if (jl_is_vararg(jt))
jt = jl_unwrap_vararg(jt);
assert(jl_is_datatype(jt));
// This is not necessary for correctness, but allows to omit
// the extra code for getting the length of the tuple
if (!bounds_check_enabled(ctx, boundscheck)) {
vidx = ctx.builder.CreateSub(vidx, ConstantInt::get(ctx.types().T_size, 1));
}
else {
vidx = emit_bounds_check(ctx, obj, (jl_value_t*)obj.typ, vidx,
emit_datatype_nfields(ctx, emit_typeof(ctx, obj, false, false)),
jl_true);
}
bool isboxed = !jl_datatype_isinlinealloc((jl_datatype_t*)jt, 0);
Value *ptr = data_pointer(ctx, obj);
*ret = typed_load(ctx, ptr, vidx,
isboxed ? (jl_value_t*)jl_any_type : jt,
obj.tbaa, nullptr, isboxed, AtomicOrdering::NotAtomic, false);
return true;
}
}
// Unknown object, but field known to be integer
vidx = ctx.builder.CreateSub(vidx, ConstantInt::get(ctx.types().T_size, 1));
Value *fld_val = ctx.builder.CreateCall(prepare_call(jlgetnthfieldchecked_func), { boxed(ctx, obj), vidx });
*ret = mark_julia_type(ctx, fld_val, true, jl_any_type);
return true;
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version regression 1.9 Regression in the 1.9 release
Projects
None yet
Development

No branches or pull requests

7 participants