Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For structs with all isbits or isbitsunion fields, allow them to be s… #32448

Merged
merged 3 commits into from
Aug 28, 2019

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Jun 28, 2019

…tored inline in arrays

This allows more structures to be stored inline in arrays, like:

struct Foo
    x::Int
    y::Union{Float64, Missing}
end
# also namedtuples
NamedTuple{(:x, :y), Tuple{Int, Union{Float64, Missing}}}

We already had an isinlinealloc flag on DataType, so we switch jl_islayout_inline to use that instead of just isbitstype. Then, when we're setting the isinlinealloc flag for a DataType, we just check that each field is isbitstype or isbitsunion.

@quinnj quinnj requested a review from vtjnash June 28, 2019 21:27
@test addr(b) == addr(b2)
@test addr(b) == addr(b3)
@test addr(b2) == addr(b3)
# @test addr(b) == addr(b2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash, these tests are failing on this branch; is that expected? I'm not sure why the Ref b2 case would be affected by anything I've changed, but I think I understand that for b3, because it's an array, when we store it in an array (inline now), then extract it back again, it's technically a different jl_value_ptr, but still the same bits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it looks like in the Ref case, the alignment of B23367 is now different, which is used by datatype_pointerfree, to determine how to unsafe_convert to a Ptr. Again, it seems like we're more correct now by saying that B23367 is indeed datatype_pointerfree. I'm not sure how much people really run into this kind of thing because doing ccall w/ structs w/ Union fields probably isn't very common.

@@ -5863,7 +5863,7 @@ let
@test b.x === Int8(91)
@test b.z === Int8(23)
@test b.y === A23367((Int8(1), Int8(2), Int8(3), Int8(4), Int8(5), Int8(6), Int8(7)))
@test sizeof(b) == sizeof(Int) * 3
@test sizeof(b) == 12
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this is more correct now? The change is via the call to jl_islayout_inline in jl_f_sizeof here.

@c42f
Copy link
Member

c42f commented Jun 30, 2019

Dumb question: what does this mean for the data layout of structs with union fields? Would it be possible/make sense to always recursively strip the selector bytes out of such structs and store them separately? That is, make the non-selector parts of arrays of unions and arrays of structs of unions ABI compatible with the C version of the same thing?

@quinnj
Copy link
Member Author

quinnj commented Jul 2, 2019

@c42f, so I'm actually not familiar with the C memory layout of structs with union fields, is it just the max size of the largest union type? I guess it doesn't track which element was put in the field, but relies on the user pulling out whatever they want?

@c42f
Copy link
Member

c42f commented Jul 2, 2019

is it just the max size of the largest union type?

I think it's ultimately defined by the platform ABI just as the struct layout is. But yes, I'd guess the union bits would just have the largest size and strictest alignment of types within the union.

I guess it doesn't track which element was put in the field, but relies on the user pulling out whatever they want?

Correct; in C you need to store the type tag info explicitly, or infer it somehow from other state in the system.

@quinnj
Copy link
Member Author

quinnj commented Jul 8, 2019

Bump. Probably best if @JeffBezanson, @vtjnash, or @Keno review this. No urgency really, but when you have a free moment.

@quinnj
Copy link
Member Author

quinnj commented Jul 17, 2019

Bump again.

@vtjnash
Copy link
Member

vtjnash commented Jul 17, 2019

Let's look at JuliaCon

@StefanKarpinski StefanKarpinski added the forget me not PRs that one wants to make sure aren't forgotten label Jul 17, 2019
@quinnj
Copy link
Member Author

quinnj commented Aug 2, 2019

Bump :)

@vtjnash
Copy link
Member

vtjnash commented Aug 5, 2019

This seems pretty good so far. It's good to see how small it is as I wasn't sure how much of the rest of the system knew to check these flags, and not assume that isbits told all.

The one aspect to finish this up that I think may need to be preserved is to ensure that Vector{Foo}(undef, 1)[1] throws an UndefRefException.

@quinnj
Copy link
Member Author

quinnj commented Aug 5, 2019

The one aspect to finish this up that I think may need to be preserved is to ensure that Vector{Foo}(undef, 1)[1] throws an UndefRefException.

Hmmm......do we really though? That's kind of a big crux of the change here is that Foo in a Vector{Foo}(undf, 1) now behaves more like if you did:

struct Foo
    x::Int
    y::Union{Missing, Float64}
    Foo() = new()
end
[Foo()] == Vector{Foo}(undef, 1)

So IMO, this almost feels more like a "bugfix" than change in behavior. How would we still do an UndefRefException? Special case these structs in isassigned and getindex?

@quinnj quinnj mentioned this pull request Aug 8, 2019
@quinnj
Copy link
Member Author

quinnj commented Aug 15, 2019

Bump, this would be nice to have for 1.3

src/datatype.c Outdated
@@ -248,7 +248,8 @@ static unsigned union_isbits(jl_value_t *ty, size_t *nbytes, size_t *align) JL_N
return 0;
return na + nb;
}
if (jl_isbits(ty)) {
if (jl_is_datatype(ty) && jl_datatype_isinlinealloc(ty))
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ on previous line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@JeffBezanson JeffBezanson added the triage This should be discussed on a triage call label Aug 15, 2019
@quinnj quinnj force-pushed the jq/inlineunionfields branch from 1995295 to d98544d Compare August 15, 2019 19:25
@quinnj
Copy link
Member Author

quinnj commented Aug 16, 2019

I see this was marked for triage; was anything concluded?

@StefanKarpinski
Copy link
Member

There was a heated discussion, I'm not sure what we concluded. @JeffBezanson, you seemed in favor.

@quinnj
Copy link
Member Author

quinnj commented Aug 21, 2019

@vtjnash, do you have any follow up here from previous discussion/questions? Anything else we need to change here?

@quinnj
Copy link
Member Author

quinnj commented Aug 27, 2019

I'll plan on merging this in 12 hours or so unless @vtjnash or anyone else has concerns.

@quinnj quinnj merged commit 8093a7c into master Aug 28, 2019
@delete-merged-branch delete-merged-branch bot deleted the jq/inlineunionfields branch August 28, 2019 04:29
@c42f
Copy link
Member

c42f commented Aug 28, 2019

If I define

struct WrappedUnion
    v::Union{Missing,Float64}
end

how does the storage layout now compare to Vector{Union{Missing,Float64}}?

I'd like to suggest that the storage layout for these two Vectors should be the same (cf. #32448 (comment)), and that the selector bits should be hoisted out in the same way in both cases.

@JeffBezanson
Copy link
Member

That's pretty difficult to do, since it means the layout of the same struct is different depending on where/how that struct is stored. It's also hard to handle things like Vector{Union{StructWithUnions, StructWithoutUnions}}.

@vtjnash vtjnash removed forget me not PRs that one wants to make sure aren't forgotten triage This should be discussed on a triage call labels Aug 29, 2019
@vtjnash
Copy link
Member

vtjnash commented Aug 29, 2019

I still think this PR was a breaking change in its current state and needed updates to the documentation.

@c42f
Copy link
Member

c42f commented Aug 30, 2019

That's pretty difficult to do, since it means the layout of the same struct is different depending on where/how that struct is stored.

I feel like with systematic hoisting of selectors you can get consistent layout when each such struct comes in two separable but consistently laid out parts: the data and the type selector bits. I've tried to write it out in more detail over at #33120.

@quinnj
Copy link
Member Author

quinnj commented Aug 30, 2019

@vtjnash, it would be helpful if you could be more specific about what exactly you consider breaking and what documentation needs updating. As I've posted a few times here, I'm happy to push forward on things, but the lack of responses/direction has left this in its current state. Personally, I still feel this is more "bugfix" than breaking given the work and docs that went in to general isbits-Union optimizations.

@vtjnash
Copy link
Member

vtjnash commented Dec 20, 2019

I still think this PR was a breaking change in its current state and needed updates to the documentation.

I still keep running into problems that this causes internally because it was a breaking change that changes assumptions made by some users and inference/codegen. I know you wanted to push ahead faster on this then I could review, and were rejecting my requests for changes to make it non-breaking. But it seems pretty yucky now that we implement isbitsunion to mean a union of things that aren't necessarily isbits (but still document it to mean the old thing).

Personally, I still feel this is more "bugfix" than breaking given the work and docs that went in to general isbits-Union optimizations.

I generally intended the isbits optimizations to be exactly that: optimizations for unions of isbits types. The documentation was intended to reflect that. Regardless of what you feel like calling it, the fact is it changed that memory model without fixing all of the places where we assume the prior memory model, both in the documentation and in the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants