You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have changed ABIs for Vectors of Union and struct more than once now, (see #32448 and #23577). The latest change got me thinking again and hoping for a stable, interoperable and efficient ABI for these things. This would make Union{Missing,T} work better in more cases, help with #26309, RelationalAI-oss/Blobs.jl#2, possibly #29289, etc.
The idea is that the data part of nested struct-and-union should follow the C ABI when stored inline, with all type selector bits hoisted outward to be stored separately from the data. For Arrays, the hoisting would move selectors to the end of the array as we currently do for Union{Missing,T}.
This is a natural generalization of the Array-of-Union layout introduced by #23577. In the same way, it allows for memory efficient storage of the data fields and selector bytes by respecting the natural alignment of each. It provides a simple general rule for writing C data structures which alias Julia data.
As a concrete example of this hoisting, here's some structs and unions written out in terms of C data structures.
#include<stdint.h>#include<stdio.h>#include<string.h>// Type for selector bitstypedefuint8_tSelector;
// Some structs containing unions//// struct X// f::Union{UInt8,Float64}// end// // struct Y// f::Union{UInt8,Int32}// end// C representation for these with selector byte split out separatelystructX { union { uint8_t_1; double_2; } f; };
structY { union { uint8_t_1; uint64_t_2; } f; };
structX_selectors { Selectorsel_f; };
structY_selectors { Selectorsel_f; };
// Aggregate of structs-containing-Unions//// struct A// x::X// y::Y// endstructA {
structXx;
structYy;
};
// Aggregate of selector bytes mirrors original struct layoutstructA_selectors {
structX_selectorsx;
structY_selectorsy;
};
// A more complicated example containing Unions of structs with unions.//// struct D// x::X// xy::Union{X,Y}// endstructD {
structXx;
union {
structX_1;
structY_2;
} xy;
};
structD_selectors {
structX_selectorsx;
union {
structX_selectors_1;
structY_selectors_2;
} xy;
Selectorsel_xy;
// Should `xy` or `sel_xy` come first here?
};
// Dump binary representation of data at p.voidhexdump(constchar*name, void*p, size_ts) {
uint8_t*c= (uint8_t*)p;
printf("%s = ", name);
for (size_ti=0; i<s; ++i) {
printf("%02x", c[i]);
if ((i+1) % 4==0)
printf(" ");
}
printf("\n");
};
intmain()
{
structAa;
structA_selectorsa_sel;
memset(&a, '\0', sizeof(a));
memset(&a_sel, '\0', sizeof(a_sel));
// a = A(123.123, 0xff)a.x.f._2=123.123;
a_sel.x.sel_f=2; // a.x isa Float64a.y.f._1=0xff;
a_sel.y.sel_f=1; // a.y isa UInt8hexdump("a", &a, sizeof(a));
hexdump("a_sel", &a_sel, sizeof(a_sel));
structDd;
structD_selectorsd_sel;
memset(&d, '\0', sizeof(d));
memset(&d_sel, '\0', sizeof(d_sel));
// d = D(X(123.123), Y(0x11223344))d.x.f._1=0xff;
d_sel.x.sel_f=1; // d.x.f isa UInt8d.xy._2.f._2=0x1122334455667788;
d_sel.sel_xy=2; // d.xy isa Yd_sel.xy._1.sel_f=2; // d.xy.f isa UInt64hexdump("d", &d, sizeof(d));
hexdump("d_sel", &d_sel, sizeof(d_sel));
// Layout for Vector{A} a combined allocation of `as` and `as_sel`structAas[2];
structA_selectorsas_sel[2];
memset(&as, '\0', sizeof(as));
memset(&as_sel, '\0', sizeof(as_sel));
// as = A[A(X(123.123), Y(0xff)), A(X(0xff), Y(0x1122334455667788))]// as[1] = A(X(123.123), Y(0xff))as[0].x.f._2=123.123;
as_sel[0].x.sel_f=2; // as[1].x isa Float64as[0].y.f._1=0xff;
as_sel[0].y.sel_f=1; // as[1].y isa UInt8// as[2] = A(X(0xff), Y(0x1122334455667788))as[1].x.f._1=0xff;
as_sel[1].x.sel_f=1; // as[1].x isa UInt8as[1].y.f._2=0x1122334455667788;
as_sel[1].y.sel_f=2; // as[1].y isa UInt64hexdump("as", &as, sizeof(as));
hexdump("as_sel", &as_sel, sizeof(as_sel));
return0;
};
I completely agree with and support that we should stabilize a C-compatible (and friendly) ABI for Unions in invariant positions (and possibly covariant ones too, eventually?).
The proposal in the OP seems consistent with what we currently do for arrays... but I also wonder if it would be nice to support other layouts (I’m speculating here that tricks like we do for VecElement might help define a different layout to choose when we are interoperating with C code?)
Regarding other layouts, I'm not sure that's the business of the compiler? People can already define libraries like StructArrays.jl to optimize storage layout for the particular purposes of their application while viewing the data in a more convenient form.
We have changed ABIs for
Vector
s ofUnion
andstruct
more than once now, (see #32448 and #23577). The latest change got me thinking again and hoping for a stable, interoperable and efficient ABI for these things. This would makeUnion{Missing,T}
work better in more cases, help with #26309, RelationalAI-oss/Blobs.jl#2, possibly #29289, etc.The idea is that the data part of nested
struct-and-union
should follow the C ABI when stored inline, with all type selector bits hoisted outward to be stored separately from the data. ForArray
s, the hoisting would move selectors to the end of the array as we currently do forUnion{Missing,T}
.This is a natural generalization of the Array-of-Union layout introduced by #23577. In the same way, it allows for memory efficient storage of the data fields and selector bytes by respecting the natural alignment of each. It provides a simple general rule for writing C data structures which alias Julia data.
As a concrete example of this hoisting, here's some structs and unions written out in terms of C data structures.
Program output:
@vtjnash @quinnj this is a more complete explanation of my vague thought-bubble from #32448 (comment)
The text was updated successfully, but these errors were encountered: