-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store Julia objects as compound types #27
Comments
Agreed 100%. I was basically waiting for immutables to land before getting serious about Compound support, and never got back to it. Because I don't need this right away I probably won't get to this immediately; feel free to tackle it, or I'll tackle it myself in a week or two. |
(I have some julia/Profile.jl bugs that need fixing first.) |
I've started on this, but I'm still thinking of the best way to handle things. The first decision to be made is whether we store only immutable bits types as compound types; we store all immutable types as compound types; or we store all Julia types as compound types. There is an undeniable appeal to storing all Julia types as compound types. Reading/writing objects that contain bits type fields would be significantly faster, since those fields wouldn't need to be references. The differences between immutables and ordinary objects would just be in the way arrays are handled (i.e., as arrays of references or arrays of values). I think we could even reconstruct missing/changed types by dynamically generating a new type based on the compound type definition. The major downside is that we'd need to break compatibility with existing JLD files, leave around a method to read them, or create a converter. I've also been thinking about how to efficiently convert HDF5 compound types to Julia types. "Efficiently" ideally means that, once the compound type is read into memory, we convert the compound type to a Julia type in place and avoid additional allocations. For immutable bits types and arrays thereof, this is easy, since we just need to add padding in the right places. It might even be possible to get the HDF5 library to perform this conversion for us. For normal Julia types, where arrays are stored as references, there isn't necessarily a big advantage to in-place conversion, since we'll never be converting very much data at a time, although we would avoid an allocation for each object. For arrays of immutable types with pointers, we would need to convert HDF5 object references to pointers in-place to avoid allocating a second buffer. Allocating a second buffer isn't that bad, and to start with I'll probably just do this, but it limits the maximum size of an array of non-bits immutables to half of the system's available memory. It might be possible to avoid, though, either by giving the HDF5 library custom conversion functions using |
I haven't thought about this in ages, but how would one handle a type declaration like
and for an array of them, some are |
We'd store anything that's not a bits type as a reference in the compound type, effectively mirroring the way Julia stores types in memory. If we have to reconstruct the type from the compound type definition because the Julia type changed or no longer exists, we'd just leave reference fields untyped. |
That seems very reasonable. Overall I think this sounds like a great plan. As much as it pains me to break JLD compatibility, I think the reality is that these files are not yet in heavy use, but probably will be some day (I'm just starting to make use of them in practice in my own work). So now is the time for breakage if there ever is. Moreover, since there is a version number, in principle we have all the information we need. The "converter," if we need one, could even be a current snapshot of jld.jl, with a different module name (e.g., JLD01). As far as efficiency goes, presumably there may be places where it matters and where it doesn't (since IO is expected to be somewhat limiting). To me it seems that you have a great plan. I agree that, ultimately, we could probably get the HDF5 library to insert padding etc for us, but also that such optimizations can come second if they're nontrivial to get working. |
Making progress: https://gist.github.com/simonster/50d282a533a76eaebbb3 Next step: reading it out. |
Oo ooh! Very nice! I'm really excited about this. Aside from my |
At the moment, we can write, but not read immutables from JLD. While it would be pretty trivial to copy the code for creating new immutables from
serialize.jl
, I wonder if we can use compound types instead. It seems like there would be massive performance and disk space advantages to storing arrays of immutables contiguously on disk as opposed to using HDF5 references for each field, even if the on-disk representation isn't necessarily the same as the in-memory representation because of padding.The text was updated successfully, but these errors were encountered: