Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo of custom serialization #3

Closed
timholy opened this issue Jul 31, 2015 · 9 comments
Closed

Demo of custom serialization #3

timholy opened this issue Jul 31, 2015 · 9 comments

Comments

@timholy
Copy link
Member

timholy commented Jul 31, 2015

From @timholy on December 23, 2014 13:39

I needed to create a custom format for a type I'm working on, so I thought while learning how to do this I'd create a demo for the benefit of others. I suspect the main issue worth discussing is whether this is the best approach, or would it be better/simpler to create custom read and write methods? CC @simonster.

Once this settles, I'll also add this material to the documentation.

Copied from original issue: JuliaIO/HDF5.jl/pull/191

@timholy
Copy link
Member Author

timholy commented Jul 31, 2015

I've already discovered a couple of issues with this; no need to review until I fix them.

@timholy
Copy link
Member Author

timholy commented Jul 31, 2015

I had time to fix this sooner than expected, so hopefully this version is correct.

@timholy
Copy link
Member Author

timholy commented Jul 31, 2015

From @simonster on December 23, 2014 23:24

I haven't looked at this in detail yet, but defining after_read and _write methods as we do for Associatives will work fine as long as your object would ordinarily be stored as a reference (i.e., it's a non-bits type) and you don't mind that it's stored as a reference.

Arguably, it's not safe to store non-bits types inline, since they could be undefined. We do it anyway for ByteStrings, UTF16Strings, Symbols, BigInts, BigFloats, and Types because serializing them as references is too expensive. But you can see that this breaks:

julia> x = Array(BigFloat, 1)
1-element Array{Base.MPFR.BigFloat,1}:
 #undef

julia> @save "test.jld" x
ERROR: access to undefined reference
 in _h5convert_vals at /home/simon/.julia/HDF5/src/JLD.jl:586
 in h5convert_vals at /home/simon/.julia/HDF5/src/JLD.jl:574
 in h5convert_array at /home/simon/.julia/HDF5/src/JLD.jl:566
 in _write at /home/simon/.julia/HDF5/src/JLD.jl:525
 in write at /home/simon/.julia/HDF5/src/JLD.jl:481
 in anonymous at /home/simon/.julia/HDF5/src/JLD.jl:898

In this particular case, there's not a huge advantage to storing MyContainer inline instead of by reference, since the Matrix and Vector in MyContainerSerializer will still be stored by reference, and it will break things if you have an Array{MyContainer} with uninitialized elements or an object with an uninitialized MyContainer field. If you don't define JLD.h5fieldtype for MyContainer, JLD should write it as a reference, but at that point I think it will write the same thing as if you defined _write and after_read and that would be less code.

If it commonly happens that people want to serialize one Julia type as another Julia type, we could also have something like this:

writeas(x) = x
readas(x) = x
writeas{T}(x::Type{MyContainer{T}}) = MyContainerSerializer{T}
readas{T}(x::Type{MyContainerSerializer{T}}) = MyContainer{T}

and then convert things appropriately before writing and after reading. There would be some complexity involved in making sure this does what's expected when the writeas type is stored inline, but it would be easier to use.

@timholy
Copy link
Member Author

timholy commented Jul 31, 2015

From @coveralls on January 27, 2015 17:23

Coverage Status

Coverage increased (+0.02%) to 72.58% when pulling 4aa14f10b0d3f6c9201e98865827b41ab0e4f939 on teh/customserialization into efbb3f6 on master.

@timholy
Copy link
Member Author

timholy commented Jul 31, 2015

OK, this has been updated. Much simpler this way.

The biggest issue is whether I've added readas and writeas in every place that needs it.

@timholy
Copy link
Member Author

timholy commented Jul 31, 2015

From @mbauman on January 27, 2015 20:8

This is great. I've been thinking about this in the context of #212 and Keno/SIUnits.jl#47 (type definitions changing in Base and other libraries), and I have a hunch you are, too. It'd be nice to do things like rename and change types with this, but how we serialize types (#204) is going to make or break this.

Adding readas in JLD.julia_type allows adding type parameters gracefully:

$ julia -q
julia> using HDF5, JLD
       type Foo{A} end
       save("test.jld", "x", Foo{1}())
       quit()

$ julia -q
julia> using HDF5, JLD
       type Foo{A,B} end
       JLD.readas{A}(::Foo{A}) = Foo{A,0}()
       JLD.readas{A}(::Type{Foo{A}}) = Foo{A,0}
       load("test.jld", "x")
Foo{1,0}()

Which is really awesome. But it won't work for things like Set{Foo{A}}… but if parameter lists become serialized lists it automagically will. I've not figured out how to deal with changing names yet.

@timholy
Copy link
Member Author

timholy commented Jul 31, 2015

From @simonster on January 27, 2015 20:42

I think this may have to be hacked into jld_types.jl to work for some bits type cases. I'll look more closely later today.

@timholy
Copy link
Member Author

timholy commented Jul 31, 2015

Thanks to both for looking.

Yes, @mbauman, it seemed sensible to start my getting back to HDF5 here, since it's possible it could help with a subset of recent challenges. Here's some mildly-informative goofing around:

julia> type MyType
           a::Int
       end

julia> t = MyType(3)
MyType(3)

julia> using HDF5, JLD

julia> @save "data.jld" t

Start a new session

julia> using HDF5, JLD

julia> t1 = load("data.jld", "t")
WARNING: type MyType not present in workspace; reconstructing
JLD.##MyType#32267(3)

julia> type MyType  # in the meantime, we've redefined the type
           a::Int
           b::Float32
       end

julia> eval(JLD, :(readas(x::$(typeof(t1))) = Main.MyType(x.a, 0.0f0)))

readas (generic function with 3 methods)

julia> load("data.jld", "t")
ERROR: stored type MyType does not match currently loaded type
 in jldatatype at /home/tim/.julia/v0.4/HDF5/src/jld_types.jl:646
 in read at /home/tim/.julia/v0.4/HDF5/src/JLD.jl:323
 in read at /home/tim/.julia/v0.4/HDF5/src/JLD.jl:308
 in anonymous at /home/tim/.julia/v0.4/HDF5/src/JLD.jl:990
 in jldopen at /home/tim/.julia/v0.4/HDF5/src/JLD.jl:229
 in load at /home/tim/.julia/v0.4/HDF5/src/JLD.jl:989

Obviously not there yet, but this direction seems worth chewing over a bit more.

@timholy
Copy link
Member Author

timholy commented Jul 31, 2015

Argh! https://github-issue-mover.appspot.com/ doesn't keep a PR as a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant