Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-demand loading of support for file formats? #7299

Closed
timholy opened this issue Jun 18, 2014 · 13 comments
Closed

On-demand loading of support for file formats? #7299

timholy opened this issue Jun 18, 2014 · 13 comments
Assignees
Labels
speculative Whether the change will be implemented is speculative

Comments

@timholy
Copy link
Member

timholy commented Jun 18, 2014

@mbauman had an interesting idea he proposed in JuliaIO/HDF5.jl#101 that may be worth discussing. He pointed out that there are conflicts for functions named save and load. He suggested that one way to solve this is based on file extension, e.g., save("mydata.jld", ...) would invoke the HDF5/JLD code, whereas save("mydata.df", ...) might invoke DataFrames routines, etc. (I don't know whether there's a standard extension for DataFrames, I just made that up.) This is especially important when the same datatype could be stored using two different on-disk formats.

Images already has something that's a little bit like this, a framework for registering support for file formats based on extension and/or magic bytes, https://github.com/timholy/Images.jl/blob/master/doc/extendingIO.md#contributing-a-file-format-to-images. Should this registration framework be generalized and added to Julia proper? The idea is that all the support code would still live in packages, but that those packages would be loaded automatically depending on extension and/or magic bytes.

@JeffBezanson
Copy link
Member

We could do something similar to how MIME types are handled, and have a default save that uses the file extension to try to call a specialized method.

@kmsquire
Copy link
Member

For reading, once #3656 is merged (and perhaps after the promised IO system
revamps), we could read magic bytes at the beginning of a file to determine
the file type (which was part of my motivation for creating that PR in the
first place).

On Tue, Jun 17, 2014 at 6:23 PM, Jeff Bezanson [email protected]
wrote:

We could do something similar to how MIME types are handled, and have a
default save that uses the file extension to try to call a specialized
method.


Reply to this email directly or view it on GitHub
#7299 (comment).

@nalimilan
Copy link
Member

Simply detecting the extension sounds great already. Magic bytes could be used later to improve the detection (using something like XML definitions from http://www.freedesktop.org/wiki/Software/shared-mime-info/ ?).

@timholy
Copy link
Member Author

timholy commented Jun 18, 2014

OK, I'll put this on my todo list.

@timholy timholy self-assigned this Jun 18, 2014
@mbauman
Copy link
Member

mbauman commented Aug 8, 2014

I was thinking about this a bit more today with the flurry of activity around jls/jld (I was prompted by Jeff's comment on adding save to base… and I want to intentionally link that issue to this one).

Here's a sketch of how I'm thinking this might work:

type FileType{Extension} <: String # Parameterized by a symbol for the extension
     path::UTF8String
end
FileType(path::String) = (p = utf8(path); FileType{symbol(splitext(p)[2])}(p))
macro FileType_str(s) # For easily creating types
    :(FileType{$(Expr(:quote, symbol(s)))})
end
# … and delegate UTF8/String methods to the path

# Then library authors could define
save(::FileType".jls", args...)
save(::FileType".jld", args...)
save(::FileType".h5", args...)
# And there could be a fallback in base that users would call:
save(p::String, args...) = save(FileType(p), args...)

Loading would work similarly for path-based dispatch.

Magic bytes are trickier; to do that I think you'd need a registration system similar to Images.jl given variable lengths and such. It could sit nicely on top of this system, however. If a filetype doesn't have a recognized extension, then Base could look through the registered magic bytes for a match, and then dispatch to the FileType registered to it.

@timholy
Copy link
Member Author

timholy commented Aug 9, 2014

That looks quite elegant, @mbauman!

@JeffBezanson
Copy link
Member

Yes, there is a nice analogy to display, which seems appropriate.

@Keno
Copy link
Member

Keno commented Aug 9, 2014

👍

@StefanKarpinski
Copy link
Member

I like it – that's turning into a standard Julia pattern.

@SimonDanisch
Copy link
Contributor

By the way, advanced my approach a little in FileIO.
I want to use it soon for all my FileIO.
I took the feedback from FileIO and stole a little from Images.jl and that's what came out of it:
https://gist.github.com/SimonDanisch/feea8f9afb0fe8a109e3#file-fileio_mime-jl
Would be nice to get some feedback =)
I ditched the parameter in File{ending} as this turned out to be pretty annoying for arrays.
I still like it, so maybe it would be nice to get better support for phantom parameters, which don't influence the underlying memory representation!?
Caching and where the mimes live is still an open question.
I also don't have a clear picture of how to handle streams vs paths vs data.
Maybe have the parameter of File tell if its a stream or just a path to a file.

@SimonDanisch
Copy link
Contributor

I haven't really showed how loading should work.
I think the function should specialize to the unique_identifier, e.g. ascii_ply.
It could look like this:

load(::File, Mime{UniqueIdentifier})
safe(::T, Mime{UniqueIdentifier})

@timholy
Copy link
Member Author

timholy commented Aug 2, 2015

OK, I finally found time to deliver some "feedback": JuliaIO/FileIO.jl#15

My sense is that this doesn't need to live in base julia; every package that wants to use load or save should be using FileIO. If others agree, I think we can close this.

@SimonDanisch
Copy link
Contributor

I agree! Would be odd to have a hook to every io package in Base, while
using FileIO is completely acceptable!
Thanks a lot for this, I will have a look at it tomorrow!
On 2 Aug 2015 16:40, "Tim Holy" [email protected] wrote:

OK, I finally found time to deliver some "feedback": JuliaIO/FileIO.jl#15
JuliaIO/FileIO.jl#15

My sense is that this doesn't need to live in base julia; every package
that wants to use load or save should be using FileIO. If others agree, I
think we can close this.


Reply to this email directly or view it on GitHub
#7299 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests

9 participants