Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add save and load functions that operate on dictionaries #101

Merged
merged 7 commits into from
Jun 18, 2014

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Jun 17, 2014

These functions are similar to their macro counterparts, except that they operate on and return dictionaries instead of variables directly in the global namespace.

save takes a filename and a dictionary with bytestring keys. The dictionary key-value pairs are saved in the file, with the keys as the variable names and the values as their contents.

load takes a filename and (optionally) a list of variable names to read. If no variable names are given, all names in the file are read. It reads the variables into a dictionary that is returned.

Furthermore, the @load macro now returns a list of variables that it loaded.

Would fix #98. Still needs tests and documentation. And perhaps a bike-shed: Should save and load be exported?

I needed this for myself; it's very simple but I figured I may as well push it upstream, too.

These functions are similar to their macro counterparts, except that they operate on and return dictionaries instead of variables directly in the global namespace.

save takes a filename and a dictionary with bytestring keys. The dictionary key-value pairs are saved in the file, with the keys as the variable names and the values as their contents.

load takes a filename and (optionally) a list of variable names to read. If no variable names are given, all names in the file are read. It reads the variables into a dictionary that is returned.
Previously, this macro would simply return the value of the last variable it read. This is more useful.
@timholy
Copy link
Member

timholy commented Jun 17, 2014

Love it, thanks for doing this! Definitely these should be exported---I'd really rather encourage people towards the functions, because the macros are very fragile. The internal functions that support the macros can be renamed to something private so they are not exported.

I don't have any reservations about any aspect of your implementation---nice work!

"r+" requires the file to already exist. Appending data to files is tricky and not terribly intuitive (what if data with the same names are already there?). I think "w" is better.
@mbauman
Copy link
Member Author

mbauman commented Jun 17, 2014

Glad you like it. After playing with it a bit more, I reverted the "r+" mode in save as it requires that the file already exist. And the semantics there are tricky. We can just clobber and document that it clobbers. This is also the default mode for Matlab's save unless append is explicitly passed, so it shouldn't be too jarring to users.

Given that the typing of dict comprehensions is poor at the top-level, it is silly to make folks jump through hoops to make sure that they String keys really are strings.  The bytestring function will error just fine by itself, although perhaps it could use a nicer error message.
@mbauman
Copy link
Member Author

mbauman commented Jun 17, 2014

Ok, updated with documentation and tests. But I come back with two issues:

  1. There is already a load function! I missed it on my first pass. It's semantics are a little different: it takes an array of variable names and returns a tuple of values. This is admittedly very nice: it allows you to be explicit about which variables are named what in your local namespace (I actually have a custom Matlab load function that does exactly this, too). But then there's a disconnect between save and load — with the PR as is, there's the nice property that what goes in is exactly what comes out. Perhaps they could co-exist, but it's not obvious which should be used for any given signature. Thoughts?

  2. Upon running the tests:

    Warning: using DataFrames.save in module Main conflicts with an existing identifier.

    Save is a very generic function name. I'm sure there are other packages that would have a similar conflict.

@timholy
Copy link
Member

timholy commented Jun 18, 2014

  1. Good catch; it's been long enough since I've worked on this code base that I've forgotten what's there. I'm unsure whether co-existence is necessary, but it occurs to me that there's a certain sense in the following API:

    vardict = load(filename)
    var1, var2 = load(filename, "var1", "var2")
    save(filename, vardict)
    save(filename, "var1", var1, "var2", var2)
    

    The main idea is that when you don't know what variable names to request, you get name/value pairs back; when you do know enough about the file to request particular variables, there is no need to return the names again. The fourth one is basically there for logical consistency with the others. Thoughts?

  2. Ugh. My suspicion is that the tests should be run with import rather than using. We could try to test for the existence of a save function in Main, and then import it if present, but in my experiments along such lines I've never been too happy with the result---it makes it impossible to specify that you want to run a particular version. As Julia grows these kinds of name conflicts are inevitable, and I suspect we'll all just have to get used to module-scoping more of our calls. Unless you have a better idea, that is.

@mbauman
Copy link
Member Author

mbauman commented Jun 18, 2014

  1. Yup, that looks wonderful. Best of both worlds. Shouldn't be too difficult to implement.
  2. Really, these calls are akin to jldopen — they cannot extend a base function because they simply take a filename input. But calling them jldsave or JLD.load would really stink. DataFrames calls them save and loaddf, which doesn't really make too much sense, either (and should really be using JLD, not serialize!).

Well, the way operating systems do this is by the extension. Are there other top-level functions that are like this? Is it too feature-y to have Base Julia register filetypes, define save and load, and do the "dispatch" to non-exported functions? Or we could define a new string type that's parametric with its extension! Single-quote 'strings' are available, no? 😈

@timholy
Copy link
Member

timholy commented Jun 18, 2014

Your idea about the file extension is interesting. In Images there is a mechanism for on-demand loading of support-code for particular file formats. This makes me wonder if that should become more generic.

Implement save/load such that the following is possible:

    vardict = load(filename)
    var1, var2 = load(filename, "var1", "var2")
    save(filename, vardict)
    save(filename, "var1", var1, "var2", var2)

as suggested by Tim Holy.
@mbauman mbauman changed the title WIP: Add save and load functions that operate on dictionaries Add save and load functions that operate on dictionaries Jun 18, 2014
@mbauman
Copy link
Member Author

mbauman commented Jun 18, 2014

Updated and good-to-go. We can punt on the name clashes for now, I think. But this could be very powerful with some base language support. read and write are similarly not defined for strings; were this in place you wouldn't need the im prefix to imread.

@timholy
Copy link
Member

timholy commented Jun 18, 2014

Discussion started at JuliaLang/julia#7299

@mbauman
Copy link
Member Author

mbauman commented Jun 18, 2014

Very cool.

I forgot to mention that I deleted the load(filename, varnames::Array) signature. It was never exported, so I think we should be good, but we could very easily deprecate it if you want to.

timholy added a commit that referenced this pull request Jun 18, 2014
WIP: Add save and load functions that operate on dictionaries
@timholy timholy merged commit b16eeea into JuliaIO:master Jun 18, 2014
@timholy
Copy link
Member

timholy commented Jun 18, 2014

Thanks so much for tackling this; I think it was the biggest API wart in JLD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: alternative @load with inspection
2 participants