Skip to content

Serialization of JSON, YAML, INI/ConfigParser #470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sdrobert opened this issue May 11, 2021 · 4 comments
Closed

Serialization of JSON, YAML, INI/ConfigParser #470

sdrobert opened this issue May 11, 2021 · 4 comments

Comments

@sdrobert
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Serialization to simple config files (rather than pickled files) is limited to JSON, and support is only partial.

Describe the solution you'd like

Integration of pydrobert-param functionality into param. It already supports such serialization, as well as a bunch of argparse and CLI stuff.

Describe alternatives you've considered

N/A

Additional context

I worked separately on this code base a while back because it seemed like it wasn't going in the direction that param was heading. With the serializer submodule, maybe this has changed? The codebase is here and would need some tailoring, but I'm happy to work on it a bit if it's worthwhile.

@jbednar
Copy link
Member

jbednar commented May 12, 2021

Cool! Would you be able to summarize what pydrobert-param provides that's not already in Param? There are definitely some useful things I'd like to see in Param:

  • Serialization to YAML (though presumably easily achieved with Param's current JSON support?)
  • Serialization to .ini (but could/should that be TOML instead?)
  • Parsing command-line arguments as Parameters (though I can't quite see if that's what the argparse stuff does?)

For the Optuna stuff, I'd be happy to see some convenience function in param that gives the right kind of data structure that Optuna would need, but I wouldn't want any specific Optuna support to be in param itself. Optuna could be an example in the docs of how this data structure would be used, though! Hope that distinction makes sense.

@sdrobert
Copy link
Contributor Author

Hi and thanks for spending the time!

When I wrote the code circa 2019 (I believe), I wanted a way to seamlessly translate between config files and Parameterized instances. This meant a) writing/reading parameterized values to/from a dictionary; b) (de)serializing that dictionary to a popular config file type; and c) providing argparse.ArgumentParser hooks to handle that all transparently so that, by the time I'm done parsing the command-line arguments, my Parameterized instances have been populated.

I might be misunderstanding the code base (in fact, I misunderstood it when I posted this last night - it appears param supports some way of (de)serializing all values - sorry!), but I think param currently has partial support for a), full support for b) as JSON, and no support for c). I can find the class for dumping the Parameterized instance to a dict, but not the other way around (though this shouldn't be too hard to implement). You're right: param's method of serializing to JSON into a schema and value file can be easily translated into INI/TOML (N.B. I wasn't aware of TOML when I wrote the code. Plus, INI support is dependency-free with configparser). pydrobert-param writes a lot of the schema stuff as comments to the value file instead of as a separate file - a method I still prefer as I believe it's more end-user-friendly - but this is by no means necessary. In pydrobert-param, the argparse stuff just manifests as a couple of lines before calling argparse.ArgumentParser.parse_args:

add_parameterized_print_group(parser, parameterized=params)
add_parameterized_read_group(parser, parameterized=params)

Once you have a) and b) all sorted out, c) is just a matter of putting them together.

In short, with param.serializer, I think that param is most of the way to this functionality on its own. I should've been more clear last night: I think pydrobert-param can be cannibalized for at least partially for that remaining functionality, not that param should subscribe to pydrobert-param's way of doing things.

Re: Optuna. Naw, I think you're right, it seems out-of-scope for param. The stuff in pydrobert-param is just the way that I do it. For this feature request, I was just talking about the above.

The only other thing I'd mention is that (de)serialization doesn't necessarily have to be round-trip. Specifically I'm thinking about things like data frames and arrays. Serialization pretty much has to mean printing the array as a list (ick), but I think a more plausible scenario for deserialization, assuming the user is writing the config by hand, is to read the array from file. Here's how I do it for DataFrames and for Numpy Arrays.

Thanks again for the consideration,
Sean

@jbednar
Copy link
Member

jbednar commented May 12, 2021

When I wrote the code circa 2019 (I believe), I wanted a way to seamlessly translate between config files and Parameterized instances. This meant a) writing/reading parameterized values to/from a dictionary; b) (de)serializing that dictionary to a popular config file type; and c) providing argparse.ArgumentParser hooks to handle that all transparently so that, by the time I'm done parsing the command-line arguments, my Parameterized instances have been populated.

Ah, I see. For c) it sounds like I have bigger goals, which is to allow the user to override individual parameters on the commandline as simple CLI options so that the rest of the program only needs to see a single clean Parameterized object but the user can choose whether to edit the config file or to put something as a CLI argument. We've made some progress towards arguments as Parameters but never really achieved it properly. In any case what we've done I think is orthogonal to what you've done, as our approach did not support a .ini file. Not sure what happened to what we did, but I can dig it up if you're interested. The array and dataframe serialization stuff looks useful as well.

Overall, if you'd be interested in working to get these changes reconciled with current Param and merged in, I'd be very glad to have them; seems like they address important use cases. If so we should schedule a meeting to work out a plan. You can email my github username at anaconda.com to go from here, if you have bandwidth soon. If not, just reach out whenever you do think you might have time to dive in!

@sdrobert
Copy link
Contributor Author

Sounds good! I'll close this up and email you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants