Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration file format compatible with DVC #547

Closed
osma opened this issue Dec 21, 2021 · 2 comments · Fixed by #560
Closed

Configuration file format compatible with DVC #547

osma opened this issue Dec 21, 2021 · 2 comments · Fixed by #560

Comments

@osma
Copy link
Member

osma commented Dec 21, 2021

To make it possible to use Annif productively in a DVC workflow, it would be helpful if DVC tools could read parameters directly from the Annif configuration file. DVC can currently read parameters from YAML 1.2, JSON, TOML and Python files (see documentation for dvc params).

Annif uses INI-style syntax (supported by configparser Python standard library module) in the projects.cfg configuration file. This is similar to TOML, but not identical.

I can think of at least these options for making the Annif configuration file DVC-compatible:

  1. Support YAML configuration files as an alternative to the current format.
  2. Support JSON configuration files as an alternative to the current format.
  3. Support TOML configuration files as an alternative to the current format.
  4. Adjust the current format slightly so that it becomes a valid subset of TOML.

I think we can rule out 2., because JSON is not very nice as a configuration language because of its strict syntax and lack of support for comments. If we want a new configuration format, either YAML (option 1) or TOML (option 3) is better.

For 4., AFAICT the main difference between the current syntax and TOML is that TOML requires string values to be quoted. So instead of this:

[tfidf-en]
language=en
backend=tfidf
analyzer=snowball(english)
limit=100
vocab=yso-en

the syntax must be

[tfidf-en]
language="en"
backend="tfidf"
analyzer="snowball(english)"
limit=100
vocab="yso-en"

(note that limit can be left as-is, as the value 100 is an integer, not a string)

This syntax doesn't work with ConfigParser currently, because it includes the quotes as part of the value. But it would be simple to change the optionxform so that it drops any quotes from the value. The file name projects.cfg could still be a problem for DVC, which would probably expect the extension .toml so that it recognizes which syntax to use.

@osma osma added this to the Short term milestone Dec 21, 2021
@osma osma added the DVC label Dec 21, 2021
@osma
Copy link
Member Author

osma commented Jan 28, 2022

But it would be simple to change the optionxform so that it drops any quotes from the value.

Whoops, this doesn't work, since optionxform applies to option names, not values.

@osma
Copy link
Member Author

osma commented Jan 28, 2022

I tried to implement option 4 (adjust the format slightly) in the branch issue547-config-toml-quotes, but wasn't very happy with the result - the name of the file projects.cfg is still a problem for DVC tools.

So instead I created PR #560 adding real support for TOML configuration files (option 3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant