Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a schema for configs #75

Open
12 tasks
talmo opened this issue Aug 14, 2024 · 0 comments
Open
12 tasks

Implement a schema for configs #75

talmo opened this issue Aug 14, 2024 · 0 comments

Comments

@talmo
Copy link
Contributor

talmo commented Aug 14, 2024

Overview

We'd like to implement a set of data classes (using attrs) that enable schema-based validation of all of the config fields parsed by OmegaConf.

Background

In core SLEAP, we rolled our own config system (sleap.nn.config) based on attrs classes, cattr for serialization/deserialization to/from dicts/JSON, and some protobuffer inspired validation utilities.

Because attrs classes don't provide the full featureset we wanted for config management, we ended up implementing a lot of those ourselves (bad). For example, enabling string-based dot-config subfield access resulted in the (likely haunted) ScopedKeyDict. To ensure only a single subfield of a class is set, we implemented an attrs version of protobuffer's oneof. More advanced validation is all over the place.

Here, we switched to using OmegaConf as a batteries-included and more standard config library. It's based around using YAML (which makes for more readable and human-editable config files than the more punishing and markup-demanding JSON).

Right now, however, while we've documented all the fields, these are not enforced or validated at runtime.

The solution is to use a schema that specifies the field names, types, and other properties about them to enable validation. After investigation, we found that OmegaConf enables this through Structured Configs, which essentially take dataclass or attrs classes as input in order to validate a given set of OmegaConf inputs (dicts or yaml).

Plan

PR 1: Basic functionality

  • Migrate over the attrs config classes from sleap.nn.config, starting with TrainingJobConfig and moving down the hierarchy. These should get migrated to a submodule under sleap-nn/sleap_nn/config.
  • Update class definitions to new attrs API
  • Replace cattr serialization with OmegaConf
  • Replace the functionality of the oneof decorator with OmegaConf-based routines if possible (or retain oneof if needed)
  • Figure out how to implement cross-field validation for linked attributes (e.g., max stride in the backbone is constrained by the max stride of all the heads, and the other way around)

PR 2: Revise fields

  • Review old vs new config fields and figure out which ones we should deprecate/rename/etc.
  • Consider a reorganization of the fields to consider better separation of user-defined values as compared to auto-generated ones
    • Currently we handle this by saving out an initial_config.json versus a training_config.json, the latter of which has auto-populated values, but is a lot less convenient to work with.
    • Some fields should probably not be in the config if they are better specified via the CLI or API (example: ZMQ port for training progress monitor). If we think there's a use case for keeping them in the config, then we should make the hierarchy clear (CLI > API > config?).
    • Some fields just store metadata that's useful later, but should not be changed by the user (example: skeleton)
  • Implement a versioning system for the schema so we can better support backwards compatibility
    • Ideally, this should also be robust to forward-compatibility, e.g., newer fields should be ignored in older versions

PR 3: Integration

PR 4: Presets

  • Re-design how we implement presets
    • If config schema changes, all the files need to be updated
    • No runtime validation if they're not Python objects -- would like to have some config presets defined as pure Python classes that can then be configured further (e.g., UNetMediumRFConfig --> pre-filled UNetBackboneConfig with medium RF values)
    • Right now, we have a sprawl of configs because we need to create a config file for every combination of config presets (e.g., backbone X head type). It would be great to have a more modular way to define these and combine them as needed.
    • Some fields are defined by the dataset, but these are inseparable from the rest of the config fields.
    • (Some of this should probably be handled in PR 2)
  • Replicate the example configs from core SLEAP (sleap/sleap/training_profiles)

Example desired API:

import sleap_nn as snn
import sleap_io as sio

labels = sio.load_file("labels.pkg.slp")

cfg = snn.make_config(data=labels, backbone="unet_medium", model="centroid")

snn.train(cfg)  # creates a Trainer from the cfg and runs it

Other possible APIs for composability:

# Specifying splits flexibly
cfg = snn.make_config(data={"train": sio.load_slp("train.pkg.slp"), "val": sio.load_slp("val.pkg.slp")}, backbone="unet_medium", model="centroid")

# Customizing presets
cfg.optimization.epochs = 5

# Composing sub-configs
cfg = snn.make_config(
    data=labels,
    backbone=snn.config.UNetMediumConfig(filters=32),
    model="centroid",
)

# Or from files
cfg = snn.make_config(
    data=labels,
    model=snn.load_config("my/previous/trained/model"),  # if backbone is not specified, pulls it from that config
)

# For transfer learning:
cfg = snn.make_config(
    data=labels,
    model=snn.load_config("my/previous/trained/model"),
    backbone="transfer",

# More control:
cfg = snn.make_config(
    data=labels,
    model=snn.load_config("my/previous/trained/model"),
    backbone=snn.config.TransferConfig(freeze="encoder", encoder_feature_layers=["block0/relu", "block1/relu"]),

Related issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant