You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We'd like to implement a set of data classes (using attrs) that enable schema-based validation of all of the config fields parsed by OmegaConf.
Background
In core SLEAP, we rolled our own config system (sleap.nn.config) based on attrs classes, cattr for serialization/deserialization to/from dicts/JSON, and some protobuffer inspired validation utilities.
Because attrs classes don't provide the full featureset we wanted for config management, we ended up implementing a lot of those ourselves (bad). For example, enabling string-based dot-config subfield access resulted in the (likely haunted) ScopedKeyDict. To ensure only a single subfield of a class is set, we implemented an attrs version of protobuffer's oneof. More advanced validation is all over the place.
Here, we switched to using OmegaConf as a batteries-included and more standard config library. It's based around using YAML (which makes for more readable and human-editable config files than the more punishing and markup-demanding JSON).
Right now, however, while we've documented all the fields, these are not enforced or validated at runtime.
The solution is to use a schema that specifies the field names, types, and other properties about them to enable validation. After investigation, we found that OmegaConf enables this through Structured Configs, which essentially take dataclass or attrs classes as input in order to validate a given set of OmegaConf inputs (dicts or yaml).
Plan
PR 1: Basic functionality
Migrate over the attrs config classes from sleap.nn.config, starting with TrainingJobConfig and moving down the hierarchy. These should get migrated to a submodule under sleap-nn/sleap_nn/config.
Update class definitions to new attrs API
Replace cattr serialization with OmegaConf
Replace the functionality of the oneof decorator with OmegaConf-based routines if possible (or retain oneof if needed)
Figure out how to implement cross-field validation for linked attributes (e.g., max stride in the backbone is constrained by the max stride of all the heads, and the other way around)
This is currently handled in ScopedKeyDict in a pretty ad hoc way [1][2]
Review old vs new config fields and figure out which ones we should deprecate/rename/etc.
Consider a reorganization of the fields to consider better separation of user-defined values as compared to auto-generated ones
Currently we handle this by saving out an initial_config.json versus a training_config.json, the latter of which has auto-populated values, but is a lot less convenient to work with.
Some fields should probably not be in the config if they are better specified via the CLI or API (example: ZMQ port for training progress monitor). If we think there's a use case for keeping them in the config, then we should make the hierarchy clear (CLI > API > config?).
Some fields just store metadata that's useful later, but should not be changed by the user (example: skeleton)
Implement a versioning system for the schema so we can better support backwards compatibility
Ideally, this should also be robust to forward-compatibility, e.g., newer fields should be ignored in older versions
If config schema changes, all the files need to be updated
No runtime validation if they're not Python objects -- would like to have some config presets defined as pure Python classes that can then be configured further (e.g., UNetMediumRFConfig --> pre-filled UNetBackboneConfig with medium RF values)
Right now, we have a sprawl of configs because we need to create a config file for every combination of config presets (e.g., backbone X head type). It would be great to have a more modular way to define these and combine them as needed.
Some fields are defined by the dataset, but these are inseparable from the rest of the config fields.
importsleap_nnassnnimportsleap_ioassiolabels=sio.load_file("labels.pkg.slp")
cfg=snn.make_config(data=labels, backbone="unet_medium", model="centroid")
snn.train(cfg) # creates a Trainer from the cfg and runs it
Other possible APIs for composability:
# Specifying splits flexiblycfg=snn.make_config(data={"train": sio.load_slp("train.pkg.slp"), "val": sio.load_slp("val.pkg.slp")}, backbone="unet_medium", model="centroid")
# Customizing presetscfg.optimization.epochs=5# Composing sub-configscfg=snn.make_config(
data=labels,
backbone=snn.config.UNetMediumConfig(filters=32),
model="centroid",
)
# Or from filescfg=snn.make_config(
data=labels,
model=snn.load_config("my/previous/trained/model"), # if backbone is not specified, pulls it from that config
)
# For transfer learning:cfg=snn.make_config(
data=labels,
model=snn.load_config("my/previous/trained/model"),
backbone="transfer",
# More control:cfg=snn.make_config(
data=labels,
model=snn.load_config("my/previous/trained/model"),
backbone=snn.config.TransferConfig(freeze="encoder", encoder_feature_layers=["block0/relu", "block1/relu"]),
Overview
We'd like to implement a set of data classes (using
attrs
) that enable schema-based validation of all of the config fields parsed by OmegaConf.Background
In core SLEAP, we rolled our own config system (
sleap.nn.config
) based onattrs
classes,cattr
for serialization/deserialization to/from dicts/JSON, and some protobuffer inspired validation utilities.Because
attrs
classes don't provide the full featureset we wanted for config management, we ended up implementing a lot of those ourselves (bad). For example, enabling string-based dot-config subfield access resulted in the (likely haunted)ScopedKeyDict
. To ensure only a single subfield of a class is set, we implemented anattrs
version of protobuffer'soneof
. More advanced validation is all over the place.Here, we switched to using
OmegaConf
as a batteries-included and more standard config library. It's based around using YAML (which makes for more readable and human-editable config files than the more punishing and markup-demanding JSON).Right now, however, while we've documented all the fields, these are not enforced or validated at runtime.
The solution is to use a schema that specifies the field names, types, and other properties about them to enable validation. After investigation, we found that OmegaConf enables this through Structured Configs, which essentially take
dataclass
orattrs
classes as input in order to validate a given set of OmegaConf inputs (dicts or yaml).Plan
PR 1: Basic functionality
attrs
config classes fromsleap.nn.config
, starting withTrainingJobConfig
and moving down the hierarchy. These should get migrated to a submodule undersleap-nn/sleap_nn/config
.attrs
APIcattr
serialization withOmegaConf
oneof
decorator withOmegaConf
-based routines if possible (or retainoneof
if needed)ScopedKeyDict
in a pretty ad hoc way [1] [2]PR 2: Revise fields
initial_config.json
versus atraining_config.json
, the latter of which has auto-populated values, but is a lot less convenient to work with.PR 3: Integration
PR 4: Presets
UNetMediumRFConfig
--> pre-filledUNetBackboneConfig
with medium RF values)sleap/sleap/training_profiles
)Example desired API:
Other possible APIs for composability:
Related issues
The text was updated successfully, but these errors were encountered: